LLM Agents (also called Agent AI Workflows) are the next major evolution of LLMs (Large Language Models) that enable the LLM to answer more complex questions, handle your private data better and reduce your cost in using LLMs.
LLM Agents will also make the current technique of RAG (Retrieval Augmented Generation) obsolete.
LLMs like ChatGPT can already do impressive feats on public data. However try to use an LLM with your own data and you run into a problem:
How do you provide the LLM with your own data?
Providing your own data to the LLM
If your data is small then you can just include it in the prompt. However most real world data is gigabytes, terabytes or petabytes. The maximum number of words (tokens) that GPT4 can accept is 8k.
So there is no way we can fit all our data in the prompt to an LLM.
Retrieval Augmented Generation (RAG)
The current technique to solve this is called Retrieval Augmented Generation or RAG.
Here’s how it works. The user asks a question. The software finds related content to the question and passes it to the LLM to answer the question.
How does the software decide which content is relevant to the current question?
The most common technique is to find content that has words that relate to words in the question. This technique uses vector similarity search between embedding of the question and embedding of each content piece.
This technique works well if the question relates to words in the content. For example, a question like “Does this patient have diabetes?” will work well since the patient’s record will have a mention of “diabetes”.
Where This Breaks Down
However take a question like “find me patients who are teenagers”. No patient record is likely to have the word “teenager” (or related words) in it. So RAG will generally fail for questions like this.
How do we solve this?
Well, the LLM can actually figure out what data it needs. So instead of sending data to the LLM could we inverse the process and provide the LLM with some “tools” that it can use to get the appropriate data it needs?
Take an example, where we tell the LLM that we have three “tools” available to it:
- A tool that finds patients based on demographics (name, birth date, gender, address etc)
- A tool that finds whether a patient has certain phrases in his/her clinical record
- A tool that finds whether a patient meets a quality measure
Now we just send the user’s question to the LLM and let the LLM figure out which tool to call when.
If the LLM gets a question like “find me patients who are teenagers” then it will call tool #1 passing in birthdate between 1/1/2005 and 1/1/2011. Notice that LLMs are smart to figure out how to convert “teenagers” into a query for birth dates.
If the LLM gets a question like “Does this patient have diabetes?” then it will call tool #2 passing in “diabetes”.
If the LLM gets a question like “find me patients over 67 years who are not meeting the HEDIS Measure A” then it will call tool #1 to find patients who are over 67 then take that list of patients and pass it to tool #3 to find patients who are not meeting the measure.
Multiple Steps Instead Of One Step
Notice how the LLM Agent can go through multiple steps until it has the answer. This is more representative of the real world where we go through multiple steps using different “tools” until we find the answer.
Typically we get some piece of data from one source (tool) then we use that data to interrogate another data source and so on until we have gotten the answer.
It is not typical that one data source has the answer to any complex question.
And the question for the second data source is dependent on the answer we got from the first data source.
So if we follow multi-step workflows in our own process today, it stands to reason that LLMs need to follow multiple steps also to answer complex questions.
How can I use LLM Agents Today?
LLM Agents are available in the popular LLMs today.
You can either use the native APIs of each LLM or use a framework like LangChain that allows you to use any LLM.
The concept is the same. You provide a list of “tools” to the API that the LLM can call.
The metadata about the tool that you provide to the LLM:
- How to call the tool. Simplest option is to provide a Python function that the LLM can call. There are also built-in tools that search the web and do other tasks.
- Instructions for what the tool does. The LLM uses this to understand when to use the tool.
- For example: “This tool finds a patient when provided either a name, a birth date or an address.”
- A schema for input. The LLM will convert the input into this schema so your tool can handle it.
- For example: {‘query’: {‘name’: ‘string’, ‘birth_date’: ‘date’, ‘address’: ‘string’}}
If you want to use the native API of OpenAI: https://platform.openai.com/docs/assistants/overview?context=with-streaming
If you want to use LangChain framework: https://python.langchain.com/docs/modules/agents/quick_start
Summary
Currently LLMs suffer from the problem where it is hard to give them access to the large amounts of private data that companies have.
The current technique of using RAG to solve this works for some questions but fails for other questions where the data does not contain the same (or related words) to the words in the question.
A question like “find me all the patients who are teenagers” is hard to solve using current techniques.
Similarly a question that involves accessing multiple data sources is also hard to handle today such as “find me patients over 67 years who are not meeting the HEDIS Measure A”.
LLM Agents (Agent AI Workflows) enable us to take better advantage of the intelligence in LLMs by letting the LLM figure out how to use the tools we give it.
This reduces the data we send to LLMs cutting costs, limiting business risk and speeding up the LLM.
In addition, by using multi step workflows LLMs can answer more complex questions that involve referencing multiple data sources.
LLM Agents Workflows are also great for reducing hallucinations because the LLM can rephrase the question multiple ways, evaluate its responses and cross-check the answers with underlying data. More on that in a future article…