Towards real-world use of large language models
The problem with LLMs
In 2022, a relatively unknown company called OpenAI took the world by storm by introducing ChatGPT. This machine learning model not only provided statistical answers to predefined questions like its predecessors, but could actually understand the user and provide natural and accurate responses to almost anything that was asked of it. It was predicted that large language models (LLMs) such as GPT are going to replace every aspect of society in short order, completing every task people used to do, faster and with more accuracy. But years down the line, where are these models today? As it turns out, not far from when they were at release. They are now more accurate and faster, but their core competence has not progressed far from the chatbot that started it all.
Most use cases for LLMs revolve around making a chatbot and priming it, either through fine-tuning or prompting, with appropriate domain “knowledge” (in truth, properly classified training sets of data from that domain) and then letting it do what it does best — answer questions. This has not been without its own problems, as the models are notoriously unreliable when it comes to facts and like to make things up, so long as they sort-of resemble the answer LLM thinks it should be giving. This phenomena was dubbed “hallucination” [1] and has been one of the reasons for slow real-world adoption of the technology.
The other significant obstacle is the simple difficulty in integrating these systems with existing processes. The answer an LLM gives is fundamentally unstable and a result of stochastic prediction algorithms which feed it naturally-sounding words that it assesses have a high chance of appearing next in the answer sequence. Simply put, it’s a very complex autocomplete. That explains why they make great chatbots and associated solutions (knowledge bases, summarizers, feedback analyzers, …) but struggle with replacing real business processes and pipelines.
Prompt smarter, not harder
One approach to resolving these obstacles to real-world use relies on reducing the LLM surface area to discretely defined tasks. Attempting to handle the entire problem with one prompt will generally lead to a result that is nearly impossible to process further. Since natural text doesn’t conform to structured patterns, neither will LLM’s answers.
We reduce this issue by splitting the problem into smaller sub-problems, confining every step into as small an answer space as we can make it. Instead of allowing the model the freedom to interpret its own context for answering, we provide the boundaries and force it to adapt its outputs to a structured form of our choice. This output can then be fed forward into the next step of the process, which will solve the next small sub-problem, bringing us closer to the end result. The starting point for this kind of sequence of steps is often some form of text classification aimed at understanding the input. When we receive unstructured text input, it has to be classified into possible actions the application can take or questions it can answer.
For example, in Tessa, we need to understand what actions a user is asking of us to process against the current specification. This is a very hard problem to solve for a normal application, but it’s an area where LLMs excel. Their grasp of natural language is best used to either transform language into structure or structure into language, one transform at a time.
First, we ask it to transform the inputs into calls to specific actions in a computer-friendly format, usually something like JSON or CSV. This sort of format is not generally natural to an LLM and it’s where fine-tuning becomes most useful, as you can nudge the model towards providing the correct output for every stage of the process. Once we have the inputs broken down and classified, we can proceed with the pipeline as normal, updating our data layer and preparing the corresponding results.
It’s in this stage where most of the work is done to prevent hallucinations and improve result reliability. This processing includes strategies such as using external tools via RAG [2] to collect associated data from the other inputs and previous states and use them to inform the model, which grounds the response in verified information instead of letting the LLM come up with the answer from its own internal data, disconnected from the task at hand. As we’re returning the end result to the user, we call again on the LLM — this time in the opposite direction. It takes the computer-friendly representation of data and turns it into a natural text response for the user.
Agent Tessa, Esq.
This style of iterative problem solving is what’s become known recently as “agentic AI” [3]. Because one-shot prompting (solving the entire problem in a single prompt and response) is so intractable, the process is decomposed into using isolated instances of the models, called agents, to perform some subtask in service of the overall pipeline. That allows for a better defined interface between inputs and outputs of every step.
One of the key advantages of agent decomposition is that they’re able to communicate with each other and may use this information to iterate on their own outputs or form prompting pipelines which result in generally better and more accurate outputs. In Tessa, we use these to adjust an agent to every step of the document generation and pipe it into the next, allowing us to control every output with precision that we could never achieve on a one-shot prompt. This is what enables us to keep track of different types of content in a single document without confusing the model as to the expected output formats for every step.
Our approach to solving these issues highlights the importance of adapting LLM workflows to real-world scenarios. As much as it is difficult for traditional software to handle unstructured data and free-form text, so it is difficult for an LLM to interact with traditional pipelines that expect stable (always the same for the same set of inputs) and structured responses. If we want to design a solution that integrates the strengths of both approaches, we must first reduce the process to its component parts and then find where the fit for LLMs is and where we must use our existing tools. The best outcomes appear when we are able to combine both.
by Primož Jeras