AI models have a limited memory: the context window. In simple chatbots, that is rarely a problem. But in long-running workflows, such as agentic systems or multi-layered processes, context management quickly becomes a technical challenge.
Every AI model has a context window: the maximum amount of text it can process at once. For a simple conversation, that is more than enough. But in long-running workflows, where an AI takes multiple steps, processes documents and makes decisions over time, you quickly hit that limit. Smart context management then becomes a necessity, not a luxury.
The context window of modern models is large, but not unlimited. Claude 3.5 Sonnet has a context window of 200,000 tokens; GPT-4o has 128,000. That sounds like a lot, but in a workflow processing dozens of documents, taking multiple steps and sending the full conversation history, it fills up quickly.
Moreover: the fuller the context window, the more expensive each request. And there is evidence that models remember information at the beginning and end of a long context better than in the middle, the so-called "lost in the middle" effect.
Instead of sending the full conversation history or document content, you summarise what is relevant. After each step in a workflow, you have the model generate a summary of what was decided and what the current status is. You send that summary to the next step, not the full history.
This requires thinking carefully about what your model needs to take the next step. What is essential? What is background? What can be dropped?
What is too large for the context window, you store outside the model. That can be in a relational database, a vector database or a simple key-value system. Relevant information is retrieved when the model needs it.
This is the same approach as RAG, but for workflow state rather than document content. The workflow context lives outside the model; the model only receives what it needs at that moment.
In conversational systems you use a "sliding window": you only send the last N messages in the context, not the full conversation history. Add a short summary of the earlier conversation to maintain continuity.
The overlap ensures the transition is smooth: the summary covers what the window no longer contains.
For very long workflows it can be smart to build in checkpoints. After each significant step you save the full state of the workflow to a database. If the workflow is interrupted or if the context becomes too large, you restart from the last checkpoint.
This requires more architectural work, but makes workflows more robust and scalable over longer time horizons.
Frameworks like LangChain and LlamaIndex offer built-in abstractions for context management. They provide memory modules that automatically summarise, integrate external storage and manage sliding windows. That saves implementation time but introduces dependencies.
For simpler use cases, manual context management, where you decide yourself what to send, is more transparent and easier to debug.
Context management becomes critical in workflows that last longer than one exchange, where the model needs to remember information established earlier in the process, or where large documents are processed. Chatbots for short customer service conversations rarely encounter this; an AI agent guiding a project over a week encounters it constantly.
Smart context management is one of the less visible but most decisive technical choices when building AI workflows. Mach8 designs AI systems where context management is properly arranged from the ground up, so long-running processes work reliably.
Want to build an AI workflow that also works well for complex, long-running processes? Get in touch with Mach8.
We help you go from strategy to implementation. Schedule a no-obligation call.
Schedule a call