Context Engineering for AI Agents Explained in 10 Minutes |

You might have heard the term "context engineering" recently. It's a powerful way to describe the numerous different things that we do when building AI agents. At their core, agents need context. They need instructions, external knowledge, and feedback from tool calls. Context engineering is the art and science of filling the context window with just the right information at each step of an agent's trajectory.

In this article, I'll explore several different strategies for context engineering, which can be grouped into four main areas: writing context, selecting context, compressing context, and isolating context. We'll walk through interesting examples of each from popular agents and discuss how a framework like LangGraph is designed to support them all.

What is Context Engineering?

The term has gained traction recently. Toby from Shopify highlighted his preference for "context engineering," and Andrej Karpathy offered a solid definition: "the delicate art and science of filling the context window with just the right information for the next step."

Karpathy also drew an interesting analogy between Large Language Models (LLMs) and operating systems. The LLM acts as the CPU, and the context window is like RAM—its working memory. Crucially, this memory has a limited capacity. Just as an operating system curates what fits in RAM, context engineering is the discipline of deciding what information needs to fit in the LLM's context at each step of an agent's process.

So, what types of context are we talking about? It's an umbrella over a few different themes:

Instructions: This includes prompt engineering, few-shot examples, and tool descriptions.
Knowledge: These are facts and memories that inform the agent.
Tools: This is feedback from the environment, such as from using APIs, a calculator, or other external tools.

All these sources of context flow into the LLM when you're building applications.

Why is This Especially Tricky for Agents?

Agents present unique challenges due to two key properties: they often handle longer-running or more complex tasks, and they utilize tool calling. Both of these properties result in larger context utilization. Feedback from tool calls can accumulate, and long-running tasks can generate significant token usage over many turns.

As the context grows, so does the risk of failure. A blog post by Drew Breun outlines several specific context failures, including:

Context Poisoning: Injecting a hallucination that adversely influences the response.
Distraction: The model gets confused by conflicting information.
Curation & Clash: The model struggles to handle and prioritize a large volume of information.

For these reasons, context engineering is particularly critical when building agents. As a recent post from Cognition AI highlighted, "context engineering is effectively the number one job of engineers building AI agents."

Four Key Strategies for Context Engineering

After reviewing numerous popular agents and reflecting on practical experience, we can distill the approaches down into four main categories.

Writing Context: Saving information outside the context window to help an agent perform a task.
Selecting Context: Selectively pulling information into the context window when needed.
Compressing Context: Retaining only the most relevant tokens from a larger body of information.
Isolating Context: Splitting context into separate, manageable parts.

Let's dive into examples for each of these categories.

1. Writing Context

Writing context means saving information outside the main context window for later use. When humans solve complex tasks, we take notes and remember things for the future. Agents can do the same using scratchpads and memory.

Note-Taking with Scratchpads

A scratchpad is a concept for persisting information while an agent is performing a single task. A great example is Anthropic's recent multi-agent researcher. The lead researcher agent begins by thinking through its approach and saves that plan to a persistent memory. This is crucial because while the context window might have a limit (e.g., 200,000 tokens), the plan can always be retrieved and retained.

Note: The implementation of a scratchpad can differ. It could be a simple file or a runtime state object within an agent framework. The core idea is to write information down so the agent can recall it later if needed.

Long-Term Memory

Sometimes, we want an agent to save information across many different sessions. While scratchpads are typically for a single session, memories are for long-term retention.

Generative Agents, for example, synthesize memories from collections of past agent feedback.
ChatGPT's memory feature is a well-known example of this pattern in a popular AI product.
Cursor will also auto-generate memories based on user-agent interactions.

The intuition is clear: as new context comes in, the agent can dynamically update its existing memories, creating a more personalized and effective experience over time.

2. Selecting Context

Selection means pulling specific context into the context window to help an agent perform a task. This is the other side of the coin to writing context.

Types of Memory to Select

Depending on the problem, an agent might need to pull in different types of memories:

Procedural Memories (Instructions): These are often captured in configuration files (e.g., claude.md in code agents) that contain style guidelines or instructions for using tools.
Semantic Memories (Facts): When dealing with a large collection of facts, it's common to use techniques like embedding-based similarity search or graph databases to retrieve only the most relevant information at the right time.
Episodic Memories (Few-Shot Examples): These are past experiences that provide specific instructions for a desired behavior.

Selecting Tools

One of the challenges with agents is their difficulty in handling large collections of tools. Research has shown performance degradation after just a few dozen tools and near-complete failure when approaching 100 tools.

A powerful technique to overcome this is to use RAG (Retrieval-Augmented Generation) over tool descriptions. This involves embedding the descriptions and using semantic similarity to fetch only the most relevant tools for a given task, significantly improving performance.

Selecting Knowledge (Advanced RAG)

RAG is a huge topic, and knowledge selection in popular agents is highly non-trivial. Code agents are some of the largest-scale RAG applications today. A post from the CEO of Cursor revealed that their approach is quite sophisticated.

Of course, they use indexing and embedding-based similarity search, but they also employ advanced techniques:

Smarter Chunking: Parsing code along semantically meaningful boundaries, not just random blocks.
Hybrid Search: Combining embedding-based search with keyword search and knowledge graphs.
LLM-based Re-ranking: Using a model to rank the retrieved results for relevance.

A huge amount of context engineering goes into effective knowledge selection.

3. Compressing Context

Compression involves retaining only the tokens required to perform a task, discarding the rest to manage token bloat.

Summarization

Summarization is a common and effective technique. * If you've used Claude Code, you might have noticed it calls an autocompact function once the session reaches 95% of its context window. This is an example of summarization across a full agent-user trajectory. * Anthropic's multi-agent paper applied summarization more narrowly, only to completed work sections. * Cognition's research showed summarization being used at the interface between different sub-agents, acting as a compressed information handoff.

Trimming

Trimming is a more selective removal of tokens. This can be done with simple heuristics, like keeping only the most recent messages, or with more advanced learned approaches that use an LLM to perform intelligent context pruning.

4. Isolating Context

Isolating context involves splitting it up to help an agent perform a task more effectively.

Multi-Agent Systems

The most intuitive example here is a multi-agent system. * The Swarm library from OpenAI was designed based on a separation of concerns, where a team of agents can each have their own context window, tools, and instructions. * Anthropic's multi-agent researcher made this explicit, noting that sub-agents operate in parallel with their own context windows. This expands the total number of tokens the overall system can process, allowing for richer and more detailed results.

Sandboxed Environments

Another powerful technique is to use a sandboxed environment for execution. * Hugging Face's Open Deep Research described a code agent that generates executable code. This code is run in a sandbox, which can persist state over multiple turns. * This isolates token-heavy information like images or audio files from the LLM's context window. Only selective information (return values, variable names) is passed back to the LLM to reason about.

State Objects

A simple and intuitive way to isolate context is by using a structured state object. You can define a data model (e.g., using Pydantic) with different fields for different types of context. One field, like messages, might always be exposed to the LLM, while other fields can hold information that is only used at specific points in the agent's trajectory.

How LangGraph Enables These Strategies

Before starting any context engineering effort, it's useful to have two things: 1. Observability: The ability to track token usage. LangSmith is a great tool for this. 2. Evaluation: A way to measure the effect of your efforts to ensure you haven't degraded the agent's behavior.

LangGraph is a low-level orchestration framework for building agents as a graph of nodes and edges. This design provides powerful primitives for all four context engineering strategies.

Writing Context: LangGraph is designed around a state object that is accessible in every node. An agent can take notes (a scratchpad) and write them to the state, which is checkpointed and available throughout the agent's session. For persistence across sessions, LangGraph has built-in support for long-term memory.
Selecting Context: Within any node, you can select from the state object (scratchpad) or retrieve from long-term memory. LangGraph's memory can store different types, and you can use techniques like embedding-based search. There are even pre-built utilities, like a tool selector that uses semantic search over tool descriptions.
Compressing Context: LangGraph has out-of-the-box utilities for summarizing and trimming message history. As a low-level framework, it also gives you the flexibility to define custom logic within any node, such as post-processing token-heavy tool calls.
Isolating Context: LangGraph has excellent support for multi-agent systems, with open-source implementations for both Supervisor and Swarm patterns. It also works nicely with sandboxed environments like E2B for code execution. Finally, the central state object can be designed with a schema to partition and isolate context as needed.

Key Takeaways

To summarize, we've explored four overarching categories for context engineering that are visible across many popular agents today:

Writing: Saving context outside the window for later retrieval (e.g., scratchpads, long-term memory).
Selecting: Retrieving the right context at the right time (e.g., tools, facts, knowledge via RAG).
Compressing: Reducing token bloat by retaining only the most relevant information (e.g., summarization, trimming).
Isolating: Splitting context into manageable parts (e.g., multi-agent systems, sandboxing, state objects).

This is a rapidly emerging field, and this list is by no means complete. However, it provides a framework for organizing the space and offers a starting point for applying these powerful techniques in your own agent development.

Podcast Title