Context Engineering Explained in 7 Minutes
A new term has been floating around in the AI space: context engineering. Unlike some other trends, this one actually makes sense. However, I have some frustrations with this newly coined term, which I'll explain. At the same time, I also really like it, and you probably do as well. I know that doesn't make any sense, but I'll explain.
We'll explore a great example from a public GitHub repository on improving AI code generation and then break down what context engineering means for an AI agent.
The "New" Term That Isn't New
The first issue with this newly coined term is that we've already been doing context engineering. This isn't a new concept, just a newly coined term. The practice of context engineering is not new.
If you've been using Retrieval-Augmented Generation (RAG) for semantic search to give more information to your agents, you've already been doing some form of context engineering. If you're retrieving information from files, providing specific tools for agents to use, or using an orchestration agent to create and manage subtasks to shorten the context window for each agent, then you've done context engineering. It's not a new concept; it's just now been officially given a name, so it seems new.
From "Vibe Coding" to Context Engineering
If you remember about five months ago, a new term, "vibe coding," emerged. The idea was to give in to the "vibes" and just tell the AI what to do, almost forgetting the code exists. While it can be fun for small personal projects, it's not a robust methodology.
This leads us to the same thought leader who recently posted in favor of "context engineering" over "prompt engineering." He noted that people associate prompts with the short, day-to-day tasks you give an LLM. In contrast, every industrial-strength LLM application involves the delicate art and science of filling the context window with just the right information for the next step.
He clarified that he wasn't trying to coin a new term, but rather to distinguish the process from simple prompting. You prompt an LLM to tell you why the sky is blue, but production applications build context meticulously for LLMs to solve custom tasks.
Just a couple of days prior to that post, a major AI publication brought out a blog post titled "The Rise of Context Engineering."
What is Context Engineering?
According to industry leaders, context engineering is about building dynamic systems to provide the right information and tools in the right format so the LLM can plausibly accomplish a task.
Complex agents likely get context from many sources. This can come from the application developer, the user, previous interactions, tool calls, or other external data. Pulling all these together involves a complex system.
Here are the various components that can be used as context for your system: * Basic Instructions & System Prompt: The foundational commands you're already used to. * Long-Term Memory: Memory that persists after a session is over, allowing information from all past sessions to be used in future ones. * State or History: This is short-term memory. For instance, some AI frameworks have a state that can be saved and used among different agents and processes during a single running flow. * Retrieval-Augmented Generation (RAG): We've been using this for a while. It involves semantically retrieving information from a vector database to give our agent or system better data to perform its task. This is not new. * The User Prompt: The direct input from the user. * Available Tools: This also isn't new. We've been giving tools to our agents for some time. With modern servers, it's even easier to integrate a suite of tools for an agent to select from, provided you have a decent prompt so it knows when and what tool to use. These tools can perform web searches to bring in even more context. * Structured Output: I haven't used non-structured output from an agent in months. Using a validation library like Pydantic for structured output just works every time and ensures reliability over plain JSON.
Why Is This Important?
When an agentic system messes up, it's largely because the LLM makes a mistake. Thinking from these principles, there are two main reasons for failure: 1. The underlying model just isn't good enough. 2. The model was not passed the appropriate context to produce a good output.
The bold statement being made is that more often than not, model mistakes are caused by the second reason. As users and developers, we are often not giving enough context—or sometimes, we're giving too much, causing hallucinations.
How Is Context Engineering Different from Prompt Engineering?
That's a great question. Why the shift in terminology? For years, we've seen countless guides on prompt engineering, which essentially focuses on injecting clever sentences, phrases, and steps into a prompt to get the correct output. You're front-loading everything onto the prompt.
Early on, developers focused on phrasing prompts cleverly to coax better answers. As applications grew more complex, it became clear that providing complete and structured context is more important than finding the exact right phrasing.
It's argued that prompt engineering is a subset of context engineering. Even if you have all the context, how you assemble it in the prompt still absolutely matters. The difference is that you are not architecting your prompt to work well with a single set of input data, but rather to take a set of dynamic data and format it properly.
This is a really good point. Prompt engineering isn't being replaced. Now that we have all these sources helping an agent perform a task, the prompt that uses that information still needs to be good. As an engineer, I wouldn't feel comfortable just giving a basic prompt and expecting it to handle a bunch of data. I would still need to provide proper prompts so the model knows how to use all the information I'm giving it.
Context Engineering for Agents in 4 Steps
Let's look at this specifically for agents. The core idea is that context engineering is the delicate art and science of filling the context window with just the right information—instructions, knowledge, and tools—for the next step.
This isn't a new concept. We have been doing this. Maybe we can do it better, and that's a fair statement, but we've been doing this.
The Problem: Context Hallucination
Long-running tasks and accumulating feedback from tool calls mean that agents often utilize a large number of tokens, which can cause problems. Longer context can lead to several performance issues: * Context Poisoning: When a hallucination makes it into the context. * Context Distraction: When the context overwhelms the model's original training. * Context Confusion: When superfluous context influences the response. * Context Clash: When different parts of the context disagree with each other.
AI research firms have noted that agents often engage in conversations spanning hundreds of turns, which means careful context management strategies are needed. Here is a four-step framework for how to do this.
1. Write Context
This means you want to persist information between sessions. * Long-Term Memory: Stored across agent sessions (e.g., in a database). * Scratchpad/State: Stored within a single agent session.
2. Select Context
This means retrieving the information you wrote down for the agent to use. * Retrieve relevant tools for the task. * Retrieve information from the scratchpad, state, or long-term memory. * Retrieve any other relevant information to help the agent perform better.
3. Compress Context
The context window can fill up quickly, and managing tokens becomes crucial. * Summarize: Condense parts of the context to retain only the most relevant tokens. * Trim: Remove irrelevant tokens entirely to shrink the context window and avoid information disagreements.
4. Isolate Context
This is a strategy that is likely underutilized. It means breaking down large tasks into smaller subtasks. If you have a long prompt for a complex task that requires retrieving a lot of information, why not split it up? * Create subtasks for the agent to handle sequentially or in parallel. * For each subtask, provide only the specific context needed to complete that part.
This results in a smaller context window for each task, which means you are far more likely to get a better output and avoid hallucinations. A multi-agent researcher makes a strong case for this, stating that a system with many agents using isolated context outperformed a single agent, largely because each sub-agent's context window could be allocated to a more narrow subtask.
Note: The one drawback is that while each agent uses fewer tokens, the total number of tokens used across all agents might be higher.
A Practical Multi-Agent System
Let's consider a high-level example of a multi-agent research system.
User Request: "What are all the companies in the United States working on AI agents in 2025? Make a list of at least 100, and for each, include [specific information]."
This request goes into the multi-agent system. Here’s what happens: 1. Lead Agent (Orchestrator): This agent doesn't complete the task itself. Instead, it plans the work and delegates it to other specialized agents. It provides the necessary tools and context for each sub-agent. 2. Search Sub-Agents: The lead agent spawns multiple sub-agents to complete the research tasks in parallel. This is a direct application of isolating context. By creating many subtasks, the system has a better chance of delivering what you want. 3. Memory: A shared memory system is used to retrieve specific information for each subtask, so each agent knows exactly what it needs. 4. Citation Sub-Agent: Another specialized agent is responsible for finding and formatting citations for the information gathered.
This approach avoids giving a single, massive prompt and hoping for the best. You want the system to perform well, and providing all these contextual elements is how you achieve that.
Final Thoughts: A Love-Hate Relationship
So, back to my original point. Why do I have mixed feelings about "context engineering"?
What I Don't Like: The main problem is that we've been doing this for a while. I dislike it when a new term emerges for an existing practice, making it seem like a new frontier in AI when it's not. This doesn't mean you have to drop everything and learn a "new" skill. It just means we can focus on doing what we already do, but better.
What I Really Like: On the other hand, I like it for the very same reason: we've already been doing it! It validates that many of us have been on the right track. If you've used modern AI assistants, you've already experienced this to some extent with their built-in memory systems. When you build your own AI agents, you can use frameworks that offer multiple memory types (long-term, short-term, entity memory, etc.) to achieve similar results.
Another thing I like is that if you want to build a production-grade application, you must perform context engineering. Imagine an automation flow that uses AI to summarize information and generate emails. You wouldn't just use a simple prompt. You would want to provide examples of past emails, examples of what the user doesn't like, and a list of things that absolutely cannot be part of the email. You're just bringing in context so the model can perform better.
Ultimately, as much as I might critique context engineering for being a new term for an old concept, it's a helpful framing. It brings to light that while prompt engineering is still needed, the broader practice of context engineering is what will allow us to build more powerful and reliable AI systems for the future.