Context Engineering Explained in 5 Minutes
We aren't talking clearly enough about context engineering, and we're getting it wrong in some important ways. If you're unfamiliar with the term, context engineering is the successor to prompt engineering. It acknowledges that large language models consider much more than just the initial prompt. They analyze system instructions, chat rules, and any uploaded documents. The engineer's responsibility is to ensure all this context is correct and leads to the desired outcome.
So far, so good. The issue is that most of the dialogue around context engineering focuses on what could be called the smaller part: the things we can deterministically control.
Part 1: The Deterministic Context We Control
Numerous papers and shared advice focus on how to efficiently shrink and use the context window sent directly to the large language model. This assumes communication with a cloud-based model where token burn is a primary concern.
A famous example is the paper on "chain of draft," which suggests that an LLM can save tokens by using its own symbols and shorthand for logical thinking instead of full English sentences. This method saves a significant number of tokens while maintaining nearly the same quality, as the act of writing things down helps the LLM "think" more clearly.
This entire domain—static prompts, knowledge bases, documentation, and data feeds—is what we call deterministic context. It's the part we can directly control, but it's also the smaller part.
Part 2: The Unpredictable Probabilistic Context
The larger, less-discussed part of context engineering is probabilistic context. When you grant an LLM any form of web access, you introduce a vast, uncontrollable context. Your deterministic context becomes a drop in the bucket compared to the probabilistic context the model can acquire.
For instance, if a multi-agent system like Claude Opus is tasked with researching a topic based on a single document, it might consult over 500+ websites. In this scenario, the original document and prompt are a minuscule percentage of the total tokens processed. The model maintains focus only because it's trained to prioritize the user's request.
This transfers the responsibility for shaping the model's context-gathering from a deterministic input to the prompt itself. The prompt becomes probabilistic—it doesn't control the context, but it shapes the agent's search for it. The focus shifts from cost-cutting through token optimization to achieving more correct and congruent answers.
Key Principles for Modern Context Engineering
As we move toward a world of web-connected, increasingly autonomous agents, we must understand the impact of our prompts on the overall probabilistic context. Here are several principles to guide this new approach:
Expect Discovery: Design for "semantic highways." Measure the rate at which a desired response appears when you include probabilistic context. Can you write prompts that consistently yield good results even when the context window is open to the web?
Monitor Information Quality: Reliably monitor the quality of the information sources the agent uses. If you ask it to use "reliable and verified news sites," does it? Auditing the sources is crucial, even if the final output seems correct. It's alarming how often an agent can produce a good result from sketchy sources.
Prioritize Security: With probabilistic context, security is paramount. LLM injection attacks from agents searching the web and MCP servers are inevitable. It's surprising this hasn't become a more widespread issue already, and we must anticipate it.
Measure Decision Accuracy: Instead of traditional precision and recall, which assume a deterministic context, focus on relevance scoring for the inputs. The quality of the sources within the probabilistic context is a better predictor of the final output's quality. This may require a dedicated evaluation harness.
Version Everything: This principle is straightforward but critical. You must carefully test and version your prompts to manage their impact over time.
The Future is Agentic
These principles point toward a future where we acknowledge the security threats of the open web, understand that larger context windows can lead to better decisions, and design our evaluations around source quality.
The most important aspect of the deterministic context we control is not token efficiency, but its ability to shape the probabilistic window. Simple constraints like "go search verified news sites" or "go search academic articles" are a start, but our evaluation harnesses must evolve. They need to handle a world where deterministic context is just a small part of the equation.
This is critical because chatbots are no longer just large language models; they are agents in a trench coat. Most frontline AI experiences use agent-like behaviors on the backend. Our context engineering practices must catch up to this agentic future and focus on deliberately engineering context even when we can't control all the pieces.