
TL;DR
Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and execution environments, then return compact summaries and receipts.
| Source | Description |
|---|---|
| Effective Context Engineering for AI Agents | Anthropic's engineering post on context management strategies for building reliable agent systems |
| Code Execution with MCP | Anthropic's engineering post showing a 98.7% token reduction by processing MCP tool results in code instead of model context |
| Claude Code Overview | Official documentation for Claude Code - the agentic coding tool that implements context reduction patterns |
| Model Context Protocol Specification | The MCP spec defining how tools communicate with AI agents - the protocol layer where context reduction happens |
| MCP Architecture | MCP architectural concepts including how servers can process data and return compact results |
Most agent systems waste context by default.
They call a tool. The tool returns a large JSON blob. The model reads the blob, chooses the next tool, gets another large blob, and repeats. After a few steps, the context window is full of intermediate data the agent no longer needs.
The result is familiar: slower runs, higher costs, worse reasoning, and failures that look mysterious until you inspect the transcript.
The fix is simple but underused: keep intermediate state outside the model context.
Anthropic's recent engineering work around code execution with MCP points at this pattern. Instead of making the model directly inspect every row, page, or event, give the agent an execution environment where it can write small programs, process data locally, and return only the answer, the evidence, and the receipt.
For the foundation, read progressive disclosure in Claude Code and the context engineering guide. This post is the implementation pattern.
A naive agent loop looks like this:
agent -> list all customers
tool -> returns 10,000 rows
agent -> filter for accounts with failed invoices
tool -> returns 1,200 rows
agent -> inspect invoice events
tool -> returns 30,000 events
agent -> summarize failures
The model context becomes a data warehouse. That is not what a language model is good at.
Even if the context window is large enough, the reasoning quality suffers. The model has to search through raw data, remember which parts matter, and avoid being distracted by irrelevant fields.
Large context is useful. It is not a substitute for data processing.
A better loop gives the agent a place to compute:
agent -> write a script that queries customers, joins invoices, filters failures, and outputs a compact report
execution environment -> runs the script
tool -> returns summary, counts, source IDs, and errors
agent -> reasons from the report
The model does not need every row. It needs the result and enough evidence to trust it.
The difference is not cosmetic. It changes the shape of the whole agent system:
That is the 98% context reduction pattern.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Move these out of the model context whenever possible:
Keep these in context:
The model should reason. Code should crunch.
The most underrated memory primitive is still the filesystem.
If an agent processes 50 files, it does not need to paste the full contents of all 50 into context. It can write:
.agent-work/
findings.json
failing-tests.txt
candidate-files.txt
summary.md
Then it can read the compact summary when needed. The raw evidence remains available without living in the prompt forever.
This is why local coding agents feel powerful. They can use files as durable scratch space. The context window becomes the active working set, not the entire workspace.
A good reduced-context tool response should include:
{
"summary": "Found 18 failed invoices across 7 customers.",
"counts": {
"customersScanned": 10421,
"failedInvoices": 18,
"affectedCustomers": 7
},
"evidence": [
"customer_123 invoice inv_456",
"customer_789 invoice inv_999"
],
"filesWritten": [
".agent-work/failed-invoices.csv",
".agent-work/invoice-summary.md"
],
"nextSuggestedAction": "Inspect payment provider webhook logs for these invoice IDs."
}
The model can act on that. The human can audit it. The raw data is still available if deeper inspection is needed.
You do not need a new framework.
For Claude Code, ask it to write analysis scripts and keep raw outputs in files. For MCP servers, expose workflow tools that process data server-side and return receipts. For custom agent apps, add a workspace directory and persist intermediate state between steps.
The architecture is boring:
That boring pattern is what makes long-running agents cheaper and more reliable.
Context is not a trash can.
Efficient agents keep the model focused on decisions and keep intermediate state in the systems built for it: code, files, databases, logs, and traces. The best agent architectures do not ask the model to remember everything. They give it a reliable way to retrieve what matters.
That is how you cut context without cutting capability.
The 98% context reduction pattern is an agent architecture approach where intermediate data - database query results, API responses, raw logs - stays outside the model's context window. Instead of pasting full datasets into the prompt, agents write scripts that process data in an execution environment and return compact summaries with evidence links. The model reasons from summaries rather than raw data, cutting context usage by 90-98% on data-heavy tasks while improving reasoning quality.
Large context windows do not equal better reasoning. When models process thousands of rows or pages of raw data, they get distracted by irrelevant fields, lose track of the goal, and make extraction errors. Context reduction forces the agent to process data programmatically - where code is reliable - and reserve the model for decisions where reasoning matters. The result is fewer hallucinations, faster runs, and lower token costs.
Ask Claude Code to write analysis scripts rather than reading data directly. For example, instead of "show me all failing tests," say "write a script that runs the test suite, captures failures, and writes a summary to .agent-work/failures.md." Claude Code will create the script, run it, and read the compact output. The raw test output stays in files, not in context. This pattern scales to database queries, log analysis, and any data-heavy task.
Keep in context: the user goal, relevant constraints, the current plan, compact findings, source references, and the next decision. Move outside context: full query results, raw API responses, large logs, dependency trees, generated files, repeated tool schemas, and verbose test output after the first failure. The rule is that the model should reason about decisions while code handles data processing.
No. The pattern works with any agent that can write and execute code. Claude Code implements it natively - the agent writes scripts, runs them, and reads outputs. MCP servers can implement it by returning receipts instead of full payloads. Custom agent apps implement it by adding a workspace directory for intermediate state. The tooling is optional; the architecture is the point.
Files are durable and cheap. An agent processing 50 documents can write findings to .agent-work/findings.json instead of keeping all 50 documents in context. On subsequent turns, it reads the compact summary file rather than re-processing. The filesystem becomes a working memory that persists across reasoning steps without consuming context tokens. This is why local coding agents feel powerful - they have natural scratch space.
A receipt is a compact tool response that includes a summary, counts, evidence links, files written, and a suggested next action. It answers the question "what happened and where can I verify it?" without including the raw data. For example, a database query receipt might say "Found 18 failed invoices across 7 customers" with IDs and a CSV path, not 18 full invoice objects. The agent acts on the receipt; the human can audit the files.
Yes, and it becomes more important with longer context windows. A 200K token window can hold more data, but that does not make the model better at processing data. Long context creates the illusion that stuffing everything into the prompt is fine. In practice, agents with long context and no reduction pattern still hit reasoning degradation, slower responses, and higher costs. Treat long context as headroom for complex reasoning, not storage for raw data.
Read next
CloudFlare, Anthropic, and Cursor independently discovered the same pattern: don't load all tools upfront. Let agents discover what they need. The results are dramatic.
10 min readContext engineering is the practice of designing the persistent information that surrounds every AI interaction. CLAUDE.md files, system prompts, skill libraries, and memory systems. It is the single highest-leverage skill for developers working with AI agents in 2026.
14 min readAgents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, context compression, and conversation summarization.
9 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Gives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolVisual testing tool for Model Context Protocol servers. Like Postman for MCP - call tools, browse resources, and view...
View ToolLightweight CLI for discovering and calling MCP servers. Dynamic tool discovery reduces token consumption from 47K to 40...
View ToolCentralized manager for MCP servers. Connect once to localhost:37373 and access all your servers through a single endpoi...
View ToolAutomatic reuse of cached context for substantial cost reduction.
Claude CodeDeferred tool loading reduces context overhead for large MCP suites.
Claude CodeConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI Agents
CloudFlare, Anthropic, and Cursor independently discovered the same pattern: don't load all tools upfront. Let agents di...

Context engineering is the practice of designing the persistent information that surrounds every AI interaction. CLAUDE....

Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, co...
One command, zero config. DD Traces is a local-first OpenTelemetry viewer for developers who use AI coding tools and wan...

Most agent tool APIs are just REST endpoints with nicer names. Production agents need intent-shaped tools that compress...

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.