
TL;DR
Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and execution environments, then return compact summaries and receipts.
Read next
CloudFlare, Anthropic, and Cursor independently discovered the same pattern: don't load all tools upfront. Let agents discover what they need. The results are dramatic.
10 min readContext engineering is the practice of designing the persistent information that surrounds every AI interaction. CLAUDE.md files, system prompts, skill libraries, and memory systems. It is the single highest-leverage skill for developers working with AI agents in 2026.
14 min readAgents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, context compression, and conversation summarization.
9 min readMost agent systems waste context by default.
They call a tool. The tool returns a large JSON blob. The model reads the blob, chooses the next tool, gets another large blob, and repeats. After a few steps, the context window is full of intermediate data the agent no longer needs.
The result is familiar: slower runs, higher costs, worse reasoning, and failures that look mysterious until you inspect the transcript.
The fix is simple but underused: keep intermediate state outside the model context.
Anthropic's recent engineering work around code execution with MCP points at this pattern. Instead of making the model directly inspect every row, page, or event, give the agent an execution environment where it can write small programs, process data locally, and return only the answer, the evidence, and the receipt.
For the foundation, read progressive disclosure in Claude Code and the context engineering guide. This post is the implementation pattern.
A naive agent loop looks like this:
agent -> list all customers
tool -> returns 10,000 rows
agent -> filter for accounts with failed invoices
tool -> returns 1,200 rows
agent -> inspect invoice events
tool -> returns 30,000 events
agent -> summarize failures
The model context becomes a data warehouse. That is not what a language model is good at.
Even if the context window is large enough, the reasoning quality suffers. The model has to search through raw data, remember which parts matter, and avoid being distracted by irrelevant fields.
Large context is useful. It is not a substitute for data processing.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
A better loop gives the agent a place to compute:
agent -> write a script that queries customers, joins invoices, filters failures, and outputs a compact report
execution environment -> runs the script
tool -> returns summary, counts, source IDs, and errors
agent -> reasons from the report
The model does not need every row. It needs the result and enough evidence to trust it.
The difference is not cosmetic. It changes the shape of the whole agent system:
That is the 98% context reduction pattern.
Move these out of the model context whenever possible:
Keep these in context:
The model should reason. Code should crunch.
The most underrated memory primitive is still the filesystem.
If an agent processes 50 files, it does not need to paste the full contents of all 50 into context. It can write:
.agent-work/
findings.json
failing-tests.txt
candidate-files.txt
summary.md
Then it can read the compact summary when needed. The raw evidence remains available without living in the prompt forever.
This is why local coding agents feel powerful. They can use files as durable scratch space. The context window becomes the active working set, not the entire workspace.
A good reduced-context tool response should include:
{
"summary": "Found 18 failed invoices across 7 customers.",
"counts": {
"customersScanned": 10421,
"failedInvoices": 18,
"affectedCustomers": 7
},
"evidence": [
"customer_123 invoice inv_456",
"customer_789 invoice inv_999"
],
"filesWritten": [
".agent-work/failed-invoices.csv",
".agent-work/invoice-summary.md"
],
"nextSuggestedAction": "Inspect payment provider webhook logs for these invoice IDs."
}
The model can act on that. The human can audit it. The raw data is still available if deeper inspection is needed.
You do not need a new framework.
For Claude Code, ask it to write analysis scripts and keep raw outputs in files. For MCP servers, expose workflow tools that process data server-side and return receipts. For custom agent apps, add a workspace directory and persist intermediate state between steps.
The architecture is boring:
That boring pattern is what makes long-running agents cheaper and more reliable.
Context is not a trash can.
Efficient agents keep the model focused on decisions and keep intermediate state in the systems built for it: code, files, databases, logs, and traces. The best agent architectures do not ask the model to remember everything. They give it a reliable way to retrieve what matters.
That is how you cut context without cutting capability.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Gives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolVisual testing tool for Model Context Protocol servers. Like Postman for MCP - call tools, browse resources, and view...
View ToolLightweight CLI for discovering and calling MCP servers. Dynamic tool discovery reduces token consumption from 47K to 40...
View ToolCentralized manager for MCP servers. Connect once to localhost:37373 and access all your servers through a single endpoi...
View ToolAutomatic reuse of cached context for substantial cost reduction.
Claude CodeDeferred tool loading reduces context overhead for large MCP suites.
Claude CodeConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI Agents
CloudFlare, Anthropic, and Cursor independently discovered the same pattern: don't load all tools upfront. Let agents di...

Context engineering is the practice of designing the persistent information that surrounds every AI interaction. CLAUDE....

Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, co...
One command, zero config. DD Traces is a local-first OpenTelemetry viewer for developers who use AI coding tools and wan...

Most agent tool APIs are just REST endpoints with nicer names. Production agents need intent-shaped tools that compress...

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state,...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.