Progressive Disclosure: How Claude Code Cut Token Usage by 98%

In September 2025, CloudFlare published a blog post titled "Code Mode: The Better Way to Use MCP." It contained a single, devastating observation: we've been using MCP wrong.
The problem wasn't theoretical. When you load MCP tool definitions directly into an LLM's context window, you're forcing the model to see every available tool for every request, whether it needs them or not. Most of the time, those tools sit idle, burning tokens for nothing.
CloudFlare's insight was radical: models are excellent at writing code. They're not great at leveraging MCP. So why not let the model write TypeScript to find and call the tools it needs instead of embedding all the schemas upfront?
Three months later, Anthropic and Cursor both arrived at identical conclusions independently. The pattern has a name: progressive disclosure.
The Numbers Don't Lie

Anthropic's tool search feature shows the math clearly. Using a full MCP tool library with traditional context loading consumed 77,000 tokens. With tool search—discovering tools on demand—that dropped to 8,700 tokens. That's an 85% reduction while maintaining access to the entire tool library.
Accuracy improved too. In MCP evaluations:
- Opus 4: 49% → 74%
- Opus 4.5: 79.5% → 88.1%
Cursor reported similar wins. By implementing dynamic context discovery, they achieved a 46.9% reduction in total agent tokens. One week later, CloudFlare dropped their findings: a 98.7% reduction in token usage using TypeScript sandboxes instead of MCP schemas.
This isn't incremental optimization. This is a paradigm shift.
The Shift from GPUs to Sandboxes
Six months ago, the industry obsessed over inference speed and GPU efficiency. The conversation has moved. CloudFlare, Anthropic, Vercel, Cursor, Daytona, and Lovable are all converging on the same infrastructure: sandboxes, file systems, and bash.
The pattern is elegant. Instead of tokenizing every tool definition, you give agents three things:
- A file system (read, write, search)
- Bash (execute commands, run scripts)
- Code execution (call MCP servers on demand)
The agent's job becomes simple: discover what you need, load it, use it. No context bloat. No unused tool schemas. No wasted tokens.
How to Build This in Claude Code

Claude Code implements progressive disclosure through skills. A skill is a YAML file with frontmatter (the summary) and references to actual scripts and markdown files (the implementation).
Here's the pattern:
---
name: "Web Research"
description: "Search and summarize web content using Firecrawl"
---
## Usage
Call this skill when you need current web information.
## Implementation
- [[firecrawl.sh]] - Core search and scraping
- [[research-template.md]] - Output format
The agent sees only the frontmatter in context (10-30 tokens). When it invokes the skill, it reads the full implementation—and only then. Scale to 1,000 skills, 10,000 skills, and the static context cost remains flat.
You can nest skills hierarchically. A skill can reference sub-skills. An agent can walk the directory structure, find what it needs, and load only that.
Advanced Tool Use: Memory and Code Execution

Anthropic's advanced tool use releases included two other pieces that complete the picture:
Programmatic Tool Calling: Tools don't return raw results anymore. They execute in a code environment, so the agent can inspect output, transform it, chain operations—all without leaving context.
Memory Tool: Not embeddings. Not vector databases. Just files. Markdown documents stored in the file system, read and updated as needed. Simple. Searchable. Manageable.
The principle extends to Claude Code. Instead of complex vector retrieval, read sections of files on demand. Update a memory.md when something matters. Let the agent grep, grep, find. It works.
What This Enables
Before progressive disclosure, agent tasks had to be small and contained. You watched token limits. You minimized tool use. You feared the context reset.
Now:
- Multi-hour workflows without context resets
- Hundreds or thousands of tool integrations available instantly
- Complex orchestration without orchestration logic—if the system can look up tools and skills, it handles complexity
- Autonomous systems that run for extended periods
- Context is no longer the bottleneck
The Experimental MCP CLI Flag
CloudFlare and Anthropic's approach inspired an experimental feature in Claude Code: the MCP CLI flag. When enabled, instead of embedding all MCP schemas in context, the model uses tool search to discover and invoke servers on demand.
Is it perfect? Not yet. It's actively being refined. But the direction is clear: zero context cost for tool discovery. Tens of thousands of tokens saved per request.
The Convergence

What's remarkable is that CloudFlare, Anthropic, Cursor, and others arrived here independently. No coordination. Same conclusion: tools as files, loaded on demand, bash is all you need.
This wasn't what anyone predicted six months ago. It's counterintuitive. Most of us assumed you'd load everything up front. But the data is overwhelming.
The industry is converging on the same answer: progressive disclosure works.
Build Boldly
If you've been cautious about Claude Code's scope because of context limits, stop. The bottleneck just moved. File systems, bash, and progressive disclosure unlock agents that can tackle ambitious, complex work without the orchestration overhead that held us back before.
Give the agent a file system. Get out of the way. Let it discover what it needs. The results speak for themselves.
Further Reading
- CloudFlare Code Mode — How TypeScript sandboxes beat MCP schema bloat
- Anthropic Advanced Tool Use — Tool search, programmatic calling, memory tools
- Cursor's Dynamic Context Discovery — 46.9% token reduction in practice
- Claude Code Skills — Implementation guide


