Fast Mode - Claude Code
2.5x faster Opus at a higher token cost (research preview).
Fast mode runs Opus with an accelerated inference path - roughly 2.5x the throughput at a higher per-token price.
What it does
When fast mode is enabled, Claude Code routes Opus calls through a lower-latency backend. You pay more per token, but turns complete faster. Quality matches standard Opus. It's a straight speed-for-cost tradeoff for sessions where wall-clock time matters more than spend.
When to use it
- Interactive work where latency hurts flow.
- Pair-programming sessions where Claude needs to keep up with you.
- Time-critical debugging or incident response.
- Demos and recordings where dead air looks bad.
Gotchas
- Fast mode is a research preview. Availability and pricing can change.
- Cost can balloon on long sessions. Watch your
/statusregularly. - Only Opus is accelerated. Sonnet and Haiku ignore the flag.
Official docs: https://code.claude.com/docs/en/fast-mode.md
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
Was this helpful?




