Claude Code Fast Mode: When 2.5x Speed Is Worth 2x Price

Official Sources#

Resource	Link
Fast Mode Documentation	code.claude.com/docs/en/fast-mode
Claude Pricing	platform.claude.com/docs/en/about-claude/pricing
Claude Code Overview	code.claude.com/docs/en/overview
Prompt Caching	code.claude.com/docs/en/prompt-caching
Claude Code Changelog	github.com/anthropics/claude-code/blob/main/CHANGELOG.md

Last updated: June 11, 2026

Fast mode is the rare Claude Code feature where the whole decision comes down to arithmetic. Type /fast, and the same Opus model answers up to 2.5x faster at double the per-token price. No quality change, no different weights, just a different API configuration. Anthropic is explicit in the fast mode documentation: "Fast mode is not a different model."

That makes it a pure pricing question, and the pricing has real wrinkles: a one-time full-context charge when you first enable it mid-conversation, a separate rate limit pool that never touches plan usage, and a 6x premium on older Opus models. Every number below was verified against the live docs on June 11, 2026.

What Fast Mode Actually Is#

Fast mode is a research preview feature in Claude Code that serves Claude Opus through a higher-speed API configuration, "up to 2.5x faster at a higher cost per token" (verified June 11, 2026, fast mode docs). The basics:

Toggle it with /fast in the CLI, or set "fastMode": true in user settings
A ↯ icon appears next to the prompt while active
It requires Claude Code v2.1.36 or later and is not supported in the VS Code extension
It works on Opus 4.8, 4.7, and 4.6 only - not Sonnet, Haiku, or Fable 5
Enabling it from a non-Opus model switches you to Opus automatically

Since v2.1.154, fast mode defaults to Opus 4.8. On versions 2.1.142 through 2.1.153 it defaulted to Opus 4.7. That default matters more than it sounds, because the price gap between models is large.

Fast Mode Pricing: The Numbers#

Here is the head-to-head, with standard Opus rates from the Claude pricing page and fast mode rates from the same page plus the fast mode docs. All figures verified June 11, 2026.

Configuration	Input / MTok	Output / MTok	Multiple of standard Opus
Opus 4.8 standard	$5	$25	1x
Opus 4.8 fast mode	$10	$50	2x
Opus 4.7 / 4.6 standard	$5	$25	1x
Opus 4.7 / 4.6 fast mode	$30	$150	6x

Three things jump out of that table.

First, the Opus 4.8 deal is dramatically better. Fast mode launched on earlier Opus models at a 6x premium. The Week 22 digest announced the Opus 4.8 rate as "2x the standard rate for about 2.5x the speed," and the CHANGELOG calls it "a fraction of its previous cost" (both verified June 11, 2026). Opus 4.8 is the only configuration where the math is close.

Second, 2x price for up to 2.5x speed is favorable whenever wall-clock time is your bottleneck: the seconds cost less than the speedup delivers.

Third, the premium is flat across the full 1M token context window. The pricing page confirms it "applies across the full context window, including requests over 200k input tokens" (verified June 11, 2026). No long-context surcharge on top.

Two stacking rules from the same pricing page: prompt caching multipliers apply on top of fast mode rates (a cache read on Opus 4.8 fast mode works out to $1/MTok using the documented 0.1x read multiplier), and fast mode is not available with the Batch API. Verified June 11, 2026.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Claude Code Routines vs Managed Agents Schedules: Where Recurring Agent Work Should Live

Jun 11, 2026 • 9 min read

Subagents vs Agent Teams vs Workflows: Claude Code's Parallelism Primitives, Compared

Jun 11, 2026 • 9 min read

Claude Fable 5 vs Gemini 3.1 Pro: The June 2026 Frontier Comparison

Jun 11, 2026 • 8 min read

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Jun 11, 2026 • 8 min read

The First-Enable Charge: Why Session Start Matters#

This is the billing quirk that catches people, documented on the prompt caching page (verified June 11, 2026).

Enabling fast mode adds a request header that is part of the prompt cache key. The moment you toggle it on, your next request no longer matches any cached prefix, so the API re-reads your entire conversation history as uncached input, billed at the fast mode rate.

The practical math: 400,000 tokens deep in a session, your first /fast re-processes all 400K tokens at $10/MTok uncached fast mode input, roughly $4.00 for the toggle alone. Enable it on turn one, when context is a few thousand tokens, and the same toll is around a cent. That is why the docs say "enabling fast mode from the start is cheaper."

The good news: this is genuinely a one-time cost per conversation.

After the first fast mode turn, Claude Code keeps sending the header and varies only the speed setting, which is not part of the cache key
Toggling fast mode off and back on later does not repeat the charge
The automatic fallback to standard speed after a rate limit also keeps the cache intact
/clear and /compact reset the slate, since they rebuild the cache anyway

One version caveat: keeping the header across toggles requires Claude Code v2.1.86 or later; on older versions every toggle invalidates the cache. To watch these cache dynamics live, the techniques in our token burn and cache observability guide apply directly here.

Separate Limits, Separate Money#

Fast mode billing is unusual on subscription plans. Per the fast mode docs (verified June 11, 2026), for Pro, Max, Team, and Enterprise users it is available via usage credits only. Fast mode tokens never count against your plan's included usage, and they draw from usage credits from the very first token, even if you have plan capacity remaining.

That cuts both ways. Your plan budget is untouched - a long fast mode session will not push you toward the caps we cover in the Claude Code usage limits playbook. But every fast mode token is real, metered spend. If you have ever been surprised by a usage credits line item, fast mode is a likely culprit, and a spend dashboard like the one in the CodeBurn TUI walkthrough makes it visible.

Rate limits follow the same separation. Fast mode has its own pool, distinct from standard Opus, shared across Opus 4.8, 4.7, and 4.6 fast mode. When you hit it (or exhaust your usage credits), Claude Code degrades gracefully: it falls back to standard speed and pricing, the ↯ icon turns gray, and fast mode re-enables after the cooldown. You keep working the whole time. For how the newer model tiers interact with plan limits, see our breakdown of Claude usage limits in the Fable 5 era.

Where Fast Mode Is Not Available#

The availability list is short and worth checking before you build a habit around /fast. Per the fast mode requirements, verified June 11, 2026:

Available: Anthropic API (Console) accounts, and subscription plans (Pro/Max/Team/Enterprise) with usage credits turned on
Not available: Amazon Bedrock, Google Vertex AI, Microsoft Azure Foundry, and Claude Platform on AWS
Not in the VS Code extension: the /fast toggle is CLI-only today
Not in Managed Agents: the pricing page lists the fast mode premium as inapplicable there
Off by default for Team and Enterprise: until an admin enables it, users see "Fast mode has been disabled by your organization"

Admins get two extra controls: CLAUDE_CODE_DISABLE_FAST_MODE=1 kills the feature entirely, and the fastModePerSessionOptIn managed setting forces fast mode to start off in every session.

One deprecation note: fast mode on Opus 4.6 is deprecated, with removal roughly 30 days after the Opus 4.8 launch, after which it falls back to standard speed at standard pricing (verified June 11, 2026, fast mode docs).

Decision Guide by Persona#

The interactive debugger#

Turn it on, at session start. This is the designed use case: rapid iteration, live debugging, tight deadlines. You read every response before acting, so model latency is your real bottleneck, and 2x token cost on short interactive turns is small in absolute dollars. Enable it on turn one so the first-enable charge is negligible.

The Pro or Max subscriber watching the budget#

Use it deliberately, not by default. Fast mode bills usage credits from the first token, converting "included" work into metered spend. Keep it off, then toggle it on at the start of sessions where you know you will be in a tight read-respond loop. Since it persists across sessions once enabled, run /fast to check its state if unsure - the same kind of habit we recommend in our Claude Code tips and tricks roundup.

The agent and automation builder#

Skip it. Long autonomous runs, batch processing, and CI pipelines are exactly the workloads the docs steer toward standard mode. Nobody is watching the terminal, so latency is free, and output tokens are where agentic spend concentrates - at $50/MTok instead of $25/MTok on Opus 4.8, an overnight run doubles in cost for zero perceived benefit. Fast mode is also unavailable in the Batch API, which is the actual discount lever (50% off) for asynchronous work.

The Team or Enterprise admin#

Enable it with guardrails. Fast mode is off by default for your org. If your engineers do real interactive work, enabling it plus setting fastModePerSessionOptIn: true gives them speed on request while preventing forgotten always-on fast mode across dozens of concurrent sessions.

The Bedrock, Vertex, or Foundry user#

Nothing to decide. Fast mode is not available on those platforms today. The only current path is first-party access through the Anthropic API or a subscription plan.

When to Skip It Entirely#

An honest accounting of where fast mode is the wrong call:

You are deep in a long session. The first-enable charge bills your whole context at uncached fast mode rates. At 500K tokens of context, that is about $5 before the speedup does anything. Wait for the next session.
You are still on Opus 4.7 or 4.6. The 6x premium ($30/$150) is hard to justify when switching to Opus 4.8 gets the same speedup at 2x. Move models first.
Your bottleneck is not the model. If turns are dominated by test suites, builds, or tool calls, a faster model changes little. Lowering the effort level is the cheaper speed lever, and the docs note you can combine fast mode with lower effort when you genuinely need maximum speed.
Cost predictability matters more than minutes. Research preview pricing is explicitly subject to change, and usage credits are open-ended metered spend. On a fixed monthly budget, standard mode inside plan limits is the calmer choice.

The 30-second version: fast mode on Opus 4.8 at $10/$50 is a fair trade for humans in the loop, enabled at session start. It is a bad trade for agents, for late-session toggles, and for anything still pointed at an older Opus model.

FAQ#

How much does Claude Code fast mode cost?#

On Opus 4.8, fast mode costs $10 per million input tokens and $50 per million output tokens, 2x the standard Opus 4.8 rate of $5/$25. On Opus 4.7 and 4.6, it costs $30/$150, a 6x premium. Pricing is flat across the full 1M context window. Verified June 11, 2026 against the fast mode docs and the Claude pricing page.

Does fast mode count against my Pro or Max plan limits?#

No. On subscription plans, fast mode tokens bill to usage credits from the first token and never count against included plan usage. Usage credits must be turned on, and Team or Enterprise organizations also need an admin to enable fast mode.

Why is enabling fast mode mid-session more expensive?#

Fast mode adds a request header that is part of the prompt cache key, so the first fast mode request re-reads your entire conversation as uncached input at fast mode rates. The charge applies once per conversation: later toggles, rate limit fallbacks, and re-enables keep the cache (on Claude Code v2.1.86+). Enabling it at session start, when context is small, makes the charge negligible.

Is fast mode available on Amazon Bedrock or Google Vertex AI?#

No. Fast mode is only available through the Anthropic API (Console) and Claude subscription plans. It is not on Amazon Bedrock, Google Vertex AI, Microsoft Azure Foundry, or Claude Platform on AWS, and the /fast toggle is not supported in the VS Code extension.

Is fast mode a faster model or the same model?#

The same model. Anthropic documents fast mode as Claude Opus with a different API configuration that prioritizes speed over cost efficiency, with identical quality and capabilities. It runs on Opus 4.8, 4.7, and 4.6 only.

Sources#

Claude Code fast mode documentation - accessed June 11, 2026
Claude Code What's New, Week 22 (May 25-29, 2026) - accessed June 11, 2026
Claude API pricing reference - accessed June 11, 2026
Claude Code CHANGELOG - accessed June 11, 2026
How Claude Code uses prompt caching - accessed June 11, 2026

Official Sources#

Resource	Link
Fast Mode Documentation	code.claude.com/docs/en/fast-mode
Claude Pricing	platform.claude.com/docs/en/about-claude/pricing
Claude Code Overview	code.claude.com/docs/en/overview
Prompt Caching	code.claude.com/docs/en/prompt-caching
Claude Code Changelog	github.com/anthropics/claude-code/blob/main/CHANGELOG.md

Last updated: June 11, 2026

What Fast Mode Actually Is#

Toggle it with /fast in the CLI, or set "fastMode": true in user settings
A ↯ icon appears next to the prompt while active
It requires Claude Code v2.1.36 or later and is not supported in the VS Code extension
It works on Opus 4.8, 4.7, and 4.6 only - not Sonnet, Haiku, or Fable 5
Enabling it from a non-Opus model switches you to Opus automatically

Since v2.1.154, fast mode defaults to Opus 4.8. On versions 2.1.142 through 2.1.153 it defaulted to Opus 4.7. That default matters more than it sounds, because the price gap between models is large.

Fast Mode Pricing: The Numbers#

Here is the head-to-head, with standard Opus rates from the Claude pricing page and fast mode rates from the same page plus the fast mode docs. All figures verified June 11, 2026.

Configuration	Input / MTok	Output / MTok	Multiple of standard Opus
Opus 4.8 standard	$5	$25	1x
Opus 4.8 fast mode	$10	$50	2x
Opus 4.7 / 4.6 standard	$5	$25	1x
Opus 4.7 / 4.6 fast mode	$30	$150	6x

Three things jump out of that table.

Second, 2x price for up to 2.5x speed is favorable whenever wall-clock time is your bottleneck: the seconds cost less than the speedup delivers.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Claude Code Routines vs Managed Agents Schedules: Where Recurring Agent Work Should Live

Jun 11, 2026 • 9 min read

Subagents vs Agent Teams vs Workflows: Claude Code's Parallelism Primitives, Compared

Jun 11, 2026 • 9 min read

Claude Fable 5 vs Gemini 3.1 Pro: The June 2026 Frontier Comparison

Jun 11, 2026 • 8 min read

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Jun 11, 2026 • 8 min read

The First-Enable Charge: Why Session Start Matters#

This is the billing quirk that catches people, documented on the prompt caching page (verified June 11, 2026).

The good news: this is genuinely a one-time cost per conversation.

After the first fast mode turn, Claude Code keeps sending the header and varies only the speed setting, which is not part of the cache key
Toggling fast mode off and back on later does not repeat the charge
The automatic fallback to standard speed after a rate limit also keeps the cache intact
/clear and /compact reset the slate, since they rebuild the cache anyway

Separate Limits, Separate Money#

Where Fast Mode Is Not Available#

The availability list is short and worth checking before you build a habit around /fast. Per the fast mode requirements, verified June 11, 2026:

Available: Anthropic API (Console) accounts, and subscription plans (Pro/Max/Team/Enterprise) with usage credits turned on
Not available: Amazon Bedrock, Google Vertex AI, Microsoft Azure Foundry, and Claude Platform on AWS
Not in the VS Code extension: the /fast toggle is CLI-only today
Not in Managed Agents: the pricing page lists the fast mode premium as inapplicable there
Off by default for Team and Enterprise: until an admin enables it, users see "Fast mode has been disabled by your organization"

Admins get two extra controls: CLAUDE_CODE_DISABLE_FAST_MODE=1 kills the feature entirely, and the fastModePerSessionOptIn managed setting forces fast mode to start off in every session.