TL;DR
Claude Code fast mode pricing explained: $10/$50 per MTok on Opus 4.8, the first-enable context charge, separate rate limit pools, and when 2.5x speed pays off.
Read next
Fable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-outcome math that actually decides whether the upgrade pays.
8 min readSame-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per million tokens, plus the three caveats that change the math.
10 min readEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Windsurf/Devin, and the Anthropic API - verified from live pricing pages on June 10, 2026.
9 min readLast updated: June 11, 2026
Fast mode is the rare Claude Code feature where the whole decision comes down to arithmetic. Type /fast, and the same Opus model answers up to 2.5x faster at double the per-token price. No quality change, no different weights, just a different API configuration. Anthropic is explicit in the fast mode documentation: "Fast mode is not a different model."
That makes it a pure pricing question, and the pricing has real wrinkles: a one-time full-context charge when you first enable it mid-conversation, a separate rate limit pool that never touches plan usage, and a 6x premium on older Opus models. Every number below was verified against the live docs on June 11, 2026.
Fast mode is a research preview feature in Claude Code that serves Claude Opus through a higher-speed API configuration, "up to 2.5x faster at a higher cost per token" (verified June 11, 2026, fast mode docs). The basics:
/fast in the CLI, or set "fastMode": true in user settings↯ icon appears next to the prompt while activeSince v2.1.154, fast mode defaults to Opus 4.8. On versions 2.1.142 through 2.1.153 it defaulted to Opus 4.7. That default matters more than it sounds, because the price gap between models is large.
Here is the head-to-head, with standard Opus rates from the Claude pricing page and fast mode rates from the same page plus the fast mode docs. All figures verified June 11, 2026.
| Configuration | Input / MTok | Output / MTok | Multiple of standard Opus |
|---|---|---|---|
| Opus 4.8 standard | $5 | $25 | 1x |
| Opus 4.8 fast mode | $10 | $50 | 2x |
| Opus 4.7 / 4.6 standard | $5 | $25 | 1x |
| Opus 4.7 / 4.6 fast mode | $30 | $150 | 6x |
Three things jump out of that table.
First, the Opus 4.8 deal is dramatically better. Fast mode launched on earlier Opus models at a 6x premium. The Week 22 digest announced the Opus 4.8 rate as "2x the standard rate for about 2.5x the speed," and the CHANGELOG calls it "a fraction of its previous cost" (both verified June 11, 2026). Opus 4.8 is the only configuration where the math is close.
Second, 2x price for up to 2.5x speed is favorable whenever wall-clock time is your bottleneck: the seconds cost less than the speedup delivers.
Third, the premium is flat across the full 1M token context window. The pricing page confirms it "applies across the full context window, including requests over 200k input tokens" (verified June 11, 2026). No long-context surcharge on top.
Two stacking rules from the same pricing page: prompt caching multipliers apply on top of fast mode rates (a cache read on Opus 4.8 fast mode works out to $1/MTok using the documented 0.1x read multiplier), and fast mode is not available with the Batch API. Verified June 11, 2026.
This is the billing quirk that catches people, documented on the prompt caching page (verified June 11, 2026).
Enabling fast mode adds a request header that is part of the prompt cache key. The moment you toggle it on, your next request no longer matches any cached prefix, so the API re-reads your entire conversation history as uncached input, billed at the fast mode rate.
The practical math: 400,000 tokens deep in a session, your first /fast re-processes all 400K tokens at $10/MTok uncached fast mode input, roughly $4.00 for the toggle alone. Enable it on turn one, when context is a few thousand tokens, and the same toll is around a cent. That is why the docs say "enabling fast mode from the start is cheaper."
The good news: this is genuinely a one-time cost per conversation.
/clear and /compact reset the slate, since they rebuild the cache anywayOne version caveat: keeping the header across toggles requires Claude Code v2.1.86 or later; on older versions every toggle invalidates the cache. To watch these cache dynamics live, the techniques in our token burn and cache observability guide apply directly here.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 9 min read
Jun 11, 2026 • 9 min read
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 9 min read
Fast mode billing is unusual on subscription plans. Per the fast mode docs (verified June 11, 2026), for Pro, Max, Team, and Enterprise users it is available via usage credits only. Fast mode tokens never count against your plan's included usage, and they draw from usage credits from the very first token, even if you have plan capacity remaining.
That cuts both ways. Your plan budget is untouched - a long fast mode session will not push you toward the caps we cover in the Claude Code usage limits playbook. But every fast mode token is real, metered spend. If you have ever been surprised by a usage credits line item, fast mode is a likely culprit, and a spend dashboard like the one in the CodeBurn TUI walkthrough makes it visible.
Rate limits follow the same separation. Fast mode has its own pool, distinct from standard Opus, shared across Opus 4.8, 4.7, and 4.6 fast mode. When you hit it (or exhaust your usage credits), Claude Code degrades gracefully: it falls back to standard speed and pricing, the ↯ icon turns gray, and fast mode re-enables after the cooldown. You keep working the whole time. For how the newer model tiers interact with plan limits, see our breakdown of Claude usage limits in the Fable 5 era.
The availability list is short and worth checking before you build a habit around /fast. Per the fast mode requirements, verified June 11, 2026:
/fast toggle is CLI-only todayAdmins get two extra controls: CLAUDE_CODE_DISABLE_FAST_MODE=1 kills the feature entirely, and the fastModePerSessionOptIn managed setting forces fast mode to start off in every session.
One deprecation note: fast mode on Opus 4.6 is deprecated, with removal roughly 30 days after the Opus 4.8 launch, after which it falls back to standard speed at standard pricing (verified June 11, 2026, fast mode docs).
Turn it on, at session start. This is the designed use case: rapid iteration, live debugging, tight deadlines. You read every response before acting, so model latency is your real bottleneck, and 2x token cost on short interactive turns is small in absolute dollars. Enable it on turn one so the first-enable charge is negligible.
Use it deliberately, not by default. Fast mode bills usage credits from the first token, converting "included" work into metered spend. Keep it off, then toggle it on at the start of sessions where you know you will be in a tight read-respond loop. Since it persists across sessions once enabled, run /fast to check its state if unsure - the same kind of habit we recommend in our Claude Code tips and tricks roundup.
Skip it. Long autonomous runs, batch processing, and CI pipelines are exactly the workloads the docs steer toward standard mode. Nobody is watching the terminal, so latency is free, and output tokens are where agentic spend concentrates - at $50/MTok instead of $25/MTok on Opus 4.8, an overnight run doubles in cost for zero perceived benefit. Fast mode is also unavailable in the Batch API, which is the actual discount lever (50% off) for asynchronous work.
Enable it with guardrails. Fast mode is off by default for your org. If your engineers do real interactive work, enabling it plus setting fastModePerSessionOptIn: true gives them speed on request while preventing forgotten always-on fast mode across dozens of concurrent sessions.
Nothing to decide. Fast mode is not available on those platforms today. The only current path is first-party access through the Anthropic API or a subscription plan.
An honest accounting of where fast mode is the wrong call:
The 30-second version: fast mode on Opus 4.8 at $10/$50 is a fair trade for humans in the loop, enabled at session start. It is a bad trade for agents, for late-session toggles, and for anything still pointed at an older Opus model.
On Opus 4.8, fast mode costs $10 per million input tokens and $50 per million output tokens, 2x the standard Opus 4.8 rate of $5/$25. On Opus 4.7 and 4.6, it costs $30/$150, a 6x premium. Pricing is flat across the full 1M context window. Verified June 11, 2026 against the fast mode docs and the Claude pricing page.
No. On subscription plans, fast mode tokens bill to usage credits from the first token and never count against included plan usage. Usage credits must be turned on, and Team or Enterprise organizations also need an admin to enable fast mode.
Fast mode adds a request header that is part of the prompt cache key, so the first fast mode request re-reads your entire conversation as uncached input at fast mode rates. The charge applies once per conversation: later toggles, rate limit fallbacks, and re-enables keep the cache (on Claude Code v2.1.86+). Enabling it at session start, when context is small, makes the charge negligible.
No. Fast mode is only available through the Anthropic API (Console) and Claude subscription plans. It is not on Amazon Bedrock, Google Vertex AI, Microsoft Azure Foundry, or Claude Platform on AWS, and the /fast toggle is not supported in the VS Code extension.
The same model. Anthropic documents fast mode as Claude Opus with a different API configuration that prioritizes speed over cost efficiency, with identical quality and capabilities. It runs on Opus 4.8, 4.7, and 4.6 only.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Interactive TUI dashboard that shows exactly where your Claude Code and Cursor tokens are going, in real time.
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolHigh-performance code editor built in Rust with native AI integration. Sub-millisecond input latency. Built-in assistant...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppTurn a one-liner into a working Claude Code skill. From idea to installed in a minute.
View AppUnlock pro skills and share private collections with your team.
View App2.5x faster Opus at a higher token cost (research preview).
Claude CodeBackground context summarization when the window starts filling up.
Claude CodeHybrid mode: Opus for planning, Sonnet for execution.
Claude Code
Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Composio: Connect AI Agents to 1,000+ Apps via CLI (Gmail, Google Docs/Sheets, Hacker News Workflows) Check out Composio here: http://dashboard.composio.dev/?utm_source=Youtube&utm_channel=0426&utm_...

Anthropic has released Channels for Claude Code, enabling external events (CI alerts, production errors, PR comments, Discord/Telegram messages, webhooks, cron jobs, logs, and monitoring signals) to b...
Same-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per milli...
Claude Code parallel agents cost real money because every session draws from one quota - here is the June 2026 budgeting...
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...
Anthropic broke its own naming ladder when it introduced the Mythos class and Claude Fable 5. Here is what the shift mea...
Anthropic gave subscribers two weeks of free Fable 5 access, then it moves to usage credits. Here's what's actually chan...
Fable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-o...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.