TL;DR
Anthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is what that does to per-request cost, max_tokens, and cross-model comparisons.
Read next
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
7 min readTask budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering the limit when max_tokens truncates it. Here is how the beta works on Fable 5, what it does not enforce, and where it fits next to effort and the Usage API.
8 min readFable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level, and how effort drives your token bill.
10 min readLast updated: June 11, 2026
There is a line in Anthropic's pricing docs that matters more to your invoice than any headline per-token rate: "Opus 4.7 and later use a new tokenizer compared to previous models, contributing to their improved performance on a wide range of tasks. This new tokenizer may use up to 35% more tokens for the same fixed text."
The sticker price of Opus 4.8 is identical to Opus 4.6 - $5 per million input tokens, $25 per million output tokens. But a million tokens is no longer the same amount of text. If you migrated from Opus 4.6 to Opus 4.8 and your bill crept up while traffic stayed flat, this is probably why. And if you are comparing Claude Fable 5's $10/$50 rate against anything on the older tokenizer, raw per-MTok numbers understate the real gap.
Three official pages describe the change, all fetched and verified on June 11, 2026.
The pricing page gives the ceiling: the new tokenizer "may use up to 35% more tokens for the same fixed text."
The migration guide gives the range, under "Updated token counting" in the Opus 4.7 section: "The new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content)." It also tells you to "update your max_tokens parameters to give additional headroom, including compaction triggers."
The token counting page gives the typical figure: "Claude Fable 5 and Claude Mythos 5 use the tokenizer introduced with Claude Opus 4.7, which produces roughly 30% more tokens than models before Claude Opus 4.7 for the same text."
So the lineage is clean:
The models overview quietly encodes the ratio in its context window tooltips: 1M tokens is listed as roughly 555k words on Opus 4.8 and 4.7, versus roughly 750k words on Sonnet 4.6 and Opus 4.6. Divide those and you get 1.35x - exactly the stated ceiling.
One more wrinkle from the migration guide: high-resolution images on Opus 4.7 and later can use "up to approximately 3x more image tokens per full-resolution image" - vision pipelines get a separate, larger multiplier.
Think in effective cost per unit of text, not per token. Illustrative arithmetic using the documented range:
Per request. A prompt that counts 100,000 tokens on Opus 4.6 counts roughly 130,000 on Opus 4.8 at the typical 30% figure, up to 135,000 at the ceiling. At $5/MTok input, the request goes from $0.50 to $0.65-$0.675. Same price, same text, more billed input.
Per month. 200M input tokens measured on Opus 4.6 ($1,000) becomes roughly 260M on the new tokenizer (about $1,300) without a single extra request.
Compounding into Fable 5. Moving from Opus 4.6 directly to Fable 5 stacks two multipliers: 2x on price ($5 to $10 input) and roughly 1.3x on token count, so effective input cost for the same text lands around 2.6x, not 2x. The Fable 5 cost-per-task analysis covers the other side of that ledger - whether fewer retries claw the premium back.
Moving from Opus 4.8 to Fable 5. Here the tokenizer is a non-event: both count the same way, so the comparison really is the clean 2x that the Fable 5 vs Opus 4.8 guide works from.
One honest caveat: "same fixed text" applies cleanly to input. Output cost depends on both the tokenizer and how verbose the model chooses to be, and the migration guide only commits to "token efficiency can vary by workload shape." Measure output on your traffic rather than applying a blanket 1.3x.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 9 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 10 min read
The token counting endpoint returns counts under the tokenizer of whatever model you pass. So the docs' recommended workflow is to count the same request twice and compare. The endpoint is free, and it has its own rate limit pool (100 requests per minute on tier 1, up to 8,000 on tier 4) separate from message creation.
import anthropic
client = anthropic.Anthropic()
SYSTEM = open("prompts/system_prompt.txt").read()
SAMPLE = open("prompts/representative_user_turn.txt").read()
def count(model: str) -> int:
return client.messages.count_tokens(
model=model,
system=SYSTEM,
messages=[{"role": "user", "content": SAMPLE}],
).input_tokens
before = count("claude-opus-4-6") # prior tokenizer
after = count("claude-fable-5") # tokenizer introduced with Opus 4.7
print(f"old: {before:,} new: {after:,} ratio: {after / before:.3f}")
Run this over a representative sample of real prompts: the ratio "varies by content and workload shape" per the migration guide, and different content mixes - code, JSON, non-English text - can land at different points in the 1.0x to 1.35x range. Whatever ratio your workload produces belongs in your cost model - not 30%.
Two caveats from the docs: counts are estimates that "may differ by a small amount" from the billed figure, and they can include system-added tokens you are not billed for. Close enough for budgeting, not a substitute for checking usage on real responses.
The migration guide's checklist says to "re-benchmark end-to-end cost and latency under the updated tokenization" and "re-tune max_tokens." In practice, five audits:
max_tokens caps output in new-tokenizer tokens. A structured response that comfortably fit in 3,000 tokens on Opus 4.6 can need closer to 4,000 tokens of room for the same text, so tightly sized caps start truncating with stop_reason: "max_tokens" on responses that used to complete. Give the parameter 35% headroom as a starting point, then tune from measurements.
The 1M window holds about 555k words on the new tokenizer versus about 750k before, per the models overview tooltips. Logic that decides "this document fits" from character or word counts calibrated on the old tokenizer is now optimistic by up to a third, and the migration guide explicitly calls out compaction triggers as something to re-tune for the same reason.
Any code that estimates tokens from character counts, or any cached token measurement from before the migration, is stale. The migration guide is direct: re-test "any code path that estimates tokens client-side or assumes a fixed token-to-character ratio." That includes cost dashboards that multiply estimated tokens by a rate card.
Token-per-minute rate limits are consumed in billed tokens, so the same traffic eats up to 35% more of your TPM allowance and throughput ceilings arrive sooner than your old capacity math predicted. Budget alerts keyed to monthly token totals need the same adjustment or they fire late. The production cost modeling guide goes deeper on building these from usage data.
The multipliers that soften all of this are unchanged on the current pricing page: cache writes at 1.25x base input (5-minute TTL) or 2x (1-hour), cache reads at 0.1x, and a flat 50% Batch API discount. More tokens makes caching more valuable, not less - a cached prefix that got 30% heavier still reads back at a tenth of the input rate. The Fable 5 prompt caching economics breakdown runs those numbers, and the docs point to effort and the beta task_budget as further spend controls, with the caveat that both can trade off capability.
This is the part most pricing tables silently get wrong. Comparing $/MTok across models only works when a token means the same thing on both sides, and within Anthropic's own lineup it now does not. Sonnet 4.6 at $3/MTok input and Opus 4.8 at $5/MTok differ by more than the sticker 1.67x for identical text, because Opus 4.8 counts that text into more tokens. Against an old Sonnet-measured baseline, Opus 4.8 input behaves more like $6.50-$6.75 per old-baseline MTok at the documented 30% typical and 35% ceiling figures, and Fable 5 like $13-$13.50.
The same correction applies against other providers, since every vendor tokenizes differently and per-MTok tables hide that. The honest comparison unit is cost per request or per completed task, measured on your own workload - our June 2026 frontier API pricing roundup carries this exact caveat, and the Fable 5 migration guide treats re-baselining as a first-class migration step.
Why the change exists: the docs frame the new tokenizer as "contributing to their improved performance on a wide range of tasks," and the full 1M window bills at standard per-token rates with no long-context premium - true on the new-tokenizer models just as on Opus 4.6 and Sonnet 4.6. You are paying more tokens for the same text, on models that do more per request. Whether that nets out is what the measurement workflow above answers.
Yes. Anthropic's pricing docs say Opus 4.7 and later use a new tokenizer that "may use up to 35% more tokens for the same fixed text," and the migration guide gives the range as roughly 1x to 1.35x depending on content. Identical prompts bill more input tokens on 4.8 even though the per-MTok price is the same $5/$25.
No. Per the migration guide, Fable 5 and Opus 4.8 token counts are "roughly unchanged because both models use the same tokenizer" - the one introduced with Opus 4.7. The roughly 30% increase only applies against models from before Opus 4.7, such as Opus 4.6, Sonnet 4.6, or Haiku 4.5.
Count the same request twice with the free /v1/messages/count_tokens endpoint: once with your current model ID, once with claude-fable-5 or claude-opus-4-8. The endpoint counts under the tokenizer of the model you pass, so the ratio between the two input_tokens values is your workload's actual multiplier. Run it across a representative sample, since the ratio varies by content.
Per-token rates did not change - Opus 4.8 costs the same $5/$25 per MTok as Opus 4.6, and cache and batch multipliers are unchanged. But because the same text tokenizes into more tokens on Opus 4.7 and later, effective cost per unit of fixed input text rose by up to 35%. The 1M context window bills at standard pricing with no long-context premium, as it already did on Opus 4.6 and Sonnet 4.6.
Coming from Opus 4.6 or earlier, yes - the migration guide says to "update your max_tokens parameters to give additional headroom, including compaction triggers," because output that fit the old cap can truncate with stop_reason: "max_tokens" under the new tokenizer. A 35% bump is a reasonable starting point. Moving between Opus 4.7, 4.8, and Fable 5 needs no change, since they share a tokenizer.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Unified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolAnthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x...
View ToolAnthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...
View ToolAnthropic's recommended default for complex work, released May 28, 2026. 1M context, 128K output, $5/$25 per million tok...
View ToolSee exactly what your agent did, locally. No cloud, no signup.
View AppUnlock pro skills and share private collections with your team.
View AppKnow what each agent run cost before the bill arrives. Budgets and alerts included.
View App50+ customizable shortcuts for cancel, history, transcript, and more.
Claude CodeWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedFable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you...
Fable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the hon...
Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level...
Fable 5 long-running requests can run for many minutes per turn and hours per autonomous run. Here is how to configure c...
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost ma...
Task budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.