
TL;DR
One expensive orchestrator plus many cheap workers beats an all-frontier fleet for most workloads. Here is the decision-intent cost math with verified Fable 5, Sonnet 5, and Opus 4.8 prices, plus the Sonnet 5 tokenizer caveat that changes worker cost.
Part 2 of the Fable 5 agent fleets series. It builds on Part 1, Orchestrating a Fleet of Agents with Fable 5, and the series origin, Fable 5 Is Back.
The manager-model pattern from Part 1 has an obvious objection: Fable 5 is expensive. At $10 per million input tokens and $50 per million output, running your whole fleet on it would be brutal. But that is not the pattern. The pattern is one expensive orchestrator and many cheap workers, and once you do the arithmetic it beats an all-frontier fleet for most workloads. This post works the numbers.
A note on the numbers: every dollar figure below is an illustrative estimate built from published per-token prices and made-up but plausible token counts. The point is the shape of the math, not a quote for your workload. Your real costs depend on your prompts, your caching, and how much your workers actually read and write. For the per-tool subscription side of the budget, the AI coding tools pricing comparison is the companion reference.
All prices are per 1M tokens, input / output:
claude-fable-5): $10 / $50. Anthropic's most capable widely released model, the orchestrator in this pattern. See the launch post.claude-sonnet-5): $2 / $10 introductory, through August 31, 2026, then $3 / $15. Anthropic calls it its "most agentic Sonnet yet," near Opus 4.8 on agentic and coding tasks. See the Sonnet 5 announcement.The spread is the whole story. Sonnet 5 output is one-fifth the price of Fable 5 output at the intro rate. When most of your fleet's token volume is worker output - and in a fan-out of code or content, it is - moving that volume to Sonnet 5 is where the savings live.
Before the arithmetic, one catch that is easy to miss. Sonnet 5 ships with a new tokenizer that produces roughly 30% more tokens for the same text (see the what's new page). That means a naive per-token price comparison understates Sonnet 5's real cost, because the same work consumes about 30% more billable tokens.
Fold that in and the effective intro output rate is not $10 per "unit of text equivalent to a million old tokens" but closer to $13 once you account for the token inflation. Sonnet 5 is still far cheaper than Fable 5 as a worker. But the tokenizer change narrows the gap, and if you benchmarked worker cost on an older Sonnet's tokenizer, your estimate is low. Re-measure on real Sonnet 5 outputs rather than trusting an old ratio.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jul 1, 2026 • 7 min read
Jul 1, 2026 • 8 min read
Jul 1, 2026 • 6 min read
Jul 1, 2026 • 8 min read
Take a concrete, made-up job: an orchestrator plans a refactor and fans it out to 10 worker tasks, each editing one module. All token counts below are invented for illustration.
Orchestrator (Fable 5). Say it reads a 200K-token slice of the repo plus spec, and across planning, dispatching, and verifying 10 results it produces 60K output tokens.
Workers (Sonnet 5, intro rate). Say each worker reads 40K tokens of context and writes a 15K-token diff plus reasoning. Apply the ~30% tokenizer inflation to both sides, so 40K becomes ~52K input and 15K becomes ~19.5K output.
Fleet total: about $8.00, split roughly $5 orchestrator and $3 workers.
Now price the same job as an all-Fable-5 fleet. The orchestrator cost is unchanged at $5. But each worker's 40K in / 15K out on Fable 5 (no tokenizer inflation, since that is a Sonnet 5 property) is:
All-Fable-5 total: about $16.50. Same orchestrator, roughly 4x the worker cost, about double the total. The split fleet does the same job for around half the money, and the workers are doing bounded, well-specified tasks where Sonnet 5's near-Opus agentic quality is enough.
That is the core result. As the worker count grows, the gap widens, because worker volume dominates and that is exactly the volume you moved to the cheaper model.
Sonnet 5 is the default worker, but not every slice is equal. Promote a worker to Opus 4.8 ($5 / $25) when the task carries more risk than a routine edit:
Opus 4.8 output at $25 is 2.5x Sonnet 5's intro rate but half of Fable 5's, so it is the sensible middle tier for the handful of slices that need more than a default worker but do not justify the orchestrator's model.
Sometimes the split fleet is the wrong tool and you should just run Fable 5 for the whole thing. That is the right call when the task is long-horizon and hard to decompose cleanly - the exact profile where Anthropic reports Fable 5's lead is largest, and where its vendor-reported results cluster: a codebase-wide migration across a 50M-line Ruby codebase in about a day at Stripe, top scores on Cognition's FrontierCode and Cursor's CursorBench, and outsized gains from file-based memory on long-running tasks (all vendor and partner reported, from the launch post).
The trade is real. If a job cannot be split into independent slices without the slices needing to know about each other constantly, the coordination overhead of a fleet eats the savings, and a single Fable 5 run holding the whole problem in its 1M context can be both cheaper and better. The heuristic: if you can write clean, independent worker specs, run the split fleet. If every slice bleeds into every other, run Fable 5 end to end and pay for the capability.
For most workloads with decomposable work, one Fable 5 orchestrator plus a fleet of Sonnet 5 workers is the cost-quality sweet spot, with Opus 4.8 as the promotion tier for risky slices. Reserve all-Fable-5 for the long-horizon, hard-to-split jobs where its lead is worth the premium. Run the arithmetic on your own token counts before committing - the shape holds, but the exact break-even depends on how much your workers read and write.
Rarely for decomposable work. If your tasks split into clean, independent slices, running every worker on Fable 5 roughly doubles total cost for the same output versus Sonnet 5 workers, because worker volume dominates and Sonnet 5 is near Opus 4.8 on agentic tasks. All-frontier makes sense for a single long-horizon job that cannot be split cleanly, where one Fable 5 run holding the whole problem beats the coordination overhead of a fleet.
Sonnet 5's new tokenizer produces roughly 30% more tokens for the same text, so the same work bills about 30% more tokens on both input and output. A naive per-token price comparison understates its real cost. Sonnet 5 is still far cheaper than Fable 5 as a worker, but re-measure worker cost on actual Sonnet 5 outputs rather than trusting a ratio from an older tokenizer.
When the slice is on a critical path, needs deeper reasoning than a routine edit, or keeps failing your verify loop. Opus 4.8 output at $25 per million is 2.5x Sonnet 5's intro rate but half of Fable 5's, making it the sensible middle tier for the few slices that need more than a default worker but do not justify the orchestrator's model.
Per million tokens, input / output: Fable 5 is $10 / $50, Opus 4.8 is $5 / $25, and Sonnet 5 is $2 / $10 introductory through August 31, 2026, then $3 / $15. All figures are Anthropic's published rates as of July 1, 2026. Confirm current pricing on Anthropic's model pages before budgeting.
Read next
Standing up a fleet of Fable 5 agents is the easy part. This is the operations layer - data retention rules, refusal-rate alerting, effort tuning, observability, and availability planning - that keeps the fleet running.
8 min readA practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.
10 min readFable 5 changes multi-agent orchestration because the orchestrator can now hold the whole project in one head. Here is the manager-model pattern: a 1M-context frontier model leading, delegating scoped work to cheaper workers, and verifying results.
8 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View ToolMulti-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolSpawn isolated workers with independent context windows.
Claude CodeConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI Agents
Build Anything with Vercel, the Agentic Infrastructure Stack Check out Vercel: https://vercel.plug.dev/cwBLgfW The video shows a behind-the-scenes walkthrough of how the creator rapidly builds and d

Anthropic Suspends Fable 5 & Mythos 5 After US Export Control Directive (Jailbreak Concerns) Anthropic announced that the US government issued export control directives requiring it to suspend Fable

Claude Fable 5 Released: Benchmarks, Pricing, Availability, and Real-World Examples Anthropic has released Claude Fable 5, the first general-use “Mythos class” model, and the video reviews the announ

Fable 5 changes multi-agent orchestration because the orchestrator can now hold the whole project in one head. Here is t...

Standing up a fleet of Fable 5 agents is the easy part. This is the operations layer - data retention rules, refusal-rat...

Vercel's eve gives you the agent plumbing - durable sessions, sandboxed code execution, approvals, subagents - as a fold...

The orchestrator is the most important model choice in an agent fleet. A fair head-to-head between Fable 5 and Opus 4.8...

Fable 5 refusals come back as a 200 response, not an error. At fleet scale, that quietly corrupts entire runs. Here is h...

1M context, 128K output, a memory tool, compaction, and task budgets change what a single agent run can cover. Here is w...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.