The Fable 5 Moment
18 partsTL;DR
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
Direct answer
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Read next
Anthropic shipped two names for one architecture on June 9, 2026. Here is what separates Fable 5 from Mythos 5, who can actually get unrestricted access, and what developers should do right now.
7 min readFable 5 launched June 9 at 2x GPT-5.5's price with a 22-point SWE-Bench Pro gap. Here is the decision framework for choosing between them.
7 min readFable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-outcome math that actually decides whether the upgrade pays.
8 min readAnthropic released Claude Fable 5 on June 9, 2026 - the first model from its restricted "Mythos" tier to reach general availability. It posts a benchmark lead that looks decisive on paper: 80.3% on SWE-Bench Pro versus Opus 4.8's 69.4%, a 50-million-line Ruby migration completed in a day at Stripe, and qualitative endorsements from engineers who describe it as a qualitative step change. It also costs exactly double Opus 4.8 on every token type. That gap demands a clear framework, not instinct. This post builds one from the practitioner data that is already public.
Last updated: June 10, 2026
Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. Opus 4.8 runs at $5 and $25 respectively - a clean 2x ratio across the board. On a small workload that difference is noise. On a long-horizon agentic task that generates hundreds of thousands of output tokens, the difference is real budget exposure.
The 67-100% price premium figure cited in coverage reflects the range depending on whether you factor in prompt caching. Fable 5 offers a 90% discount on cached input tokens at $1 per million (versus $0.50 for Opus 4.8 cache hits). If your agent reuses a large system prompt or shared context, aggressive caching narrows the effective gap. But it does not close it, and on output tokens - where the real spend accumulates in agentic work - there is no caching discount to lean on.
The decision rule follows directly: Fable 5 earns its price only when quality improvement translates into fewer total tokens consumed, fewer failed runs, or avoided human rework. Any other framing is rationalizing a luxury purchase.
The headline figure is SWE-Bench Pro, an industry-standard measure of autonomous software engineering on real GitHub issues. Fable 5 scores 80.3% versus Opus 4.8 at 69.4% - an 11-point gap that is meaningful but requires context before routing decisions.
SWE-Bench Pro evaluates models on self-contained repository tasks: fix a bug, implement a feature, resolve a test failure. These tasks are bounded. A model gets a repo, a failing test, and must produce a passing patch. The benchmark rewards autonomy, planning, and code correctness on problems that fit inside a single session.
What that means practically: the gap between Fable 5 and Opus 4.8 is most pronounced on tasks that require the model to explore a codebase, identify the right intervention point, and produce a complete, multi-file solution without hand-holding. On simpler, more constrained coding tasks - apply this diff, rename this function, explain this error - the gap shrinks because both models are capable and the task does not stress the planning layer where Fable 5 differentiates.
Two other benchmarks worth noting from Anthropic's launch materials: Fable 5 scored more than 2x Opus 4.8 on FrontierCode Diamond (difficult production-codebase tasks), and spatial reasoning improved from 14.5% (Opus 4.8) to 38.6% - a near-tripling that matters for code that involves layout, data structure design, or complex dependency graphs.
CodeRabbit's benchmark data gives the clearest practitioner signal on where Fable 5 earns its price. In their coding project evaluations, the model exhibited a consistent behavior: given a vague prompt, it explored the environment first, identified available files, tools, and constraints, then built from a grounded plan rather than guessing at structure. Fully underspecified prompts still produced complete projects rather than prototype shells.
The Stripe migration is the canonical enterprise example: a codebase-wide Ruby migration across 50 million lines, estimated at over two months for a full team, completed by Fable 5 in a single day. Anthropic published this in their launch materials and it has not been disputed. The task fits the Fable 5 profile exactly - large scope, multiple interdependent files, requires holding the full context of what changed to avoid breaking downstream code.
Specific task types where Fable 5 consistently outperforms based on available data:
The CodeRabbit team noted one counterintuitive finding: Fable 5 produced more thorough implementations in complex coding projects, but the same exploration drive that produces quality also produces timeouts. In their coding task benchmark, 19 of the evaluated tasks hit the agent timeout before completion. The model kept working when given ambiguous scope. That is a feature for some workloads and a cost hazard for others - which is why explicit step and token budgets are not optional when deploying Fable 5 in an agent pipeline.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 10, 2026 • 8 min read
Jun 10, 2026 • 7 min read
Jun 10, 2026 • 7 min read
Jun 10, 2026 • 7 min read
The CodeRabbit benchmarks are specific and worth quoting directly: in a 105-EP code review evaluation, Fable 5 achieved 32.8% actionable precision versus Opus 4.8's 35.5%. On full precision the gap is wider - 19.4% for Fable 5 versus 26.5% for Opus 4.8. Fable 5 also produced 253 comments versus fewer from Opus 4.8, with a larger proportion classified as nitpick or assertive style.
This combination - lower precision, higher volume - is a real operational problem for code review. More comments means more triage work for reviewers. Lower precision means a higher rate of noise that dilutes the signal. CodeRabbit's recommendation is to keep Opus 4.8 as the default code review path until Fable 5's precision improves through future tuning.
| Metric | Fable 5 | Opus 4.8 |
|---|---|---|
| SWE-Bench Pro | 80.3% | 69.4% |
| Actionable precision (code review) | 32.8% | 35.5% |
| Full precision (code review) | 19.4% | 26.5% |
| Comment volume (105-EP benchmark) | 253 | lower |
| Coding task timeouts (CodeRabbit) | 19 | not reported |
| Input price (per million tokens) | $10 | $5 |
| Output price (per million tokens) | $50 | $25 |
| Context window | 1M tokens | 200K tokens |
For interactive chat - where a developer is asking questions, iterating on code snippets, debugging with back-and-forth - Opus 4.8's lower latency profile and tighter precision are practical advantages. Fable 5 is slower to respond because it reasons more deeply before answering. That is useful when the question is hard. It is friction when the question is routine.
The framing that emerges from CodeRabbit's evaluation is precise: Fable 5 behaves like an async engineering tool, not a responsive chat assistant. It works best when you hand it a well-scoped problem with clear success criteria and let it run. It works worst when you need fast, targeted responses with high comment precision.
The 19-timeout figure from their coding benchmark is not a failure of the model - it is a signal about deployment contract. Those tasks ran until the harness cut them off because the model kept exploring. In a production agent with explicit step limits and a clear completion condition, many of those runs would have finished. Without those guardrails, they are expensive non-completions.
The practical conclusion: treat Fable 5 as you would a senior contractor doing a large, autonomous engagement. Define the deliverable clearly, set a budget, and give it room to work. Do not use it for tasks where you need a quick answer or where review noise is a cost.
The token efficiency argument for Fable 5 depends on the task profile. TrueFoundry's analysis notes that Anthropic and early customers report Fable 5 finishing tasks in fewer turns and tokens on hard problems - meaning a job at 2x the per-token rate can land closer to parity in total cost when the model's deeper planning avoids the retry loops that a less capable model accumulates.
A rough framework for the math:
The honest version: Fable 5 is cheaper than it looks on tasks where Opus 4.8 fails or requires multiple attempts. It is more expensive than it looks on tasks without clear completion conditions. Token-per-dollar analysis requires measuring actual completion rates, not just per-token sticker prices.
Use this scoring guide to route between Fable 5 and Opus 4.8. Score each attribute for the task at hand, then total.
Score +1 for Fable 5 if:
Score +1 for Opus 4.8 if:
Routing rule: If Fable 5 score exceeds Opus 4.8 score by 2 or more, use Fable 5. If tied or Opus 4.8 leads, use Opus 4.8 and revisit if quality falls short.
No. Fable 5 leads on long-horizon agentic coding - multi-file migrations, autonomous project work, underspecified tasks. On code review precision and interactive chat, Opus 4.8 currently outperforms it based on CodeRabbit's benchmarks (35.5% vs 32.8% actionable precision in review).
SWE-Bench Pro measures autonomous resolution of real GitHub issues. An 80.3% score means Fable 5 solved roughly 4 in 5 evaluated issues independently, without human steering. The 11-point gap over Opus 4.8 is largest on complex multi-step issues and narrows on simpler, self-contained tasks.
Fable 5 explores extensively when task scope is ambiguous. In CodeRabbit's coding task harness, many tasks hit the agent timeout because the model kept working without converging. This is a deployment configuration issue - not an inherent flaw - but it means Fable 5 requires explicit step limits and completion conditions to avoid runaway costs.
The base token rates are exactly 2x ($10/$50 versus $5/$25 per million tokens). Prompt caching reduces that gap on input tokens (both offer 90% cache discounts). On output tokens, where agentic work accumulates most spend, there is no caching discount and the 2x ratio holds.
Fable 5 ships with classifiers that route cybersecurity, biology, and chemistry queries to Opus 4.8. Anthropic reports this affects fewer than 5% of sessions. Rerouted requests are charged at Opus 4.8 rates, not Fable 5 rates. API customers must configure the Fallback API explicitly - it is not automatic outside the Claude apps.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolAnthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token c...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolUnified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsAnthropic shipped two names for one architecture on June 9, 2026. Here is what separates Fable 5 from Mythos 5, who can...
Fable 5 launched June 9 at 2x GPT-5.5's price with a 22-point SWE-Bench Pro gap. Here is the decision framework for choo...
Fable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-o...
Fable 5 is mostly a drop-in replacement for Opus 4.8, but 'mostly' is doing real work in that sentence. Here's every bre...
Anthropic's Claude Fable 5 includes undisclosed interventions that silently degrade responses for certain ML development...

The AI coding market is noisy. The changes that matter are easier to spot when you separate model capability, editor loo...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.