The Fable 5 Moment
29 partsTL;DR
Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level, and how effort drives your token bill.
Read next
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.
10 min readFable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
7 min readFable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the honest math on when RAG still wins.
10 min readLast updated: June 11, 2026
Fable 5 took away most of the dials developers used to tune Claude. Thinking is always on, budget_tokens returns a 400, and the sampling parameters are gone entirely. What is left is one control that now does almost all the work: the effort parameter. The same dial drives Opus 4.8 and 4.7, and it surfaces in Claude Code as /effort. This guide covers what each level changes, which models accept which levels, and how to reason about cost - all verified against Anthropic's documentation on June 11, 2026.
Effort is set via output_config: {"effort": "..."} in the Messages API. It is GA on supported models with no beta header. Per Anthropic's effort documentation, the parameter affects all tokens in the response, not just thinking:
That last point is the key difference from the old budget_tokens approach, which only capped thinking. At lower effort, Claude makes fewer tool calls, combines operations into single calls, skips preamble, and confirms tersely. At higher effort, it makes more tool calls, explains its plan before acting, and writes more detailed summaries and code comments.
One important framing from the docs: effort is "a behavioral signal, not a strict token budget." At low, Claude will still think on genuinely hard problems - just less. If you need a hard ceiling, that is what max_tokens is for.
The API accepts exactly five values. These are the complete set - the docs state this explicitly, which matters because Claude Code's menu shows a sixth option (more on ultracode below).
| Level | What the docs say | Available on |
|---|---|---|
max | "Absolute maximum capability with no constraints on token spending" | Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, Mythos Preview |
xhigh | "Extended capability for long-horizon work" - agentic and coding tasks over 30 minutes "with token budgets in the millions" | Fable 5, Mythos 5, Opus 4.8, Opus 4.7 only |
high | The default. "Equivalent to not setting the parameter" | All effort-capable models |
medium | "Balanced approach with moderate token savings" | All effort-capable models |
low | "Most efficient. Significant token savings with some capability reduction" | All effort-capable models |
Effort is supported on Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5, and Mythos Preview. Models not on that list (Sonnet 4.5, Haiku 4.5) do not support the parameter at all.
Two subtleties worth internalizing:
xhigh is the exclusive club. Only Fable 5, Mythos 5, Opus 4.8, and Opus 4.7 have it. On Opus 4.6 and Sonnet 4.6 the ladder jumps from high straight to max.The API default is high everywhere - omitting the parameter and setting "high" behave identically. Claude Code defaults differ per model:
| Model | API default | Claude Code default |
|---|---|---|
| Fable 5 | high | high |
| Opus 4.8 | high | high |
| Opus 4.7 | high | xhigh |
| Opus 4.6 / Sonnet 4.6 | high | high |
Claude Code also has fallback behavior: set a level the active model does not support and it falls back to the highest supported level at or below it. Set xhigh and switch to Opus 4.6, and you silently run at high. And when you first run Fable 5, Opus 4.8, or Opus 4.7, Claude Code applies that model's default effort even if you had set a different level for another model.
One more wrinkle: low through xhigh persist across Claude Code sessions, but max applies to the current session only (unless forced through the CLAUDE_CODE_EFFORT_LEVEL environment variable). Anthropic clearly does not want anyone accidentally living at max.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 8 min read
On Fable 5, thinking cannot be turned off - effort is the only depth control. On Opus 4.8 and 4.7, adaptive thinking is the only mode, and it is off unless you send thinking: {"type": "adaptive"}. The adaptive thinking docs describe how each effort level steers it:
max: Claude always thinks, with no constraints on depthxhigh: always thinks deeply, with extended explorationhigh: almost always thinksmedium: moderate thinking; may skip it for very simple querieslow: minimizes thinking; skips it where speed matters mostIf you are coming from budget_tokens code, this mapping is your migration path - there is no 1:1 token conversion. The full migration checklist covers the other breaking changes that land alongside it.
Here is the part pricing pages will not spell out for you: effort does not change the per-token rate. Fable 5 bills $10 per million input tokens and $50 per million output at every effort level; Opus 4.8 and 4.7 bill $5/$25, per the pricing page. The only documented price multipliers are caching, batch, fast mode, and data residency - effort is not on the list.
What effort changes is volume, almost entirely on the output side, because thinking tokens bill as output tokens even when you never see them (the default display on Fable 5 and Opus 4.8/4.7 is "omitted", which hides thinking text but bills it identically).
A worked illustration. Suppose an agentic task on Opus 4.8 sends 20,000 input tokens per turn. These output volumes are hypothetical, but the shape matches how the docs describe the levels:
| Scenario | Output tokens | Input cost | Output cost | Turn total |
|---|---|---|---|---|
low - terse, few tool calls | 5,000 | $0.10 | $0.125 | ~$0.23 |
high - thinks, plans, summarizes | 20,000 | $0.10 | $0.50 | ~$0.60 |
xhigh - extended exploration | 60,000 | $0.10 | $1.50 | ~$1.60 |
Same model, same prices, roughly 7x spread per turn purely from behavior. On Fable 5 double every number. Multiply by hundreds of turns in a long agent run and the effort setting is a bigger budget lever than the model choice in many cases - which is why effort deserves a row in any cost-per-task analysis you run.
Three practical cost notes, all from the docs:
usage.output_tokens_details.thinking_tokens reports how many billed output tokens went to reasoning. Run the same eval at two effort levels and compare.xhigh or max, Anthropic recommends starting max_tokens at 64,000 - it is a hard ceiling on thinking plus response text. Seeing stop_reason: "max_tokens" means raise the ceiling or lower the effort.Counterintuitively, higher effort is sometimes the cheaper total: a run that finishes in one pass at xhigh can undercut three failed retries at medium. The Fable 5 vs Opus 4.8 decision guide works through that completion-rate math for model choice; the same logic applies at the effort dial.
Synthesizing Anthropic's per-model recommendations:
On Fable 5: start at high (the default) for most work and reserve xhigh for the most capability-sensitive workloads. Anthropic's docs note that lower effort settings on Fable 5 "still perform well and often exceed xhigh performance on prior models" - a vendor claim, but a useful prior: do not assume Fable 5 needs the dial maxed. Step down to medium or low if tasks complete correctly but take longer than necessary.
On Opus 4.8 and 4.7: start with xhigh for coding and agentic work, treat high as the floor for anything intelligence-sensitive, and drop to medium only after evals confirm quality holds. The docs are blunt about max: on most workloads it "adds significant cost for relatively small quality gains" and it "can lead to overthinking" on structured-output tasks. Reserve it for genuinely frontier problems.
Everywhere: low is the documented home for subagents, classification, quick lookups, and high-volume latency-sensitive paths. If you orchestrate subagents, agent teams, or workflows, running workers at low while the orchestrator sits at high or xhigh is the cleanest cost win available.
In Claude Code, the CLAUDE_CODE_EFFORT_LEVEL environment variable beats everything, then your configured level, then the model default. The mechanisms:
/effort opens an interactive slider; /effort xhigh sets directly; /effort auto resets to the model default/model picker--effort <level> at launch, for a single sessioneffortLevel in settings (accepts low through xhigh; max and ultracode are session-only)effort in skill or subagent frontmatter, overriding the session level while that skill or subagent runsAbout ultracode: it appears in the /effort menu but is not an API effort level. It sends xhigh to the model and additionally grants Claude Code standing permission to orchestrate dynamic workflows. Session-only, and deliberately excluded from the effortLevel setting and --effort flag. Related but different: typing ultrathink in a prompt requests deeper reasoning for that one turn via an in-context instruction - the effort level sent to the API is unchanged.
On the API, the syntax is one field:
response = client.messages.create(
model="claude-fable-5",
max_tokens=64000,
output_config={"effort": "xhigh"},
messages=[{"role": "user", "content": "..."}],
)
On Opus 4.8 or 4.7, add thinking={"type": "adaptive"} if you want thinking; on Fable 5, omit the thinking parameter entirely (it is always on, and an explicit disabled returns a 400). If raw speed is the constraint rather than depth, see whether fast mode is worth it - fast mode changes price-per-token, effort changes token volume.
No. xhigh exists only on Fable 5, Mythos 5, Opus 4.8, and Opus 4.7. Opus 4.6 and Sonnet 4.6 support low, medium, high, and max but skip xhigh. In Claude Code, setting xhigh on an unsupported model silently falls back to high.
No. Per-token rates are fixed per model ($10/$50 for Fable 5, $5/$25 for Opus 4.8 and 4.7) and effort is not a pricing multiplier. Higher effort costs more because the model generates more tokens - more thinking (billed as output even when hidden), more tool calls, longer explanations. Check usage.output_tokens_details.thinking_tokens to see where the spend goes.
It is a Claude Code setting, not an API level. The API accepts exactly low, medium, high, xhigh, and max. Ultracode sends xhigh to the model and adds standing permission for Claude Code to launch multi-agent dynamic workflows. It applies to the current session only.
Anthropic says no - start at high, the default. Fable 5's lower effort levels are documented as often exceeding the xhigh performance of prior models, and blanket xhigh gets expensive fast. Move up only when a capability-sensitive task measurably benefits.
thinking_tokens usage field (accessed June 11, 2026)/effort command, per-model level tables, defaults, ultracode, ultrathink, fallback behavior (accessed June 11, 2026)/effort xhigh guidance (accessed June 11, 2026)Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolAnthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x...
View ToolAnthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token c...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolLow, medium, high, xhigh, and max for adaptive reasoning control.
Claude CodeWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentFable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the hon...
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost ma...
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you...
Claude Code dynamic workflows turn orchestration into a JavaScript script that runs up to 1,000 agents per run - here is...
A verified directory of the frontier AI models in June 2026 - Claude Fable 5, GPT-5.5, GPT-5.4, Gemini 3.1 Pro, and Deep...
Rewriting prompts and skills for Fable 5: what changes when you migrate agents from Opus 4.x, how effort interplay works...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.