Fable 5 with 1M Context: What Actually Works in Practice

Last updated: June 11, 2026

Claude Fable 5 launched on June 9 with a 1M token context window as the default, not an opt-in beta, and no long-context price premium - a 900K-token request bills at the same per-token rate as a 9K one. That combination changes which workflows are practical. But "1M tokens" is a spec, not a workflow, and the gap between the two is where teams will waste money this month. This guide works through three patterns that hold up in practice, with cost math attached, and is honest about where retrieval still beats stuffing the window.

What You Actually Get: The Window on Paper#

The specs, from Anthropic's models overview: 1M token context, 128K max output, $10 per million input tokens and $50 per million output. The context windows documentation confirms the 1M maximum is also the default on the Claude API, and a single request can include up to 600 images or PDF pages.

Two caveats before you size anything:

The tokenizer counts differently. Fable 5 uses the tokenizer introduced with Opus 4.7, which produces roughly 30% more tokens for the same text than pre-4.7 models (the pricing docs say up to 35%). Anthropic's own tooltip pegs 1M tokens at roughly 555K words or about 2.5M unicode characters. If your sizing intuition was built on older Claude models, re-baseline it.

The usable envelope is smaller than the spec. In Claude Code specifically, Verdent's analysis of the 1M window puts the practical budget at about 830K tokens once the auto-compaction buffer and usage thresholds are accounted for. Plan around 800K, not a million.

Simon Willison's release-day assessment sets the temperament: "slow, expensive" but "the challenge is finding tasks that it can't do." The long-context tier is an async tool. Every workflow below assumes you are not sitting there watching a spinner.

Worked Example 1: Whole-Repo Review#

The classic 1M-context pitch is loading an entire codebase and asking questions that span it: dead code, dependency cycles, inconsistent error handling across modules. This works, but only if you size and cache deliberately.

Step 1: Measure before you load. Using the documented ~2.5 characters per token on the new tokenizer:

Terminal

git ls-files | grep -vE 'node_modules|dist|build|\.lock|\.svg|fixtures' \
  | xargs wc -c | tail -1
# divide total bytes by ~2.5 for a rough token estimate

Verdent's exclusion list matches what we have seen: never load node_modules/, build output, lockfiles, generated protobuf or GraphQL code, or large test fixtures. They are token-dense and signal-poor.

Step 2: Do the cache math. Say the repo lands at 600K tokens and you want a ten-question architecture review session. Per the prompt caching docs, a 1-hour cache write runs 2x base input and reads run 0.1x:

Uncached: 10 questions x 600K x $10/MTok = $60.00 in input alone
Cached: one $12.00 write (600K x $20/MTok) + 9 reads at $0.60 each = $17.40

That is roughly a 70% input-cost reduction for the session, before output tokens either way. The docs note the 1-hour cache pays for itself after two reads; the 5-minute tier ($12.50/MTok write) pays off after one. Fable 5's minimum cacheable prompt is 512 tokens on the Claude API, so even small stable preambles cache.

Step 3: Use Batch for the overnight pass. A non-interactive full-repo review (run the audit, file the report) qualifies for the Batch API's 50% discount - $5/$25 per MTok - which turns the $6.00 cold load into $3.00. For recurring scheduled reviews, batch plus caching is the only configuration where whole-repo passes stay cheap enough to run routinely. We covered the per-task framing in more depth in the Fable 5 cost-per-task analysis.

Worked Example 2: Log Archaeology#

This is the workflow where intuitions are most wrong. A million tokens sounds like "all the logs." It is not. At ~2.5 characters per token, the window holds roughly 2.5MB of raw text - on the order of 16,000 to 17,000 typical 150-byte log lines. A busy service emits that in minutes.

So the pattern is filter-then-load, not load-everything:

Terminal

# cut to the incident window and suspect services first
grep -h "2026-06-10T2[2-3]" logs/api-*.log logs/worker-*.log \
  | grep -vE 'healthz|heartbeat|GET /metrics' > incident.log
wc -c incident.log   # bytes / 2.5 ~= tokens

What the long window buys you is not capacity for everything - it is that the entire filtered slice fits in one shot. Cross-service timeline reconstruction ("the worker retries started 40 seconds before the API 502s, and both correlate with this deploy marker") is exactly the reasoning that chunked retrieval breaks, because the causal chain spans chunks no single query retrieves together. Anthropic's launch post claims Fable 5 "stays focused across millions of tokens in long-running tasks" - a vendor claim, but incident reconstruction is where that focus is most visibly useful.

One honest limit: this is a single-session pattern. If your incident review spans days and the log corpus keeps growing, you are back in retrieval territory (more on that below).

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Jun 11, 2026 • 8 min read

Setting Up the Memory Tool with Fable 5: Persistent Agents That Learn

Jun 11, 2026 • 8 min read

Prompt Caching Economics on Fable 5: When the 5-Minute TTL Pays

Jun 11, 2026 • 10 min read

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Jun 11, 2026 • 8 min read

Worked Example 3: Multi-Doc Synthesis#

The third pattern is loading a document set whole: an RFC plus the four design docs it supersedes, a vendor contract stack, or six to eight research papers (the pricing docs estimate a 500KB research PDF at ~125K tokens, so eight of those is the whole window). The 600-page-per-request PDF cap is the operative limit before tokens are.

Anthropic's own guidance here predates Fable 5 and still holds. The contextual retrieval post says it plainly: if your knowledge base is under 200K tokens (about 500 pages), skip RAG entirely and put it all in the prompt, with caching making that "significantly faster and more cost-effective" - they cite latency improvements over 2x and cost reductions up to 90% from caching alone.

What Fable 5's tier adds is the 200K-to-830K band: document sets that were previously forced into RAG purely by capacity now fit in one request. Synthesis tasks ("where do these five specs contradict each other") benefit most, because contradictions are relational facts that live between documents, and retrieval pipelines surface documents one relevance-ranked chunk at a time. For the mechanics of structuring large prompts, our context engineering guide covers ordering, stable-prefix layout, and cache breakpoint placement.

The Tradeoffs the Launch Post Skips#

Latency is real. Willison called the model "a beast" that is slow and expensive, and long-input requests compound that: processing 800K input tokens takes meaningfully longer than 8K regardless of model. Treat full-window calls as async jobs with retries, not interactive turns. If you need fast iterative chat over a big codebase, Opus 4.8 with tighter context is often the better tool.

Context rot is acknowledged, measured, and not solved. Anthropic's own docs state that "as token count grows, accuracy and recall degrade," and their context engineering post frames attention as a finite budget that every token depletes. Verdent's roundup of independent testing found degradation starting around 400K tokens and retrieval becoming unreliable past 600K on Sonnet 4.6, with a working heuristic of about 2% effectiveness loss per 100K tokens. The best published long-context retrieval number in that set is Opus 4.6 at 78.3% on 8-needle MRCR at 1M tokens - strong, and still means missed needles. No equivalent independent Fable 5 figure existed in anything we could fetch this week; until one does, assume the back half of the window is softer than the front and put your highest-value content early.

Cost compounds per turn. Every conversational turn re-sends the whole context. An uncached 800K-token context costs $8.00 of input per turn. Caching is not an optimization here; it is the difference between viable and absurd. And if generation overruns the window, requests on 4.5+ models stop with stop_reason: "model_context_window_exceeded" rather than erroring - handle it.

When Retrieval Still Wins#

The honest decision table, synthesized from Anthropic's guidance and Verdent's analysis:

Situation	Load the window	Use retrieval
Corpus size	Under ~830K tokens	Over ~830K tokens
Query pattern	One deep session, many questions	High-frequency queries over weeks
Content change rate	Static snapshot	Frequently updated (cache invalidation kills you)
Question shape	Cross-cutting, relational	Pointy, lookup-style
Session shape	Single session	Multi-session, many users

The per-query economics are the part people skip. A cached full-window read still costs about $0.80 in input per query at 800K tokens. A RAG pipeline retrieving 5K relevant tokens costs about $0.05 uncached. At hundreds of queries a day, retrieval wins by an order of magnitude even before latency. And retrieval itself has gotten better: Anthropic's contextual retrieval technique cuts retrieval failure rates by 49% (combined with BM25) and 67% with reranking. If your workload is lookup-shaped, a well-built RAG pipeline is not the legacy option - it is the right one.

The hybrid that Anthropic's engineering team actually recommends is just-in-time loading: keep lightweight identifiers in context, pull full content on demand, and use sub-agents that return condensed summaries instead of raw exploration. That pattern, plus compaction (in beta for Fable 5), is how the long-horizon agent harnesses get multi-day runs out of a finite window.

FAQ#

What is the Claude Fable 5 context window?#

Claude Fable 5 has a 1M token context window with a 128K token maximum output, per Anthropic's models overview (accessed June 11, 2026). The 1M window is the default on the Claude API, not an opt-in beta, and there is no long-context price premium - input bills at the standard $10 per million tokens whether the request is 9K or 900K. Anthropic's docs peg 1M tokens at roughly 555K words on the new tokenizer.

Does the 1M context window cost extra on Fable 5?#

No. Anthropic's pricing docs state that Fable 5, Opus 4.8, and Sonnet 4.6 include the full 1M window at standard per-token rates, with caching and batch discounts applying across the whole window. The cost driver is volume, not a surcharge: 800K input tokens is $8.00 at the base rate regardless.

How much code actually fits in the window?#

Anthropic's docs estimate 1M tokens at roughly 555K words or 2.5M characters on the new tokenizer. In Claude Code, plan for ~830K usable tokens after the auto-compaction buffer. As a rough rule, divide your repo's byte count (after excluding vendored deps, lockfiles, and generated code) by 2.5.

Is recall reliable across the full million tokens?#

Not uniformly. Anthropic's docs acknowledge context rot, and independent testing collected by Verdent found measurable degradation from ~400K tokens with a ~2% effectiveness loss per 100K added. Put critical content early, and verify long-context outputs the same way you would verify a junior engineer's first pass.

Should I replace my RAG pipeline with full-context loading?#

Only for corpora under ~830K tokens that are static, queried in concentrated sessions, and benefit from cross-document reasoning. High-frequency lookup workloads over large or changing corpora stay cheaper and faster on retrieval, especially with contextual retrieval cutting failure rates by up to 67%.

Can I use Fable 5's long context on my Claude subscription?#

Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans only through June 22, 2026; from June 23 it requires usage credits. If you are mid-evaluation, the June 22 deadline post covers what changes, and the migration guide covers the API-side moves.

Sources#

Anthropic pricing documentation - model rates, caching multipliers, batch discounts, long-context pricing, tokenizer note. Accessed June 10, 2026.
Anthropic models overview - Fable 5 specs, context window, GA date, word-per-token tooltips. Accessed June 10, 2026.
Simon Willison: Initial impressions of Claude Fable 5 - latency and capability assessment, pricing confirmation. Accessed June 10, 2026.
Anthropic context windows documentation - 1M default, 600-page PDF cap, context rot acknowledgment, compaction beta, overflow stop reason. Accessed June 11, 2026.
Anthropic: Introducing Claude Fable 5 and Claude Mythos 5 - launch claims, Stripe migration, subscription timeline. Accessed June 11, 2026.
Verdent: Claude Code 1M context window guide - 830K usable envelope, degradation thresholds, MRCR figures, exclusion list. Accessed June 11, 2026.
Anthropic prompt caching documentation - cache minimums per model, breakpoints, TTLs, payoff math. Accessed June 11, 2026.
Anthropic: Effective context engineering for AI agents - attention budget, just-in-time retrieval, sub-agent patterns. Accessed June 11, 2026.
Anthropic: Introducing Contextual Retrieval - 200K-token rule of thumb, retrieval failure rate improvements. Accessed June 11, 2026.

Last updated: June 11, 2026

What You Actually Get: The Window on Paper#

Two caveats before you size anything:

Worked Example 1: Whole-Repo Review#

Step 1: Measure before you load. Using the documented ~2.5 characters per token on the new tokenizer:

Terminal

git ls-files | grep -vE 'node_modules|dist|build|\.lock|\.svg|fixtures' \
  | xargs wc -c | tail -1
# divide total bytes by ~2.5 for a rough token estimate

Uncached: 10 questions x 600K x $10/MTok = $60.00 in input alone
Cached: one $12.00 write (600K x $20/MTok) + 9 reads at $0.60 each = $17.40

Worked Example 2: Log Archaeology#

So the pattern is filter-then-load, not load-everything:

Terminal

# cut to the incident window and suspect services first
grep -h "2026-06-10T2[2-3]" logs/api-*.log logs/worker-*.log \
  | grep -vE 'healthz|heartbeat|GET /metrics' > incident.log
wc -c incident.log   # bytes / 2.5 ~= tokens

One honest limit: this is a single-session pattern. If your incident review spans days and the log corpus keeps growing, you are back in retrieval territory (more on that below).

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Worked Example 3: Multi-Doc Synthesis#

The Tradeoffs the Launch Post Skips#

When Retrieval Still Wins#

The honest decision table, synthesized from Anthropic's guidance and Verdent's analysis:

Situation	Load the window	Use retrieval
Corpus size	Under ~830K tokens	Over ~830K tokens
Query pattern	One deep session, many questions	High-frequency queries over weeks
Content change rate	Static snapshot	Frequently updated (cache invalidation kills you)
Question shape	Cross-cutting, relational	Pointy, lookup-style
Session shape	Single session	Multi-session, many users

FAQ#

What is the Claude Fable 5 context window?#

Does the 1M context window cost extra on Fable 5?#

How much code actually fits in the window?#

Is recall reliable across the full million tokens?#

Should I replace my RAG pipeline with full-context loading?#

Can I use Fable 5's long context on my Claude subscription?#

Sources#

Anthropic pricing documentation - model rates, caching multipliers, batch discounts, long-context pricing, tokenizer note. Accessed June 10, 2026.
Anthropic models overview - Fable 5 specs, context window, GA date, word-per-token tooltips. Accessed June 10, 2026.
Simon Willison: Initial impressions of Claude Fable 5 - latency and capability assessment, pricing confirmation. Accessed June 10, 2026.
Anthropic context windows documentation - 1M default, 600-page PDF cap, context rot acknowledgment, compaction beta, overflow stop reason. Accessed June 11, 2026.
Anthropic: Introducing Claude Fable 5 and Claude Mythos 5 - launch claims, Stripe migration, subscription timeline. Accessed June 11, 2026.
Verdent: Claude Code 1M context window guide - 830K usable envelope, degradation thresholds, MRCR figures, exclusion list. Accessed June 11, 2026.
Anthropic prompt caching documentation - cache minimums per model, breakpoints, TTLs, payoff math. Accessed June 11, 2026.
Anthropic: Effective context engineering for AI agents - attention budget, just-in-time retrieval, sub-agent patterns. Accessed June 11, 2026.
Anthropic: Introducing Contextual Retrieval - 200K-token rule of thumb, retrieval failure rate improvements. Accessed June 11, 2026.

What You Actually Get: The Window on Paper#

Worked Example 1: Whole-Repo Review#

Worked Example 2: Log Archaeology#

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Setting Up the Memory Tool with Fable 5: Persistent Agents That Learn

Prompt Caching Economics on Fable 5: When the 5-Minute TTL Pays

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Worked Example 3: Multi-Doc Synthesis#

The Tradeoffs the Launch Post Skips#

When Retrieval Still Wins#

FAQ#

What is the Claude Fable 5 context window?#

Does the 1M context window cost extra on Fable 5?#

How much code actually fits in the window?#

Is recall reliable across the full million tokens?#

Should I replace my RAG pipeline with full-context loading?#

Can I use Fable 5's long context on my Claude subscription?#

Sources#

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Related Tools

Outlines

Superwhisper

Claude Fable 5

Lovable

Apps from Developers Digest

Hue

RSS Radar

Related Guides

Run AI Models Locally with Ollama and LM Studio

Routines (Web) - Claude Code

Getting Started with DevDigest CLI

Related Videos

Nimbalyst: The Open-Source Visual Workspace for Building with Codex and Claude Code

Anthropic Claude Code with Sonnet 3.7 in 15 Minutes

Build an AI Web App with Cursor, Anthropic MCP, Next.js & Neon in 30 Minutes

Related Posts

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Is Claude Fable 5 Slow? Latency in Practice, and When It Matters

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Claude Opus 5: Near-Fable Intelligence at Half the Cost

Build with the member tools

Get Smarter About AI Dev

What You Actually Get: The Window on Paper#

Worked Example 1: Whole-Repo Review#

Worked Example 2: Log Archaeology#

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Setting Up the Memory Tool with Fable 5: Persistent Agents That Learn

Prompt Caching Economics on Fable 5: When the 5-Minute TTL Pays

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Worked Example 3: Multi-Doc Synthesis#

The Tradeoffs the Launch Post Skips#

When Retrieval Still Wins#

FAQ#

What is the Claude Fable 5 context window?#

Does the 1M context window cost extra on Fable 5?#

How much code actually fits in the window?#

Is recall reliable across the full million tokens?#

Should I replace my RAG pipeline with full-context loading?#

Can I use Fable 5's long context on my Claude subscription?#

Sources#

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Related Tools

Outlines

Superwhisper

Claude Fable 5

Lovable

Apps from Developers Digest

Hue

RSS Radar

Related Guides

Run AI Models Locally with Ollama and LM Studio

Routines (Web) - Claude Code

Getting Started with DevDigest CLI

Related Videos