Handling Fable 5 Refusals: A Working Guide to the Fallback API

Q: What should I use if I am hand-rolling retries in raw HTTP?

Manual retry plus the fallback-credit token. Echo the `fallback_credit_token` from the refusal response back on your retry with the `fallback-credit-2026-06-01` beta header so the retry is billed as if the conversation had always been on the fallback model, avoiding a double charge for cache writes. *Sources: Anthropic's [refusals and fallback](https://platform.claude.com/docs/en/build-with-claude/refusals-and-fallback), [fallback credit](https://platform.claude.com/docs/en/build-with-claude/fallback-credit), [SDK middleware](https://platform.claude.com/docs/en/cli-sdks-libraries/middleware) docs, and the [fallback billing cookbook](https://platform.claude.com/cookbook/fable-5-fallback-billing-guide).*

Developers Digest•June 10, 2026•10 min read

Claude Fable 5 Anthropic API Agents

The Fable 5 Moment

31 parts

Previous in seriesClaude Fable 5 API: Production Integration Patterns, Rate Limits, and Migration Gotchas

Next in seriesWhy Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

TL;DR

Fable 5 ships with safety classifiers that route flagged requests away from the model. In production you need to handle this, and Anthropic shipped three ways to do it. Here's how each one works, with code, plus the billing rules nobody has written up.

Claude Fable 5 is the same model as the restricted-access Mythos 5, wrapped in safety classifiers. When a classifier fires, Fable 5 doesn't answer. In the Claude apps, the request silently falls back to Opus 4.8. On the API, handling that is your job. If you have not yet moved your integration over, see Migrating to Claude Fable 5 and the June 22 deadline post for the cutover timeline.

This matters even if your product has nothing to do with security or biology. The classifiers are tuned conservative and they catch normal work: developers reported a base64 implementation flagged as cyber, genome-alignment pipelines rerouted, and prompts asking the model to "explain its reasoning" tripping the extraction filter. Anthropic's own number is that under 5% of sessions hit a fallback. Across production traffic, that's not an edge case. That's Tuesday.

Here's how refusals work on the wire, and the three ways to handle them.

What a refusal looks like#

A refusal is not an error. You get HTTP 200 with a new stop reason:

JSON

{
  "id": "msg_...",
  "model": "claude-fable-5",
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "explanation": "This request was flagged by safety classifiers..."
  },
  "content": [...],
  "usage": {...}
}

Three things to know:

stop_details.category is "cyber", "bio", "reasoning_extraction", or null.
A refusal can happen before any output or mid-stream. If it fires before output, you are not billed and it doesn't count against rate limits. Mid-stream, you pay for input plus whatever already streamed.
If your code only checks for HTTP errors, refusals will surface as mysteriously short responses. Check stop_reason on every call.

Option 1: Manual retry on Opus 4.8#

The simplest approach. Catch the refusal, replay the conversation on claude-opus-4-8:

Python

def create_with_fallback(client, **kwargs):
    response = client.messages.create(model="claude-fable-5", **kwargs)

    if response.stop_reason == "refusal":
        # Strip thinking blocks before cross-model replay
        clean_messages = strip_thinking_blocks(kwargs["messages"])
        response = client.messages.create(
            model="claude-opus-4-8",
            **{**kwargs, "messages": clean_messages},
        )

    return response

The gotcha is in that strip_thinking_blocks call. Thinking blocks are model-specific: you pass them back unchanged on same-model multi-turn conversations, but you must strip thinking and redacted_thinking blocks when replaying history on a different model.

The bigger problem with manual retry is cost. Prompt caches are per-model. If you've built up a large cached prefix on Fable 5 and a refusal forces you to Opus 4.8, you pay full cache-write costs all over again on the new model. On a long agentic session with hundreds of thousands of cached tokens, that's real money. Anthropic shipped a fix for this, covered below.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Jun 10, 2026 • 8 min read

Claude Managed Agents: Dreaming, Outcomes, and Multi-Agent Orchestration Explained

Jun 10, 2026 • 8 min read

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Jun 10, 2026 • 8 min read

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Jun 10, 2026 • 9 min read

Option 2: Server-side fallback (the good one)#

New with Fable 5, in beta. You declare fallback models in the request and Anthropic handles the reroute server-side:

Python

response = client.messages.create(
    model="claude-fable-5",
    max_tokens=32000,
    messages=messages,
    fallbacks=[{"model": "claude-opus-4-8"}],
    extra_headers={"anthropic-beta": "server-side-fallback-2026-06-01"},
)

Details that matter:

The beta header must be exactly server-side-fallback-2026-06-01. Other date values return a 400.
Up to 3 fallback models, tried in order. Each entry can override max_tokens and thinking for that attempt only, which you'll want, because Opus 4.8 has different thinking semantics than Fable 5.
Only a safety-classifier decline triggers the fallback. Rate limits, overload, and 5xx errors do not. This is not a general resilience mechanism; pair it with your normal retry logic.
Permitted fallback targets are published per-model as allowed_fallback_models on the Models API when you set the beta header.

The response tells you exactly what happened. The top-level model field reports whichever model actually answered, a new fallback content block marks the handoff:

JSON

{"type": "fallback", "from": {"model": "claude-fable-5"}, "to": {"model": "claude-opus-4-8"}}

and usage.iterations[] breaks out token usage per attempted model, which is what you want for cost attribution.

Availability caveat: server-side fallback works on the Claude API and Claude Platform on AWS only. It's rejected on the Batches API, and it doesn't exist on Bedrock, Vertex, or Microsoft Foundry. On those platforms you need option 3.

Option 3: SDK middleware#

Anthropic shipped refusal-fallback middleware for the TypeScript, Python, Go, Java, and C# SDKs alongside the launch. It implements the catch-refusal-retry-and-bill-correctly loop client-side:

TypeScript

import Anthropic from "@anthropic-ai/sdk";
import { refusalFallback } from "@anthropic-ai/sdk/middleware";

const client = new Anthropic({
  middleware: [refusalFallback({ fallbackModel: "claude-opus-4-8" })],
});

This is the right answer on Bedrock and Vertex where server-side fallback isn't available, and for teams who want the behavior without taking a beta header into production. It also applies the fallback credit automatically, which brings us to billing.

The billing rules#

Scattered across three docs pages, collected here:

Refusal before any output: free. No billing, no rate-limit hit.
Mid-stream refusal: you pay for input plus already-streamed output at Fable 5 rates.
The fallback answer is billed at the fallback model's rates. Anthropic states you won't be charged Fable prices for rerouted requests. Opus 4.8 answers cost Opus 4.8 prices ($5/$25 per MTok instead of $10/$50).
Cache re-write costs are refundable via the fallback-credit beta.

That last one deserves explanation. When a refusal forces you onto a new model, the refusal response carries a one-time token in stop_details:

JSON

"stop_details": {
  "type": "refusal",
  "fallback_credit_token": "fct_...",
  "fallback_has_prefill_claim": true
}

Echo it as a top-level fallback_credit_token on your retry (with the fallback-credit-2026-06-01 beta header, which the server-side-fallback header also grants), and the retry is billed as if the conversation had always been on the new model. No double-paying for cache writes.

You only need to touch this if you're hand-rolling retries in Ruby, PHP, or raw HTTP. Server-side fallback and the SDK middleware both apply it for you.

Which option to use#

New project on the Claude API: server-side fallback. Least code, correct billing, per-model usage breakdown.
Bedrock, Vertex, or Foundry: SDK middleware. Server-side fallback doesn't exist there.
Batch workloads: manual handling. The Batches API rejects the fallbacks parameter, so split refused items into a retry batch against Opus 4.8.
No SDK (raw HTTP): manual retry plus the fallback-credit token.

Reduce refusals before you handle them#

The cheapest refusal is the one that never fires. Two prompt-level fixes with outsized impact:

First, remove any instruction asking the model to reproduce its reasoning in responses. "Show your thought process" style prompts trigger the reasoning_extraction classifier. If you need process visibility, set thinking: {"display": "summarized"} and read the thinking blocks.

Second, if you work in security or life sciences legitimately, expect false positives and design for them rather than fighting them. The fallback path to Opus 4.8 is the supported answer. Opus 4.8 has no blocking cyber safeguards and remains a very capable model; for flagged requests, a clean fallback is a better experience than an argument with a classifier.

Log every refusal with its category. A week of data will tell you whether your workload has a 0.1% fallback rate or a 15% one, and that number should drive how much engineering you invest here.

FAQ#

Is a Fable 5 refusal billed the same as a normal request?#

No. A refusal that fires before any output is free and does not count against rate limits. A mid-stream refusal bills you for input plus whatever output already streamed at Fable 5 rates. The fallback answer itself is billed at the fallback model's rates, not Fable 5 rates.

Does server-side fallback work on Bedrock or Vertex?#

No. Server-side fallback is available only on the Claude API and Claude Platform on AWS. On Bedrock, Vertex, or Microsoft Foundry, use the SDK middleware instead, since it applies the same catch-refusal-retry-and-bill-correctly logic client-side.

Can I avoid refusals instead of just handling them?#

Partially. Two prompt-level changes reduce false positives: dropping instructions that ask the model to reproduce its reasoning in the response (which can trip the reasoning-extraction classifier), and expecting false positives if you legitimately work in security or life sciences, then designing the fallback path for them rather than fighting the classifier.

What should I use if I am hand-rolling retries in raw HTTP?#

Manual retry plus the fallback-credit token. Echo the fallback_credit_token from the refusal response back on your retry with the fallback-credit-2026-06-01 beta header so the retry is billed as if the conversation had always been on the fallback model, avoiding a double charge for cache writes.

Sources: Anthropic's refusals and fallback, fallback credit, SDK middleware docs, and the fallback billing cookbook.

Migrating to Claude Fable 5: The Practical Guide

Fable 5 is mostly a drop-in replacement for Opus 4.8, but 'mostly' is doing real work in that sentence. Here's every breaking change, what to delete from your code, and the prompt audit you should run before flipping the model ID.

9 min read

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Anthropic gave subscribers two weeks of free Fable 5 access, then it moves to usage credits. Here's what's actually changing, what the real-world burn rates look like, and what to do depending on how you use Claude.

6 min read

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

Claude Fable 5 routes blocked queries to Opus 4.8 rather than refusing outright - but the fallback is not automatic for API users and requires explicit configuration. Here is the complete developer guide to the refusal architecture.

8 min read

Suggest an editSave

Discuss this article on Twitter/X

Developers Digest

Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.

300+ videos30K+ GitHub stars50+ articles

Subscribe YouTube GitHub Twitter/X

Handling Fable 5 Refusals: A Working Guide to the Fallback API

Developers Digest•June 10, 2026•10 min read

Claude Fable 5 Anthropic API Agents

The Fable 5 Moment

31 parts

Previous in seriesClaude Fable 5 API: Production Integration Patterns, Rate Limits, and Migration Gotchas

Next in seriesWhy Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

TL;DR

Here's how refusals work on the wire, and the three ways to handle them.

What a refusal looks like#

A refusal is not an error. You get HTTP 200 with a new stop reason:

JSON

{
  "id": "msg_...",
  "model": "claude-fable-5",
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "explanation": "This request was flagged by safety classifiers..."
  },
  "content": [...],
  "usage": {...}
}

Three things to know:

stop_details.category is "cyber", "bio", "reasoning_extraction", or null.
A refusal can happen before any output or mid-stream. If it fires before output, you are not billed and it doesn't count against rate limits. Mid-stream, you pay for input plus whatever already streamed.
If your code only checks for HTTP errors, refusals will surface as mysteriously short responses. Check stop_reason on every call.

Option 1: Manual retry on Opus 4.8#

The simplest approach. Catch the refusal, replay the conversation on claude-opus-4-8:

Python

def create_with_fallback(client, **kwargs):
    response = client.messages.create(model="claude-fable-5", **kwargs)

    if response.stop_reason == "refusal":
        # Strip thinking blocks before cross-model replay
        clean_messages = strip_thinking_blocks(kwargs["messages"])
        response = client.messages.create(
            model="claude-opus-4-8",
            **{**kwargs, "messages": clean_messages},
        )

    return response

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Jun 10, 2026 • 8 min read

Claude Managed Agents: Dreaming, Outcomes, and Multi-Agent Orchestration Explained

Jun 10, 2026 • 8 min read

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Jun 10, 2026 • 8 min read

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Jun 10, 2026 • 9 min read

Option 2: Server-side fallback (the good one)#

New with Fable 5, in beta. You declare fallback models in the request and Anthropic handles the reroute server-side:

Python

response = client.messages.create(
    model="claude-fable-5",
    max_tokens=32000,
    messages=messages,
    fallbacks=[{"model": "claude-opus-4-8"}],
    extra_headers={"anthropic-beta": "server-side-fallback-2026-06-01"},
)

Details that matter:

The beta header must be exactly server-side-fallback-2026-06-01. Other date values return a 400.
Up to 3 fallback models, tried in order. Each entry can override max_tokens and thinking for that attempt only, which you'll want, because Opus 4.8 has different thinking semantics than Fable 5.
Only a safety-classifier decline triggers the fallback. Rate limits, overload, and 5xx errors do not. This is not a general resilience mechanism; pair it with your normal retry logic.
Permitted fallback targets are published per-model as allowed_fallback_models on the Models API when you set the beta header.

The response tells you exactly what happened. The top-level model field reports whichever model actually answered, a new fallback content block marks the handoff:

JSON

{"type": "fallback", "from": {"model": "claude-fable-5"}, "to": {"model": "claude-opus-4-8"}}

and usage.iterations[] breaks out token usage per attempted model, which is what you want for cost attribution.

Option 3: SDK middleware#

Anthropic shipped refusal-fallback middleware for the TypeScript, Python, Go, Java, and C# SDKs alongside the launch. It implements the catch-refusal-retry-and-bill-correctly loop client-side:

TypeScript

import Anthropic from "@anthropic-ai/sdk";
import { refusalFallback } from "@anthropic-ai/sdk/middleware";

const client = new Anthropic({
  middleware: [refusalFallback({ fallbackModel: "claude-opus-4-8" })],
});

The billing rules#

Scattered across three docs pages, collected here:

Refusal before any output: free. No billing, no rate-limit hit.
Mid-stream refusal: you pay for input plus already-streamed output at Fable 5 rates.
The fallback answer is billed at the fallback model's rates. Anthropic states you won't be charged Fable prices for rerouted requests. Opus 4.8 answers cost Opus 4.8 prices ($5/$25 per MTok instead of $10/$50).
Cache re-write costs are refundable via the fallback-credit beta.

That last one deserves explanation. When a refusal forces you onto a new model, the refusal response carries a one-time token in stop_details:

JSON

"stop_details": {
  "type": "refusal",
  "fallback_credit_token": "fct_...",
  "fallback_has_prefill_claim": true
}

You only need to touch this if you're hand-rolling retries in Ruby, PHP, or raw HTTP. Server-side fallback and the SDK middleware both apply it for you.

Which option to use#

New project on the Claude API: server-side fallback. Least code, correct billing, per-model usage breakdown.
Bedrock, Vertex, or Foundry: SDK middleware. Server-side fallback doesn't exist there.
Batch workloads: manual handling. The Batches API rejects the fallbacks parameter, so split refused items into a retry batch against Opus 4.8.
No SDK (raw HTTP): manual retry plus the fallback-credit token.

Reduce refusals before you handle them#

The cheapest refusal is the one that never fires. Two prompt-level fixes with outsized impact:

Log every refusal with its category. A week of data will tell you whether your workload has a 0.1% fallback rate or a 15% one, and that number should drive how much engineering you invest here.

FAQ#

Is a Fable 5 refusal billed the same as a normal request?#

Does server-side fallback work on Bedrock or Vertex?#

Can I avoid refusals instead of just handling them?#

What should I use if I am hand-rolling retries in raw HTTP?#

Sources: Anthropic's refusals and fallback, fallback credit, SDK middleware docs, and the fallback billing cookbook.

Discuss this article on Twitter/X

Developers Digest

Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.

300+ videos30K+ GitHub stars50+ articles

Subscribe YouTube GitHub Twitter/X

What a refusal looks like#

Option 1: Manual retry on Opus 4.8#

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Claude Managed Agents: Dreaming, Outcomes, and Multi-Agent Orchestration Explained

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Option 2: Server-side fallback (the good one)#

Option 3: SDK middleware#

The billing rules#

Which option to use#

Reduce refusals before you handle them#

FAQ#

Is a Fable 5 refusal billed the same as a normal request?#

Does server-side fallback work on Bedrock or Vertex?#

Can I avoid refusals instead of just handling them?#

What should I use if I am hand-rolling retries in raw HTTP?#

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

Try These Tools

Related Tools

Claude Fable 5

Claude Code

Vercel AI SDK

Claude Agent SDK

Apps from Developers Digest

Skill Builder

Subagent Studio

Related Guides

Claude Code Setup Guide

Building Your First MCP Server

Routines (Web) - Claude Code

Related Videos

Claude Mythos & Fable 5 Banned

Claude Fable 5 in 7 Minutes

Anthropic's Cowork: Claude Code for the Rest of Your Work

Related Posts

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Claude Opus 5 vs Opus 4.8 vs Fable 5: Benchmark Comparison (July 2026)

Terence Tao Digests the Jacobian Conjecture Counterexample: How Claude Fable 5 Broke an 87-Year-Old Math Problem

Fable 5 Is Back: The Anthropic Model the Government Switched Off

Refusals at Fleet Scale: Building Fable 5 Agents That Do Not Silently Fail

Build with the member tools

Get Smarter About AI Dev

What a refusal looks like#

Option 1: Manual retry on Opus 4.8#

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Claude Managed Agents: Dreaming, Outcomes, and Multi-Agent Orchestration Explained

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Option 2: Server-side fallback (the good one)#

Option 3: SDK middleware#

The billing rules#

Which option to use#

Reduce refusals before you handle them#

FAQ#

Is a Fable 5 refusal billed the same as a normal request?#

Does server-side fallback work on Bedrock or Vertex?#

Can I avoid refusals instead of just handling them?#

What should I use if I am hand-rolling retries in raw HTTP?#

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

Try These Tools

Related Tools

Claude Fable 5

Claude Code

Vercel AI SDK

Claude Agent SDK

Apps from Developers Digest

Skill Builder

Subagent Studio

Related Guides

Claude Code Setup Guide

Building Your First MCP Server

Routines (Web) - Claude Code

Related Videos

Claude Mythos & Fable 5 Banned

Claude Fable 5 in 7 Minutes