Refusals at Fleet Scale: Building Fable 5 Agents That Do Not Silently Fail

Q: How do I tell a refusal apart from a normal short answer?

Check `stop_reason`, not the length of the text. A refusal returns `stop_reason: "refusal"` on an otherwise normal `200` response. A short but legitimate answer returns `end_turn`. Never infer a refusal from a suspiciously short string, because that is exactly the ambiguity that lets refusals slip through as "successful" results.

This is Part 3 of the Fable 5 agent fleets series. Part 1 covered what Fable 5 is and why it came back. Part 4 looks at what its 1M context and memory actually unlock. This post is the applied one: the single API behavior that will corrupt a fleet run if you ignore it, and how to build around it from day one.

The failure that does not look like a failure

When Fable 5's safety classifier refuses a request, it does not raise an HTTP error. It returns a normal 200 response with stop_reason: "refusal". There is no exception to catch, no non-2xx status to branch on, no timeout. From the outside it looks like the model completed and returned very little.

For a single interactive chat, this is a minor annoyance. A user sees a short or empty answer, shrugs, and retries. For a fleet, it is a different class of problem. If your orchestrator hands work to 40 parallel workers (the fan-out coordination pattern) and treats each 200 as a successful worker result, a refused request becomes a hole in the middle of the run that nothing flags. The worker "succeeded." The aggregation step consumes its empty or truncated output as if it were real. The final artifact is quietly wrong, and the only signal you get is a downstream result that does not add up.

This is worse than a crash. A crash is loud and local. A silent refusal is quiet and it propagates. It is the reliability equivalent of a function that returns undefined instead of throwing.

The reason this matters now, and not as some theoretical edge case, is the post-return classifier. When Fable 5 came back on July 1, 2026, Anthropic shipped a new safety classifier that blocks the specific reported jailbreak technique more than 99 percent of the time. The tradeoff, stated plainly in the redeployment post, is more false positives on benign coding and debugging work. Blocked requests are re-served by Opus 4.8. If your fleet does security research, vulnerability triage, exploit-adjacent defensive work, or even ordinary debugging that touches those topics, you will see refusals, and you will see them on requests that are completely legitimate.

Build the fallback path on day one. It is not an optimization. It is table stakes.

Detecting a refusal in every worker loop

The first rule: never trust a 200 to mean "the model produced a usable answer." Check stop_reason explicitly on every completion, in every worker.

// illustrative - fields follow the documented response shape
type StopReason = "end_turn" | "max_tokens" | "tool_use" | "refusal";

interface WorkerResult {
  ok: boolean;
  refused: boolean;
  text: string;
  model: string;
  stopReason: StopReason;
}

function interpret(response: MessageResponse): WorkerResult {
  const refused = response.stop_reason === "refusal";
  return {
    ok: !refused && response.stop_reason !== "max_tokens",
    refused,
    text: extractText(response),
    model: response.model,
    stopReason: response.stop_reason,
  };
}

The important discipline is structural, not clever: a refusal must be a first-class outcome in your worker's return type, not something inferred later from a suspiciously short string. If refused is a real field that the aggregation layer can read, you can decide what to do with it. If it is buried inside an empty text, you cannot.

A refusal also has one useful billing property worth noting. If Fable 5 refuses before producing any output, you are not billed for that request. That makes the "detect and retry on a different model" pattern cheap: the refused attempt costs nothing, and only the fallback attempt bills.

Three ways to fall back, and when to use each

Anthropic supports three routes for handling a refusal. They are not interchangeable, and picking the wrong one for your fleet shape creates its own problems.

1. Server-side `fallbacks`

You pass a fallbacks parameter and the platform re-serves a refused request on the fallback model for you. This is the least code and the most consistent behavior across every call site, because the retry happens before the response ever reaches your app.

Use it when you want a uniform policy across the whole fleet and you are comfortable letting the platform decide when to hand off. The cost is control: the fallback fires on the platform's terms, and your orchestrator sees the final answer without necessarily knowing a handoff happened unless you inspect the returned model field.

2. SDK middleware

You wrap the client so that a refusal is intercepted and retried according to your own logic before your business code sees it. This sits between the raw API and your worker loop.

// illustrative middleware wrapper
async function withRefusalFallback(
  req: MessageRequest,
  primary = "claude-fable-5",
  fallback = "claude-opus-4-8",
): Promise<WorkerResult> {
  const first = interpret(await client.messages.create({ ...req, model: primary }));
  if (!first.refused) return first;

  metrics.increment("fable5.refusal", { stage: req.stage });
  const second = interpret(await client.messages.create({ ...req, model: fallback }));
  return { ...second, refused: false }; // resolved by fallback
}

Use middleware when you want one consistent fallback policy but also want to emit metrics, tag the result, or vary the fallback per task type. It is the sweet spot for most fleets: centralized, observable, and still yours.

3. Manual fallback in the worker

The worker itself catches the refusal and decides what to do. Maximum control, maximum boilerplate, and the easiest to get subtly wrong because every worker has to remember to do it. Reserve manual handling for workers with genuinely special requirements, for example a step where the fallback prompt has to differ from the primary prompt, or where a refusal should route to a human review queue instead of another model.

For most teams the answer is middleware as the default, with server-side fallbacks as a floor so that even an un-wrapped call site is covered. Manual handling stays the exception.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

The MCP 2026-07-28 Rewrite: What Breaks and How to Migrate

Jul 1, 2026 • 11 min read

Webernetes: Kubernetes Ported to the Browser in TypeScript

Jul 1, 2026 • 5 min read

Claude Code Is Steganographically Marking Requests

Jun 30, 2026 • 7 min read

Claude in Microsoft Foundry on Azure: Developer Guide 2026

Jun 30, 2026 • 8 min read

Designing the Opus 4.8 fallback so results stay consistent

Falling back to a different model is not free of consequences. Opus 4.8 is a different model than Fable 5, with a different context window, different pricing, and different output characteristics. If half your fleet's results came from Fable 5 and the other half came from Opus 4.8 because of scattered refusals, you can end up with an inconsistent final artifact: two coding styles in one migration, two summary voices in one report, two verdicts from what should be one rubric.

A few practices keep the fallback path coherent:

Keep the prompt model-agnostic. The same prompt should produce compatible output on both models. Avoid instructions that lean on Fable-5-only behavior. If you must specialize, specialize on the fallback path explicitly rather than hoping the primary prompt transfers.
Tag every result with the model that produced it. Carry model through to the aggregation layer. When a downstream reviewer or a human sees an odd result, "this one came from the fallback" is the first thing they should be able to check.
Normalize at the seams. If your fleet stitches worker outputs into one artifact, run a consistency pass (formatting, naming, voice) after aggregation so mixed-model output does not leak into the deliverable.
Decide whether a fallback result is acceptable per task. For a bulk code migration, an Opus 4.8 result for one file is fine. For a task where only Fable 5's depth is the point, a refusal might mean "escalate to a human," not "silently downgrade."

The goal is that a reader of the final artifact cannot tell which workers were refused and re-served, because you designed for that outcome instead of discovering it.

Idempotency and retry budgets

Retries are where a naive fallback turns into a runaway. Two guardrails matter.

Idempotency. A worker that gets refused, retried, and then partially completes must not double-apply its side effects. If a worker writes a file, opens a PR, or posts a result, tag each unit of work with a stable idempotency key so a retried attempt overwrites rather than duplicates. This is ordinary distributed-systems hygiene, but refusals make it non-optional because refusal-driven retries are now a normal, frequent path rather than a rare error case.

Retry budgets. Cap how many fallback attempts a single task gets, and cap the aggregate fallback rate for a run. A per-task budget of one Fable 5 attempt plus one Opus 4.8 attempt is a reasonable default. Without a budget, a systematically refused category of work (say, every worker touching a security module) can quietly double your spend and latency as every task burns its full retry allowance.

// illustrative retry-budget guard
async function runWorker(task: Task, budget: RetryBudget): Promise<WorkerResult> {
  const result = await withRefusalFallback(task.request);
  if (result.refused && !budget.tryConsume()) {
    return { ...result, ok: false }; // out of budget - escalate, do not loop
  }
  return result;
}

The failure mode to design against is the fleet that "works" but silently costs 2x because a whole task category is being refused and re-served on every run, and nobody is watching the number.

Refusal rate is a fleet health metric

The most important shift is treating the refusal rate as a first-class operational signal, right next to latency and error rate.

Emit a counter every time a worker is refused, tagged by task type, stage, and prompt template. Then watch it:

A sudden spike in one task category usually means a prompt or an input started tripping the classifier. That is a debugging lead, not noise.
A slow climb across the fleet can mean your workload is drifting toward topics the post-return classifier treats conservatively.
A near-zero rate everywhere on a fleet that touches security or debugging work is itself suspicious. It may mean your detection is broken and refusals are being silently swallowed as "successful" empty results.

Because the post-return classifier deliberately trades false positives for safety, a healthy Fable 5 fleet has a non-zero baseline refusal rate. The number to alert on is a change, not the presence of refusals. Establish the baseline in the first week, chart it per task type, and page on deviations.

A practical dashboard has three lines: total requests, refusals, and fallback resolutions. When those three move together, your fleet is absorbing refusals as designed. When refusals climb but fallback resolutions do not, you have workers dropping refused work on the floor, which is exactly the silent corruption this whole post is about.

The day-one checklist

Check stop_reason on every completion. Make refused a real field on the worker result.
Wrap the client in middleware with an Opus 4.8 fallback. Keep server-side fallbacks on as a floor.
Keep prompts model-agnostic and tag every result with its producing model.
Add idempotency keys to any worker with side effects.
Set per-task and per-run retry budgets. Escalate out-of-budget refusals instead of looping.
Emit and chart refusal rate by task type. Alert on change, not on presence.

None of this is exotic. It is the reliability engineering that a 200-that-means-refusal forces you to do up front instead of after your first quietly corrupted run.

Frequently Asked Questions

How do I tell a refusal apart from a normal short answer?

Check stop_reason, not the length of the text. A refusal returns stop_reason: "refusal" on an otherwise normal 200 response. A short but legitimate answer returns end_turn. Never infer a refusal from a suspiciously short string, because that is exactly the ambiguity that lets refusals slip through as "successful" results.

Am I billed for a refused request?

If Fable 5 refuses before producing any output, you are not billed for that attempt, per Anthropic's guidance. That is what makes the detect-and-fall-back pattern cheap: the refused attempt costs nothing and only the fallback attempt on Opus 4.8 bills.

Should I use the server-side `fallbacks` parameter or write my own?

Use SDK middleware as your default so you can emit metrics and tag results, and keep server-side fallbacks enabled as a floor so even un-wrapped call sites are covered. Reserve fully manual, per-worker handling for steps with special requirements like a different fallback prompt or routing to a human queue.

Why is this a day-one requirement instead of an edge case?

Because the post-return safety classifier that shipped with Fable 5's July 1 redeployment blocks the reported technique more than 99 percent of the time at the cost of more false positives on benign coding and debugging. If your fleet does that kind of work, legitimate requests will be refused, so the fallback path is a normal operating condition rather than a rare exception.

The failure that does not look like a failure

Detecting a refusal in every worker loop

Three ways to fall back, and when to use each

1. Server-side fallbacks

2. SDK middleware

3. Manual fallback in the worker

The MCP 2026-07-28 Rewrite: What Breaks and How to Migrate

Webernetes: Kubernetes Ported to the Browser in TypeScript

Claude Code Is Steganographically Marking Requests

Claude in Microsoft Foundry on Azure: Developer Guide 2026

Designing the Opus 4.8 fallback so results stay consistent

Idempotency and retry budgets

Refusal rate is a fleet health metric

The day-one checklist

Frequently Asked Questions

How do I tell a refusal apart from a normal short answer?

Am I billed for a refused request?

Should I use the server-side fallbacks parameter or write my own?

Why is this a day-one requirement instead of an edge case?

Sources

Long-Horizon Agents: What Fable 5's 1M Context and Memory Actually Unlock

Handling Fable 5 Refusals: A Working Guide to the Fallback API

Orchestrating a Fleet of Agents with Fable 5

Related Tools

Claude Fable 5

Claude Agent SDK

Claude Code

Vercel AI SDK

Apps from Developers Digest

Overnight Agents

agentfs

Subagent Studio

Related Guides

Claude Code Setup Guide

Building Your First MCP Server

AI Agent Frameworks Compared: LangGraph vs CrewAI vs Mastra vs CopilotKit

Related Videos

Claude Mythos & Fable 5 Banned

Claude Fable 5 in 7 Minutes

TRAE: Custom AI Agents That Actually Understand Your Codebase

Related Posts

Long-Horizon Agents: What Fable 5's 1M Context and Memory Actually Unlock

Orchestrating a Fleet of Agents with Fable 5

Running Fable 5 Agent Fleets in Production: The Operations Guide

Fable 5 Is Back: The Anthropic Model the Government Switched Off

Running Fable 5 Agents on Vercel's eve Framework

Fable 5 vs Opus 4.8: Which Should Orchestrate Your Agents?

Get Smarter About AI Dev

The failure that does not look like a failure

Detecting a refusal in every worker loop

Three ways to fall back, and when to use each

1. Server-side fallbacks

2. SDK middleware

3. Manual fallback in the worker

The MCP 2026-07-28 Rewrite: What Breaks and How to Migrate

Webernetes: Kubernetes Ported to the Browser in TypeScript

Claude Code Is Steganographically Marking Requests

Claude in Microsoft Foundry on Azure: Developer Guide 2026

Designing the Opus 4.8 fallback so results stay consistent

Idempotency and retry budgets

Refusal rate is a fleet health metric

The day-one checklist

Frequently Asked Questions

How do I tell a refusal apart from a normal short answer?

Am I billed for a refused request?

Should I use the server-side fallbacks parameter or write my own?

Why is this a day-one requirement instead of an edge case?

Sources

Long-Horizon Agents: What Fable 5's 1M Context and Memory Actually Unlock

Handling Fable 5 Refusals: A Working Guide to the Fallback API

Orchestrating a Fleet of Agents with Fable 5

Related Tools

Claude Fable 5

Claude Agent SDK

Claude Code

Vercel AI SDK

Apps from Developers Digest

Overnight Agents

agentfs

Subagent Studio

1. Server-side `fallbacks`

Should I use the server-side `fallbacks` parameter or write my own?

1. Server-side `fallbacks`

Should I use the server-side `fallbacks` parameter or write my own?