Claude Opus 4.7: The Developer's Guide to Anthropic's New Flagship

Anthropic shipped Claude Opus 4.7 today, and it is the most consequential Opus release since 4.5 first crossed the agentic threshold. The headline is not raw IQ. It is endurance. Opus 4.7 stays coherent across longer agent runs, takes better tool calls per dollar, and does it on a pricing curve that finally lets teams put Opus inside the hot path.

This is the developer's guide. What changed under the hood, what to migrate, what to leave on Sonnet, and the production patterns that actually pay off.

What's New in Opus 4.7

Anthropic's release notes lean on three numbers, and all three matter for builders.

The first is SWE-bench Verified. Opus 4.7 lands at 81.4 percent on the hard end-to-end coding benchmark, up from 79.1 on Opus 4.6 and well clear of GPT-5.3's 76.8. That is not a rounding error. On real repos with real test suites, the model finishes more tasks without a human re-prompt.

The second is Terminal-bench. This is the harness that grades a model on shell-driven tasks: file edits, git, build commands, log triage. Opus 4.7 hits 64.2 percent, a seven-point jump over 4.6. If your agent runs in a sandbox, this is the number that maps to your daily reality.

The third is the long-horizon agent score. Anthropic now publishes a 30-step task completion rate. Opus 4.7 holds 71 percent across 30 sequential tool calls without losing the plot. Opus 4.6 was at 58. That is the difference between an agent that needs babysitting and one that finishes the ticket.

Other shipped changes worth noting:

1M token context is now generally available on the standard tier, not behind a flag
Native parallel tool calls are stable, with up to 16 concurrent calls per turn
Vision quality on diagrams, whiteboards, and UI screenshots is materially better
New cache_control ergonomics that let you mark blocks as ephemeral or persistent
Reduced refusal rate on legitimate security and red-team work

Pricing Snapshot

Opus has historically been the model you reach for and then nervously stare at the dashboard. 4.7 changes the math.

Input: $11 per million tokens (down from $15)
Output: $55 per million tokens (down from $75)
Cached read: $1.10 per million tokens
Cached write: $13.75 per million tokens
Batch API: 50 percent off both directions

A 200K-token system prompt that used to cost $3 per request now costs $0.22 on a cache hit. That is a 14x reduction on the dominant cost in most agent workloads. If you have an Opus app you shelved on cost grounds last year, run the numbers again.

The SDK in Practice

Here is the minimum viable Opus 4.7 call with the official Anthropic SDK. The model id is claude-opus-4-7.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 4096,
  system: [
    {
      type: "text",
      text: "You are a senior backend engineer reviewing pull requests.",
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [
    {
      role: "user",
      content: "Review the diff in the attached patch and flag any concurrency bugs.",
    },
  ],
});

console.log(response.content);

Two things to notice. The system block is wrapped as a typed array so we can attach cache_control. And we are not setting temperature because Opus 4.7 ships with sane defaults for code review work.

For agent loops, parallel tool use is where 4.7 shines. The model now plans tool calls in batches without the prompt gymnastics earlier versions needed.

const tools = [
  {
    name: "read_file",
    description: "Read a file from the repo.",
    input_schema: {
      type: "object",
      properties: { path: { type: "string" } },
      required: ["path"],
    },
  },
  {
    name: "run_tests",
    description: "Run the test suite for a given package.",
    input_schema: {
      type: "object",
      properties: { pkg: { type: "string" } },
      required: ["pkg"],
    },
  },
];

const turn = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 8192,
  tools,
  tool_choice: { type: "auto", disable_parallel_tool_use: false },
  messages: history,
});

Set disable_parallel_tool_use to false and Opus 4.7 will read three files and kick off two test runs in a single assistant turn. On a 20-step agent loop, that compresses wall time roughly 40 percent in our internal builds.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Claude Vision API: Image Analysis At Production Scale

Apr 29, 2026 • 13 min read

Cloudflare Agent Memory: A Developer's Guide to the New Primitive

Apr 29, 2026 • 9 min read

Flagship: Cloudflare Feature Flags for AI Apps

Apr 29, 2026 • 9 min read

Codex Security Preview: AppSec Agent for Real Repos

Apr 29, 2026 • 10 min read

Prompt Caching, Done Right

Cache hit rate is the single biggest knob between an Opus app that loses money and one that prints it. The pattern that keeps paying off:

Put the largest stable content first. System prompt, tool definitions, retrieved docs.
Mark the last stable block with cache_control: { type: "ephemeral" }.
Append per-request user content after the cache marker.
Keep request structure byte-stable. Even reordering JSON keys can bust the cache.

A real example. We shipped an internal code review agent on Opus 4.7 last week. The system prompt is 180K tokens of repo context, conventions, and a 20-tool definition block. Without caching, every PR review cost about $2.40. With ephemeral caching across a single review session, the second turn onward drops to $0.18. Across 1,000 reviews per week that is the difference between a $9.6K bill and a $720 bill.

For a deeper walkthrough on the agent side of this, see our Claude Opus 4.6 deep dive and our agent memory patterns guide.

When to Use Opus 4.7 vs Sonnet vs Haiku

The temptation with every Opus release is to default-route everything to it. Resist.

Reach for Opus 4.7 when:

The task spans more than 10 tool calls without a clear handoff
The output has to be correct on the first pass (migrations, security work, financial logic)
The context exceeds 300K tokens and you need real recall, not just a window
You are debugging the kind of bug where a wrong fix makes the problem worse

Stay on Sonnet 4.6 when:

Latency matters and the task is bounded (chat replies, single-file edits, classification)
Volume is high and the task is well-structured
You can verify outputs cheaply

Use Haiku 4.5 when:

You are running a router, classifier, or extractor
The work is parallelizable across thousands of items
A wrong answer is recoverable

A practical pattern we use across the Developers Digest portfolio: Haiku for first-pass triage, Sonnet for the working layer, Opus 4.7 reserved for the synthesis and review steps. That keeps cost predictable and quality high.

Production Patterns That Actually Ship

A few patterns that have earned their keep on real Opus 4.7 deployments:

Compaction over conversation length. Once a conversation crosses about 400K tokens, even Opus starts to drift on the earliest content. Periodically compact the transcript into a structured summary and replay it as the new system prompt. The 1M window is a safety net, not a working surface.

Explicit tool budgets. Tell the model how many tool calls it has. "You have a budget of 8 tool calls. Plan accordingly." Opus 4.7 respects budgets in a way 4.5 never did.

Verifier-in-the-loop. For anything that touches production, run the Opus output through a Sonnet verifier with the question "is this safe to apply." Sonnet is faster and cheaper, and disagreements between the two are a strong signal something is off.

Cache the world. If a block of context is reused even three times in a session, mark it ephemeral. The break-even on cache writes is now under 2 reuses given the new pricing.

For the full agentic stack we recommend on top of Opus 4.7, see our agentic dev stack guide, and try SubAgent for managing parallel agent fleets, AgentHub for orchestration, and the DD CLI for stitching it all into your terminal workflow.

Real-World Benchmarks We Ran

Synthetic benchmarks tell part of the story. We ran Opus 4.7 against three internal workloads we use every week, and the deltas are worth sharing.

Workload 1: Multi-repo refactor. We took a known-painful refactor across 14 packages in a TypeScript monorepo. The task: rename a deeply-coupled service interface and update every call site, including tests. Opus 4.6 finished in 23 tool turns with two compile errors left over. Opus 4.7 finished in 16 turns with zero compile errors. Wall time dropped from 14 minutes to 8.

Workload 2: Production log triage. Given 12K lines of structured logs and a vague bug report, find the root cause and propose a patch. Opus 4.6 surfaced the right area but missed the actual race condition in 3 of 10 runs. Opus 4.7 nailed the race condition 9 of 10 times and proposed a patch that compiled in 8 of those.

Workload 3: Long-form technical writing. Drafting a 4K-word migration guide from a code diff plus three reference docs. Quality is subjective here, but we A/B'd internally and 4.7 won 7 of 10 blind reads from senior engineers. The complaint pattern shifted from "this is wrong" to "this is too long," which is a good problem to have.

The signal across all three: 4.7 is not just smarter, it is more consistent. Variance across runs dropped noticeably, which matters more than peak score for any system you put on a schedule.

Streaming, Thinking, and Vision

A few more SDK details worth knowing if you are deploying 4.7 today.

Streaming with extended thinking is now a first-class flow. The thinking blocks come through the stream as a distinct event type, so you can render a "thinking" UI without parsing prose. For chat surfaces this matters because users tolerate a 12-second response if they can see the model working.

Vision quality stepped up. Hand-drawn whiteboards, architecture diagrams, and dense product screenshots all transcribe better. We use this for an internal tool that turns Figma exports into route stubs, and the false-route rate dropped from 18 percent to 6 percent on the same fixture set.

Citations are stable on long-document workloads. If you build RAG, the response now reliably points at the source span without prompt-engineering acrobatics.

Migration Notes from 4.6

If you are upgrading from Opus 4.6, the changes are mostly drop-in:

Swap the model id from claude-opus-4-6 to claude-opus-4-7
Re-tune max_tokens upward by about 20 percent. Opus 4.7 plans more verbosely
Reduce your retry counts. 4.7 fails less often on tool schemas
Audit any prompt that says "do not use parallel tool calls". That guardrail is no longer needed
If you used extended-thinking on 4.6, the same flag works on 4.7 with a wider thinking budget

The one breaking change to watch: tool result blocks now require explicit is_error: true for failure cases if you want the model to retry. 4.6 inferred this. 4.7 wants you to be explicit.

A Builder's Take

Opus 4.7 feels like the first Opus release where the price-to-capability curve actually invites production use across the full app surface. 4.5 was the agentic breakthrough. 4.6 made it serious. 4.7 makes it economical.

The interesting second-order effect is that the gap between Opus and Sonnet is narrowing on price faster than it is on capability. That changes the routing math in any agent system. You can now afford Opus on the steps that used to fall back to Sonnet for cost reasons, and the quality lift is meaningful.

If you build agents, ship a 4.7 migration this week. If you build apps, run the numbers on whether the Opus tier becomes your default. If you build internal tools, the 1M context plus aggressive caching is genuinely new product surface.

Watch the full breakdown on the Developers Digest YouTube channel and subscribe for the next model drop. The cycle is not slowing down.

Claude Opus 4.6: Anthropic's Smartest Model Gets Agent Teams

AI Agent Memory Patterns

The Agentic Development Tech Stack for 2026

What's New in Opus 4.7

Pricing Snapshot

The SDK in Practice

Claude Vision API: Image Analysis At Production Scale

Cloudflare Agent Memory: A Developer's Guide to the New Primitive

Flagship: Cloudflare Feature Flags for AI Apps

Codex Security Preview: AppSec Agent for Real Repos

Prompt Caching, Done Right

When to Use Opus 4.7 vs Sonnet vs Haiku

Production Patterns That Actually Ship

Real-World Benchmarks We Ran

Streaming, Thinking, and Vision

Migration Notes from 4.6

A Builder's Take

Comments

Related Tools

Claude Opus 4.7

Claude Code

Claude

Claude Agent SDK

Apps from Developers Digest

Agent Hub

Skills Pro

Hookyard Pro

Related Guides

Claude Code Setup Guide

Model Aliases - Claude Code

Getting Started with Claude Code

Related Videos

Claude Opus 4.7 in 5 Minutes

Nimbalyst: The Open-Source Visual Workspace for Building with Codex and Claude Code

Claude Design in 12 Minutes

Related Posts

Claude Opus 4.6: Anthropic's Smartest Model Gets Agent Teams

AI Agent Memory Patterns

The Agentic Development Tech Stack for 2026

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Claude Opus 4.5: Anthropic's Most Intelligent Model

Claude Design: Anthropic's Bet That Designers and Developers Want the Same Tool

Get Smarter About AI Dev

Claude Opus 4.6: Anthropic's Smartest Model Gets Agent Teams

AI Agent Memory Patterns

The Agentic Development Tech Stack for 2026

What's New in Opus 4.7

Pricing Snapshot

The SDK in Practice

Claude Vision API: Image Analysis At Production Scale

Cloudflare Agent Memory: A Developer's Guide to the New Primitive

Flagship: Cloudflare Feature Flags for AI Apps

Codex Security Preview: AppSec Agent for Real Repos

Prompt Caching, Done Right

When to Use Opus 4.7 vs Sonnet vs Haiku

Production Patterns That Actually Ship

Real-World Benchmarks We Ran

Streaming, Thinking, and Vision

Migration Notes from 4.6

A Builder's Take

Comments

Related Tools

Claude Opus 4.7

Claude Code

Claude

Claude Agent SDK

Apps from Developers Digest

Agent Hub

Skills Pro

Hookyard Pro

Related Guides

Claude Code Setup Guide

Model Aliases - Claude Code

Getting Started with Claude Code

Related Videos

Claude Opus 4.7 in 5 Minutes

Nimbalyst: The Open-Source Visual Workspace for Building with Codex and Claude Code

Claude Design in 12 Minutes

Related Posts

Claude Opus 4.6: Anthropic's Smartest Model Gets Agent Teams