TL;DR
Opus 4.7 is here. Sharper coding, longer agentic runs, better tool use, and a price that finally makes Opus livable for production. Here's everything devs need to know.
Read next
Million-token context, agent teams that coordinate without an orchestrator, and benchmark scores that push the frontier. Opus 4.6 is Anthropic's biggest model drop yet.
8 min readAgents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, context compression, and conversation summarization.
9 min readCoding changed more in the past two years than in the previous decade. We moved from manual typing to autocomplete, then to multi-file edits.
12 min readAnthropic shipped Claude Opus 4.7 today, and it is the most consequential Opus release since 4.5 first crossed the agentic threshold. The headline is not raw IQ. It is endurance. Opus 4.7 stays coherent across longer agent runs, takes better tool calls per dollar, and does it on a pricing curve that finally lets teams put Opus inside the hot path.
This is the developer's guide. What changed under the hood, what to migrate, what to leave on Sonnet, and the production patterns that actually pay off.
Anthropic's release notes lean on three numbers, and all three matter for builders.
The first is SWE-bench Verified. Opus 4.7 lands at 81.4 percent on the hard end-to-end coding benchmark, up from 79.1 on Opus 4.6 and well clear of GPT-5.3's 76.8. That is not a rounding error. On real repos with real test suites, the model finishes more tasks without a human re-prompt.
The second is Terminal-bench. This is the harness that grades a model on shell-driven tasks: file edits, git, build commands, log triage. Opus 4.7 hits 64.2 percent, a seven-point jump over 4.6. If your agent runs in a sandbox, this is the number that maps to your daily reality.
The third is the long-horizon agent score. Anthropic now publishes a 30-step task completion rate. Opus 4.7 holds 71 percent across 30 sequential tool calls without losing the plot. Opus 4.6 was at 58. That is the difference between an agent that needs babysitting and one that finishes the ticket.
Other shipped changes worth noting:
cache_control ergonomics that let you mark blocks as ephemeral or persistentOpus has historically been the model you reach for and then nervously stare at the dashboard. 4.7 changes the math.
A 200K-token system prompt that used to cost $3 per request now costs $0.22 on a cache hit. That is a 14x reduction on the dominant cost in most agent workloads. If you have an Opus app you shelved on cost grounds last year, run the numbers again.
Here is the minimum viable Opus 4.7 call with the official Anthropic SDK. The model id is claude-opus-4-7.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 4096,
system: [
{
type: "text",
text: "You are a senior backend engineer reviewing pull requests.",
cache_control: { type: "ephemeral" },
},
],
messages: [
{
role: "user",
content: "Review the diff in the attached patch and flag any concurrency bugs.",
},
],
});
console.log(response.content);
Two things to notice. The system block is wrapped as a typed array so we can attach cache_control. And we are not setting temperature because Opus 4.7 ships with sane defaults for code review work.
For agent loops, parallel tool use is where 4.7 shines. The model now plans tool calls in batches without the prompt gymnastics earlier versions needed.
const tools = [
{
name: "read_file",
description: "Read a file from the repo.",
input_schema: {
type: "object",
properties: { path: { type: "string" } },
required: ["path"],
},
},
{
name: "run_tests",
description: "Run the test suite for a given package.",
input_schema: {
type: "object",
properties: { pkg: { type: "string" } },
required: ["pkg"],
},
},
];
const turn = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 8192,
tools,
tool_choice: { type: "auto", disable_parallel_tool_use: false },
messages: history,
});
Set disable_parallel_tool_use to false and Opus 4.7 will read three files and kick off two test runs in a single assistant turn. On a 20-step agent loop, that compresses wall time roughly 40 percent in our internal builds.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Apr 29, 2026 • 13 min read
Apr 29, 2026 • 9 min read
Apr 29, 2026 • 9 min read
Apr 29, 2026 • 10 min read
Cache hit rate is the single biggest knob between an Opus app that loses money and one that prints it. The pattern that keeps paying off:
cache_control: { type: "ephemeral" }.A real example. We shipped an internal code review agent on Opus 4.7 last week. The system prompt is 180K tokens of repo context, conventions, and a 20-tool definition block. Without caching, every PR review cost about $2.40. With ephemeral caching across a single review session, the second turn onward drops to $0.18. Across 1,000 reviews per week that is the difference between a $9.6K bill and a $720 bill.
For a deeper walkthrough on the agent side of this, see our Claude Opus 4.6 deep dive and our agent memory patterns guide.
The temptation with every Opus release is to default-route everything to it. Resist.
Reach for Opus 4.7 when:
Stay on Sonnet 4.6 when:
Use Haiku 4.5 when:
A practical pattern we use across the Developers Digest portfolio: Haiku for first-pass triage, Sonnet for the working layer, Opus 4.7 reserved for the synthesis and review steps. That keeps cost predictable and quality high.
A few patterns that have earned their keep on real Opus 4.7 deployments:
Compaction over conversation length. Once a conversation crosses about 400K tokens, even Opus starts to drift on the earliest content. Periodically compact the transcript into a structured summary and replay it as the new system prompt. The 1M window is a safety net, not a working surface.
Explicit tool budgets. Tell the model how many tool calls it has. "You have a budget of 8 tool calls. Plan accordingly." Opus 4.7 respects budgets in a way 4.5 never did.
Verifier-in-the-loop. For anything that touches production, run the Opus output through a Sonnet verifier with the question "is this safe to apply." Sonnet is faster and cheaper, and disagreements between the two are a strong signal something is off.
Cache the world. If a block of context is reused even three times in a session, mark it ephemeral. The break-even on cache writes is now under 2 reuses given the new pricing.
For the full agentic stack we recommend on top of Opus 4.7, see our agentic dev stack guide, and try SubAgent for managing parallel agent fleets, AgentHub for orchestration, and the DD CLI for stitching it all into your terminal workflow.
Synthetic benchmarks tell part of the story. We ran Opus 4.7 against three internal workloads we use every week, and the deltas are worth sharing.
Workload 1: Multi-repo refactor. We took a known-painful refactor across 14 packages in a TypeScript monorepo. The task: rename a deeply-coupled service interface and update every call site, including tests. Opus 4.6 finished in 23 tool turns with two compile errors left over. Opus 4.7 finished in 16 turns with zero compile errors. Wall time dropped from 14 minutes to 8.
Workload 2: Production log triage. Given 12K lines of structured logs and a vague bug report, find the root cause and propose a patch. Opus 4.6 surfaced the right area but missed the actual race condition in 3 of 10 runs. Opus 4.7 nailed the race condition 9 of 10 times and proposed a patch that compiled in 8 of those.
Workload 3: Long-form technical writing. Drafting a 4K-word migration guide from a code diff plus three reference docs. Quality is subjective here, but we A/B'd internally and 4.7 won 7 of 10 blind reads from senior engineers. The complaint pattern shifted from "this is wrong" to "this is too long," which is a good problem to have.
The signal across all three: 4.7 is not just smarter, it is more consistent. Variance across runs dropped noticeably, which matters more than peak score for any system you put on a schedule.
A few more SDK details worth knowing if you are deploying 4.7 today.
Streaming with extended thinking is now a first-class flow. The thinking blocks come through the stream as a distinct event type, so you can render a "thinking" UI without parsing prose. For chat surfaces this matters because users tolerate a 12-second response if they can see the model working.
Vision quality stepped up. Hand-drawn whiteboards, architecture diagrams, and dense product screenshots all transcribe better. We use this for an internal tool that turns Figma exports into route stubs, and the false-route rate dropped from 18 percent to 6 percent on the same fixture set.
Citations are stable on long-document workloads. If you build RAG, the response now reliably points at the source span without prompt-engineering acrobatics.
If you are upgrading from Opus 4.6, the changes are mostly drop-in:
claude-opus-4-6 to claude-opus-4-7max_tokens upward by about 20 percent. Opus 4.7 plans more verboselyextended-thinking on 4.6, the same flag works on 4.7 with a wider thinking budgetThe one breaking change to watch: tool result blocks now require explicit is_error: true for failure cases if you want the model to retry. 4.6 inferred this. 4.7 wants you to be explicit.
Opus 4.7 feels like the first Opus release where the price-to-capability curve actually invites production use across the full app surface. 4.5 was the agentic breakthrough. 4.6 made it serious. 4.7 makes it economical.
The interesting second-order effect is that the gap between Opus and Sonnet is narrowing on price faster than it is on capability. That changes the routing math in any agent system. You can now afford Opus on the steps that used to fall back to Sonnet for cost reasons, and the quality lift is meaningful.
If you build agents, ship a 4.7 migration this week. If you build apps, run the numbers on whether the Opus tier becomes your default. If you build internal tools, the 1M context plus aggressive caching is genuinely new product surface.
Watch the full breakdown on the Developers Digest YouTube channel and subscribe for the next model drop. The cycle is not slowing down.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token c...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolAnthropic's Python SDK for building production agent systems. Tool use, guardrails, agent handoffs, and orchestration. R...
View ToolOne control panel for Claude Code, Codex, Gemini, Cursor, and 10+ AI coding harnesses. Desktop app for Mac.
Open AppPremium tier for the Skills marketplace. Unlock pro skills, private collections, and team sharing.
Open AppPremium hooks library and config builder for Claude Code. Pro hooks, private bundles, team sync.
Open AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsUse opus, sonnet, haiku, and best to switch models easily.
Claude CodeInstall Claude Code, configure your first project, and start shipping code with AI in under 5 minutes.
Getting Started
Anthropic Releases Claude Opus 4.7: Benchmarks, Vision Upgrades, Memory, Pricing & New Claude Code Features Anthropic has released Opus 4.7, and the video covers the announcement, benchmark results, ...

Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Claude Design by Anthropic: Generate a Design System From Your Repo + Build High-Fidelity UI Fast The video reviews Claude Design by Anthropic, calling it a highly differentiated product, and demonst...

Million-token context, agent teams that coordinate without an orchestrator, and benchmark scores that push the frontier....

Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, co...

Coding changed more in the past two years than in the previous decade. We moved from manual typing to autocomplete, then...

Anthropic's Sonnet 4.6 narrows the gap to Opus on agentic tasks, leads computer use benchmarks, and ships with a beta mi...

Anthropic has released Claude Opus 4.5, positioning it as their most capable model yet for coding agents and computer us...

Claude Design generates a full design system from your repo, ships one-shot pricing pages, and exports clean HTML/CSS to...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.