
TL;DR
Cursor shipped Composer 2.5 in May 2026 - a 1T parameter agentic coding model that matches Opus 4.7 and GPT-5.5 on benchmarks at roughly one tenth the cost. Here is everything you need to know to use it effectively.
| Resource | Link |
|---|---|
| Cursor Composer 2.5 Announcement | cursor.com/blog/composer-2-5 |
| Cursor Pricing | cursor.com/pricing |
| Cursor Documentation | docs.cursor.com |
| Kimi K2.5 Base Model | Moonshot AI |
| SWE-bench Multilingual | swebench.com |
Cursor shipped Composer 2.5 on May 18, 2026 - just two months after Composer 2. The headline: it matches Claude Opus 4.7 and GPT-5.5 on coding benchmarks at roughly one tenth the cost per token. But the story underneath is more interesting than the benchmark numbers suggest.
Last updated: July 1, 2026
This guide covers what Composer 2.5 actually is, how to set it up, when to use it versus external models, and the training approach that made the performance jump possible.
Composer 2.5 is Cursor's own agentic coding model, purpose-built to plan, edit files, run terminal commands, and verify its own work inside the Cursor editor. It is not a general-purpose chatbot. The training and evaluation targets are software engineering trajectories, not single-shot Q&A.
Like its predecessor, Composer 2.5 is based on Moonshot's open-weights Kimi K2.5. The architecture is a mixture-of-experts transformer with 1.04 trillion parameters total and 32 billion active parameters per token. It supports up to 200,000 tokens of context with native function calling, reasoning, and context caching.
Inside Cursor, it can:
The key improvement over Composer 2 is sustained effort. Composer 2.5 maintains focus across long tasks, follows complex instructions more reliably, and calibrates how much work a request actually needs instead of over- or under-doing it.
Composer 2.5 ships in Cursor 3.4 and later (3.5 is the current release as of May 20, 2026).
Step 1: Open the Composer panel or chat sidebar with Cmd+I on macOS or Ctrl+I on Windows and Linux.
Step 2: Click the model picker in the top-right corner of the Composer panel.
Step 3: Select Composer 2.5 from the dropdown.
For interactive coding sessions, leave the default Fast variant on. For background agents and Cloud Agent runs, switch to the Standard variant in Settings > Models > Composer 2.5.
The Fast variant prioritizes low latency for real-time interactions. The Standard variant prioritizes quality for autonomous tasks where you are not waiting on each response.
Composer 2.5 ships in two variants:
| Variant | Input | Cached | Output |
|---|---|---|---|
| Standard | $0.50/MTok | $0.20/MTok | $2.50/MTok |
| Fast | $3.00/MTok | $0.50/MTok | $15.00/MTok |
For context, Claude Opus 4.8 is $5/$25 per MTok and GPT-5.5 runs between $10-$15/$30-$45 per MTok depending on variant. Composer 2.5 is meaningfully cheaper at the Standard tier.
The practical impact: Cursor reports that Composer 2.5 completes CursorBench tasks at an average cost of under $1, while Opus 4.7 and GPT-5.5 run between $3 and $11 per task for comparable results.
For Cursor subscribers, both variants draw from your usage pool. Pro users get it as part of their $20/month. Teams Standard and Teams Premium get it with their split usage pools (first-party models including Composer 2.5 get their own allocation as of July 1, 2026).
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jul 1, 2026 • 6 min read
Jul 1, 2026 • 5 min read
Jun 30, 2026 • 7 min read
Jun 30, 2026 • 8 min read
Here is where Composer 2.5 sits against the other frontier models as of mid-2026:
| Benchmark | Composer 2.5 | Claude Opus 4.7 | GPT-5.5 |
|---|---|---|---|
| SWE-bench Multilingual | 79.8% | 80.1% | 78.4% |
| CursorBench v3.1 | 63.2% | 64.8% | 62.7% |
| Terminal-Bench 2.0 | 69.5% | 70.2% | 82.7% |
The numbers tell a clear story:
Where Composer 2.5 competes: On multi-file coding tasks and repository-level refactors, Composer 2.5 matches Opus 4.7 and GPT-5.5 within noise. The benchmark differences are 1-2 percentage points - not enough to change your choice based on raw capability.
Where Composer 2.5 falls behind: Terminal-Bench 2.0 measures shell and terminal workflows - compiling code, setting up servers, system administration. GPT-5.5 leads by roughly 13 points. If your work is heavy in terminal trajectories, GPT-5.5 is the better tool.
Cost efficiency: At one tenth the token cost, Composer 2.5 is the default choice for agentic coding inside Cursor unless your task specifically benefits from Opus or GPT-5.5.
Cursor's training approach is worth understanding because it explains why Composer 2.5 improved so much over Composer 2 with the same base model.
25x more synthetic tasks. Composer 2.5 was trained on 25 times as many synthetic tasks as Composer 2. Cursor developed harder synthetic problems dynamically throughout the training run.
Feature deletion training. One method: the agent is given a working codebase with a full set of tests, asked to delete specific features while keeping the codebase functional, and then tasked with reimplementing those features. The tests serve as a verifiable reward signal - either the tests pass or they do not.
Targeted textual feedback. Instead of one reward signal at the end of a task, Cursor writes a short hint describing the fix they want, drops that hint into the agent's local context, and uses on-policy distillation to incorporate the behavior back into the model. This provides denser credit assignment than end-of-task rewards.
Agentic monitoring. The training pipeline includes monitors that detect and prevent reward hacking behaviors before they compound.
The infrastructure side: Cursor uses a sharded Muon optimizer with distributed orthogonalization and dual-mesh HSDP. They report 0.2s optimizer step time on the 1T parameter model - fast enough to iterate quickly on training runs.
Pick your model based on task type, not brand loyalty:
Use Composer 2.5 when:
Use Claude Opus 4.8 when:
Use GPT-5.5 when:
Use Fable 5 when:
Long refactors. Composer 2.5 excels at multi-file refactors that require sustained attention. Start with a clear instruction ("refactor all API handlers to use the new error handling pattern") and let it work through the codebase.
Test-driven development. Write failing tests first, then ask Composer 2.5 to implement the features. The verification loop gives it clear success criteria.
CI fixers. Point Composer 2.5 at a failing CI run and let it iterate. The combination of file editing and terminal access means it can run the tests locally, see the failures, and fix them.
Code review assistance. Use Composer 2.5 to review your own changes before committing. It can catch issues you missed and suggest improvements.
Batch operations. If you have 20 similar changes to make across a codebase, describe the pattern once and let Composer 2.5 apply it everywhere.
Not a replacement for external models in all cases. Terminal-Bench scores show GPT-5.5 is still better for shell-heavy work. For architecture decisions requiring the deepest reasoning, Opus or Fable 5 may justify the cost premium.
Cursor-native. Composer 2.5 is built for Cursor. If you are using VS Code, Neovim, or another editor, you need to use the external model APIs directly.
200K context window. Large but not unlimited. For massive codebases, you still need to be selective about what context you load.
Model-specific behaviors. Composer 2.5 is trained for agentic coding patterns. For general chat, creative writing, or non-coding tasks, general-purpose models may perform better.
Cursor Composer 2.5 is Cursor's own agentic coding model, released in May 2026. It is based on Moonshot's Kimi K2.5 with a mixture-of-experts architecture (1T parameters, 32B active per token). It is purpose-built for multi-file editing, terminal commands, and sustained agentic coding inside the Cursor editor.
Standard variant: $0.50 input, $0.20 cached, $2.50 output per million tokens. Fast variant: $3.00 input, $0.50 cached, $15.00 output per million tokens. For Cursor subscribers, usage draws from your plan's allocation.
On SWE-bench Multilingual and CursorBench, Composer 2.5 matches Opus 4.7 within 1-2 percentage points. Composer 2.5 costs roughly one tenth as much per token. Opus 4.7 may have an edge on tasks requiring the deepest architectural reasoning.
On coding benchmarks, the two are comparable. GPT-5.5 leads significantly on Terminal-Bench 2.0 (82.7% vs 69.5%) - for shell-heavy workflows, GPT-5.5 is the better choice. Composer 2.5 wins on cost.
Use Composer 2.5 as your default for agentic coding inside Cursor when cost matters. Reach for Opus 4.8 for deep reasoning tasks, GPT-5.5 for terminal-heavy work, and Fable 5 when the task justifies the premium price.
Up to 200,000 tokens with native function calling, reasoning, and context caching.
No. Composer 2.5 is integrated into Cursor and is not available as a standalone API. For external usage, you need Claude, GPT, or another API-accessible model.
25x more synthetic training tasks, feature deletion training with test-based rewards, targeted textual feedback for denser credit assignment, and agentic monitoring to prevent reward hacking.
Read next
Cursor just shipped Composer 2 - a major upgrade to their AI coding assistant. Here is what changed and why it matters.
5 min readEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Devin, and the Anthropic API - verified from live pricing pages on July 1, 2026. Fable 5 is back online today after export controls were lifted.
9 min readFable 5 landed on June 9, GitHub Copilot rewired its billing on June 1, and the tool-stack decisions you made in Q1 may need a rethink. Here is where every major coding tool stands right now.
9 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
AI-native code editor forked from VS Code. Composer mode rewrites multiple files at once. Tab autocomplete predicts your...
View ToolCodeium's AI-native IDE. Cascade agent mode handles multi-file edits autonomously. Free tier with generous limits. Stron...
View ToolInteractive TUI dashboard that shows exactly where your Claude Code and Cursor tokens are going, in real time.
View ToolMac app for running parallel Claude Code, Codex, and Cursor agents in isolated workspaces. Watch every agent work at onc...
View ToolA concrete step-by-step guide to moving your development workflow from Cursor to Claude Code - settings, rules, keybindings, and the habits that transfer.
Getting StartedConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI Agents
Cursor just shipped Composer 2 - a major upgrade to their AI coding assistant. Here is what changed and why it matters.

Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...

Fable 5 landed on June 9, GitHub Copilot rewired its billing on June 1, and the tool-stack decisions you made in Q1 may...

Uber burned through its entire 2026 AI tools budget by April. Microsoft is canceling Claude Code licenses company-wide....

Cursor and Devin Desktop have converged on similar pricing but diverged hard on philosophy. Here is what actually matter...

Benchmarks are useful, but frontend work fails in places leaderboards barely measure. Here is how Web Dev Arena turns AI...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.