Cursor dropped Composer 2 today. It is their second-generation in-house coding model, and the jump from Composer 1 is significant. CursorBench scores went from 38.0 to 61.3. Terminal-Bench 2.0 went from 40.0 to 61.7. SWE-bench Multilingual climbed from 56.9 to 73.7. These are not incremental improvements. This is a fundamentally better model.
Cursor announced on X that Composer 2 achieves these benchmark results while staying cheaper than competing frontier models. They shared detailed benchmark comparisons showing the jump from Composer 1 to Composer 2 across every category. The team also highlighted the continued pretraining approach that made these gains possible, along with pricing details that undercut most of the market. The full writeup is on the Cursor blog.
The pricing is aggressive too. Standard tier runs $0.50/M input and $2.50/M output tokens. There is also a faster variant at $1.50/M input and $7.50/M output that ships as the default. Even the fast option undercuts most competing models at comparable intelligence levels.
Composer 2 is the result of Cursor's first continued pretraining run. That is a big deal. Composer 1 was trained primarily through reinforcement learning on top of an existing base model. Composer 2 starts from a much stronger foundation because Cursor actually did continued pretraining on coding-specific data before layering RL on top.
From that stronger base, they scaled their reinforcement learning on long-horizon coding tasks - the kind that require hundreds of sequential actions across files, terminals, and search tools. The model learned to plan more deliberately, use tools in parallel when it makes sense, and avoid premature edits. It reads before it writes. That behavioral shift alone makes it noticeably more reliable on real codebases.
The architecture remains mixture-of-experts, which is why the speed is still there. Most tasks complete in under 30 seconds, even with the quality jump.
Here is how Composer 2 stacks up against its predecessors:
| Model | CursorBench | Terminal-Bench 2.0 | SWE-bench Multilingual |
|---|---|---|---|
| Composer 2 | 61.3 | 61.7 | 73.7 |
| Composer 1.5 | 44.2 | 47.9 | 65.9 |
| Composer 1 | 38.0 | 40.0 | 56.9 |
The Terminal-Bench 2.0 numbers are particularly interesting. That benchmark tests real terminal-based agent work, the same kind of tasks you would use Claude Code or Codex for. Composer 2 scoring 61.7 puts it in the same conversation as the frontier models from Anthropic and OpenAI, but at a fraction of the cost.
SWE-bench Multilingual at 73.7 is strong. For context, that benchmark tests the model's ability to resolve real GitHub issues across multiple programming languages. Going from 56.9 to 73.7 in one generation is a 30% jump.
We tested Composer 2 against 5 other AI models on 10 web development tasks. Composer 2 achieved 10/10 task completion. See the full results on our Web Dev Arena.
Synthetic benchmarks tell part of the story, but real-world web dev tasks tell the rest. Composer 2 handled everything we threw at it - React component generation, API integration, database queries, auth flows, and multi-file refactors. It completed all 10 tasks without needing manual intervention. That is rare. Most models stumble on at least one or two edge cases in a set like this.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
The AI coding landscape has gotten crowded. Here is where Composer 2 fits.
Claude Code still uses the best reasoning models available (Opus 4.6, Sonnet 4.6). For complex architectural decisions, novel problem-solving, and tasks where you need the model to think deeply before acting, Claude Code remains the strongest option. It is terminal-native, which some developers prefer and others avoid. The tradeoff is speed. Claude Code prioritizes accuracy over velocity.
OpenAI Codex runs on GPT-5.3 and has strong performance on structured engineering tasks. It is a solid all-rounder with good IDE integration. But it is more expensive per token than Composer 2, and for iterative coding work, the speed difference matters.
Windsurf takes a more guided approach with its Cascade system. It is good for developers who want more hand-holding and a structured workflow. But it does not have its own frontier model. It relies on third-party models, which means it is always one step behind on model quality.
Composer 2 carves out a specific niche: fast, cheap, and smart enough for most coding tasks. If you are doing iterative development where you send 20-30 prompts in a session, the speed advantage compounds. You stay in flow. You do not context-switch while waiting for responses. That matters more than most benchmarks capture.
The real answer, though, is that most serious developers use multiple tools. Use Composer 2 for fast iteration and routine work. Switch to Claude Code or Codex for the hard stuff. The tools are not mutually exclusive.
Use Composer 2 if you want speed. If your workflow is prompt-heavy and iterative, 30-second completions at $0.50/M input tokens are hard to beat. You will get more iterations per hour than any other option.
Use it for multi-agent parallel work. Cursor's multi-agent interface runs up to eight agents simultaneously with git worktree isolation. Composer 2 is the cheapest frontier-quality model you can run in those parallel slots. Running eight Claude Code agents in parallel gets expensive fast. Eight Composer 2 agents is reasonable.
Use it alongside other models. Cursor lets you swap models mid-session. Start with Composer 2 for scaffolding and routine edits, then switch to Sonnet 4.6 or GPT-5 for the parts that need deeper reasoning. This hybrid approach gives you the best of both worlds.
Skip it if accuracy on first attempt matters more than iteration speed. If you are running background agents on long autonomous tasks where you will not be reviewing intermediate steps, you want the smartest model possible. That is still Claude Code with Opus or Sonnet.
Cursor building their own model is the signal that matters here. They are not just wrapping API calls to Anthropic and OpenAI anymore. They are training models specifically for their IDE, their tools, their workflow patterns. That vertical integration is powerful.
The broader trend is clear. The gap between "fast and cheap" models and "smart and expensive" models is closing. Composer 2 at $0.50/M input tokens delivers results that would have required a $15/M token model a year ago. That compression is accelerating.
We are also seeing the rise of model-switching as a first-class workflow. No single model wins every task. The winning setup in 2026 is an IDE that lets you fluidly move between models based on what you are doing right now. Cursor understood this early. Their multi-model, multi-agent architecture is built for exactly this future.
The next frontier is not smarter models. It is smarter coordination of multiple agents running multiple models on different parts of your codebase simultaneously. Cursor is betting heavily on that with Automations, Bugbot, and now Composer 2 as the cost-efficient workhorse model that makes running many agents economically viable.
Composer 2 is available now. Select it from the model dropdown in Cursor or try it in the new Glass interface alpha at cursor.com/glass.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
AI-native code editor forked from VS Code. Composer mode rewrites multiple files at once. Tab autocomplete predicts your...
View ToolCodeium's AI-native IDE. Cascade agent mode handles multi-file edits autonomously. Free tier with generous limits. Stron...
View ToolStackBlitz's in-browser AI app builder. Full-stack apps from a prompt - runs Node.js, installs packages, and deploys....
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

Unveiling Cursor's Groundbreaking Agent Model: Composer In today's video, we delve into Cursor's latest updates, focusing on their newly launched agent model, Composer. Composer is a high-speed...

Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://v2.scrimba.com/the-ai-engineer-path-c02v?via=developersdigest Exploring Cursor's New Agentic Capabilities: A Hands-On...

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/DevelopersDigest . You’ll also get 20% off an annual premium subscription. In this video,...
From terminal agents to cloud IDEs - these are the AI coding tools worth using for TypeScript development in 2026.
Claude Code runs in your terminal. Cursor runs in an IDE. Both write TypeScript. Here is how to pick the right one.
Cursor edits code in your IDE. Codex runs in a cloud sandbox and submits PRs. Here is when to use each for TypeScript pr...