Cursor Composer 2: Everything You Need to Know

Official Sources

Resource	Link
Cursor Homepage	cursor.com
Cursor Documentation	docs.cursor.com
Cursor Blog	cursor.com/blog
Cursor Pricing	cursor.com/pricing
Cursor Changelog	cursor.com/changelog
Cursor Glass	cursor.com/glass

Cursor dropped Composer 2 today. It is their second-generation in-house coding model, and the jump from Composer 1 is significant. CursorBench scores went from 38.0 to 61.3. Terminal-Bench 2.0 went from 40.0 to 61.7. SWE-bench Multilingual climbed from 56.9 to 73.7. These are not incremental improvements. This is a fundamentally better model.

Cursor announced on X that Composer 2 achieves these benchmark results while staying cheaper than competing frontier models. They shared detailed benchmark comparisons showing the jump from Composer 1 to Composer 2 across every category. The team also highlighted the continued pretraining approach that made these gains possible, along with pricing details that undercut most of the market. The full writeup is on the Cursor blog.

The pricing is aggressive too. Standard tier runs $0.50/M input and $2.50/M output tokens. There is also a faster variant at $1.50/M input and $7.50/M output that ships as the default. Even the fast option undercuts most competing models at comparable intelligence levels.

What Changed Under the Hood

Composer 2 is the result of Cursor's first continued pretraining run. That is a big deal. Composer 1 was trained primarily through reinforcement learning on top of an existing base model. Composer 2 starts from a much stronger foundation because Cursor actually did continued pretraining on coding-specific data before layering RL on top.

For broader context, pair this with Cursor vs Claude Code in 2026 - Which Should You Use? and Every AI Coding Tool Compared: The 2026 Matrix; those companion pieces show where this fits in the wider AI developer workflow.

From that stronger base, they scaled their reinforcement learning on long-horizon coding tasks - the kind that require hundreds of sequential actions across files, terminals, and search tools. The model learned to plan more deliberately, use tools in parallel when it makes sense, and avoid premature edits. It reads before it writes. That behavioral shift alone makes it noticeably more reliable on real codebases.

The architecture remains mixture-of-experts, which is why the speed is still there. Most tasks complete in under 30 seconds, even with the quality jump.

The Benchmark Picture

Here is how Composer 2 stacks up against its predecessors:

Model	CursorBench	Terminal-Bench 2.0	SWE-bench Multilingual
Composer 2	61.3	61.7	73.7
Composer 1.5	44.2	47.9	65.9
Composer 1	38.0	40.0	56.9

The Terminal-Bench 2.0 numbers are particularly interesting. That benchmark tests real terminal-based agent work, the same kind of tasks you would use Claude Code or Codex for. Composer 2 scoring 61.7 puts it in the same conversation as the frontier models from Anthropic and OpenAI, but at a fraction of the cost.

SWE-bench Multilingual at 73.7 is strong. For context, that benchmark tests the model's ability to resolve real GitHub issues across multiple programming languages. Going from 56.9 to 73.7 in one generation is a 30% jump.

Our Own Testing

We tested Composer 2 against 5 other AI models on 10 web development tasks. Composer 2 achieved 10/10 task completion. See the full results on our Web Dev Arena.

Synthetic benchmarks tell part of the story, but real-world web dev tasks tell the rest. Composer 2 handled everything we threw at it - React component generation, API integration, database queries, auth flows, and multi-file refactors. It completed all 10 tasks without needing manual intervention. That is rare. Most models stumble on at least one or two edge cases in a set like this.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Cursor vs Codex: IDE Agent vs Terminal and Cloud Agent for TypeScript

Mar 19, 2026 • 5 min read

Gemini CLI: Free AI Coding With 1M Token Context

Mar 19, 2026 • 4 min read

GitHub Copilot in 2026: Still Worth It for TypeScript Developers?

Mar 19, 2026 • 5 min read

How to Build AI Agents in TypeScript

Mar 19, 2026 • 10 min read

How It Compares to Claude Code, Codex, and Windsurf

The AI coding landscape has gotten crowded. Here is where Composer 2 fits.

Claude Code still uses the best reasoning models available (Opus 4.6, Sonnet 4.6). For complex architectural decisions, novel problem-solving, and tasks where you need the model to think deeply before acting, Claude Code remains the strongest option. It is terminal-native, which some developers prefer and others avoid. The tradeoff is speed. Claude Code prioritizes accuracy over velocity.

OpenAI Codex runs on GPT-5.3 and has strong performance on structured engineering tasks. It is a solid all-rounder with good IDE integration. But it is more expensive per token than Composer 2, and for iterative coding work, the speed difference matters.

Windsurf takes a more guided approach with its Cascade system. It is good for developers who want more hand-holding and a structured workflow. But it does not have its own frontier model. It relies on third-party models, which means it is always one step behind on model quality.

Composer 2 carves out a specific niche: fast, cheap, and smart enough for most coding tasks. If you are doing iterative development where you send 20-30 prompts in a session, the speed advantage compounds. You stay in flow. You do not context-switch while waiting for responses. That matters more than most benchmarks capture.

The real answer, though, is that most serious developers use multiple tools. Use Composer 2 for fast iteration and routine work. Switch to Claude Code or Codex for the hard stuff. The tools are not mutually exclusive.

Who Should Use It

Use Composer 2 if you want speed. If your workflow is prompt-heavy and iterative, 30-second completions at $0.50/M input tokens are hard to beat. You will get more iterations per hour than any other option.

Use it for multi-agent parallel work. Cursor's multi-agent interface runs up to eight agents simultaneously with git worktree isolation. Composer 2 is the cheapest frontier-quality model you can run in those parallel slots. Running eight Claude Code agents in parallel gets expensive fast. Eight Composer 2 agents is reasonable.

Use it alongside other models. Cursor lets you swap models mid-session. Start with Composer 2 for scaffolding and routine edits, then switch to Sonnet 4.6 or GPT-5 for the parts that need deeper reasoning. This hybrid approach gives you the best of both worlds.

Skip it if accuracy on first attempt matters more than iteration speed. If you are running background agents on long autonomous tasks where you will not be reviewing intermediate steps, you want the smartest model possible. That is still Claude Code with Opus or Sonnet.

Where AI Coding Is Heading

Cursor building their own model is the signal that matters here. They are not just wrapping API calls to Anthropic and OpenAI anymore. They are training models specifically for their IDE, their tools, their workflow patterns. That vertical integration is powerful.

The broader trend is clear. The gap between "fast and cheap" models and "smart and expensive" models is closing. Composer 2 at $0.50/M input tokens delivers results that would have required a $15/M token model a year ago. That compression is accelerating.

We are also seeing the rise of model-switching as a first-class workflow. No single model wins every task. The winning setup in 2026 is an IDE that lets you fluidly move between models based on what you are doing right now. Cursor understood this early. Their multi-model, multi-agent architecture is built for exactly this future.

The next frontier is not smarter models. It is smarter coordination of multiple agents running multiple models on different parts of your codebase simultaneously. Cursor is betting heavily on that with Automations, Bugbot, and now Composer 2 as the cost-efficient workhorse model that makes running many agents economically viable.

Composer 2 is available now. Select it from the model dropdown in Cursor or try it in the new Glass interface alpha at cursor.com/glass.

FAQ

What is Cursor Composer 2?

Composer 2 is Cursor's second-generation in-house AI coding model. It was built through continued pretraining on coding-specific data followed by reinforcement learning on long-horizon coding tasks. The result is a significant jump in benchmark performance - CursorBench scores went from 38.0 (Composer 1) to 61.3 (Composer 2), with similar gains across Terminal-Bench 2.0 and SWE-bench Multilingual.

How much does Composer 2 cost?

Composer 2 has two pricing tiers. Standard runs at $0.50/M input and $2.50/M output tokens. The faster variant (the default) costs $1.50/M input and $7.50/M output tokens. Both undercut competing frontier models at similar intelligence levels. For Cursor Pro and Business subscribers, Composer 2 is included in the 500 "fast" requests per month.

How does Composer 2 compare to Claude Code?

Claude Code uses Anthropic's frontier models (Opus 4.6, Sonnet 4.6) and prioritizes accuracy over speed - ideal for complex architectural decisions and novel problem-solving. Composer 2 prioritizes speed and cost - completing most tasks in under 30 seconds at a fraction of the token cost. Many developers use both: Composer 2 for fast iteration and routine work, Claude Code for the hard stuff.

Can I use Composer 2 with other models in Cursor?

Yes. Cursor lets you swap models mid-session. A common workflow is starting with Composer 2 for scaffolding and routine edits, then switching to Sonnet 4.6 or GPT-5 for parts that need deeper reasoning. This hybrid approach maximizes both speed and quality.

What is Cursor Glass?

Glass is Cursor's new interface alpha available at cursor.com/glass. It provides an alternative way to interact with Composer 2 and other models outside the main Cursor IDE. The interface is designed for quick interactions and testing.

How many agents can run in parallel with Composer 2?

Cursor's multi-agent interface supports up to eight agents running simultaneously with git worktree isolation. Composer 2 is the most cost-effective frontier-quality model for these parallel slots - running eight Claude Code agents in parallel gets expensive fast, while eight Composer 2 agents remains economical.

What benchmarks did Composer 2 achieve?

Composer 2 scored 61.3 on CursorBench (up from 38.0 on Composer 1), 61.7 on Terminal-Bench 2.0 (up from 40.0), and 73.7 on SWE-bench Multilingual (up from 56.9). The SWE-bench Multilingual score is particularly notable - that benchmark tests the model's ability to resolve real GitHub issues across multiple programming languages.

When should I use Claude Code or Codex instead of Composer 2?

Use Claude Code or Codex when accuracy on first attempt matters more than iteration speed. If you're running background agents on long autonomous tasks where you won't review intermediate steps, you want the smartest model possible. Composer 2 excels at fast, iterative development where you're actively prompting and reviewing results - not at unsupervised autonomous work.

Official Sources

What Changed Under the Hood