GPT-5: OpenAI's Most Capable Model

A Unified Architecture That Thinks Before It Acts
GPT-5 introduces a fundamentally different approach to inference. Instead of forcing developers to manually configure reasoning parameters, the model operates as a unified system with real-time routing based on query complexity.
Tell it to "think hard" about a difficult problem, and it allocates additional compute. Ask a simple conversational question, and it responds immediately without burning tokens on unnecessary test-time compute. This dynamic routing eliminates the guesswork of selecting between fixed reasoning modes while keeping costs predictable.
Real-World Performance Beyond Benchmarks
OpenAI optimized GPT-5 for practical utility, not just leaderboard scores. The focus areas, writing, coding, and health, represent ChatGPT's most common use cases.
Hallucination rates are down. Instruction following is tighter. But the real difference shows up in qualitative output.
Front-End Coding Leap
The model demonstrates measurable improvements in front-end development. During demonstrations, GPT-5 generated complete interactive applications: a physics-based ball-rolling game, a pixel art canvas, a typing trainer, a drum simulator, and a lofi music environment. One standout example was a 3JS-style castle defense game with interactive balloon targeting, built entirely from a text prompt within Cursor.
Health Queries That Actually Feel Human
When asked about cancer risk factors, previous models like O3 responded with dry tables and bullet-point citations. GPT-5 leads with empathy: "I'm sorry you're dealing with this worry. Many people have the same question." The information is equally accurate, but the delivery respects the emotional weight of the query.

Benchmark Analysis: Intelligence Per Token
Artificial Analysis' aggregate Intelligence Index, combining MMLU, GPQA Diamond, Humanity's Last Exam, and Live CodeBench, places GPT-5 (high mode) at state-of-the-art. Even GPT-5 medium outperforms the best competing models.
The efficiency curve is where it gets interesting. GPT-5 low ranks above Claude 4 Sonnet Thinking and approaches Qwen 3 235B, while using significantly fewer tokens. When plotting intelligence against output tokens consumed, GPT-5 dominates the curve, delivering superior results at lower cost and latency than Grok 4.

Where It Wins and Where It Trails
GPT-5 takes best-in-class status on MMLU Pro, Humanity's Last Exam, AMIE medical evaluations, long-context tasks, and instruction following. GPQA Diamond still belongs to Grok 4. On Live CodeBench, it trails O4 mini (high) and Grok.
LM Arena human preference data shows GPT-5 beating Gemini 2.5 Pro on text responses and dominating WebDev Arena against Gemini 2.5 Pro, DeepSeek R1, and Claude 4 Opus.
ARC-AGI scores put GPT-5 high at 65.7 versus Grok 4's 66.7, but GPT-5 achieves this at roughly half the cost per task.
The API: Four Models, One Architecture
The GPT-5 family launches with four variants:
| Model | Input | Output | Use Case |
|---|---|---|---|
| GPT-5 | $1.25/M | $10/M | Flagship performance |
| GPT-5 Mini | $0.25/M | $2/M | Balanced speed and capability |
| GPT-5 Nano | Lower cost | Lower cost | Latency-sensitive applications |
| GPT-5 Chat | Optimized | Optimized | Conversational interfaces |
All four support multimodal inputs (text and image), function calling, structured outputs, and streaming. The flagship model adds predicted outputs for efficient code refactoring and text editing workflows.
Context window is 400,000 tokens across the board, with 128,000 max output tokens. Pricing undercuts Grok 4 and Claude 4 Sonnet Thinking ($3/$15 per million) while matching Gemini 2.5 Pro's rates with superior performance.
Developer Validation
Cognition's Junior Dev Eval, the benchmark behind the Devin coding agent, shows GPT-5 outperforming Sonnet and GPT-4.1 on exploration, planning, and code execution.
The Cursor CEO publicly called it the best coding model they've used to date. During OpenAI's livestream, the model resolved a GitHub issue in real-time. Both Windsurf and Cursor are offering GPT-5 access to users immediately.

Availability
GPT-5 is rolling out to all ChatGPT users today. Plus subscribers receive expanded usage limits. Pro subscribers unlock GPT-5 Pro, the equivalent of API high mode, for extended reasoning on complex problems.

