OpenAI shipped GPT 5.4 and it matters. Not because it tops every benchmark--it doesn't--but because it changes what you can actually do with a model in production.
Two variants landed: GPT 5.4 Thinking and GPT 5.4. The first is the reasoning powerhouse. The second is the fast, capable default. Both have a million tokens of context and a new steerable thinking UX that lets you redirect the model's reasoning mid-response. That last part is new for everyone.
Let's break it down.
This is where OpenAI's pricing maze gets real.
GPT 5.4 Thinking is available on ChatGPT Plus ($20/mo), Teams, Pro, and Enterprise. That's the reasoning model most people will use.
GPT 5.4 (the non-thinking variant) is locked to the $200/month Pro tier. If you want both, you're paying Pro pricing.
The API is live for both. More on pricing below.
This is the standout UX innovation.
Previous thinking models gave you a plan upfront and then executed it. If the plan was wrong, you waited for it to finish and then corrected. Wasted tokens, wasted time.
GPT 5.4 Thinking shows you the plan as it forms and lets you steer it. Mid-response. You see the model's reasoning unfold and can inject corrections before it commits to a bad path.

This matters for complex tasks where the model's first interpretation of your prompt isn't what you meant. Instead of regenerating from scratch, you nudge. It's closer to pair programming than prompt engineering.
A million tokens of context, same as Opus 4.6. But OpenAI added a pricing twist: anything beyond 272k tokens costs 2x. So you can use the full million, but you'll pay for it.
For most workflows, 272k is plenty. If you're feeding entire codebases or long document chains, budget accordingly.
The headline number is OSWorld Verified--a benchmark for computer use tasks. GPT 5.4 hits 75%. Humans score 72.4%. That's not a typo. The model outperforms average human operators on structured computer tasks.
| Benchmark | GPT 5.4 | GPT 5.3 | Claude Opus 4.6 | Humans |
|---|---|---|---|---|
| OSWorld Verified | 75.0% | 58.3% | 62.1% | 72.4% |
| BrowseComp | 71.2% | 49.7% | 53.8% | -- |
| WebArena | 68.4% | 51.2% | 55.6% | -- |
| Agentic Coding (SWE-bench) | 74.1% | 69.2% | 72.8% | -- |
BrowseComp and WebArena show meaningful jumps too. These are real-world browser automation tasks--navigating sites, filling forms, extracting data. If you're building agents that interact with the web, these numbers translate directly.

Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
OpenAI is leaning into "knowledge work" as a category. Think polished documents, presentations, structured reports. The outputs are noticeably more formatted and complete than 5.3. Fewer rough edges. Better structure.
This is less relevant for developers and more relevant if you're using the API to generate client-facing content. But it signals where OpenAI sees the commercial opportunity: enterprise users who need production-ready documents, not raw text.
The computer use capabilities are where GPT 5.4 pulls ahead of the field. OSWorld Verified at 75% isn't just a benchmark win--it means the model can reliably execute multi-step browser workflows.
Navigate to a site. Find the right form. Fill it out. Submit. Verify the result. GPT 5.4 does this with higher reliability than any other model right now, including Opus 4.6.
If you're building browser automation agents, this is the model to test against.
The coding demos are strong. Web games, 3D simulations, complex frontend layouts--all generated with fewer iterations than 5.3. The Cursor team gave positive feedback on integration quality, which matters more than synthetic benchmarks for day-to-day coding workflows.
Where it really shines is frontend. HTML/CSS/JS generation is tighter. Fewer layout bugs. Better responsive handling. If you're using an AI coding assistant for UI work, GPT 5.4 is worth switching to.
Standard pricing for the API:
GPT 5.4:
Input: $2.50 / 1M tokens
Output: $10.00 / 1M tokens
GPT 5.4 Thinking:
Input: $5.00 / 1M tokens
Output: $20.00 / 1M tokens
Context beyond 272k tokens: 2x multiplier on both input and output
Compared to Opus 4.6 ($5 input / $25 output), GPT 5.4 is cheaper across the board. The non-thinking variant is half the cost of Opus on input. If your workload doesn't need extended reasoning, that's significant savings at scale.
The honest comparison: they're different tools for different jobs.
Opus 4.6 wins on: agentic terminal coding, long-horizon multi-step tasks, agent team coordination, agentic search. If you're running Claude Code with agent teams on complex codebases, Opus is still the frontier.
GPT 5.4 wins on: computer use, browser automation, frontend code generation, knowledge work output quality, and price-per-token. If you're building web agents or need polished document generation, GPT 5.4 is the better choice.
Neither model dominates everything. Pick based on your workload.
OpenAI also shipped a fast mode for Codex that runs 1.5x faster than the standard mode. If you're using Codex for batch code generation or CI pipelines, the speed improvement compounds.
This is a quiet but important update. Faster inference means tighter feedback loops. Tighter feedback loops mean more iterations per hour.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's cloud coding agent. Runs in a sandboxed container, reads your repo, executes tasks, and submits PRs. Uses GPT-5...
View ToolOpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View ToolNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
What MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsInstall the dd CLI and scaffold your first AI-powered app in under a minute.
Getting Started
The video reviews OpenAI’s newly released GPT 5.4, highlighting access tiers (GPT 5.4 Thinking in ChatGPT Plus/Teams/Pro/Enterprise and GPT 5.4 in the $200/month tier) and API availability....

OpenAI AI has launched their first browser called ChatGPT Atlas, which incorporates ChatGPT for enhanced functionality. This browser allows users to interact with their documents using natural...

OpenAI's New GPT Image Model API📸 Today OpenAI released their new GPT Image one model via API! 🌟 Last month, ChatGPT introduced Image Generation, and it quickly became a hit with over...

OpenAI is turning ChatGPT into a hub. The new Apps feature lets you access external services directly inside conversatio...

Claude Code now has a native Loop feature for scheduling recurring prompts - from one-minute intervals to three-day wi...

Anthropic dropped a batch of updates across Claude Code and Cowork - remote control from your phone, scheduled tasks,...