OpenAI's GPT 5.4 in 10 Minutes

Official Sources#

Resource	Link
GPT-5.4 Announcement	openai.com/index/introducing-gpt-5-4
GPT-5.4 System Card	openai.com/index/gpt-5-4-thinking-system-card
OpenAI Models Documentation	developers.openai.com/api/docs/models
OpenAI API Pricing	developers.openai.com/api/docs/pricing
OpenAI Python SDK	github.com/openai/openai-python
OSWorld Benchmark	os-world.github.io

Last updated: June 11, 2026. Verify model availability, pricing, and API details against the official OpenAI documentation.

OpenAI shipped GPT 5.4 and it matters. Not because it tops every benchmark--it doesn't--but because it changes what you can actually do with a model in production.

Two variants landed: GPT 5.4 Thinking and GPT 5.4. The first is the reasoning powerhouse. The second is the fast, capable default. Both have a million tokens of context and a new steerable thinking UX that lets you redirect the model's reasoning mid-response. That last part is new for everyone.

Let's break it down.

Access Tiers#

This is where OpenAI's pricing maze gets real.

For model-selection context, compare this with OpenAI Codex: Cloud AI Coding With GPT-5.3 and OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; the useful question is not only benchmark quality, but where the model fits in a real developer workflow.

GPT 5.4 Thinking is available on ChatGPT Plus ($20/mo), Teams, Pro, and Enterprise. That's the reasoning model most people will use.

GPT 5.4 (the non-thinking variant) is locked to the $200/month Pro tier. If you want both, you're paying Pro pricing.

The API is live for both. More on pricing below.

Steerable Thinking#

This is the standout UX innovation.

Previous thinking models gave you a plan upfront and then executed it. If the plan was wrong, you waited for it to finish and then corrected. Wasted tokens, wasted time.

GPT 5.4 Thinking shows you the plan as it forms and lets you steer it. Mid-response. You see the model's reasoning unfold and can inject corrections before it commits to a bad path.

Steerable thinking UI showing mid-response intervention

This matters for complex tasks where the model's first interpretation of your prompt isn't what you meant. Instead of regenerating from scratch, you nudge. It's closer to pair programming than prompt engineering.

Context and Efficiency#

A million tokens of context (1,050,000 listed on the model page), same ballpark as Opus 4.6. But OpenAI added a pricing twist: prompts beyond 272k input tokens are priced at 2x input and 1.5x output for the full session. So you can use the full million, but you'll pay for it.

For most workflows, 272k is plenty. If you're feeding entire codebases or long document chains, budget accordingly.

Benchmarks#

The headline number is OSWorld Verified--a benchmark for computer use tasks. GPT 5.4 hits 75%. Humans score 72.4%. That's not a typo. The model outperforms average human operators on structured computer tasks.

Benchmark	GPT 5.4	GPT 5.3	Claude Opus 4.6	Humans
OSWorld Verified	75.0%	58.3%	62.1%	72.4%
BrowseComp	71.2%	49.7%	53.8%	--
WebArena	68.4%	51.2%	55.6%	--
Agentic Coding (SWE-bench)	74.1%	69.2%	72.8%	--

BrowseComp and WebArena show meaningful jumps too. These are real-world browser automation tasks--navigating sites, filling forms, extracting data. If you're building agents that interact with the web, these numbers translate directly.

Benchmark comparison chart across computer use and coding tasks

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Claude Code: Remote Control, Auto Memory, Plugins & More

Feb 28, 2026 • 5 min read

Mercury 2: The LLM That Doesn't Generate Like an LLM

Feb 24, 2026 • 8 min read

Claude Code Worktrees: Parallel Development Without the Chaos

Feb 21, 2026 • 6 min read

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Feb 19, 2026 • 6 min read

Knowledge Work#

OpenAI is leaning into "knowledge work" as a category. Think polished documents, presentations, structured reports. The outputs are noticeably more formatted and complete than 5.3. Fewer rough edges. Better structure.

This is less relevant for developers and more relevant if you're using the API to generate client-facing content. But it signals where OpenAI sees the commercial opportunity: enterprise users who need production-ready documents, not raw text.

Browser Agent Workflows#

The computer use capabilities are where GPT 5.4 pulls ahead of the field. OSWorld Verified at 75% isn't just a benchmark win--it means the model can reliably execute multi-step browser workflows.

Navigate to a site. Find the right form. Fill it out. Submit. Verify the result. GPT 5.4 does this with higher reliability than any other model right now, including Opus 4.6.

If you're building browser automation agents, this is the model to test against.

Coding and Frontend Wins#

The coding demos are strong. Web games, 3D simulations, complex frontend layouts--all generated with fewer iterations than 5.3. The Cursor team gave positive feedback on integration quality, which matters more than synthetic benchmarks for day-to-day coding workflows.

Where it really shines is frontend. HTML/CSS/JS generation is tighter. Fewer layout bugs. Better responsive handling. If you're using an AI coding assistant for UI work, GPT 5.4 is worth switching to.

API Pricing#

Standard pricing for the API:

Code

GPT 5.4:
  Input:  $2.50 / 1M tokens
  Cached input: $0.25 / 1M tokens
  Output: $15.00 / 1M tokens

GPT 5.4 mini:
  Input:  $0.75 / 1M tokens
  Output: $4.50 / 1M tokens

GPT 5.4 nano:
  Input:  $0.20 / 1M tokens
  Output: $1.25 / 1M tokens

Prompts beyond 272k input tokens: 2x input, 1.5x output for the full session

One nuance worth knowing: the API exposes a single gpt-5.4 model rather than a separate Thinking SKU - the Thinking split is a ChatGPT product distinction. OpenAI's pricing page also lists a gpt-5.4-pro tier at $30 input / $180 output for the heaviest workloads.

Compared to Opus 4.6 ($5 input / $25 output), GPT 5.4 is cheaper across the board: half the cost on input, 40% cheaper on output. If your workload doesn't need a premium reasoning tier, that's significant savings at scale.

Versus Claude Opus 4.6#

The honest comparison: they're different tools for different jobs.

Opus 4.6 wins on: agentic terminal coding, long-horizon multi-step tasks, agent team coordination, agentic search. If you're running Claude Code with agent teams on complex codebases, Opus is still the frontier.

GPT 5.4 wins on: computer use, browser automation, frontend code generation, knowledge work output quality, and price-per-token. If you're building web agents or need polished document generation, GPT 5.4 is the better choice.

Neither model dominates everything. Pick based on your workload.

Codex Fast Mode#

OpenAI also shipped a fast mode for Codex that runs 1.5x faster than the standard mode. If you're using Codex for batch code generation or CI pipelines, the speed improvement compounds.

This is a quiet but important update. Faster inference means tighter feedback loops. Tighter feedback loops mean more iterations per hour.

Practical Next Steps#

Test browser automation workflows. If you have agents that navigate websites, GPT 5.4's computer use scores are best-in-class. Run your existing test suite against it.
Try steerable thinking on complex prompts. The mid-response intervention UX is genuinely new. It changes how you interact with reasoning models.
Compare costs. If you're running high-volume API calls with Opus, price out the same workload on GPT 5.4. The savings might justify a switch for certain tasks.
Watch the 272k boundary. That 2x pricing cliff is easy to hit if you're feeding large codebases. Monitor your token usage.

FAQ#

What's the difference between GPT 5.4 and GPT 5.4 Thinking?#

GPT 5.4 is the fast, capable default model optimized for low-latency tasks. GPT 5.4 Thinking is the reasoning variant that shows its chain-of-thought and supports mid-response steering. Use GPT 5.4 for general coding and content tasks; use GPT 5.4 Thinking when you need step-by-step reasoning or want to guide the model mid-response.

How much does GPT 5.4 cost?#

GPT 5.4 API pricing is $2.50 per million input tokens and $15.00 per million output tokens, with cached input at $0.25. There is no separate Thinking SKU in the API; cheaper tiers are covered by gpt-5.4-mini ($0.75 input / $4.50 output) and gpt-5.4-nano ($0.20 input / $1.25 output). Prompts beyond 272k input tokens are priced at 2x input and 1.5x output for the full session. That makes GPT 5.4 half the input cost of Claude Opus 4.6 and 40% cheaper on output.

What is steerable thinking and why does it matter?#

Steerable thinking lets you see GPT 5.4's reasoning as it forms and redirect it mid-response. Previous reasoning models showed a plan and executed it - if wrong, you waited for completion then started over. Steerable thinking lets you correct course before the model commits, saving tokens and time. It's closer to pair programming than prompt engineering.

What's the context window for GPT 5.4?#

GPT 5.4 supports just over 1 million tokens of context (1,050,000 on the official model page) with 128k max output. However, prompts beyond 272k input tokens are priced at 2x input and 1.5x output for the full session. For most workflows, 272k is sufficient. If you're processing entire codebases or long document chains, monitor your token usage - the pricing cliff at 272k can significantly increase costs.

How does GPT 5.4 compare to Claude Opus 4.6?#

They excel at different tasks. Opus 4.6 wins on agentic terminal coding, long-horizon multi-step tasks, agent team coordination, and agentic search - it's better for Claude Code workflows on complex codebases. GPT 5.4 wins on computer use (75% on OSWorld vs Opus's 62%), browser automation, frontend code generation, and has lower API pricing. Neither dominates everything - choose based on your workload.

What is OSWorld and why is GPT 5.4's 75% score significant?#

OSWorld is a benchmark for computer use tasks - navigating websites, filling forms, extracting data, executing multi-step workflows. GPT 5.4 scores 75% on OSWorld Verified, which surpasses average human performance (72.4%). This means GPT 5.4 can reliably execute browser automation workflows with higher accuracy than previous models.

Is GPT 5.4 good for coding?#

Yes - it shows strong performance on frontend work especially. HTML/CSS/JS generation is tighter with fewer layout bugs and better responsive handling. The Cursor team gave positive feedback on integration quality. For backend and complex codebase refactoring, Claude Opus 4.6 may still be preferable, but for UI work and general code generation, GPT 5.4 is competitive.

Who has access to GPT 5.4?#

GPT 5.4 Thinking is available to ChatGPT Plus ($20/month), Teams, Pro, and Enterprise users. The non-thinking GPT 5.4 variant is locked to the $200/month Pro tier. Both variants are available through the API with standard pay-per-token pricing.

Watch the Video#

Official Sources#

Resource	Link
GPT-5.4 Announcement	openai.com/index/introducing-gpt-5-4
GPT-5.4 System Card	openai.com/index/gpt-5-4-thinking-system-card
OpenAI Models Documentation	developers.openai.com/api/docs/models
OpenAI API Pricing	developers.openai.com/api/docs/pricing
OpenAI Python SDK	github.com/openai/openai-python
OSWorld Benchmark	os-world.github.io

Last updated: June 11, 2026. Verify model availability, pricing, and API details against the official OpenAI documentation.

OpenAI shipped GPT 5.4 and it matters. Not because it tops every benchmark--it doesn't--but because it changes what you can actually do with a model in production.

Let's break it down.

Access Tiers#

This is where OpenAI's pricing maze gets real.

GPT 5.4 Thinking is available on ChatGPT Plus ($20/mo), Teams, Pro, and Enterprise. That's the reasoning model most people will use.

GPT 5.4 (the non-thinking variant) is locked to the $200/month Pro tier. If you want both, you're paying Pro pricing.

The API is live for both. More on pricing below.

Steerable Thinking#

This is the standout UX innovation.

Previous thinking models gave you a plan upfront and then executed it. If the plan was wrong, you waited for it to finish and then corrected. Wasted tokens, wasted time.

GPT 5.4 Thinking shows you the plan as it forms and lets you steer it. Mid-response. You see the model's reasoning unfold and can inject corrections before it commits to a bad path.

Context and Efficiency#

For most workflows, 272k is plenty. If you're feeding entire codebases or long document chains, budget accordingly.

Benchmarks#

Benchmark	GPT 5.4	GPT 5.3	Claude Opus 4.6	Humans
OSWorld Verified	75.0%	58.3%	62.1%	72.4%
BrowseComp	71.2%	49.7%	53.8%	--
WebArena	68.4%	51.2%	55.6%	--
Agentic Coding (SWE-bench)	74.1%	69.2%	72.8%	--

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Claude Code: Remote Control, Auto Memory, Plugins & More

Feb 28, 2026 • 5 min read

Mercury 2: The LLM That Doesn't Generate Like an LLM

Feb 24, 2026 • 8 min read

Claude Code Worktrees: Parallel Development Without the Chaos

Feb 21, 2026 • 6 min read

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Feb 19, 2026 • 6 min read

Knowledge Work#

Browser Agent Workflows#

The computer use capabilities are where GPT 5.4 pulls ahead of the field. OSWorld Verified at 75% isn't just a benchmark win--it means the model can reliably execute multi-step browser workflows.

Navigate to a site. Find the right form. Fill it out. Submit. Verify the result. GPT 5.4 does this with higher reliability than any other model right now, including Opus 4.6.

If you're building browser automation agents, this is the model to test against.

Coding and Frontend Wins#

API Pricing#

Standard pricing for the API:

Code

GPT 5.4:
  Input:  $2.50 / 1M tokens
  Cached input: $0.25 / 1M tokens
  Output: $15.00 / 1M tokens

GPT 5.4 mini:
  Input:  $0.75 / 1M tokens
  Output: $4.50 / 1M tokens

GPT 5.4 nano:
  Input:  $0.20 / 1M tokens
  Output: $1.25 / 1M tokens

Prompts beyond 272k input tokens: 2x input, 1.5x output for the full session

Versus Claude Opus 4.6#

The honest comparison: they're different tools for different jobs.

Neither model dominates everything. Pick based on your workload.

Codex Fast Mode#

OpenAI also shipped a fast mode for Codex that runs 1.5x faster than the standard mode. If you're using Codex for batch code generation or CI pipelines, the speed improvement compounds.

This is a quiet but important update. Faster inference means tighter feedback loops. Tighter feedback loops mean more iterations per hour.

Practical Next Steps#

Test browser automation workflows. If you have agents that navigate websites, GPT 5.4's computer use scores are best-in-class. Run your existing test suite against it.
Try steerable thinking on complex prompts. The mid-response intervention UX is genuinely new. It changes how you interact with reasoning models.
Compare costs. If you're running high-volume API calls with Opus, price out the same workload on GPT 5.4. The savings might justify a switch for certain tasks.
Watch the 272k boundary. That 2x pricing cliff is easy to hit if you're feeding large codebases. Monitor your token usage.

Official Sources#

Access Tiers#

Steerable Thinking#

Context and Efficiency#

Benchmarks#

Claude Code: Remote Control, Auto Memory, Plugins & More

Mercury 2: The LLM That Doesn't Generate Like an LLM

Claude Code Worktrees: Parallel Development Without the Chaos

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Knowledge Work#

Browser Agent Workflows#

Coding and Frontend Wins#

API Pricing#

Versus Claude Opus 4.6#

Codex Fast Mode#

Practical Next Steps#

Further Reading#

FAQ#

What's the difference between GPT 5.4 and GPT 5.4 Thinking?#

How much does GPT 5.4 cost?#

What is steerable thinking and why does it matter?#

What's the context window for GPT 5.4?#

How does GPT 5.4 compare to Claude Opus 4.6?#

What is OSWorld and why is GPT 5.4's 75% score significant?#

Is GPT 5.4 good for coding?#

Who has access to GPT 5.4?#

Watch the Video#

GPT-5: OpenAI's Most Capable Model

GPT-5 Codex: OpenAI's Agentic Coding Model

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

Related Tools

ChatGPT

OpenAI Codex

GPT-5

Codex CLI

Apps from Developers Digest

Agent Benchmark Lab

DD Canvas

Overnight Agents

Related Guides

MCP Servers Explained

Claude Code Setup Guide

Run AI Models Locally with Ollama and LM Studio

Related Videos