GPT-OSS: OpenAI's First Open Source Model

Official Sources#

Resource	Link
GPT-OSS on Hugging Face	huggingface.co/openai/gpt-oss
OpenAI Platform Docs	platform.openai.com/docs
Ollama GPT-OSS	ollama.com/library/gpt-oss
Fireworks AI	fireworks.ai
Groq Cloud	groq.com
OpenRouter	openrouter.ai

First Open-Weight Models Since GPT-2#

OpenAI has released its first open-weight models in over five years. GPT-OSS 12B and GPT-OSS 20B are now available under the Apache 2.0 license, marking a significant shift in strategy for the company. These are reasoning models built on a Mixture of Experts (MoE) architecture, designed to run efficiently on consumer hardware while delivering competitive performance against frontier closed models.

For model-selection context, compare this with OpenAI Codex: Cloud AI Coding With GPT-5.3 and OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; the useful question is not only benchmark quality, but where the model fits in a real developer workflow.

Architecture overview of GPT-OSS MoE design

Model Specifications#

Two variants are available:

GPT-OSS 20B - The efficient option. Activates 3.6 billion parameters per token and runs on a laptop with 16GB of RAM. Suitable for offline, private deployments where data cannot leave the local environment.

GPT-OSS 120B - The larger variant. Activates 5.1 billion parameters per token despite its name, deployable on a single 80GB GPU such as an NVIDIA A100. This model targets production applications requiring higher capability.

Both models support a 128,000 token context window and were trained primarily on English text with emphasis on STEM, coding, and general knowledge. OpenAI is also releasing the O200K tokenizer used for GPT-4 and GPT-4o mini, now open-sourced as part of this announcement.

Chain-of-Thought with Tool Integration#

The standout feature is the integration of tool use within the reasoning process. During the post-training phase, OpenAI trained these models to invoke tools like web search and code execution before finalizing responses. This happens inside the chain-of-thought trace.

This architecture eliminates the need for external agent orchestration. The model can search, evaluate results, and decide to search again if the first query fails, all within its internal reasoning loop. For developers building agentic applications, this reduces complexity significantly. No separate agent framework is required to handle tool selection, reflection, and iterative refinement.

Workflow diagram showing tool use during reasoning

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Augment's Task List: AI-Powered Development Planning

Aug 5, 2025 • 6 min read

Claude Code Sub Agents: Parallel AI Development

Jul 25, 2025 • 8 min read

Qwen 3 Coder: Alibaba's Coding-Optimized LLM

Jul 24, 2025 • 5 min read

Create Beautiful UI with Claude Code: The Style Guide Method

Jul 21, 2025 • 8 min read

Performance Benchmarks#

The 120B model outperforms o3-mini across standard benchmarks, even without tool access. Against the full o3 model, it remains competitive.

Benchmark	GPT-OSS 120B	GPT-OSS 20B
MMLU	90.0%	85.3%
GPQA Diamond	80.1%	71.5%
Humanity's Last Exam	Strong	Strong for size
Competition Math	Near o3/o4-mini	Competitive

On artificial analysis aggregations, these models sit respectably against Gemini 2.5, Grok 2, and other frontier systems. The critical caveat: these are not code-generation specialists. They will not build full web applications from prompts like Claude Opus or similar top-tier coding models. They excel at reasoning, analysis, and tool-augmented tasks rather than end-to-end application generation.

Deployment Costs and Options#

Because these are Apache 2.0 licensed, hosting competition is already aggressive:

GPT-OSS 120B:

Fireworks: $0.10 per million input tokens / $0.50 output
Groq: $0.15 per million input tokens / $0.75 output

GPT-OSS 20B:

Fireworks: $0.05 per million input tokens / $0.20 output
Groq: $0.10 per million input tokens / $0.50 output

Groq delivers over 1,000 tokens per second on the 20B model and approximately 500 tokens per second on the 120B variant. OpenRouter provides unified billing across providers with transparent latency and throughput metrics if you prefer a single integration point.

Pricing comparison across hosting providers

Running Locally and Getting Started#

For local execution, HuggingFace hosts the model weights. Ollama provides the simplest setup path:

Terminal

ollama run gpt-oss  # Defaults to 20B model

For the 120B model, you need hardware like an A100 or an M3 Max with substantial RAM.

Cloud deployment options include Groq for low-latency inference, Fireworks for cost optimization, and OpenRouter for multi-provider access. Each platform exposes the standard OpenAI-compatible API, making migration straightforward.

The Bottom Line#

GPT-OSS fills a specific niche: capable reasoning with tool integration at low cost and manageable hardware requirements. These models are not replacements for top-tier closed models on creative or complex coding tasks. They are practical choices for applications requiring reasoning, moderate coding assistance, and agentic tool use without the infrastructure overhead of massive parameter counts or closed API dependencies.

FAQ#

What is GPT-OSS and why is it significant?#

GPT-OSS is OpenAI's first open-weight model release since GPT-2, available under the Apache 2.0 license. It comes in two variants: GPT-OSS 20B (runs on 16GB RAM laptops) and GPT-OSS 120B (requires an 80GB GPU like A100). The significance is both strategic and practical - OpenAI entering the open-weights space creates competition with Llama, Qwen, and other open models while giving developers deployment flexibility without API dependencies.

How does GPT-OSS compare to other open-source models like Llama 4 or Qwen 3?#

GPT-OSS 120B achieves 90% on MMLU and 80.1% on GPQA Diamond, placing it competitively against Llama 4 and Qwen 3. The key differentiator is built-in tool use during reasoning - the model can search, execute code, and iterate within its chain-of-thought without external orchestration. For pure coding tasks, specialized models like Qwen 3.6 Coder may outperform, but GPT-OSS excels at reasoning-heavy, tool-augmented workflows.

What hardware do I need to run GPT-OSS locally?#

GPT-OSS 20B runs on consumer hardware with 16GB RAM - a MacBook Pro or gaming laptop with sufficient memory works. GPT-OSS 120B requires an 80GB GPU (NVIDIA A100) or an M3 Max MacBook with expanded RAM. For the 20B model, Ollama provides the simplest setup: ollama run gpt-oss.

How much does it cost to run GPT-OSS in production?#

Cloud hosting is aggressive on pricing. GPT-OSS 120B costs $0.10/$0.50 per million tokens (input/output) on Fireworks and $0.15/$0.75 on Groq. GPT-OSS 20B is cheaper at $0.05/$0.20 on Fireworks. For comparison, this is significantly less than closed frontier models - the tradeoff is capability ceiling on complex coding tasks.

What is the tool use feature and how does it work?#

GPT-OSS integrates tool invocation inside the chain-of-thought reasoning process. During training, OpenAI taught the model to call web search and code execution tools before finalizing responses. The model decides when to search, evaluates results, and can retry if needed - all within its internal reasoning trace. This eliminates the need for external agent frameworks like LangChain or AutoGen for simple agentic workflows.

What are the limitations of GPT-OSS?#

GPT-OSS is not a coding specialist. It will not build full web applications from prompts like Claude Opus or GPT-4o. It excels at reasoning, analysis, and tool-augmented research tasks rather than end-to-end code generation. The context window is 128K tokens, and training emphasized English STEM content - non-English performance and creative writing may lag.

Which hosting provider should I use for GPT-OSS?#

Groq delivers the fastest inference (1,000+ tokens/second on 20B, ~500 on 120B) but costs slightly more. Fireworks offers the best price-performance ratio. OpenRouter provides unified billing across providers with transparent latency metrics if you want to avoid vendor lock-in. All expose OpenAI-compatible APIs.

Can I fine-tune GPT-OSS for my use case?#

Yes, the Apache 2.0 license permits fine-tuning, modification, and commercial use without restrictions. The O200K tokenizer is also open-sourced. For fine-tuning infrastructure, Hugging Face provides the weights and standard PyTorch training pipelines apply. Fireworks and other providers also offer managed fine-tuning services.

Watch the Video#

Official Sources#

Resource	Link
GPT-OSS on Hugging Face	huggingface.co/openai/gpt-oss
OpenAI Platform Docs	platform.openai.com/docs
Ollama GPT-OSS	ollama.com/library/gpt-oss
Fireworks AI	fireworks.ai
Groq Cloud	groq.com
OpenRouter	openrouter.ai

First Open-Weight Models Since GPT-2#

Model Specifications#

Two variants are available:

Chain-of-Thought with Tool Integration#

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Performance Benchmarks#

The 120B model outperforms o3-mini across standard benchmarks, even without tool access. Against the full o3 model, it remains competitive.

Benchmark	GPT-OSS 120B	GPT-OSS 20B
MMLU	90.0%	85.3%
GPQA Diamond	80.1%	71.5%
Humanity's Last Exam	Strong	Strong for size
Competition Math	Near o3/o4-mini	Competitive

Deployment Costs and Options#

Because these are Apache 2.0 licensed, hosting competition is already aggressive:

GPT-OSS 120B:

Fireworks: $0.10 per million input tokens / $0.50 output
Groq: $0.15 per million input tokens / $0.75 output

GPT-OSS 20B:

Fireworks: $0.05 per million input tokens / $0.20 output
Groq: $0.10 per million input tokens / $0.50 output

Running Locally and Getting Started#

For local execution, HuggingFace hosts the model weights. Ollama provides the simplest setup path:

Terminal

ollama run gpt-oss  # Defaults to 20B model

For the 120B model, you need hardware like an A100 or an M3 Max with substantial RAM.

Official Sources#

First Open-Weight Models Since GPT-2#

Model Specifications#

Chain-of-Thought with Tool Integration#

Augment's Task List: AI-Powered Development Planning

Claude Code Sub Agents: Parallel AI Development

Qwen 3 Coder: Alibaba's Coding-Optimized LLM

Create Beautiful UI with Claude Code: The Style Guide Method

Performance Benchmarks#

Deployment Costs and Options#

Running Locally and Getting Started#

The Bottom Line#

FAQ#

What is GPT-OSS and why is it significant?#

How does GPT-OSS compare to other open-source models like Llama 4 or Qwen 3?#

What hardware do I need to run GPT-OSS locally?#

How much does it cost to run GPT-OSS in production?#

What is the tool use feature and how does it work?#

What are the limitations of GPT-OSS?#

Which hosting provider should I use for GPT-OSS?#

Can I fine-tune GPT-OSS for my use case?#

Watch the Video#

GPT-5: OpenAI's Most Capable Model

DeepSeek R1 and V3: The Developer's Guide to Open-Source AI

Llama 4: The Complete Developer's Guide to Meta's Open Source Models

Related Tools

Codex CLI

ChatGPT

GPT-5

DeepSeek V3.2

Apps from Developers Digest

Maintainer Dashboard

Migrate

TraceTrail Plus

Related Guides

Run AI Models Locally with Ollama and LM Studio

Building Your First MCP Server

Getting Started with DevDigest CLI

Related Videos

OpenAI's New GPT Image Model API in 5 Minutes 📸

OpenAI Open Sources Codex: The CLI Coding Agent

Operator: OpenAI's First AI Agent

Related Posts

GPT-5: OpenAI's Most Capable Model

DeepSeek R1 and V3: The Developer's Guide to Open-Source AI

Llama 4: The Complete Developer's Guide to Meta's Open Source Models

Codex Changelog April 2026: Goals, Browser Use, GPT-5.5, and Safer Agents

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

Qwen3.6-27B Is the Local Coding Model to Test First

Build with the member tools

Get Smarter About AI Dev

Official Sources#

First Open-Weight Models Since GPT-2#

Model Specifications#

Chain-of-Thought with Tool Integration#

Augment's Task List: AI-Powered Development Planning

Claude Code Sub Agents: Parallel AI Development

Qwen 3 Coder: Alibaba's Coding-Optimized LLM

Create Beautiful UI with Claude Code: The Style Guide Method

Performance Benchmarks#

Deployment Costs and Options#

Running Locally and Getting Started#

The Bottom Line#

FAQ#

What is GPT-OSS and why is it significant?#

How does GPT-OSS compare to other open-source models like Llama 4 or Qwen 3?#

What hardware do I need to run GPT-OSS locally?#

How much does it cost to run GPT-OSS in production?#

What is the tool use feature and how does it work?#

What are the limitations of GPT-OSS?#

Which hosting provider should I use for GPT-OSS?#

Can I fine-tune GPT-OSS for my use case?#

Watch the Video#

GPT-5: OpenAI's Most Capable Model

DeepSeek R1 and V3: The Developer's Guide to Open-Source AI

Llama 4: The Complete Developer's Guide to Meta's Open Source Models

Related Tools

Codex CLI

ChatGPT

GPT-5