Topic

AI MODELS

Large language models - benchmarks, capabilities, and how to choose the right one.

62 resources - 45 posts, 16 tools, 1 guide

All TopicsAI ModelsAnthropic Developer Tools ai-models LLMs Open Source Claude AI Agents model-routing

Blog Posts

Claude Sonnet 5 vs Sonnet 4.6: Should You Upgrade?

Claude Sonnet 5 lands near Opus 4.8 on some tasks for a fraction of the price - but a new tokenizer runs about 30 percent more tokens. Here is the upgrade decision for builders, with the numbers.

Jul 1, 20266 min read

Running Fable 5 Agent Fleets in Production: The Operations Guide

Standing up a fleet of Fable 5 agents is the easy part. This is the operations layer - data retention rules, refusal-rate alerting, effort tuning, observability, and availability planning - that keeps the fleet running.

Jul 1, 20268 min read

Fable 5 Is Back: The Anthropic Model the Government Switched Off

Anthropic's most capable model launched, got suspended by a US export-control order, and returned today. Here is what Fable 5 is, what changed on the way back, and whether builders should reach for it.

Jul 1, 20266 min read

Fable 5 vs Opus 4.8: Which Should Orchestrate Your Agents?

The orchestrator is the most important model choice in an agent fleet. A fair head-to-head between Fable 5 and Opus 4.8 for that role, with a decision matrix by run length, budget, compliance, and refusal-handling tolerance.

Jul 1, 20268 min read

GLM 5.2 in 9 Minutes: The Open-Weight Rival to GPT-5.5

A companion guide to the GLM 5.2 video: an open-weight model positioned against GPT-5.5, walked through with benchmarks, pricing, and a live OpenCode demo. Here is what the video covers and where to go deeper.

Jul 1, 20266 min read

GPT-5.5 in 7 Minutes: Benchmarks, Codex Agents, Context Window, and Pricing

A companion guide to the GPT-5.5 video: OpenAI's newly released model rolling out to ChatGPT and Codex, reviewed through benchmarks, agent capabilities, context window, and pricing. Here is what the video covers and where to go deeper.

Jul 1, 20266 min read

Claude Sonnet 5 Launch Analysis: The Most Agentic Sonnet Yet

Anthropic releases Claude Sonnet 5 with improved agentic capabilities, better tool use, and an introductory pricing deal. Here's what developers need to know.

Jun 30, 20266 min read

Apertus: Europe's Answer to AI Sovereignty - and Why HN Is Skeptical

Switzerland's fully open foundation model promises transparent training data and EU compliance. The HN crowd has questions about actual performance.

Jun 22, 20266 min read

Fugu Ultra's Frontier Performance Claim, Explained Without the Hype

Sakana says Fugu Ultra stands with Fable, Mythos, GPT-5.5, Gemini, and Opus by orchestrating models instead of being one giant model. Here is what the benchmarks show, what is novel, and what still needs proof.

Jun 22, 202611 min read

Sakana Fugu and the Case for Not Betting Everything on One Proprietary Model

Sakana Fugu makes a timely argument for model routing: frontier performance should come from swappable systems, not a hard dependency on one proprietary API.

Jun 22, 20269 min read

Sakana Fugu Ultra: The Model Router Making the Frontier Look Less Proprietary

Sakana Fugu Ultra is not just another giant model. It is a learned orchestration layer that routes work across expert models, matches frontier benchmark claims, and makes a serious case for multi-model AI systems.

Jun 22, 202610 min read

How to Use GLM 5.2 and Other Custom Model Providers in Codex

Codex can point at OpenAI-compatible model providers, local Ollama servers, and internal model proxies. Here is the practical config pattern, the sharp edges, and when to use it.

Jun 21, 20269 min read

The Router Era: Why Not Owning a Frontier Model Became an Advantage

No single model wins every task anymore, and the companies that never trained one - Factory, Devin, Perplexity, Cursor, OpenCode - are turning that into a moat. This is how model routing works, why open weights and neoclouds make it cheap, and the honest counter-argument.

Jun 20, 202611 min read

$GLM-5.2 Cost Math: When Open-Weights Coding Models Actually Save You Money$

GLM-5.2 Cost Math: When Open-Weights Coding Models Actually Save You Money

Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the per-token cost. Here is the real cost math, a worked cost-per-task example, and a when-to-use-which decision guide.

Jun 17, 20269 min read

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

A code-heavy field guide to model routing. Real, runnable-style configs for tiering tasks by complexity, routing simple work to open-weights, reserving frontier models for hard reasoning, building failover chains, and keeping prompt caches warm with OpenRouter, LiteLLM, and Factory Router.

Jun 17, 202611 min read

OpenRouter Fusion Makes Model Panels Real. Use Them Like Escalation, Not Autopilot

OpenRouter Fusion turns multi-model panels into an API feature. The useful lesson is not to run every prompt through more models. It is to define when a task deserves an expensive second opinion.

Jun 15, 20268 min read

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Anthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is what that does to per-request cost, max_tokens, and cross-model comparisons.

Jun 11, 20268 min read

Fable 5 with 1M Context: What Actually Works in Practice

Fable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the honest math on when RAG still wins.

Jun 11, 202610 min read

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level, and how effort drives your token bill.

Jun 11, 202610 min read

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Fable 5 long-running requests can run for many minutes per turn and hours per autonomous run. Here is how to configure client timeouts, streaming keepalive, batch polling, and background patterns so they actually finish.

Jun 11, 20268 min read

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.

Jun 11, 202610 min read

The Frontier Model Landscape, June 2026 Edition

A verified directory of the frontier AI models in June 2026 - Claude Fable 5, GPT-5.5, GPT-5.4, Gemini 3.1 Pro, and DeepSeek V4 - with pricing checked against official docs.

Jun 11, 202610 min read

How to Use Claude Fable 5: Every Access Path Explained

How to use Claude Fable 5 across every access path: claude.ai plans through June 22, the Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry, with setup effort and first-prompt tips.

Jun 11, 20268 min read

Is Claude Fable 5 Slow? Latency in Practice, and When It Matters

Claude Fable 5 latency measured: 109 seconds to first token at max effort vs 1.4s for Sonnet 4.6. When slow is fine, when it hurts, and how to route around it.

Jun 11, 20268 min read

Migrating Off Retired GPT Models in 2026: A Working Checklist

Migrating off retired GPT models in 2026: the live retirement table, what maps to what, an eval-before-switch day plan, and when to jump providers.

Jun 11, 202610 min read

Qwen 3.7 Max Developer Guide: 1M Context, $1.25/MTok, and Agent-First Architecture

Alibaba shipped Qwen 3.7 Max on May 19, 2026 with a 1M token context window, Anthropic-compatible API, and agent-first architecture. Here is what developers need to know about pricing, performance, and when to use it.

Jun 11, 20268 min read

12 Ways Developers Are Actually Leveraging Claude Fable 5

Twelve documented Claude Fable 5 use patterns - agent orchestration, overnight runs, 1M-context refactors, effort tuning - each with a how-to seed and doc link.

Jun 11, 202610 min read

Decoding Anthropic's Model Names: Fable, Mythos, and What the Naming Shift Signals

Anthropic broke its own naming ladder when it introduced the Mythos class and Claude Fable 5. Here is what the shift means, how to map each tier to a real workload, and what questions it leaves open.

Jun 10, 20268 min read

Apple's LanguageModel Protocol: Xcode 27 Just Made Model Lock-In Optional

Apple shipped a LanguageModel protocol at WWDC 2026 that lets iOS and macOS developers swap between Claude, Gemini, and local models with a single dependency change. Here is what OS-level provider abstraction actually means for switching costs, moats, and your architecture decisions.

Jun 10, 20268 min read

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.

Jun 10, 20267 min read

Claude Mythos 5 Explained: What It Is, Who Can Access It, and Why It's Gated

Anthropic shipped two names for one architecture on June 9, 2026. Here is what separates Fable 5 from Mythos 5, who can actually get unrestricted access, and what developers should do right now.

Jun 10, 20267 min read

The Model, IDE, CLI, and Agent Framework Changes That Actually Matter

The AI coding market is noisy. The changes that matter are easier to spot when you separate model capability, editor loops, terminal agents, background agents, agent frameworks, UI layers, context, security, and cost.

May 30, 202610 min read

Models.dev Makes Model Routing Feel Like Infrastructure

The models.dev project is trending because AI teams need one boring source of truth for model specs, pricing, context windows, modalities, and tool support.

May 23, 20267 min read

DeepSeek V4 Changes the Coding Agent Cost Equation

DeepSeek V4 is trending because it is close enough to frontier coding models at a much lower token price. The real question for developers is where cheap reasoning belongs in an agent stack.

May 2, 20268 min read

DeepSeek V4: The Developer's Guide to Flash and Pro

DeepSeek V4 splits into Flash and Pro, ships a 1M context window, and undercuts every closed model on price. Here's how to wire it up with the OpenAI SDK, when to pick it over Claude or GPT, and what changed since V3 and R1.

Apr 29, 202610 min read

NVIDIA Nemotron 3 Super: A Developer's Guide to the 120B Hybrid MoE

A practical walkthrough of Nemotron 3 Super: latent mixture of experts, hybrid Mamba transformer architecture, 1M context, reasoning modes, and the code you actually need to run it on NVIDIA hardware.

Apr 29, 20269 min read

Claude Haiku 4.5: Near-Frontier Intelligence at a Fraction of the Cost

Anthropic's Claude Haiku 4.5 delivers Sonnet 4-level coding performance at one-third the cost and twice the speed. Here is what developers need to know.

Apr 2, 20265 min read

DeepSeek R1 and V3: The Developer's Guide to Open-Source AI

DeepSeek's R1 and V3 models deliver frontier-level performance under an MIT license. Here's how to use them through the API, run them locally with Ollama, and decide when they beat closed-source alternatives.

Mar 26, 20269 min read

Llama 4: The Complete Developer's Guide to Meta's Open Source Models

Meta's Llama 4 family brings mixture-of-experts to open source with Scout and Maverick. Here's how to run them locally, access them through APIs, and decide when they beat the competition.

Mar 26, 202610 min read

Claude vs GPT for Coding: Which Model Writes Better TypeScript?

Claude Opus 4.7 vs GPT-5.5 for real TypeScript work. Benchmarks, pricing, model families, and practical differences.

Mar 19, 20265 min read

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

A developer's comparison of OpenAI and Anthropic ecosystems - models, coding tools, APIs, pricing, and which to choose for different use cases.

Mar 19, 202610 min read

NVIDIA's Nemotron 3 Super in 6 Minutes

NVIDIA's Nemotron 3 Super combines latent mixture of experts with hybrid Mamba architecture - 120B total parameters, 12B active per token, 1M context, and up to 4x more experts at the same cost.

Mar 13, 20265 min read

Grok 4: xAI's Most Powerful AI Model

xAI has launched Grok 4, claiming the title of the world's most powerful AI model. With a $300/month Super Grok tier, saturated AMI benchmarks, and a coding model on the horizon, this is xAI's bigge...

Jul 10, 20257 min read

Qwen 3: Alibaba's Open-Source Model That Outclassed Llama 4

Alibaba released Qwen 3 with eight models under an Apache 2 license, including a 235B mixture-of-experts flagship that beats Llama 4 Maverick on nearly every benchmark while being smaller and cheaper to run.

Apr 29, 20258 min read

xAI Grok 3 Launch: The Smartest AI on Earth?

xAI launched Grok 3 with 200,000 GPUs, outperforming GPT-4o, Sonnet 3.5, and DeepSeek R1 on reasoning benchmarks. Here is what the hardware, the benchmarks, and the new features actually mean for developers.

Feb 18, 20259 min read

Related Tools

All tools →

Claude

Daily Driver

Anthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding model I've tested. Max plan ($200/mo).

AI Models

ChatGPT

OpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web browsing, DALL-E, code interpreter.

AI Models

OpenRouter

Unified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Automatic fallbacks and load balancing.

AI Models

DeepSeek

Open-source reasoning models from China. DeepSeek-R1 rivals o1 on math and code benchmarks. V3 for general use. Fully open weights. Extremely cost-effective API.

AI Models

Llama

Meta's open-source model family. Llama 4 available in Scout (17B active) and Maverick (17B active, 128 experts). Free to use, modify, and deploy commercially.

AI Models

Mistral

European open-weight models. Mistral Large for complex tasks, Mistral Small for speed, Codestral for code. Strong multilingual support. Open and API options.

AI Models

GPT-5

OpenAI's latest flagship model. Major leap in reasoning, coding, and instruction following over GPT-4o. Powers ChatGPT Plus/Pro and the API. Available via API and ChatGPT.

AI Models

Gemini

Google's frontier model family. Gemini 2.5 Pro has 1M token context and top-tier coding benchmarks. Gemini 3 Pro pushes reasoning further. Free tier via AI Studio.

AI Models

Grok

xAI's model with real-time X/Twitter data access. Grok 3 rivals top models on reasoning. Built-in web search and current events awareness. Available via API.

AI Models

Claude Haiku 4.5

Anthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x faster than Sonnet 4.5. $1/$5 per million tokens.

AI Models

Qwen3-Coder

Alibaba's flagship open-weight coding model. 480B total parameters, 35B active (MoE). Native 256K context, scales to 1M. Apache 2.0 license. State-of-the-art agentic coding.

AI Models

DeepSeek V3.2

DeepSeek's reasoning-first model built for agents. First model to integrate thinking directly into tool use. Ships alongside V3.2-Speciale, which rivals GPT-5 and Gemini 3.0 Pro.

AI Models

Claude Opus 4.7

Anthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token context window. Available via API and in Claude Code.

AI Models

Claude Fable 5

New

Anthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 per million tokens. Built for long-horizon agentic work.

AI Models

Claude Opus 4.8

Anthropic's recommended default for complex work, released May 28, 2026. 1M context, 128K output, $5/$25 per million tokens. Defaults to high effort on all surfaces.

AI Models

DeepSeek V4

Open weights

DeepSeek's open-weights frontier family, previewed April 24, 2026. V4-Pro is 1.6T total / 49B active params; V4-Flash is 284B / 13B. 1M context standard. Weights on Hugging Face.

AI Models

Guides

All guides →

Run AI Models Locally with Ollama and LM Studio

Install Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.

Guide

Keep exploring

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

Explore 659 topics

Browse All Topics

AI MODELS

Blog Posts

Claude Sonnet 5 vs Sonnet 4.6: Should You Upgrade?

Running Fable 5 Agent Fleets in Production: The Operations Guide

Fable 5 Is Back: The Anthropic Model the Government Switched Off

Fable 5 vs Opus 4.8: Which Should Orchestrate Your Agents?

GLM 5.2 in 9 Minutes: The Open-Weight Rival to GPT-5.5

GPT-5.5 in 7 Minutes: Benchmarks, Codex Agents, Context Window, and Pricing

Claude Sonnet 5 Launch Analysis: The Most Agentic Sonnet Yet

Apertus: Europe's Answer to AI Sovereignty - and Why HN Is Skeptical

Fugu Ultra's Frontier Performance Claim, Explained Without the Hype

Sakana Fugu and the Case for Not Betting Everything on One Proprietary Model

Sakana Fugu Ultra: The Model Router Making the Frontier Look Less Proprietary

How to Use GLM 5.2 and Other Custom Model Providers in Codex

The Router Era: Why Not Owning a Frontier Model Became an Advantage

GLM-5.2 Cost Math: When Open-Weights Coding Models Actually Save You Money

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

OpenRouter Fusion Makes Model Panels Real. Use Them Like Escalation, Not Autopilot

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Fable 5 with 1M Context: What Actually Works in Practice

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

The Frontier Model Landscape, June 2026 Edition

How to Use Claude Fable 5: Every Access Path Explained

Is Claude Fable 5 Slow? Latency in Practice, and When It Matters

Migrating Off Retired GPT Models in 2026: A Working Checklist

Qwen 3.7 Max Developer Guide: 1M Context, $1.25/MTok, and Agent-First Architecture

12 Ways Developers Are Actually Leveraging Claude Fable 5

Decoding Anthropic's Model Names: Fable, Mythos, and What the Naming Shift Signals

Apple's LanguageModel Protocol: Xcode 27 Just Made Model Lock-In Optional

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Claude Mythos 5 Explained: What It Is, Who Can Access It, and Why It's Gated

The Model, IDE, CLI, and Agent Framework Changes That Actually Matter

Models.dev Makes Model Routing Feel Like Infrastructure

DeepSeek V4 Changes the Coding Agent Cost Equation

DeepSeek V4: The Developer's Guide to Flash and Pro

NVIDIA Nemotron 3 Super: A Developer's Guide to the 120B Hybrid MoE

Claude Haiku 4.5: Near-Frontier Intelligence at a Fraction of the Cost

DeepSeek R1 and V3: The Developer's Guide to Open-Source AI

Llama 4: The Complete Developer's Guide to Meta's Open Source Models

Claude vs GPT for Coding: Which Model Writes Better TypeScript?

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

NVIDIA's Nemotron 3 Super in 6 Minutes

Grok 4: xAI's Most Powerful AI Model

Qwen 3: Alibaba's Open-Source Model That Outclassed Llama 4

xAI Grok 3 Launch: The Smartest AI on Earth?

Related Tools

Claude

ChatGPT

OpenRouter

DeepSeek

Llama

Mistral

GPT-5

Gemini

Grok

Claude Haiku 4.5

Qwen3-Coder

DeepSeek V3.2

Claude Opus 4.7

Claude Fable 5

Claude Opus 4.8

DeepSeek V4

Guides

Run AI Models Locally with Ollama and LM Studio

More on AI Models

Get Smarter About AI Dev

AI MODELS

Blog Posts

Claude Sonnet 5 vs Sonnet 4.6: Should You Upgrade?

Running Fable 5 Agent Fleets in Production: The Operations Guide

Fable 5 Is Back: The Anthropic Model the Government Switched Off

Fable 5 vs Opus 4.8: Which Should Orchestrate Your Agents?

GLM 5.2 in 9 Minutes: The Open-Weight Rival to GPT-5.5

GPT-5.5 in 7 Minutes: Benchmarks, Codex Agents, Context Window, and Pricing

Claude Sonnet 5 Launch Analysis: The Most Agentic Sonnet Yet

Apertus: Europe's Answer to AI Sovereignty - and Why HN Is Skeptical

Fugu Ultra's Frontier Performance Claim, Explained Without the Hype

Sakana Fugu and the Case for Not Betting Everything on One Proprietary Model