
TL;DR
Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the per-token cost. Here is the real cost math, a worked cost-per-task example, and a when-to-use-which decision guide.
| Source | What it covers |
|---|---|
| VentureBeat: Z.ai's open-weights GLM-5.2 beats GPT-5.5 | Release coverage, benchmark and cost claims |
| Artificial Analysis: GLM-5.2 | Independent intelligence, performance, and price analysis |
| InfoWorld: GLM-5.2 coverage | Release details and availability |
| Hugging Face: GLM-5.2 weights | Open weights for self-hosting |
The pitch for GLM-5.2 is simple enough to fit on a sticky note: a coding model that scores higher than GPT-5.5 on a real software-engineering benchmark, for roughly one-sixth of the per-token price, with the weights published openly. Released by Z.ai (formerly Zhipu AI) on June 16, 2026, it is a 753-billion-parameter mixture-of-experts model with a 1M-token context window, available on Hugging Face, through the Z.ai API, and inside 20-plus coding environments.
That headline is favourable. It is also, as far as the public numbers go, true. But "cheaper per token" and "cheaper for your workload" are not the same claim, and the gap between them is where most cost decisions actually get made. This post does the math, then turns it into a decision guide.
Last verified: June 17, 2026.
The benchmark lead is narrow but real, and it is on the kind of task that matters for agentic coding rather than a trivia quiz. That is the part worth taking seriously.
Two pricing paths matter, and people routinely conflate them.
Path 1: the Z.ai API (pay per token).
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Combined |
|---|---|---|---|
| GLM-5.2 | ~$1.40 | ~$4.40 | ~$5.80 |
| GPT-5.5 | ~$5.00 | ~$30.00 | ~$35.00 |
Independent trackers list a lower median across providers serving the open weights (closer to ~$0.55 input and ~$1.85 output), because anyone can host an open-weights model and compete on price. So the "one-sixth the cost" line is roughly right at Z.ai's own list price, and the gap can widen further if you shop the open-weights hosting market.
Path 2: the GLM Coding Plan (flat monthly). Z.ai also sells subscription tiers that bundle GLM-5.2 access for agentic coding tools:
For steady daily coding, the flat plan is usually the cheaper and more predictable path than metering the API. The API math below is what matters for product builders and high-volume automation, where you are paying per call at scale.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 17, 2026 • 7 min read
Jun 17, 2026 • 7 min read
Jun 17, 2026 • 11 min read
Jun 17, 2026 • 6 min read
Abstract per-token rates do not tell you what a workday costs. So model a concrete unit: one agentic coding task - the agent reads context, plans, edits a few files, and runs checks.
Assume a representative task consumes 40,000 input tokens (repo context, files, tool results) and 8,000 output tokens (the plan plus diffs). That input-heavy ratio is typical for agentic coding, where the model reads far more than it writes.
Cost of one task:
| Model | Input cost | Output cost | Total per task |
|---|---|---|---|
| GLM-5.2 (Z.ai API) | 40K x $1.40/1M = $0.056 | 8K x $4.40/1M = $0.035 | ~$0.091 |
| GPT-5.5 | 40K x $5.00/1M = $0.200 | 8K x $30.00/1M = $0.240 | ~$0.440 |
| Claude Sonnet 4.6 ($3/$15) | 40K x $3.00/1M = $0.120 | 8K x $15.00/1M = $0.120 | ~$0.240 |
Now scale to 1,000 tasks (a busy week of agent runs, or a single large multi-agent batch job):
| Model | Cost per 1,000 tasks |
|---|---|
| GLM-5.2 | ~$91 |
| Claude Sonnet 4.6 | ~$240 |
| GPT-5.5 | ~$440 |
At this volume GLM-5.2 runs roughly 2.6x cheaper than Sonnet 4.6 and 4.8x cheaper than GPT-5.5 for the same unit of work. The exact multiple shifts with your input/output ratio - output-heavy workloads (long generations, verbose explanations) widen GLM-5.2's lead further, because its output rate is where the discount is largest.
The catch the table hides: this assumes the cheaper model lands the task in one pass. If GLM-5.2 needs two attempts where the pricier model needs one, half of the savings evaporates. Per-token price only becomes per-task savings when quality holds, which is exactly why the SWE-bench Pro number matters: it is evidence the quality is competitive, not just the price.
Price is one input. Here is the practical routing logic.
Reach for GLM-5.2 when:
Reach for a frontier model (GPT-5.5, Claude Opus/Sonnet) when:
The pragmatic default for most teams is a routed setup: send the bulk of well-scoped, high-volume tasks to the cheapest model that clears your quality bar (often GLM-5.2 or another open-weights model), and escalate the hard, expensive-to-get-wrong tasks to a frontier model. That is exactly the pattern a meta-harness exists to enforce - our writeup on Omnigent and orchestrating Claude Code, Codex, and custom agents covers how to keep that routing logic above any single tool. For the full cross-provider rate card behind these numbers, see the June 2026 AI coding tools pricing reality check.
On the Z.ai API list price, yes - roughly $1.40/$4.40 per million input/output tokens versus GPT-5.5's roughly $5.00/$30.00, about one-sixth the combined per-token cost. The open-weights nature also lets other providers host it competitively, with independent trackers showing even lower medians. The savings only materialize per task if GLM-5.2 completes work in as few attempts as the pricier model.
For a representative agentic task (~40K input, ~8K output tokens) on the Z.ai API, about $0.09 per task, versus roughly $0.24 on Claude Sonnet 4.6 and $0.44 on GPT-5.5. At 1,000 tasks that is about $91 vs $240 vs $440. Your actual ratio of input to output tokens will shift these numbers.
For steady daily coding, the flat GLM Coding Plan (Lite, Pro, or Max tiers, roughly $3 to $80 per month depending on tier) is usually cheaper and more predictable. The per-token API math matters most for product builders and high-volume automation paying per call at scale.
Yes. The weights are open and published on Hugging Face, so you can run it on your own hardware and pay no per-token cost at all. Self-hosting a 753B mixture-of-experts model requires substantial GPU capacity, so the economics only favour it at sustained high volume.
The main non-quality risk flagged in independent coverage is data governance: the hosted Z.ai API is operated by a China-based company, which some teams cannot route source code to under their own policies. Self-hosting the open weights avoids the API routing concern. As with any benchmark leader, the quality lead over frontier models is narrow and can change with the next release.
Read next
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Devin, and the Anthropic API - verified from live pricing pages on June 17, 2026. Only 5 days until the Fable 5 deadline.
9 min readDatabricks open-sourced Omnigent, a meta-harness that sits above individual agent CLIs so your sessions, policies, and skills are not locked inside any single tool. Here is what it does, how to install it, and where it fits if you already run Claude Code and Codex.
8 min readSame-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per million tokens, plus the three caveats that change the math.
10 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
DeepSeek's open-weights frontier family, previewed April 24, 2026. V4-Pro is 1.6T total / 49B active params; V4-Flash is...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolOpen-source reasoning models from China. DeepSeek-R1 rivals o1 on math and code benchmarks. V3 for general use. Fully op...
View ToolAnthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x...
View ToolInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedA practical walk-through of how to design, write, and ship a Claude Code skill - from choosing when to trigger, through allowed-tools, to the steps the agent will actually follow.
Getting StartedA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-development
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...

Databricks open-sourced Omnigent, a meta-harness that sits above individual agent CLIs so your sessions, policies, and s...

Same-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per milli...

A code-heavy field guide to model routing. Real, runnable-style configs for tiering tasks by complexity, routing simple...

Open weights are free to download, but inference is not free to run. Here is the honest break-even math on when self-hos...

Uber burned through its entire 2026 AI tools budget by April. Microsoft is canceling Claude Code licenses company-wide....

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.