
TL;DR
GLM-5.2 ships under an MIT license, so it is hosted everywhere - and a few places run it for free right now. Here is every way to access Z.ai's open-weights coding model, from free tiers in Devin and Hugging Face to the cheapest per-token routes on OpenRouter, Fireworks, and DeepInfra, plus local Ollama.
Direct answer
GLM-5.2 ships under an MIT license, so it is hosted everywhere - and a few places run it for free right now. Here is every way to access Z.ai's open-weights coding model, from free tiers in Devin and Hugging Face to the cheapest per-token routes on OpenRouter, Fireworks, and DeepInfra, plus local Ollama.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
| Source | What it covers |
|---|---|
| Z.ai: GLM-5.2 research blog | Official release, architecture, benchmarks |
| Z.ai subscribe + model API pages | GLM Coding Plan tiers and per-token API pricing |
| Hugging Face: zai-org/GLM-5.2 | Open weights, MIT license, framework support |
| OpenRouter: z-ai/glm-5.2 | Live multi-provider routing table and prices |
| Artificial Analysis: GLM-5.2 providers | Independent blended price and throughput across hosts |
Because GLM-5.2 is released under a permissive MIT license, the usual question for a frontier model - "where can I even get access?" - has the opposite answer here. It is hosted almost everywhere within days of release, several providers undercut Z.ai's own list price, and a couple of routes will run it for free right now. The hard part is not finding access. It is picking the route that fits your workload without overpaying or accidentally self-hosting a 756-billion-parameter model you did not need to.
This post maps every path: the genuinely free ones, the cheapest paid ones, and local. Prices are per million tokens and were verified on June 20, 2026. Pricing pages are the only source of truth and they move, so treat the numbers as a snapshot, not a contract.
Last verified: June 20, 2026.
GLM-5.2 is Z.ai's (formerly Zhipu AI) open-weights coding model, released in mid-June 2026. It is a mixture-of-experts model with roughly 756B total parameters, a 1M-token context window (1,048,576 tokens), and an MIT license that places no regional or commercial restrictions on the weights. On Z.ai's own benchmarks it is the strongest open-source coding model available, trailing Anthropic's Opus 4.8 and edging out GPT-5.5 on the FrontierSWE suite. For the full benchmark and cost breakdown, see the GLM-5.2 cost math and developer guide. This page is only about access.
Three paths will run GLM-5.2 at no per-token cost today. None is unlimited, and the genuinely free ones are time-limited, so read the terms before you wire a production agent to them.

If you want free and you want it to last, the honest answer is local: download the weights and run them yourself. That is the Ollama and self-host section below.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 20, 2026 • 6 min read
Jun 19, 2026 • 8 min read
Jun 19, 2026 • 5 min read
Jun 19, 2026 • 8 min read
For most teams the right answer is a hosted API at a few dollars per million tokens. GLM-5.2 is open weights, so any inference shop can serve it and compete on price, which is exactly why the per-token cost sits well below comparable closed models. Here is the live picture.
| Provider | Input ($/1M) | Output ($/1M) | Cached input | Context | Notes |
|---|---|---|---|---|---|
| OpenRouter (cheapest route) | 1.20 | 4.10 | 0.20 | 1.05M | Auto-routes across 13+ hosts |
| DeepInfra | 1.20 | 4.20 | 0.20 | 1.05M | fp4, among cheapest blended |
| Z.ai (first-party API) | 1.40 | 4.40 | 0.26 | 1.05M | fp8, the reference price |
| Fireworks AI | 1.40 | 4.40 | 0.26 | 1.04M | Day-zero, dedicated GPU option |
| Novita | 1.40 | 4.40 | 0.26 | 1.05M | fp8 |
A few things worth knowing before you pick a row:
z-ai/glm-5.2, and OpenRouter sends your request to the cheapest or fastest one that meets your constraints. You get failover and price competition without managing keys for each host. There is no :free variant for GLM-5.2 on OpenRouter, despite some other models having one.For the worked cost-per-task math versus closed models, the GLM-5.2 cost math post runs the numbers.
If you go first-party, Z.ai sells two things, and they suit different usage shapes.
| Tier | Monthly | Yearly (per mo) | Rough quota |
|---|---|---|---|
| Lite | $18 | $12.60 | ~80 prompts / 5 hrs |
| Pro | $72 | $50.40 | ~400 prompts / 5 hrs |
| Max | $160 | $112 | ~1,600 prompts / 5 hrs |
The subscription wins when you code with it daily and would otherwise burn through far more than the flat fee in per-token charges. The API wins for spiky or automated workloads. Quotas are from Z.ai's devpack FAQ and are stated per a rolling window, so check the current terms.
Because the weights are MIT-licensed and on Hugging Face, you can run GLM-5.2 with no per-token cost at all - if you have the hardware.

glm-5.2. The currently surfaced tag is glm-5.2:cloud, which runs on Ollama Cloud GPUs rather than your machine. Community GGUF builds (Unsloth, llama.cpp) exist for true local runs, but at roughly 756B total parameters this is a datacenter-class model: even a 4-bit quant targets high-RAM multi-GPU rigs, not a laptop. If your goal is genuinely local coding on modest hardware, a smaller dense model is the better tool - see the best local coding LLMs.vllm serve "zai-org/GLM-5.2" works out of the box. This only pays off above a high, steady token volume where amortized GPU cost beats per-token API pricing. Below that line, a hosted route is cheaper and far less operational work.GLM-5.2 is free to use in a few specific places right now: bundled into Devin's paid Pro plan at no extra model cost, through Z.ai's ZCODE CLI free token quota, and via a limited Hugging Face Inference Providers window. The weights themselves are free under an MIT license, so self-hosting has no per-token cost. There is no permanent, unlimited free hosted API.
On hosted APIs, the cheapest routes are OpenRouter's auto-router and DeepInfra, around $1.20 input and $4.10 to $4.20 output per million tokens (fp4 quantized). On a blended 3:1 mix, independent trackers put the cheapest hosts near $0.72 to $0.80 per million tokens. Self-hosting is cheapest only at high, steady volume.
Yes. The Z.ai GLM Coding Plan is built for exactly this and supports Claude Code, Cursor, Cline, and 20-plus tools. Model-agnostic open-source agents like OpenCode are an especially good fit, since they let you point at any OpenAI-compatible endpoint (Z.ai API, Fireworks, OpenRouter) and swap models freely without leaving the tool.
Not practically. At roughly 756B total parameters, even a 4-bit quant needs high-RAM multi-GPU hardware. Ollama's glm-5.2:cloud tag runs on hosted GPUs, not your machine. For local coding on consumer hardware, a smaller dense model is the right choice.
MIT. Some early posts claimed Apache 2.0, but Z.ai's blog, the Hugging Face model card, and provider pages all confirm MIT - permissive, with no regional or commercial restrictions.
Read next
No single model wins every task anymore, and the companies that never trained one - Factory, Devin, Perplexity, Cursor, OpenCode - are turning that into a moat. This is how model routing works, why open weights and neoclouds make it cheap, and the honest counter-argument.
11 min readZ.ai shipped GLM-5.2 in mid-June with a usable 1M-token context window, two thinking-effort levels, and MIT open weights now released. Here is the setup guide for Claude Code, pricing breakdown, and what to test before the benchmarks arrive.
8 min readZ.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the per-token cost. Here is the real cost math, a worked cost-per-task example, and a when-to-use-which decision guide.
9 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Deep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.
AI AgentsPersistent project instructions loaded every session; supports nested dirs.
Claude Code
No single model wins every task anymore, and the companies that never trained one - Factory, Devin, Perplexity, Cursor,...

Z.ai shipped GLM-5.2 in mid-June with a usable 1M-token context window, two thinking-effort levels, and MIT open weights...

Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the p...

Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to...

A data-rich, source-cited comparison of the three open-weights coding models that matter in 2026: GLM-5.2, DeepSeek V4,...

JetBrains released Mellum2 on June 2, 2026 - a 12B MoE model with only 2.5B active parameters per token. Here is how to...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.