Where to Run GLM-5.2 Free and Cheap: Every Provider Compared (2026)

Developers Digest•June 20, 2026•9 min read

glm z-ai open-weights ai-coding-tools pricing

TL;DR

GLM-5.2 ships under an MIT license, so it is hosted everywhere - and a few places run it for free right now. Here is every way to access Z.ai's open-weights coding model, from free tiers in Devin and Hugging Face to the cheapest per-token routes on OpenRouter, Fireworks, and DeepInfra, plus local Ollama.

Direct answer

Where to Run GLM-5.2 Free and Cheap: Every Provider Compared (2026)

GLM-5.2 ships under an MIT license, so it is hosted everywhere - and a few places run it for free right now. Here is every way to access Z.ai's open-weights coding model, from free tiers in Devin and Hugging Face to the cheapest per-token routes on OpenRouter, Fireworks, and DeepInfra, plus local Ollama.

Best for

Developers comparing real tool tradeoffs before choosing a stack.

Covers

Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.

Official Sources

Source	What it covers
Z.ai: GLM-5.2 research blog	Official release, architecture, benchmarks
Z.ai subscribe + model API pages	GLM Coding Plan tiers and per-token API pricing
Hugging Face: zai-org/GLM-5.2	Open weights, MIT license, framework support
OpenRouter: z-ai/glm-5.2	Live multi-provider routing table and prices
Artificial Analysis: GLM-5.2 providers	Independent blended price and throughput across hosts

Because GLM-5.2 is released under a permissive MIT license, the usual question for a frontier model - "where can I even get access?" - has the opposite answer here. It is hosted almost everywhere within days of release, several providers undercut Z.ai's own list price, and a couple of routes will run it for free right now. The hard part is not finding access. It is picking the route that fits your workload without overpaying or accidentally self-hosting a 756-billion-parameter model you did not need to.

This post maps every path: the genuinely free ones, the cheapest paid ones, and local. Prices are per million tokens and were verified on June 20, 2026. Pricing pages are the only source of truth and they move, so treat the numbers as a snapshot, not a contract.

Last verified: June 20, 2026.

What GLM-5.2 is, in one paragraph

GLM-5.2 is Z.ai's (formerly Zhipu AI) open-weights coding model, released in mid-June 2026. It is a mixture-of-experts model with roughly 756B total parameters, a 1M-token context window (1,048,576 tokens), and an MIT license that places no regional or commercial restrictions on the weights. On Z.ai's own benchmarks it is the strongest open-source coding model available, trailing Anthropic's Opus 4.8 and edging out GPT-5.5 on the FrontierSWE suite. For the full benchmark and cost breakdown, see the GLM-5.2 cost math and developer guide. This page is only about access.

The free routes (right now)

Three paths will run GLM-5.2 at no per-token cost today. None is unlimited, and the genuinely free ones are time-limited, so read the terms before you wire a production agent to them.

Devin (Cognition), inside the Pro plan. Cognition's Devin pricing lists "free use of leading open source models" and "full model availability" on the Pro plan, with GLM-5.2 among them. The important caveat: this is bundled into a paid plan, not a standalone giveaway. The model itself adds no marginal cost once you are on Pro, which is what people mean when they say GLM-5.2 is "free in Devin." The free tier carries limited model availability and does not include it.
Z.ai ZCODE CLI free quota. Z.ai has been seeding its own coding CLI with a large free token allowance (community reports put it near 300M tokens) to pull developers onto GLM-5.2. Quotas and eligibility change, so confirm on z.ai before relying on it.
Hugging Face Inference Providers, launch window. Hugging Face opened a limited free window for GLM-5.2 through its Inference Providers routing shortly after release. Free windows close, so check the model page for current status.

If you want free and you want it to last, the honest answer is local: download the weights and run them yourself. That is the Ollama and self-host section below.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

GPT-5.5 Has a 3x Higher Hallucination Rate Than MIT-Licensed GLM-5.2

Jun 20, 2026 • 6 min read

DuckDB Internals: What Makes It So Fast

Jun 19, 2026 • 8 min read

Three Ways to Ignore Files in Git (Beyond .gitignore)

Jun 19, 2026 • 5 min read

GitHub Copilot Agent Finder: What ARD Means for Third-Party AI Tools in 2026

Jun 19, 2026 • 8 min read

The cheapest paid routes

For most teams the right answer is a hosted API at a few dollars per million tokens. GLM-5.2 is open weights, so any inference shop can serve it and compete on price, which is exactly why the per-token cost sits well below comparable closed models. Here is the live picture.

Provider	Input ($/1M)	Output ($/1M)	Cached input	Context	Notes
OpenRouter (cheapest route)	1.20	4.10	0.20	1.05M	Auto-routes across 13+ hosts
DeepInfra	1.20	4.20	0.20	1.05M	fp4, among cheapest blended
Z.ai (first-party API)	1.40	4.40	0.26	1.05M	fp8, the reference price
Fireworks AI	1.40	4.40	0.26	1.04M	Day-zero, dedicated GPU option
Novita	1.40	4.40	0.26	1.05M	fp8

A few things worth knowing before you pick a row:

OpenRouter is a router, not a host. Its endpoints API shows 13-plus providers serving z-ai/glm-5.2, and OpenRouter sends your request to the cheapest or fastest one that meets your constraints. You get failover and price competition without managing keys for each host. There is no :free variant for GLM-5.2 on OpenRouter, despite some other models having one.
Quantization matters. The cheapest routes (DeepInfra, Wafer) serve fp4 quantized weights; Z.ai, Fireworks, and Novita serve fp8. For coding agents the quality gap is usually small but real, and it is the reason the price differs. Test your own task before optimizing purely on price.
Blended price is lower than the table suggests. On a typical 3:1 input-output mix, Artificial Analysis puts the cheapest blended hosts (GMI, Wafer, DeepInfra) near $0.72 to $0.80 per million tokens, with Wafer the fastest at over 200 tokens/sec.

For the worked cost-per-task math versus closed models, the GLM-5.2 cost math post runs the numbers.

Direct from Z.ai: API vs the Coding Plan

If you go first-party, Z.ai sells two things, and they suit different usage shapes.

Per-token API at roughly $1.40 input and $4.40 output per million tokens, with cached input near $0.26. Right for variable or bursty usage where you pay for exactly what you run.
GLM Coding Plan flat-rate subscriptions, which bundle GLM-5.2 access into agentic coding tools (Claude Code, Cursor, Cline, and 20-plus others). As verified on z.ai/subscribe on June 20, 2026:

Tier	Monthly	Yearly (per mo)	Rough quota
Lite	$18	$12.60	~80 prompts / 5 hrs
Pro	$72	$50.40	~400 prompts / 5 hrs
Max	$160	$112	~1,600 prompts / 5 hrs

The subscription wins when you code with it daily and would otherwise burn through far more than the flat fee in per-token charges. The API wins for spiky or automated workloads. Quotas are from Z.ai's devpack FAQ and are stated per a rolling window, so check the current terms.

Local and self-host

Because the weights are MIT-licensed and on Hugging Face, you can run GLM-5.2 with no per-token cost at all - if you have the hardware.

Ollama. GLM-5.2 is published as glm-5.2. The currently surfaced tag is glm-5.2:cloud, which runs on Ollama Cloud GPUs rather than your machine. Community GGUF builds (Unsloth, llama.cpp) exist for true local runs, but at roughly 756B total parameters this is a datacenter-class model: even a 4-bit quant targets high-RAM multi-GPU rigs, not a laptop. If your goal is genuinely local coding on modest hardware, a smaller dense model is the better tool - see the best local coding LLMs.
Self-host at scale. The Hugging Face model card lists first-class support for vLLM (0.23.0+), SGLang, Transformers, KTransformers, and llama.cpp, plus Ascend NPU paths. vllm serve "zai-org/GLM-5.2" works out of the box. This only pays off above a high, steady token volume where amortized GPU cost beats per-token API pricing. Below that line, a hosted route is cheaper and far less operational work.

Which route should you pick?

Just trying it: Devin Pro if you already pay for it, or the Z.ai ZCODE free quota, or OpenRouter with a few dollars of credit.
Cheapest production tokens: OpenRouter's auto-router or DeepInfra, accepting fp4 quantization. Validate on your own task first.
Daily agentic coding in a tool you live in: the GLM Coding Plan (Lite or Pro), so cost is predictable.
Highest control or compliance needs: self-host on vLLM. Only worth it at steady high volume.
Truly offline or air-gapped: local weights, but budget for serious GPU memory or step down to a smaller model.

FAQ

Is GLM-5.2 free?

GLM-5.2 is free to use in a few specific places right now: bundled into Devin's paid Pro plan at no extra model cost, through Z.ai's ZCODE CLI free token quota, and via a limited Hugging Face Inference Providers window. The weights themselves are free under an MIT license, so self-hosting has no per-token cost. There is no permanent, unlimited free hosted API.

What is the cheapest way to use GLM-5.2?

On hosted APIs, the cheapest routes are OpenRouter's auto-router and DeepInfra, around $1.20 input and $4.10 to $4.20 output per million tokens (fp4 quantized). On a blended 3:1 mix, independent trackers put the cheapest hosts near $0.72 to $0.80 per million tokens. Self-hosting is cheapest only at high, steady volume.

Can I run GLM-5.2 with Claude Code, Cursor, or OpenCode?

Yes. The Z.ai GLM Coding Plan is built for exactly this and supports Claude Code, Cursor, Cline, and 20-plus tools. Model-agnostic open-source agents like OpenCode are an especially good fit, since they let you point at any OpenAI-compatible endpoint (Z.ai API, Fireworks, OpenRouter) and swap models freely without leaving the tool.

Can I run GLM-5.2 locally on my laptop?

Not practically. At roughly 756B total parameters, even a 4-bit quant needs high-RAM multi-GPU hardware. Ollama's glm-5.2:cloud tag runs on hosted GPUs, not your machine. For local coding on consumer hardware, a smaller dense model is the right choice.

What license is GLM-5.2 under?

MIT. Some early posts claimed Apache 2.0, but Z.ai's blog, the Hugging Face model card, and provider pages all confirm MIT - permissive, with no regional or commercial restrictions.

Sources

The Router Era: Why Not Owning a Frontier Model Became an Advantage

No single model wins every task anymore, and the companies that never trained one - Factory, Devin, Perplexity, Cursor, OpenCode - are turning that into a moat. This is how model routing works, why open weights and neoclouds make it cheap, and the honest counter-argument.

11 min read

GLM-5.2 Developer Guide: Z.ai's 1M-Context Coding Model

Z.ai shipped GLM-5.2 in mid-June with a usable 1M-token context window, two thinking-effort levels, and MIT open weights now released. Here is the setup guide for Claude Code, pricing breakdown, and what to test before the benchmarks arrive.

8 min read

GLM-5.2 Cost Math: When Open-Weights Coding Models Actually Save You Money

Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the per-token cost. Here is the real cost math, a worked cost-per-task example, and a when-to-use-which decision guide.

9 min read

Share

Suggest an editSave

Discuss this article on Twitter/X

Developers Digest

Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.

300+ videos30K+ GitHub stars50+ articles

Subscribe YouTube GitHub Twitter/X

Related Tools

Infrastructure

Groq

LPU-powered inference delivering 500-1,000+ tokens/sec. Purpose-built chip with on-chip SRAM instead of HBM. 5-10x faste...

View Tool

Related Guides

Guide

AI Agent Frameworks Compared: LangGraph vs CrewAI vs Mastra vs CopilotKit

Deep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.

AI Agents

Guide

CLAUDE.md Files - Claude Code

Persistent project instructions loaded every session; supports nested dirs.

Claude Code

Where to Run GLM-5.2 Free and Cheap: Every Provider Compared (2026)

Official Sources

What GLM-5.2 is, in one paragraph

The free routes (right now)