
TL;DR
Z.ai shipped GLM-5.2 on June 13 with a usable 1M-token context window, two thinking-effort levels, and MIT open weights coming soon. Here is the setup guide for Claude Code, pricing breakdown, and what to test before the benchmarks arrive.
| Resource | Link |
|---|---|
| Z.AI Developer Documentation | docs.z.ai/devpack/overview |
| Claude Code Setup | docs.z.ai/devpack/tool/claude |
| GLM-5 Overview | z.ai/blog/glm-5 |
| OpenRouter GLM-5 | openrouter.ai/z-ai/glm-5 |
| Z.AI Twitter | @Zai_org |
Z.ai released GLM-5.2 on June 13, 2026, making it available immediately to every GLM Coding Plan subscriber. This is the company's new flagship coding model - and the headline feature is a 1,000,000-token context window that actually works for large codebase navigation.
The model ships with two thinking-effort levels (High and Max), 131,072 output tokens per response, and MIT-licensed open weights arriving within the week. No benchmarks have been published yet - Z.ai shipped first, benchmarks later. Here is what developers need to know to start testing it.
Last updated: June 15, 2026
GLM-5.2 is a step function jump from GLM-5.1 in context capacity. The usable context window expands from 200,000 tokens to 1,000,000 tokens - roughly five times larger. For coding work, this means you can load entire monorepo directories without hitting the context ceiling that forces aggressive summarization.
The output limit also increased to 131,072 tokens per response, which matters for long refactors, multi-file diffs, and migration scripts that need to output complete files.
Z.ai added two thinking-effort levels:
For coding tasks specifically, Z.ai recommends Max effort. The extra thinking time pays off on tasks that benefit from planning and verification passes - the same pattern Anthropic documented with Fable 5's effort levels.
Z.ai offers the GLM Coding Plan in three tiers, billed quarterly:
| Plan | Price | Quarterly | Notes |
|---|---|---|---|
| Lite | ~$10/month | $30/quarter | Entry point for solo developers |
| Pro | ~$30/month | $90/quarter | Higher quotas, recommended for active use |
| Max | ~$80/month | $240/quarter | Highest limits, team-friendly |
Q2 2026 discounts bring these down slightly ($27, $81, $216 per quarter). Earlier promotional pricing around $3/month no longer exists - Z.ai removed first-purchase discounts in February 2026.
The Coding Plan exposes an Anthropic-compatible endpoint. If you have built agents or workflows against Claude's API, they work with a base-URL and API key swap - no code changes required beyond environment variables.
The GLM Coding Plan maps Claude Code's model tiers to GLM models by default:
To use GLM-5.2 instead of the defaults, you need to override the model environment variables.
Add these to your shell config (.bashrc, .zshrc, or equivalent):
export ANTHROPIC_BASE_URL="https://open.z.ai/api/paas/v4/"
export ANTHROPIC_API_KEY="your-glm-coding-plan-key"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"
export ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]"
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000
The [1m] suffix enables the 1M context variant. Without it, you get the standard context window. The CLAUDE_CODE_AUTO_COMPACT_WINDOW setting tells Claude Code to use the full 1M context before triggering automatic compaction.
Alternatively, add or update these values in ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "https://open.z.ai/api/paas/v4/",
"ANTHROPIC_API_KEY": "your-glm-coding-plan-key",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]",
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000"
}
}
If Claude Code reports that the model with the [1m] suffix does not exist, upgrade to the latest Claude Code version and try again.
After setup, start a new Claude Code session and ask it to identify which model it is running. The response should confirm GLM-5.2. You can also check by running a task that would exceed the standard 200K context - if it works without compaction warnings, the 1M context is active.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 15, 2026 • 8 min read
Jun 14, 2026 • 8 min read
Jun 13, 2026 • 8 min read
Jun 13, 2026 • 6 min read
Z.ai shipped GLM-5.2 without published benchmarks. That is not necessarily a red flag - it matches Z.ai's historical pattern of releasing models quickly and letting the community validate them. But it does mean you should test against your actual workloads before committing.
Large codebase navigation. Load a 500K+ token directory into context and ask the model to explain architecture, find specific patterns, or trace a call path. This is where the 1M context should shine - or fail visibly if the attention degrades at scale.
Multi-file refactors. Ask for a refactor that touches 10+ files. Check whether the output maintains consistency across files and respects your existing patterns. GLM-5.1 was competitive with Sonnet 4.5 on this; GLM-5.2 should be stronger.
Long-horizon agentic tasks. Run a multi-step task with tool calls - file reads, writes, searches. Track whether the model stays on task across steps or drifts. Use Max effort for this.
Comparison with your current model. Run the same task through GLM-5.2 and your current default (Fable 5, Opus 4.8, GPT-5.5, whatever you use). Note completion quality, speed, and whether the context window matters for that task.
Direct GLM-5.2 vs Fable 5 vs GPT-5.5 benchmarks do not exist yet. What we know from GLM-5.1 benchmarks as a baseline:
| Model | SWE-bench Pro | Code Arena Elo |
|---|---|---|
| Claude Fable 5 | 80.3% | Not rated |
| GPT-5.5 | 58.6% | Not rated |
| GLM-5.1 | 58.4% | 1530 |
GLM-5.1 was competitive with GPT-5.5 on SWE-bench Pro. GLM-5.2 should improve on that - the question is by how much.
The context window advantage is real and measurable. Fable 5 offers a 1M+ context window as well, but at $10/$50 per million tokens on the API. GLM Coding Plan pricing is substantially lower for comparable context capacity.
Good fit:
Wait or skip:
Z.ai stated that the MIT-licensed open weights release will follow within a week of the Coding Plan launch. That means self-hosting and local inference options are imminent. For developers who want to run GLM-5.2 on their own infrastructure, the timeline is short.
The standalone API is also rolling out - currently the model is only accessible through the Coding Plan's Anthropic-compatible endpoint. Once the API launches, OpenRouter and other aggregators will likely add it to their routing options.
No benchmark release timeline has been announced. Z.ai's pattern is to let community testing generate the numbers rather than publishing self-reported scores.
GLM-5.2 is Z.ai's newest flagship coding model, released June 13, 2026. It features a 1,000,000-token context window, 131,072 output tokens per response, two thinking-effort levels (High and Max), and MIT-licensed open weights arriving soon.
GLM-5.2 is available through the GLM Coding Plan. Pricing is approximately $10/month (Lite), $30/month (Pro), or $80/month (Max), billed quarterly. The standalone API with per-token pricing is still rolling out.
Yes. The GLM Coding Plan exposes an Anthropic-compatible endpoint. Set the ANTHROPIC_BASE_URL to https://open.z.ai/api/paas/v4/, configure your API key, and override the model environment variables to use glm-5.2[1m]. See the setup section above for full details.
No direct benchmarks exist yet. GLM-5.1 scored 58.4% on SWE-bench Pro compared to Fable 5's 80.3%. GLM-5.2 should improve on 5.1 but the magnitude is unknown. The main advantage is cost: GLM Coding Plan pricing is substantially lower than Fable 5 API rates at $10/$50 per million tokens.
Z.ai stated the MIT-licensed open weights will release within a week of the June 13 Coding Plan launch. That puts the expected release around June 20, 2026.
Yes. Because the GLM Coding Plan uses an Anthropic-compatible endpoint, your existing MCP server configurations, skills, and hooks work without modification. You only need to change the base URL and API key.
High effort returns faster responses suitable for routine coding tasks. Max effort runs deeper reasoning passes before returning an answer, recommended for complex refactors and multi-step agentic work. Z.ai recommends Max for coding tasks where accuracy matters more than speed.
Not yet. The standalone API is still rolling out. Currently, access requires a GLM Coding Plan subscription at the Lite, Pro, or Max tier.
Read next
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Devin, and the Anthropic API - verified from live pricing pages on June 15, 2026. Only 7 days until the Fable 5 deadline.
9 min readChoosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to keep code off third-party servers. Here is what to run and on what hardware.
8 min readOpenCode is the fastest-growing open-source AI coding agent - 160K GitHub stars, 7.5M monthly users, 75+ model providers. Here is how to set it up, configure models, and use it effectively in your workflow.
11 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Open-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolOpen-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
View ToolOpen-source AI code assistant for VS Code and JetBrains. Bring your own model - local or API. Tab autocomplete, chat,...
View ToolOpen-source reasoning models from China. DeepSeek-R1 rivals o1 on math and code benchmarks. V3 for general use. Fully op...
View ToolEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppScore every coding agent on your own tasks. Catch regressions in CI.
View AppRoute prompts to the right model based on cost, latency, and priority rules.
View AppInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedClickable PR link in the footer with review state color coding.
Claude CodeUse opus, sonnet, haiku, and best to switch models easily.
Claude CodeEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...
Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to...

OpenCode is the fastest-growing open-source AI coding agent - 160K GitHub stars, 7.5M monthly users, 75+ model providers...
Anthropic shipped Fable 5 and a June 22 subscription cliff. OpenAI shipped GPT-5.5 inside Codex plus automations, browse...
Windsurf is now Devin Desktop, owned by Cognition after a turbulent 2025 acquisition saga. If the ownership shuffle has...
Uber burned through its entire 2026 AI tools budget by April. Microsoft is canceling Claude Code licenses company-wide....

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.