
TL;DR
A data-rich, source-cited comparison of the three open-weights coding models that matter in 2026: GLM-5.2, DeepSeek V4, and Qwen3. Benchmark table, per-token pricing, context windows, self-host footprint, and a clear pick-X-if decision matrix.
Direct answer
A data-rich, source-cited comparison of the three open-weights coding models that matter in 2026: GLM-5.2, DeepSeek V4, and Qwen3. Benchmark table, per-token pricing, context windows, self-host footprint, and a clear pick-X-if decision matrix.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Last updated: June 17, 2026
The interesting fight in coding models is no longer open versus closed. It is open versus open. Three labs now ship weights you can download, self-host, and route to per-token at a fraction of frontier pricing, and all three post real software-engineering benchmark numbers: Z.ai's GLM-5.2, DeepSeek V4, and Alibaba's Qwen3 line. If your question is "which open-weights model should I point my coding agent at," this is the comparison built to answer it.
This piece is part of our model-economics beat. For the single-model deep dives, see our GLM-5.2 cost math, our DeepSeek V4 economics breakdown, and the self-hosting break-even math. For the layer that decides when to use which, read model routing recipes and why the orchestration layer is the next big play.
Every figure below is attributed to a primary or named source, with verification dates. Prices and benchmarks move fast on open weights because anyone can host them - verify against the live vendor page before you commit a production budget.
Each of these is a family, not a single model. To keep the comparison fair, we anchor on the strongest open-weights variant from each lab, because the whole point of this category is downloadable weights:
That last point is the single most important caveat in this post: in mid-2026, Qwen's very best coding model is closed. The open-weights Qwen you can actually self-host is a much smaller MoE - which, as the numbers show, punches far above its size.
| GLM-5.2 | DeepSeek V4 Pro | Qwen3.6-35B-A3B | |
|---|---|---|---|
| Vendor | Z.ai (Zhipu) | DeepSeek | Alibaba |
| Released | Jun 16, 2026 | Apr 2026 | Apr 16, 2026 |
| Total params | 753B (MoE) | 1.6T (MoE) | 35B (MoE) |
| Active params | 40B | 49B | 3B |
| Context window | 1M | 1M | large (VRAM-bound when self-hosted) |
| Max output | ~131K | 384K | - |
| License | MIT | MIT | Apache 2.0 |
| Self-host class | multi-GPU server | multi-GPU server | single 24GB GPU |
Sources for this table are in the Sources section. The standout structural facts: DeepSeek V4 Pro is by far the largest (1.6T total), GLM-5.2 sits in the middle, and Qwen3.6-35B-A3B is two orders of magnitude smaller in total parameters and activates only 3B per token - which is why it is the only one of the three that runs on a single consumer-class GPU.
Coding benchmarks for open-weights models are reported by a mix of vendors, aggregators, and third-party evaluators using different scaffolds. Treat them as a cluster, not a leaderboard, and never compare a SWE-bench Verified number against a SWE-bench Pro number - they are different, harder tests.
| Benchmark | GLM-5.2 | DeepSeek V4 Pro | Qwen3.6-35B-A3B |
|---|---|---|---|
| SWE-bench Verified | not separately reported | ~80.6% (V4 Pro-Max) | 73.4 |
| SWE-bench Pro | 62.1 | 55.4 (unverified scaffold) | not reported |
| Terminal-Bench 2.x | 81.0 (v2.1) | 67.9 (v2.0) | not reported |
| AA Intelligence Index | 51 | 44 | not reported |
A few honest reads of this table:
The practical takeaway: on raw quality, DeepSeek V4 Pro and GLM-5.2 are frontier-substitute class, while Qwen3.6-35B-A3B is a remarkable efficiency play that trades a few points of capability for a dramatically smaller footprint.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 17, 2026 • 7 min read
Jun 17, 2026 • 7 min read
Jun 17, 2026 • 11 min read
Jun 17, 2026 • 8 min read
Open-weights pricing is a moving target because the original lab is just one of many hosts. Here are the official first-party API list prices, verified June 17, 2026.
| Model | Input ($/MTok) | Cached input ($/MTok) | Output ($/MTok) |
|---|---|---|---|
| GLM-5.2 (Z.ai list) | $1.40 | - | $4.40 |
| GLM-5.2 (provider median) | ~$0.55 | - | ~$1.85 |
| DeepSeek V4 Pro | $0.435 | $0.003625 | $0.87 |
| DeepSeek V4 Flash | $0.14 | $0.0028 | $0.28 |
| Qwen3.6 Plus (API) | $0.325 (35% off) | - | $1.95 (35% off) |
GLM-5.2 list pricing is $1.40 / $4.40, with a provider median closer to $0.55 / $1.85 because the open weights let third parties compete on hosting (Artificial Analysis). DeepSeek's live pricing page now lists V4 Pro at $0.435 / $0.87 with a near-free cache hit of $0.003625, and V4 Flash at $0.14 / $0.28 (api-docs.deepseek.com/quick_start/pricing, verified June 17, 2026).
One important correction worth flagging: a lot of April launch coverage - and our own earlier DeepSeek V4 economics post - quoted V4 Pro at $1.74 / $3.48. The live DeepSeek pricing page as of June 17, 2026 lists $0.435 / $0.87, roughly a quarter of that. The live page is the source of truth; if you are modeling spend, use it, not the launch articles.
Qwen3.6-35B-A3B itself is open weights, so its "price" depends entirely on who hosts it or your own hardware. The Qwen3.6 Plus API line above ($0.325 / $1.95 at a 35% promo, per OpenRouter) is included as a managed-API reference point for the Qwen family, not as the price of the 35B open-weights model specifically.
Per-token rates do not tell you what work costs. Model one agentic coding task: roughly 40,000 input tokens (repo context, files, tool results) and 8,000 output tokens (plan plus diffs), no caching.
| Model (first-party list) | Cost per task |
|---|---|
| DeepSeek V4 Pro ($0.435 / $0.87) | ~$0.024 |
| GLM-5.2 ($1.40 / $4.40) | ~$0.091 |
| GLM-5.2 (provider median) | ~$0.037 |
At list prices, DeepSeek V4 Pro is the cheapest of the high-quality options on this input-heavy profile, and its near-zero cache-hit rate makes repeated-context agent loops cheaper still. GLM-5.2 closes most of the gap if you shop the open-weights hosting market for the lower provider-median rate. The caveat that survives every pricing table: cheap per token only becomes cheap per task if the model lands the work in as few attempts as a pricier model, which is exactly why the benchmark cluster above matters.
All three flagships are built for repo-scale work.
If your workload is "drop a whole repo in and ask for a coordinated change," any of the three handles the input side. DeepSeek's 384K output ceiling is the one to reach for when the model has to write a lot, not just read a lot.
This is the category that separates "open weights in principle" from "open weights you will actually run."
If "no per-token bill, runs on hardware I already own" is your hard requirement, the decision is effectively made for you: Qwen3.6-35B-A3B is the open-weights coder that fits on a single GPU, and it is the only one of the three in that class.
Pick GLM-5.2 if you want the strongest reported SWE-bench Pro and Terminal-Bench numbers among true open-weights models, you are consuming it via API or a managed host, and you value an MIT license with a wide ecosystem of coding-tool integrations. It is the "best raw coding quality you can also download" pick.
Pick DeepSeek V4 Pro if unit cost is your primary axis and you want frontier-substitute quality. At $0.435 / $0.87 with a near-free cache hit and a 384K output ceiling, it is the cheapest high-quality option for input-heavy, cache-friendly agent loops. Route to V4 Flash ($0.14 / $0.28) for the bounded, high-volume inner-loop steps where you do not need Pro-level reasoning.
Pick Qwen3.6-35B-A3B if you must self-host on modest hardware, want zero per-token cost, or care about latency and data residency. Scoring 73.4 on SWE-bench Verified from a 3B-active model that fits on a 24GB GPU is the best capability-per-footprint deal in open weights right now. Choose the managed Qwen3.7-Max API only if you need Qwen's absolute top coding quality and can accept that it is closed-weights.
Route across all three if you are running real volume. The honest answer for most production setups is not "pick one" but "tier them": cheap open-weights for the easy majority, a frontier-substitute for the hard reasoning, with a failover chain. That is precisely the pattern in our model routing recipes, and the reason the orchestration layer is where the margin is moving.
At first-party API list prices verified June 17, 2026, DeepSeek V4 Pro is the cheapest high-quality option at $0.435 input / $0.87 output per million tokens, with a near-free cache-hit input price of $0.003625. DeepSeek V4 Flash is cheaper still at $0.14 / $0.28 for lighter work. GLM-5.2 lists at $1.40 / $4.40 on Z.ai but has a provider median closer to $0.55 / $1.85 because third parties host the open weights.
It depends on the benchmark, and you should read them as a cluster. GLM-5.2 leads the SWE-bench Pro (62.1) and Terminal-Bench 2.1 (81.0) figures it reports. DeepSeek V4 Pro-Max posts the strongest SWE-bench Verified number of the three (around 80.6%). Qwen3.6-35B-A3B scores 73.4 on SWE-bench Verified, which is exceptional for a model with only 3B active parameters.
Qwen3.6-35B-A3B is the only one of the three that runs on a single consumer-class GPU - about 21GB VRAM at Q4_K_M, fitting a 24GB card. GLM-5.2 (753B) and DeepSeek V4 Pro (1.6T) require multi-GPU server-class deployments, so most teams consume them via API rather than self-host.
No. As of mid-2026, Alibaba's strongest coding model, Qwen3.7-Max, is API-only on DashScope ($2.50 / $7.50 per million tokens) with no published weights. The best Qwen you can self-host is the smaller Qwen3.6-35B-A3B under Apache 2.0.
GLM-5.2 and DeepSeek V4 are released under the MIT license; Qwen3.6-35B-A3B is under Apache 2.0. All three permit commercial use and self-hosting.
| Source | Link | Used for |
|---|---|---|
| Artificial Analysis: GLM-5.2 | artificialanalysis.ai/models/glm-5-2 | GLM-5.2 pricing, Intelligence Index, params, license |
| llm-stats: GLM-5.2 | llm-stats.com/models/glm-5.2 | GLM-5.2 context, max output, license |
| MarkTechPost: GLM-5.2 launch | marktechpost.com | GLM-5.2 release, context, thinking levels |
| DeepSeek API pricing | api-docs.deepseek.com/quick_start/pricing | DeepSeek V4 Pro/Flash live pricing, context, max output |
| Artificial Analysis: DeepSeek V4 Pro | artificialanalysis.ai/models/deepseek-v4-pro | V4 Pro Intelligence Index |
| morphllm: SWE-bench Pro leaderboard | morphllm.com/swe-bench-pro | SWE-bench Pro context |
| morphllm: DeepSeek V4 | morphllm.com/deepseek-v4 | V4 params, context, benchmarks |
| Qwen3.6-35B-A3B announcement | qwen.ai/blog?id=qwen3.6-35b-a3b | Qwen open-weights release, Apache 2.0 |
| DEV: Qwen3.6 SWE-bench 73.4 | dev.to | Qwen3.6-35B-A3B SWE-bench Verified, 3B active |
| OpenRouter: Qwen3.6 Plus | openrouter.ai/qwen/qwen3.6-plus/benchmarks | Qwen API pricing, context, SWE-bench |
| Will It Run AI: Qwen3.6 VRAM | willitrunai.com/blog/qwen-3-6-vram-requirements | Qwen self-host VRAM footprint |
| codersera: Qwen 3.7 Max | codersera.com | Qwen3.7-Max closed-weights pricing, SWE-Pro |
Figures verified June 17, 2026. Benchmark scores are reported by different evaluators on different scaffolds; treat them as a cluster, not an exact ranking, and re-verify against the live vendor pages before making a production decision.
Read next
Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the per-token cost. Here is the real cost math, a worked cost-per-task example, and a when-to-use-which decision guide.
9 min readDeepSeek V4 Pro lands a 63.5 on SWE-bench Verified at $0.435/$0.87 per million tokens, and Flash runs agent inner loops for cents. Here is the worked cost math, the Flash-vs-Pro split, and a clear guide on when to route to DeepSeek instead of a frontier model.
9 min readA first-hand visit to DeepSeek HQ reveals something more interesting than benchmark scores: a 300-person company that treats AI as infrastructure, not eschatology - and what that means for API pricing everywhere.
7 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
DeepSeek's open-weights frontier family, previewed April 24, 2026. V4-Pro is 1.6T total / 49B active params; V4-Flash is...
View ToolOpen-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolOpen-source reasoning models from China. DeepSeek-R1 rivals o1 on math and code benchmarks. V3 for general use. Fully op...
View ToolFastest inference for open-source models. 200+ models via unified API. Ranks #1 on speed benchmarks for DeepSeek, Qwen,...
View ToolInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedDeep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.
AI AgentsClickable PR link in the footer with review state color coding.
Claude Code
Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the p...

DeepSeek V4 Pro lands a 63.5 on SWE-bench Verified at $0.435/$0.87 per million tokens, and Flash runs agent inner loops...

GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4: pricing, benchmarks, context behavior, and license terms for the mid-tier mode...

DeepSeek V4-Flash costs $0.28 per million output tokens. Fable 5 costs $50. That 178x gap is real - but so is the qualit...

A first-hand visit to DeepSeek HQ reveals something more interesting than benchmark scores: a 300-person company that tr...

Alex Ellis shares real production experience running local LLMs: $12k hardware investment, 2-3 month ROI, and why treati...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.