GLM-5.2 vs DeepSeek V4 vs Qwen3: The Open-Weights Coding Model Showdown (2026)

A data-rich, source-cited comparison of the open-weights coding models that matter in 2026: GLM-5.2, DeepSeek V4, Qwen3, and the new Kimi K3 frontier entrant. Benchmark table, per-token pricing, context windows, self-host footprint, and a clear pick-X-if decision matrix.

Best for

Developers comparing real tool tradeoffs before choosing a stack.

Covers

Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.

Last updated: July 31, 2026

The interesting fight in coding models is no longer open versus closed. It is open versus open. Three labs now ship weights you can download, self-host, and route to per-token at a fraction of frontier pricing, and all three post real software-engineering benchmark numbers: Z.ai's GLM-5.2, DeepSeek V4, and Alibaba's Qwen3 line. If your question is "which open-weights model should I point my coding agent at," this is the comparison built to answer it.

This piece is part of our model-economics beat. For the single-model deep dives, see our GLM-5.2 cost math, our DeepSeek V4 economics breakdown, and the self-hosting break-even math. For the layer that decides when to use which, read model routing recipes and why the orchestration layer is the next big play.

Every figure below is attributed to a primary or named source, with verification dates. Prices and benchmarks move fast on open weights because anyone can host them - verify against the live vendor page before you commit a production budget.

What Changed on July 31, 2026#

Kimi K3 open weights landed (July 27). Moonshot released the 2.8T-parameter K3 on Hugging Face with 104B active per token, native MXFP4 4-bit weights, and a 1M-token context. It posts Terminal-Bench 2.1 at 88.3 - the top score among open models - and matches or beats closed frontier models on several agent benchmarks. This is the first open release that sits at the frontier, and it changes the category's ceiling (see the K3 weights analysis and every access route with prices).
DeepSeek's naming settled. The July 24 deprecation wave retired the legacy deepseek-chat / deepseek-reasoner names; the live API is now deepseek-v4-pro and deepseek-v4-flash, with V4 Flash 0731 shipping on July 31 as a re-post-trained agent workhorse at the same $0.14/$0.28 rates (DeepSeek change log).
GLM-5.2's price keeps sliding. Community analysis around the K3 release notes GLM-5.2 prices dropped roughly 45% since its June 16 launch. Z.ai's list price is unchanged at $1.40/$4.40, but OpenRouter now routes across 13-plus hosting providers and the cheapest blended hosts sit near $0.72-0.80 per million tokens on a 3:1 input-output mix.
Qwen's frontier stays closed. Qwen3.7-Max remains API-only; the best self-hostable Qwen is still Qwen3.6-35B-A3B. No change in July, which is itself the story: Alibaba has not answered K3 with open weights.

A Note on Which Variant We Compare#

Each of these is a family, not a single model. To keep the comparison fair, we anchor on the strongest open-weights variant from each lab, because the whole point of this category is downloadable weights:

GLM-5.2 - Z.ai's flagship, open weights under MIT.
DeepSeek V4 Pro - the flagship tier (we cover V4 Flash separately), open weights under MIT, now served under the deepseek-v4-pro API name.
Qwen3.6-35B-A3B - Alibaba's open-weights coding MoE under Apache 2.0. Alibaba's larger Qwen3.7-Max is the stronger model on paper but is API-only with no open weights, so it sits outside this category; we flag it where relevant.
Kimi K3 - Moonshot's frontier flagship, weights released July 27, 2026 under a custom license. The new entrant to this category, and currently its capability ceiling.

That last point about Qwen is the single most important caveat in this post: in mid-2026, Qwen's very best coding model is closed. The open-weights Qwen you can actually self-host is a much smaller MoE - which, as the numbers show, punches far above its size. Kimi K3 inverts that story: it is the frontier model that did come out, and the trade-off is a datacenter-sized footprint (see the self-host section).

The Contenders at a Glance#

	GLM-5.2	DeepSeek V4 Pro	Qwen3.6-35B-A3B	Kimi K3
Vendor	Z.ai (Zhipu)	DeepSeek	Alibaba	Moonshot
Released	Jun 16, 2026	Apr 2026	Apr 16, 2026	Jul 27, 2026
Total params	753B (MoE)	1.6T (MoE)	35B (MoE)	2.8T (MoE)
Active params	40B	49B	3B	104B
Context window	1M	1M	large (VRAM-bound when self-hosted)	1M
Max output	~131K	384K	-	not published
License	MIT	MIT	Apache 2.0	Custom (free under thresholds)
Self-host class	multi-GPU server	multi-GPU server	single 24GB GPU	~1.5TB VRAM (datacenter)

Sources for this table are in the Sources section. The standout structural facts: DeepSeek V4 Pro is a large 1.6T total-parameter MoE, GLM-5.2 sits in the middle, Kimi K3 is the largest open release ever at 2.8T total with 104B active, and Qwen3.6-35B-A3B is two orders of magnitude smaller in total parameters and activates only 3B per token - which is why it is the only one of the four that runs on a single consumer-class GPU.

Benchmarks: Read the Cluster, Not the Ranking#

Coding benchmarks for open-weights models are reported by a mix of vendors, aggregators, and third-party evaluators using different scaffolds. Treat them as a cluster, not a leaderboard, and never compare a SWE-bench Verified number against a SWE-bench Pro number - they are different, harder tests.

Benchmark	GLM-5.2	DeepSeek V4 Pro	Qwen3.6-35B-A3B	Kimi K3
SWE-bench Verified	not separately reported	~80.6% (V4 Pro-Max)	73.4	not reported
SWE-bench Pro	62.1	55.4 (unverified scaffold)	not reported	not reported
Terminal-Bench 2.x	81.0 (v2.1)	67.9 (v2.0)	not reported	88.3 (v2.1)
AA Intelligence Index	51	44	not reported	not reported
ProgramBench	not reported	not reported	not reported	77.8
SWE-Marathon	not reported	not reported	not reported	42.0
MCPMark-Verified	not reported	not reported	not reported	94.5

A few honest reads of this table:

Kimi K3 is now the capability ceiling of the open-weights category: Terminal-Bench 2.1 at 88.3 tops every open model, and its agent benchmarks (MCPMark-Verified 94.5, ProgramBench 77.8, SWE-Marathon 42.0) land at or above the closed frontier on most axes. It does not report SWE-bench Verified, so we leave those cells blank. The caveats are its license (free under revenue thresholds, with a commercial agreement required above them) and its footprint.
GLM-5.2 leads the SWE-bench Pro and Terminal-Bench numbers it reports, and its Artificial Analysis Intelligence Index of 51 is described as well above the median for open-weight models of similar size. Note that artificialanalysis.ai does not list a separate SWE-bench Verified figure for GLM-5.2, so we leave that cell blank rather than borrow a number from a less reliable source.
DeepSeek V4 Pro-Max posts the strongest SWE-bench Verified figure of the traditional three (around 80.6%, per third-party trackers), but its SWE-bench Pro and Terminal-Bench numbers come from vendor-style scaffolds and should be treated as indicative. The V4 Flash 0731 update brought its budget sibling within striking distance of these scores at $0.14/$0.28 (see our 0731 analysis).
Qwen3.6-35B-A3B scoring 73.4 on SWE-bench Verified with only 3B active parameters is the most surprising data point in the set. It is not the top score, but on a per-active-parameter and per-watt basis nothing else here is close.

The practical takeaway: on raw quality, Kimi K3 is now the frontier-substitute pick of the category, DeepSeek V4 Pro and GLM-5.2 hold the middle, and Qwen3.6-35B-A3B remains the efficiency play that trades a few points of capability for a dramatically smaller footprint.

From the archive

Mastra npm Supply Chain Attack: 140+ AI Framework Packages Backdoored

Jun 17, 2026 • 7 min read

Microsoft's Work IQ APIs Hit GA: What Agent Builders Actually Get on June 16

Jun 17, 2026 • 7 min read

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

Jun 17, 2026 • 11 min read

Omnigent: Databricks' Meta-Harness for Orchestrating Claude Code, Codex, and Custom Agents

Jun 17, 2026 • 8 min read

Pricing: The Live Page Is the Only Source of Truth#

Open-weights pricing is a moving target because the original lab is just one of many hosts. Here are the official first-party API list prices, verified July 31, 2026.

Model	Input ($/MTok)	Cached input ($/MTok)	Output ($/MTok)
GLM-5.2 (Z.ai list)	$1.40	-	$4.40
GLM-5.2 (provider median)	~$0.55	-	~$1.85
DeepSeek V4 Pro	$0.435	$0.003625	$0.87
DeepSeek V4 Flash	$0.14	$0.0028	$0.28
Kimi K3	$3.00	$0.30	$15.00
Qwen3.6 Plus (API)	$0.325 (35% off)	-	$1.95 (35% off)

GLM-5.2 list pricing is $1.40 / $4.40, with a provider median closer to $0.55 / $1.85 because the open weights let third parties compete on hosting (Artificial Analysis). The slide continues: pricing has dropped roughly 45% since the June 16 launch, and the cheapest blended hosts sit near $0.72-0.80 per million tokens on a 3:1 input-output mix, with Wafer the fastest at over 200 tokens/sec. DeepSeek's live pricing page lists V4 Pro at $0.435 / $0.87 with a near-free cache hit of $0.003625, and V4 Flash at $0.14 / $0.28 - unchanged across the July 24 rename and the July 31 re-post-training (api-docs.deepseek.com/quick_start/pricing).

Kimi K3 lists at $3.00 / $15.00 with a $0.30 cache-read rate across Moonshot, Together, Fireworks, Modal, SiliconFlow, and OpenRouter - priced like a frontier model, because it benchmarks like one. The interesting wrinkle: K3's 2.8T weights are open under a custom license, so expect the same third-party hosting price competition that drove GLM-5.2 down once dedicated providers spin up.

One important correction worth flagging: a lot of April launch coverage - and our own earlier DeepSeek V4 economics post - quoted V4 Pro at $1.74 / $3.48. The live DeepSeek pricing page lists $0.435 / $0.87, roughly a quarter of that. The live page is the source of truth; if you are modeling spend, use it, not the launch articles.

Qwen3.6-35B-A3B itself is open weights, so its "price" depends entirely on who hosts it or your own hardware. The Qwen3.6 Plus API line above ($0.325 / $1.95 at a 35% promo, per OpenRouter) is included as a managed-API reference point for the Qwen family, not as the price of the 35B open-weights model specifically.

What a Real Task Costs#

Per-token rates do not tell you what work costs. Model one agentic coding task: roughly 40,000 input tokens (repo context, files, tool results) and 8,000 output tokens (plan plus diffs), no caching.

Model (first-party list)	Cost per task
DeepSeek V4 Pro ($0.435 / $0.87)	~$0.024
GLM-5.2 (provider median)	~$0.037
GLM-5.2 ($1.40 / $4.40)	~$0.091
Kimi K3 ($3.00 / $15.00)	~$0.24

At list prices, DeepSeek V4 Pro is the cheapest of the high-quality options on this input-heavy profile, and its near-zero cache-hit rate makes repeated-context agent loops cheaper still. GLM-5.2 closes most of the gap if you shop the open-weights hosting market for the lower provider-median rate. Kimi K3 costs about 10x V4 Pro per task - the honest premium for frontier-level agent scores from open weights. The caveat that survives every pricing table: cheap per token only becomes cheap per task if the model lands the work in as few attempts as a pricier model, which is exactly why the benchmark cluster above matters.

Context Windows and Output Limits#

All three flagships are built for repo-scale work.

GLM-5.2: 1M-token context, max output around 128K-131K depending on host (llm-stats.com).
DeepSeek V4 Pro and Flash: 1M-token context, 384K max output - the largest output ceiling of the four, and a genuine differentiator for tasks that emit huge diffs or long structured plans (DeepSeek pricing docs).
Kimi K3: 1,048,576-token context with native multimodal input (MoonViT-V2 vision encoder). Max output is not published in the technical report.
Qwen3 family: the managed Qwen3.6 Plus and Qwen3.7-Max APIs advertise 1M context; the open-weights Qwen3.6-35B-A3B supports long context but useful self-hosted context length is bounded by your VRAM (see below).

If your workload is "drop a whole repo in and ask for a coordinated change," any of the four handles the input side. DeepSeek's 384K output ceiling is the one to reach for when the model has to write a lot, not just read a lot.

Self-Host Footprint: Where Qwen3 Wins Outright#

This is the category that separates "open weights in principle" from "open weights you will actually run."

Kimi K3 (2.8T / 104B active): the datacenter model. Native MXFP4 weights need roughly 1.5TB of VRAM - about 8x B200-class GPUs at the limit, realistically 16x for context and throughput. A 2-bit quant is already on Hugging Face at ~1TB. Individual developers are priced out; this is a cloud-provider and well-funded-lab deployment, and the custom license adds a revenue-based commercial term on top.
GLM-5.2 (753B / 40B active): a multi-GPU server-class deployment. Open weights are on Hugging Face under MIT, but serving a 753B MoE means real GPU capacity and ops investment. The economics only favor self-hosting at sustained high volume - see our break-even math.
DeepSeek V4 Pro (1.6T / 49B active): the heaviest of the traditional three to self-host. The weights are open, but a 1.6T-parameter checkpoint is a serious infrastructure commitment; most teams will consume V4 via the (very cheap) API rather than host it. V4 Flash (284B / 13B active) is the more realistic self-host target in the DeepSeek family.
Qwen3.6-35B-A3B (35B / 3B active): runs at roughly 21GB VRAM at Q4_K_M (about 37GB at Q8), which fits a single 24GB GPU, with reports of it running on as little as 6GB via llama.cpp at reduced speed (Will It Run AI, knightli.com). This is the only model here a solo developer can realistically self-host on a workstation.

If "no per-token bill, runs on hardware I already own" is your hard requirement, the decision is effectively made for you: Qwen3.6-35B-A3B is the open-weights coder that fits on a single GPU, and it is the only one of the four in that class.

The Decision Matrix: Pick X If...#

Pick Kimi K3 if you want the top of the open-weights category, period - Terminal-Bench 2.1 at 88.3 and agent benchmarks that match closed frontier models. You are consuming it via a managed API (Moonshot, Together, Fireworks, Modal, SiliconFlow, or OpenRouter at $3/$15) and your business is under the license's revenue thresholds. Budget for a frontier-model token bill: K3 costs about 10x DeepSeek V4 Pro per task.

Pick GLM-5.2 if you want the strongest reported SWE-bench Pro and Terminal-Bench numbers among the established open-weights models, you are consuming it via API or a managed host, and you value an MIT license with a wide ecosystem of coding-tool integrations. The price slide since launch (roughly 45%, with blended third-party rates near $0.72-0.80 per million tokens) makes it the value pick at the frontier-substitute tier.

Pick DeepSeek V4 Pro if unit cost is your primary axis and you want frontier-substitute quality. At $0.435 / $0.87 with a near-free cache hit and a 384K output ceiling, it is the cheapest high-quality option for input-heavy, cache-friendly agent loops. Route to V4 Flash ($0.14 / $0.28) for the bounded, high-volume inner-loop steps where you do not need Pro-level reasoning - the 0731 re-post-training made it a credible whole-loop agent.

Pick Qwen3.6-35B-A3B if you must self-host on modest hardware, want zero per-token cost, or care about latency and data residency. Scoring 73.4 on SWE-bench Verified from a 3B-active model that fits on a 24GB GPU is the best capability-per-footprint deal in open weights right now. Choose the managed Qwen3.7-Max API only if you need Qwen's absolute top coding quality and can accept that it is closed-weights.

Route across all four if you are running real volume. The honest answer for most production setups is not "pick one" but "tier them": cheap open-weights for the easy majority, a frontier-substitute for the hard reasoning, with a failover chain. That is precisely the pattern in our model routing recipes, and the reason the orchestration layer is where the margin is moving. K3 as the frontier rung, GLM-5.2 or V4 Pro as the middle, V4 Flash or Qwen3.6-35B-A3B as the cheap floor is a coherent July 2026 stack.

Frequently Asked Questions#

Which open-weights coding model is cheapest?#

At first-party API list prices verified July 31, 2026, DeepSeek V4 Pro is the cheapest high-quality option at $0.435 input / $0.87 output per million tokens, with a near-free cache-hit input price of $0.003625. DeepSeek V4 Flash is cheaper still at $0.14 / $0.28 for lighter work. GLM-5.2 lists at $1.40 / $4.40 on Z.ai but has a provider median closer to $0.55 / $1.85 because third parties host the open weights. Kimi K3 is the premium option of the category at $3.00 / $15.00.

Which open-weights model is best on coding benchmarks?#

It depends on the benchmark, and you should read them as a cluster. Kimi K3 is the new category leader: Terminal-Bench 2.1 at 88.3, the top open-model score, plus ProgramBench 77.8 and MCPMark-Verified 94.5. Among the established three, GLM-5.2 leads the SWE-bench Pro (62.1) and Terminal-Bench 2.1 (81.0) figures it reports, DeepSeek V4 Pro-Max posts the strongest SWE-bench Verified number (around 80.6%), and Qwen3.6-35B-A3B scores 73.4 on SWE-bench Verified with only 3B active parameters.

Which one can I actually self-host on my own hardware?#

Qwen3.6-35B-A3B is the only one of the four that runs on a single consumer-class GPU - about 21GB VRAM at Q4_K_M, fitting a 24GB card. GLM-5.2 (753B), DeepSeek V4 Pro (1.6T), and Kimi K3 (2.8T, ~1.5TB VRAM at native MXFP4) require server-class or datacenter deployments, so most teams consume them via API rather than self-host.

Is Kimi K3 open weights?#

Yes, with a license. Moonshot released the 2.8T K3 weights on Hugging Face on July 27, 2026 under a custom Kimi K3 License: free for most use, but a separate commercial agreement is required for model-as-a-service businesses above $20M aggregate revenue over any 12 consecutive months, and prominent "Kimi K3" branding applies at very large scale. The full inference stack (MoonEP expert parallelism, AgentEnv eval environment) is also open sourced.

Is Qwen's best coding model open weights?#

No. As of mid-2026, Alibaba's strongest coding model, Qwen3.7-Max, is API-only on DashScope ($2.50 / $7.50 per million tokens) with no published weights. The best Qwen you can self-host is the smaller Qwen3.6-35B-A3B under Apache 2.0.

What licenses do these use?#

GLM-5.2 and DeepSeek V4 are released under the MIT license; Qwen3.6-35B-A3B is under Apache 2.0; Kimi K3 uses a custom license that is free under revenue thresholds and requires a commercial agreement for large model-as-a-service businesses. All four permit commercial use and self-hosting (K3 with the license caveat).

Continue Reading#

Kimi K3 Weights Land on HuggingFace - the 2.8T open frontier release, benchmarks, and the licensing terms
Where to Access Kimi K3 - every provider route with verified prices
GLM-5.2 Cost Math - why the open-weights price war keeps driving GLM down
DeepSeek V4 Flash 0731 Update - what changed when DeepSeek re-post-trained its budget model
Budget AI Coding Models Compared - V4 Flash vs Luna vs Gemini 3.5 Flash vs Haiku 4.5 at the cheap end of the market
Self-Hosting Open-Weights Models: Break-Even Math - when hosting your own weights beats the API

Official Sources#

Source	Link	Last verified
Artificial Analysis: GLM-5.2	artificialanalysis.ai/models/glm-5-2	July 31, 2026
llm-stats: GLM-5.2	llm-stats.com/models/glm-5.2	July 31, 2026
DeepSeek API pricing	api-docs.deepseek.com/quick_start/pricing	July 31, 2026
DeepSeek change log	api-docs.deepseek.com/updates	July 31, 2026
Hugging Face: Kimi K3	huggingface.co/moonshotai/Kimi-K3	July 31, 2026
Kimi K3 access and pricing guide	kimi.com/blog/kimi-k3	July 31, 2026
OpenRouter: K3	openrouter.ai/moonshotai/kimi-k3-20260715	July 31, 2026
Qwen3.6-35B-A3B announcement	qwen.ai/blog?id=qwen3.6-35b-a3b	July 31, 2026
Will It Run AI: Qwen3.6 VRAM	willitrunai.com/blog/qwen-3-6-vram-requirements	July 31, 2026

Figures verified July 31, 2026. Benchmark scores are reported by different evaluators on different scaffolds; treat them as a cluster, not an exact ranking, and re-verify against the live vendor pages before making a production decision.

Last updated: July 31, 2026

What Changed on July 31, 2026#

Kimi K3 open weights landed (July 27). Moonshot released the 2.8T-parameter K3 on Hugging Face with 104B active per token, native MXFP4 4-bit weights, and a 1M-token context. It posts Terminal-Bench 2.1 at 88.3 - the top score among open models - and matches or beats closed frontier models on several agent benchmarks. This is the first open release that sits at the frontier, and it changes the category's ceiling (see the K3 weights analysis and every access route with prices).
DeepSeek's naming settled. The July 24 deprecation wave retired the legacy deepseek-chat / deepseek-reasoner names; the live API is now deepseek-v4-pro and deepseek-v4-flash, with V4 Flash 0731 shipping on July 31 as a re-post-trained agent workhorse at the same $0.14/$0.28 rates (DeepSeek change log).
GLM-5.2's price keeps sliding. Community analysis around the K3 release notes GLM-5.2 prices dropped roughly 45% since its June 16 launch. Z.ai's list price is unchanged at $1.40/$4.40, but OpenRouter now routes across 13-plus hosting providers and the cheapest blended hosts sit near $0.72-0.80 per million tokens on a 3:1 input-output mix.
Qwen's frontier stays closed. Qwen3.7-Max remains API-only; the best self-hostable Qwen is still Qwen3.6-35B-A3B. No change in July, which is itself the story: Alibaba has not answered K3 with open weights.

A Note on Which Variant We Compare#

GLM-5.2 - Z.ai's flagship, open weights under MIT.
DeepSeek V4 Pro - the flagship tier (we cover V4 Flash separately), open weights under MIT, now served under the deepseek-v4-pro API name.
Qwen3.6-35B-A3B - Alibaba's open-weights coding MoE under Apache 2.0. Alibaba's larger Qwen3.7-Max is the stronger model on paper but is API-only with no open weights, so it sits outside this category; we flag it where relevant.
Kimi K3 - Moonshot's frontier flagship, weights released July 27, 2026 under a custom license. The new entrant to this category, and currently its capability ceiling.

The Contenders at a Glance#

	GLM-5.2	DeepSeek V4 Pro	Qwen3.6-35B-A3B	Kimi K3
Vendor	Z.ai (Zhipu)	DeepSeek	Alibaba	Moonshot
Released	Jun 16, 2026	Apr 2026	Apr 16, 2026	Jul 27, 2026
Total params	753B (MoE)	1.6T (MoE)	35B (MoE)	2.8T (MoE)
Active params	40B	49B	3B	104B
Context window	1M	1M	large (VRAM-bound when self-hosted)	1M
Max output	~131K	384K	-	not published
License	MIT	MIT	Apache 2.0	Custom (free under thresholds)
Self-host class	multi-GPU server	multi-GPU server	single 24GB GPU	~1.5TB VRAM (datacenter)

Benchmarks: Read the Cluster, Not the Ranking#

Benchmark	GLM-5.2	DeepSeek V4 Pro	Qwen3.6-35B-A3B	Kimi K3
SWE-bench Verified	not separately reported	~80.6% (V4 Pro-Max)	73.4	not reported
SWE-bench Pro	62.1	55.4 (unverified scaffold)	not reported	not reported
Terminal-Bench 2.x	81.0 (v2.1)	67.9 (v2.0)	not reported	88.3 (v2.1)
AA Intelligence Index	51	44	not reported	not reported
ProgramBench	not reported	not reported	not reported	77.8
SWE-Marathon	not reported	not reported	not reported	42.0
MCPMark-Verified	not reported	not reported	not reported	94.5

A few honest reads of this table:

Kimi K3 is now the capability ceiling of the open-weights category: Terminal-Bench 2.1 at 88.3 tops every open model, and its agent benchmarks (MCPMark-Verified 94.5, ProgramBench 77.8, SWE-Marathon 42.0) land at or above the closed frontier on most axes. It does not report SWE-bench Verified, so we leave those cells blank. The caveats are its license (free under revenue thresholds, with a commercial agreement required above them) and its footprint.
GLM-5.2 leads the SWE-bench Pro and Terminal-Bench numbers it reports, and its Artificial Analysis Intelligence Index of 51 is described as well above the median for open-weight models of similar size. Note that artificialanalysis.ai does not list a separate SWE-bench Verified figure for GLM-5.2, so we leave that cell blank rather than borrow a number from a less reliable source.
DeepSeek V4 Pro-Max posts the strongest SWE-bench Verified figure of the traditional three (around 80.6%, per third-party trackers), but its SWE-bench Pro and Terminal-Bench numbers come from vendor-style scaffolds and should be treated as indicative. The V4 Flash 0731 update brought its budget sibling within striking distance of these scores at $0.14/$0.28 (see our 0731 analysis).
Qwen3.6-35B-A3B scoring 73.4 on SWE-bench Verified with only 3B active parameters is the most surprising data point in the set. It is not the top score, but on a per-active-parameter and per-watt basis nothing else here is close.

From the archive

Mastra npm Supply Chain Attack: 140+ AI Framework Packages Backdoored

Jun 17, 2026 • 7 min read

Microsoft's Work IQ APIs Hit GA: What Agent Builders Actually Get on June 16

Jun 17, 2026 • 7 min read

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

Jun 17, 2026 • 11 min read

Omnigent: Databricks' Meta-Harness for Orchestrating Claude Code, Codex, and Custom Agents

Jun 17, 2026 • 8 min read

Pricing: The Live Page Is the Only Source of Truth#

Open-weights pricing is a moving target because the original lab is just one of many hosts. Here are the official first-party API list prices, verified July 31, 2026.

Model	Input ($/MTok)	Cached input ($/MTok)	Output ($/MTok)
GLM-5.2 (Z.ai list)	$1.40	-	$4.40
GLM-5.2 (provider median)	~$0.55	-	~$1.85
DeepSeek V4 Pro	$0.435	$0.003625	$0.87
DeepSeek V4 Flash	$0.14	$0.0028	$0.28
Kimi K3	$3.00	$0.30	$15.00
Qwen3.6 Plus (API)	$0.325 (35% off)	-	$1.95 (35% off)