Cohere's North Mini Code: A 30B Open-Weight Coding Model That Runs on One H100

Last updated: June 24, 2026

Cohere has mostly been known for retrieval, embeddings, and enterprise RAG. On June 9, 2026, it stepped onto coding-model turf for the first time. North Mini Code is the company's first agentic coding model, released with open weights under Apache 2.0 and sized so it fits on a single H100. For teams that want a capable coding model they can actually self-host, that combination is the headline.

This post sticks to what Cohere has published and what independent coverage has confirmed. Where a number is a vendor claim rather than an independently reproduced benchmark, it is flagged as such.

What North Mini Code Is#

North Mini Code is a mixture-of-experts (MoE) model with 30B total parameters and 3B active per token. The MoE design is the whole point: you get the knowledge capacity of a 30B model while only paying the inference cost of routing through roughly 3B parameters on each forward pass. That is what keeps the deployment footprint small.

The specs Cohere lists:

30B total / 3B active parameters (MoE)
256K token context window
64K token maximum generation length
Apache 2.0 license (open weights, commercial use permitted)
Minimum hardware: 1x H100 at FP8 or 1x H100 at FP4

It is positioned as the first model in what Cohere calls its next generation, and it is aimed squarely at agentic software engineering - code generation, sub-agent orchestration, architecture mapping, code review, and terminal work. That places it in the same practical category as local Qwen workflows, GLM-5.2 local deployment, and the broader best local coding LLMs conversation.

Where You Can Run It#

Cohere made availability broad on day one, which matters more than benchmarks for a model people are meant to actually deploy:

Hugging Face - weights in bf16, fp8, and w4a16 (4-bit) formats
Cohere API - hosted inference
Model Vault - Cohere's managed dedicated inference
OpenCode - usable for free inside the open-source coding harness
OpenRouter - routed access alongside other models

The OpenCode integration is a smart distribution move. OpenCode is currently the most-starred open-source coding agent, so dropping North Mini Code into that harness puts it in front of a large existing developer audience without asking anyone to change tools.

The Benchmark Picture (Read This Carefully)#

This is where favorable framing needs to give way to precision.

The single concrete launch number Cohere reports is 33.4 on the Artificial Analysis Coding Index, which the company characterizes as a competitive position among similarly sized models. That is an honest framing - it is not claiming frontier parity, it is claiming it competes in its weight class.

The current public Artificial Analysis model page is a useful second check, but it is not the same metric. At refresh time, it listed North Mini Code at 21 on the Artificial Analysis Intelligence Index and around 123 output tokens per second for the measured provider route. That supports the same practical read: this is a fast, efficient open-weight coding model, not a frontier hosted-model replacement.

Cohere's Hugging Face materials now expose more of the benchmark picture than the first version of this post had. The model card lists 67.6 on SWE-Bench Verified and 40.2 on SWE-Bench Pro, with evaluation through the SWE-agent harness. It also documents the broader evaluation suite: Terminal-Bench v2, Terminal-Bench Hard, SciCode, and LiveCodeBench v6, with three seeds averaged at temperature 1.0 and top_p 0.95.

That is useful, but still read it as a benchmark suite, not a universal quality guarantee. Harness choice matters for coding agents, and North Mini Code is clearly optimized for code and terminal work rather than general agentic tasks.

The efficiency claims are clearer, though still vendor-reported. Against Devstral Small 2, Cohere reports:

Up to 2.8x higher output throughput
A ~30% advantage in inter-token latency
Devstral Small 2 keeps a slight lead on time-to-first-token

These are throughput-and-latency claims, not quality claims, and they have not been independently reproduced. They are plausible given the 3B active-parameter design, but worth confirming on your own hardware before you build a latency budget around them.

From the archive

Cursor Origin: A Git Forge Built for AI Agents, Not Humans

Jun 17, 2026 • 8 min read

DeepSeek V4 Economics: The Cost-Quality Frontier for Agentic Coding in 2026

Jun 17, 2026 • 9 min read

Epic Games Releases Lore: A Version Control System Built for Game Development

Jun 17, 2026 • 7 min read

Everything Vercel Shipped at Ship 26 (June 2026)

Jun 17, 2026 • 8 min read

Why This Matters#

The interesting story here is not "Cohere beat anyone." It is the shape of the offering.

Self-hosting just got more practical for coding. A 30B MoE that runs on one H100 with open weights and a permissive license lands in the same conversation as Devstral, Qwen-class coders, and other locally runnable models. For teams under data-residency or compliance pressure - the exact crowd that cannot send source code to a third-party API - a single-GPU footprint and Apache 2.0 terms remove two of the biggest blockers at once.

256K context is generous for the size class. Agentic coding chews through context fast once you add file trees, diffs, and tool output. A quarter-million-token window on a model this small is a real working advantage for repo-scale tasks.

Cohere is signaling a direction. Calling this the "inaugural member" of a next generation suggests more developer-facing models are coming from a company that previously stayed out of the coding race. That is worth watching even if this first model is a mid-tier entry rather than a leader.

Model routing gets more interesting. North Mini Code is exactly the kind of model that belongs in a router: cheap local repo exploration, high-volume edit generation, and privacy-sensitive codebase work can go to the open-weight model, while harder design calls can still route to a frontier model. That is the same cost-control pattern behind model routing recipes, DeepSeek V4 budget coding agents, and the broader AI affordability crisis.

The Honest Caveats#

Coding-specialized, not generalist. The strongest story is code and terminal work. Do not extrapolate that into non-coding agentic workflows without testing.
Efficiency numbers are vendor-reported. The 2.8x throughput and 30% latency figures come from Cohere, not independent labs.
"Competitive among similarly sized models" is not "best." This is a small, efficient, openly licensed model - not a Fable 5 or GPT-5.5 competitor on raw capability. Set expectations accordingly.

Should You Try It?#

If you are already self-hosting coding models, North Mini Code is worth a slot in your evaluation harness this week - the Apache 2.0 license, single-H100 footprint, and OpenCode integration make it cheap to test. If you are happy with a hosted frontier model and have no compliance reason to bring inference in-house, there is no urgency here. And if you see a specific SWE-Bench percentage quoted for it, ask where the number came from before you trust it.

The most useful thing about this release is not the model itself but what it represents: capable, openly licensed coding models that fit on hardware a single team can afford are becoming normal. That trend is good for developers regardless of which vendor's logo is on this particular checkpoint.

FAQ#

Is North Mini Code open source?#

The weights are released under Apache 2.0, so the practical licensing story is permissive. Cohere and Hugging Face describe it as an open-weight research release rather than a fully open training-data-and-recipe release.

Can North Mini Code replace Claude, GPT, or other frontier coding models?#

Not for every task. It is more compelling as a local or routed model for high-volume coding work, repo exploration, and privacy-sensitive workflows. Keep frontier models in the loop for hard architecture decisions, ambiguous product work, and final review.

What hardware do you need to run North Mini Code?#

Cohere's launch materials say it can run on a single H100 in FP8 or FP4-style deployments, with Hugging Face variants available in bf16, fp8, and w4a16 formats. Your real footprint depends on serving stack, batch size, quantization, and context length.

What benchmark number should I trust?#

Use Cohere's launch numbers and Hugging Face model-card scores as vendor-published context, then use the Artificial Analysis model page as a current third-party snapshot. Do not mix the Coding Index, Intelligence Index, SWE-Bench, and speed numbers as if they measure the same thing.

Sources#

Last updated: June 24, 2026

This post sticks to what Cohere has published and what independent coverage has confirmed. Where a number is a vendor claim rather than an independently reproduced benchmark, it is flagged as such.

What North Mini Code Is#

The specs Cohere lists:

30B total / 3B active parameters (MoE)
256K token context window
64K token maximum generation length
Apache 2.0 license (open weights, commercial use permitted)
Minimum hardware: 1x H100 at FP8 or 1x H100 at FP4

Where You Can Run It#

Cohere made availability broad on day one, which matters more than benchmarks for a model people are meant to actually deploy:

Hugging Face - weights in bf16, fp8, and w4a16 (4-bit) formats
Cohere API - hosted inference
Model Vault - Cohere's managed dedicated inference
OpenCode - usable for free inside the open-source coding harness
OpenRouter - routed access alongside other models

The Benchmark Picture (Read This Carefully)#

This is where favorable framing needs to give way to precision.

The efficiency claims are clearer, though still vendor-reported. Against Devstral Small 2, Cohere reports:

Up to 2.8x higher output throughput
A ~30% advantage in inter-token latency
Devstral Small 2 keeps a slight lead on time-to-first-token

From the archive

Why This Matters#

The interesting story here is not "Cohere beat anyone." It is the shape of the offering.

The Honest Caveats#

Coding-specialized, not generalist. The strongest story is code and terminal work. Do not extrapolate that into non-coding agentic workflows without testing.
Efficiency numbers are vendor-reported. The 2.8x throughput and 30% latency figures come from Cohere, not independent labs.
"Competitive among similarly sized models" is not "best." This is a small, efficient, openly licensed model - not a Fable 5 or GPT-5.5 competitor on raw capability. Set expectations accordingly.

What North Mini Code Is#

Where You Can Run It#

The Benchmark Picture (Read This Carefully)#

Cursor Origin: A Git Forge Built for AI Agents, Not Humans

DeepSeek V4 Economics: The Cost-Quality Frontier for Agentic Coding in 2026

Epic Games Releases Lore: A Version Control System Built for Game Development

Everything Vercel Shipped at Ship 26 (June 2026)

Why This Matters#

The Honest Caveats#

Should You Try It?#

FAQ#

Is North Mini Code open source?#

Can North Mini Code replace Claude, GPT, or other frontier coding models?#

What hardware do you need to run North Mini Code?#

What benchmark number should I trust?#

Sources#

The Best Local Coding LLMs in 2026: Run Enterprise-Grade AI Without the Cloud

MAI-Code-1-Flash Is a Model Routing Signal

OpenCode Developer Guide: The Open Source AI Coding Agent with 160K Stars

Related Tools

Codex CLI

OpenCode

Aider

Mistral

Apps from Developers Digest

Agent Hub

Agent Benchmark Lab

Related Guides

Run AI Models Locally with Ollama and LM Studio

PR Status in Footer - Claude Code

Migrating from Cursor to Claude Code

Related Videos

Gemini 2.5 Pro 05-06 in 6 Minutes: The Best Coding Model?

Minimax M2.7: Self-Evolving Agent Model

Zed: The Open Source Agentic IDE - Use Claude Code, Codex & Gemini CLI in one place

Related Posts

The Best Local Coding LLMs in 2026: Run Enterprise-Grade AI Without the Cloud

MAI-Code-1-Flash Is a Model Routing Signal

OpenCode Developer Guide: The Open Source AI Coding Agent with 160K Stars

Claude Code vs Codex vs Cursor vs OpenCode: Which Agent Ships More Code?

Local Qwen Is a Different Tool, Not a Worse Opus

Buzz: Block's Agent-Native Messaging Layer on Nostr

Build with the member tools

Get Smarter About AI Dev

What North Mini Code Is#

Where You Can Run It#

The Benchmark Picture (Read This Carefully)#

Cursor Origin: A Git Forge Built for AI Agents, Not Humans

DeepSeek V4 Economics: The Cost-Quality Frontier for Agentic Coding in 2026

Epic Games Releases Lore: A Version Control System Built for Game Development

Everything Vercel Shipped at Ship 26 (June 2026)

Why This Matters#

The Honest Caveats#

Should You Try It?#

FAQ#

Is North Mini Code open source?#

Can North Mini Code replace Claude, GPT, or other frontier coding models?#

What hardware do you need to run North Mini Code?#

What benchmark number should I trust?#

Sources#

The Best Local Coding LLMs in 2026: Run Enterprise-Grade AI Without the Cloud

MAI-Code-1-Flash Is a Model Routing Signal

OpenCode Developer Guide: The Open Source AI Coding Agent with 160K Stars

Related Tools

Codex CLI

OpenCode

Aider

Mistral

Apps from Developers Digest

Agent Hub

Agent Benchmark Lab

Related Guides

Run AI Models Locally with Ollama and LM Studio

PR Status in Footer - Claude Code

Migrating from Cursor to Claude Code

Related Videos

Gemini 2.5 Pro 05-06 in 6 Minutes: The Best Coding Model?

Minimax M2.7: Self-Evolving Agent Model

Zed: The Open Source Agentic IDE - Use Claude Code, Codex & Gemini CLI in one place

Related Posts