
TL;DR
Cohere shipped its first developer-facing model on June 9, 2026. North Mini Code is a 30B mixture-of-experts coding model with 3B active parameters, Apache 2.0 weights, and a deployment footprint of a single H100. Here is what it actually offers and where the open questions are.
Cohere has mostly been known for retrieval, embeddings, and enterprise RAG. On June 9, 2026, it stepped onto coding-model turf for the first time. North Mini Code is the company's first agentic coding model, released with open weights under Apache 2.0 and sized so it fits on a single H100. For teams that want a capable coding model they can actually self-host, that combination is the headline.
This post sticks to what Cohere has published and what independent coverage has confirmed. Where a number is a vendor claim rather than an independently reproduced benchmark, it is flagged as such.
Published: June 17, 2026
North Mini Code is a mixture-of-experts (MoE) model with 30B total parameters and 3B active per token. The MoE design is the whole point: you get the knowledge capacity of a 30B model while only paying the inference cost of routing through roughly 3B parameters on each forward pass. That is what keeps the deployment footprint small.
The specs Cohere lists:
It is positioned as the first model in what Cohere calls its next generation, and it is aimed squarely at agentic software engineering - code generation, sub-agent orchestration, architecture mapping, code review, and terminal work.
Cohere made availability broad on day one, which matters more than benchmarks for a model people are meant to actually deploy:
The OpenCode integration is a smart distribution move. OpenCode is currently the most-starred open-source coding agent, so dropping North Mini Code into that harness puts it in front of a large existing developer audience without asking anyone to change tools.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 17, 2026 • 8 min read
Jun 17, 2026 • 9 min read
Jun 17, 2026 • 7 min read
Jun 17, 2026 • 8 min read
This is where favorable framing needs to give way to precision.
The single concrete, third-party-anchored number Cohere reports is 33.4 on the Artificial Analysis Coding Index, which the company characterizes as "a competitive position among similarly sized models." That is an honest framing - it is not claiming frontier parity, it is claiming it competes in its weight class.
Cohere says it evaluated the model on SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench v2 (plus Terminal-Bench Hard, SciCode, and LiveCodeBench v6), running three seeds and averaging. However, individual scores for those benchmarks were not disclosed in the launch materials or in independent coverage available at publication. Treat any specific SWE-Bench percentage you see attributed to North Mini Code with skepticism until Cohere publishes the per-benchmark numbers.
The efficiency claims are clearer, though still vendor-reported. Against Devstral Small 2, Cohere reports:
These are throughput-and-latency claims, not quality claims, and they have not been independently reproduced. They are plausible given the 3B active-parameter design, but worth confirming on your own hardware before you build a latency budget around them.
The interesting story here is not "Cohere beat anyone." It is the shape of the offering.
Self-hosting just got more practical for coding. A 30B MoE that runs on one H100 with open weights and a permissive license lands in the same conversation as Devstral, Qwen-class coders, and other locally runnable models. For teams under data-residency or compliance pressure - the exact crowd that cannot send source code to a third-party API - a single-GPU footprint and Apache 2.0 terms remove two of the biggest blockers at once.
256K context is generous for the size class. Agentic coding chews through context fast once you add file trees, diffs, and tool output. A quarter-million-token window on a model this small is a real working advantage for repo-scale tasks.
Cohere is signaling a direction. Calling this the "inaugural member" of a next generation suggests more developer-facing models are coming from a company that previously stayed out of the coding race. That is worth watching even if this first model is a mid-tier entry rather than a leader.
If you are already self-hosting coding models, North Mini Code is worth a slot in your evaluation harness this week - the Apache 2.0 license, single-H100 footprint, and OpenCode integration make it cheap to test. If you are happy with a hosted frontier model and have no compliance reason to bring inference in-house, there is no urgency here. And if you see a specific SWE-Bench percentage quoted for it, ask where the number came from before you trust it.
The most useful thing about this release is not the model itself but what it represents: capable, openly licensed coding models that fit on hardware a single team can afford are becoming normal. That trend is good for developers regardless of which vendor's logo is on this particular checkpoint.
Read next
Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to keep code off third-party servers. Here is what to run and on what hardware.
8 min readMicrosoft's new in-house coding model matters less as a benchmark headline and more as a signal that Copilot is becoming a routing layer for cost, latency, ownership, and review quality.
7 min readOpenCode is the fastest-growing open-source AI coding agent - 160K GitHub stars, 7.5M monthly users, 75+ model providers. Here is how to set it up, configure models, and use it effectively in your workflow.
11 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's open-source terminal coding agent built in Rust. Runs locally, reads your repo, edits files, and executes comma...
View ToolOpen-source AI coding agent for terminal, desktop, and IDE. Works with 75+ LLM providers including Claude, GPT, Gemini,...
View ToolOpen-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
View ToolEuropean open-weight models. Mistral Large for complex tasks, Mistral Small for speed, Codestral for code. Strong multil...
View ToolInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedClickable PR link in the footer with review state color coding.
Claude CodeA concrete step-by-step guide to moving your development workflow from Cursor to Claude Code - settings, rules, keybindings, and the habits that transfer.
Getting Started
Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to...

Microsoft's new in-house coding model matters less as a benchmark headline and more as a signal that Copilot is becoming...

OpenCode is the fastest-growing open-source AI coding agent - 160K GitHub stars, 7.5M monthly users, 75+ model providers...

Four agents, same tasks. Honest trade-offs from a developer shipping production apps with all of them.

The rsync Claude debate shows why teams need reproducible defect forensics before AI attribution becomes a public blame...

Forge hit the Hacker News front page with a strong claim: small local models can become much more useful at tool-calling...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.