5 items
5 posts
Alex Ellis shares real production experience running local LLMs: $12k hardware investment, 2-3 month ROI, and why treating local models as Opus substitutes misses the point entirely.
Cohere shipped its first developer-facing model on June 9, 2026. North Mini Code is a 30B mixture-of-experts coding model with 3B active parameters, Apache 2.0 weights, and a deployment footprint of a single H100. Here is what it actually offers and where the open questions are.
Open weights are free to download, but inference is not free to run. Here is the honest break-even math on when self-hosting GLM-5.2, DeepSeek V4, or Llama beats paying per-token API prices - GPU rental and ownership costs, real throughput, utilization, the crossover in tokens per month, and the hidden ops bill nobody budgets for.
Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to keep code off third-party servers. Here is what to run and on what hardware.
Claude Code does not have to call Anthropic's API. Here are five working patterns for running it through your own gateway, on your own models, in your own VPC, with full audit logs and cost control.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.