7 items
7 posts
A $500M accidental Claude bill and an open-weights model beating GPT-5.5 at one-sixth the cost point to the same conclusion: the margin is moving to the layer that decides when to use which model for what. Here is how routing and orchestration differ, and how to cut your model spend.
DeepSeek V4 Pro lands a 63.5 on SWE-bench Verified at $1.74/$3.48 per million tokens, and Flash runs agent inner loops for cents. Here is the worked cost math, the Flash-vs-Pro split, and a clear guide on when to route to DeepSeek instead of a frontier model.
Z.ai's GLM-5.2 lands as a 753B open-weights coding model that beats GPT-5.5 on SWE-bench Pro for roughly one-sixth the per-token cost. Here is the real cost math, a worked cost-per-task example, and a when-to-use-which decision guide.
DeepSeek V4-Flash costs $0.28 per million output tokens. Fable 5 costs $50. That 178x gap is real - but so is the quality difference. Here is where it matters and where it does not.
Pricing deadlines, infrastructure funding, a banking prompt injection case, and a 4x speed breakthrough - June 10 was one of the densest single days the AI dev tool market has ever produced.
A first-hand visit to DeepSeek HQ reveals something more interesting than benchmark scores: a 300-person company that treats AI as infrastructure, not eschatology - and what that means for API pricing everywhere.
Gemma 4 ships byte-for-byte open weights from Google DeepMind. How developers deploy it locally, fine-tune it, and ship agents on top of it.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.