
TL;DR
Kimi K2.7-Code is Moonshot's open-source 1T parameter coding model with 30% fewer reasoning tokens than K2.6. Here's how to set it up with Claude Code, pricing breakdown, and honest benchmark analysis.
| Resource | Link |
|---|---|
| Hugging Face Model | moonshotai/Kimi-K2.7-Code |
| Moonshot API Platform | platform.kimi.ai |
| API Documentation | platform.kimi.ai/docs |
| API Pricing | platform.kimi.ai/docs/pricing |
| Claude Code Integration | platform.kimi.ai/docs/guide/agent-support |
| OpenRouter | openrouter.ai/moonshotai/kimi-k2.7-code |
Last updated: June 14, 2026
Kimi K2.7-Code dropped on Hugging Face on June 12, 2026. It is Moonshot AI's coding-focused variant of the K2 family - a 1 trillion parameter Mixture-of-Experts model with 32 billion active parameters, 384 experts, and a 256K context window. The release comes under a Modified MIT license, making it one of the largest open-weight coding models available.
The headline improvement: K2.7-Code uses roughly 30% fewer reasoning tokens than K2.6 while scoring higher on Moonshot's internal coding benchmarks. For developers running long agent loops, fewer tokens means lower costs and faster completions.
K2.7-Code is not a general-purpose update. It is tuned specifically for coding and agentic workflows:
The weights ship in native INT4 using the quantization-aware training method Moonshot introduced with K2 Thinking. This makes the model more practical to self-host without sacrificing the quality you would get from a full-precision run.
Moonshot reports strong internal numbers:
| Benchmark | K2.7-Code Improvement |
|---|---|
| Kimi Code Bench v2 | +21.8% |
| Program Bench | +11.0% |
| MLS Bench Lite (multi-language) | +31.5% |
On MLS Bench Lite, K2.7-Code scores 35.1 - nearly matching GPT-5.5's 35.5. On MCPMark Verified, it leads Opus 4.8 by about 5 points for tool invocation accuracy.
The honest caveat: These are Moonshot's own benchmarks. No independent third-party SWE-bench or equivalent scores exist for K2.7-Code yet. We cannot make an apples-to-apples comparison against Claude Fable 5 (95.0% SWE-bench Verified) or GPT-5.5 on identical tests. The model likely does not match Fable 5 on raw coding benchmarks - but that is not why you would run it.
The cost structure is where K2.7-Code gets interesting.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Cache Discount |
|---|---|---|---|
| Kimi K2.7-Code (Moonshot API) | $0.95 | $4.00 | $0.19 cached |
| Kimi K2.7-Code (OpenRouter) | $0.75 | $3.50 | - |
| Claude Fable 5 | $10.00 | $50.00 | 0.1x cached |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 0.1x cached |
| GPT-5.5 | $5.00 | $25.00 | - |
K2.7-Code is roughly 4x cheaper than Sonnet 4.6 on output and 12x cheaper than Fable 5. For agent loops that generate substantial output - code reviews, multi-file refactors, documentation generation - this adds up.
Moonshot also offers a CLI plan at $19/month for developers who want predictable costs.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 13, 2026 • 8 min read
Jun 13, 2026 • 6 min read
Jun 13, 2026 • 5 min read
Jun 13, 2026 • 8 min read
Claude Code supports model routing through environment variables. To use K2.7-Code:
Get your API key from platform.kimi.ai. Navigate to Console, then API Keys.
Set environment variables:
export ANTHROPIC_BASE_URL="https://api.moonshot.ai/v1"
export ANTHROPIC_AUTH_TOKEN="your_kimi_api_key"
Claude Code functions identically; only the backend model changes. You retain full access to MCP servers, hooks, skills, and the terminal workflow.
For multi-tool setups with Cline or RooCode, the same environment variables apply. The Kimi API is Anthropic-compatible, so any tool that supports Anthropic routing works without code changes.
Good fits:
Less ideal:
The 340GB model (INT4 weights) can run on:
No official GGUF / Ollama / llama.cpp builds exist for K2.7-Code yet. Community GGUFs existed for K2.6 and will likely follow. For now, vLLM or SGLang on a proper GPU server is the self-hosting path.
Hardware requirements: You need substantial VRAM. The INT4 quantization helps, but 1T parameters (even with 32B active) still demands serious hardware - multiple A100s or equivalent.
| K2 (July 2025) | K2.6 (March 2026) | K2.7-Code (June 2026) | |
|---|---|---|---|
| Context | 128K | 256K | 256K |
| Focus | General | Balanced | Coding + agents |
| Token efficiency | Baseline | Improved | 30% fewer reasoning tokens |
| Tool use (MCPMark) | - | ~74% | 81.1% |
| License | Modified MIT | Modified MIT | Modified MIT |
If you were running K2.6 for coding work, K2.7-Code is a direct upgrade. If you were running K2 original, the jump is substantial on both context and efficiency.
A realistic workflow for K2.7-Code with Claude Code:
This matches how many teams already work with model routing - cheaper models for volume, expensive models for judgment calls.
Kimi K2.7-Code is Moonshot AI's open-source coding model released June 12, 2026. It is a 1 trillion parameter Mixture-of-Experts model (32B active, 384 experts) with a 256K context window, specifically tuned for code generation, debugging, and agentic tool use. It uses 30% fewer reasoning tokens than K2.6.
Set two environment variables: ANTHROPIC_BASE_URL to https://api.moonshot.ai/v1 and ANTHROPIC_AUTH_TOKEN to your Kimi API key from platform.kimi.ai. Start Claude Code normally - it routes requests to Moonshot's API automatically.
Yes. It is released under a Modified MIT license. Weights are available on Hugging Face and ModelScope. You can self-host with vLLM, SGLang, or Docker Model Runner, or use it through Moonshot's API or OpenRouter.
Through Moonshot's API: $0.95 per million input tokens, $4.00 per million output tokens, with cached input at $0.19. Through OpenRouter: $0.75 input, $3.50 output. This is roughly 4x cheaper than Claude Sonnet 4.6 on output tokens.
Fable 5 scores 95.0% on SWE-bench Verified and is the current leader for raw coding accuracy. K2.7-Code does not have independent SWE-bench scores yet, so direct comparison is not possible. K2.7-Code is about 12x cheaper per output token and is open-source. Use K2.7-Code for cost-sensitive volume work; use Fable 5 for maximum accuracy on critical tasks.
Yes. The INT4 weights are 340GB. You can run them with vLLM or SGLang on GPU servers. Hardware requirements are substantial - you need multiple high-end GPUs (A100 or equivalent). No official GGUF builds exist yet for consumer hardware.
K2.7-Code uses approximately 30% fewer reasoning tokens than K2.6 while scoring higher on Moonshot's coding benchmarks. This means faster completions and lower costs for the same quality of output on coding tasks.
Yes. When routed through Claude Code, you get full access to MCP servers, hooks, skills, and the complete Claude Code feature set. The Kimi API is Anthropic-compatible, so MCP integration works without changes.
Read next
Two months ago, I built Open Lovable with Claude Sonnet 4. Today, Kimi K2 runs the show.
7 min readFrom terminal agents to cloud IDEs - these are the AI coding tools worth using for TypeScript development in 2026.
8 min readEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Devin, and the Anthropic API - verified from live pricing pages on June 13, 2026. Only 9 days until the Fable 5 deadline.
9 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Open-source terminal coding agent from Moonshot AI. Powered by Kimi K2.5 (1T params, 32B active). 256K context window. A...
View ToolOpen-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolOpen-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
View ToolOpen-source AI code assistant for VS Code and JetBrains. Bring your own model - local or API. Tab autocomplete, chat,...
View ToolCompare AI coding agents on reproducible tasks with scored, shareable runs.
View AppTrack open-source maintenance signals, release tasks, and repo follow-ups in one dashboard.
View AppEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedBackground monitoring of logs, files, and long-running processes.
Claude CodeClickable PR link in the footer with review state color coding.
Claude Code
Two months ago, I built Open Lovable with Claude Sonnet 4. Today, Kimi K2 runs the show.

From terminal agents to cloud IDEs - these are the AI coding tools worth using for TypeScript development in 2026.
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...

Four agents, same tasks. Honest trade-offs from a developer shipping production apps with all of them.

Alibaba shipped Qwen 3.7 Max on May 19, 2026 with a 1M token context window, Anthropic-compatible API, and agent-first a...

The rsync Claude debate shows why teams need reproducible defect forensics before AI attribution becomes a public blame...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.