Kimi K2.7-Code Developer Guide: The Open-Source Coding Model Worth Running

Q: How do I use Kimi K2.7-Code with Claude Code?

Set two environment variables: `ANTHROPIC_BASE_URL` to `https://api.moonshot.ai/v1` and `ANTHROPIC_AUTH_TOKEN` to your Kimi API key from platform.kimi.ai. Start Claude Code normally - it routes requests to Moonshot's API automatically.

Official Sources#

Resource	Link
Hugging Face Model	moonshotai/Kimi-K2.7-Code
Moonshot API Platform	platform.kimi.ai
API Documentation	platform.kimi.ai/docs
API Pricing	platform.kimi.ai/docs/pricing
Claude Code Integration	platform.kimi.ai/docs/guide/agent-support
OpenRouter	openrouter.ai/moonshotai/kimi-k2.7-code
GitHub Copilot Changelog	github.blog/changelog/2026-07-01-kimi-k2-7-is-now-available-in-github-copilot

updatedAt: "2026-07-12"#

Last updated: July 12, 2026

What Changed on July 12#

GitHub Copilot integration is now GA. On July 1, 2026, Kimi K2.7 Code became the first open-weight model available in GitHub Copilot's model picker - a significant milestone for open-weight models in enterprise tooling.
Five-lab coverage. Copilot now routes across OpenAI, Anthropic, Google, Microsoft, and Moonshot AI under a single subscription.
Enterprise controls. Business and Enterprise plans have the model off by default - admins must explicitly enable it. Prompts route through Azure, not Moonshot servers.

Kimi K2.7-Code dropped on Hugging Face on June 12, 2026. It is Moonshot AI's coding-focused variant of the K2 family - a 1 trillion parameter Mixture-of-Experts model with 32 billion active parameters, 384 experts, and a 256K context window. The release comes under a Modified MIT license, making it one of the largest open-weight coding models available.

The headline improvement: K2.7-Code uses roughly 30% fewer reasoning tokens than K2.6 while scoring higher on Moonshot's internal coding benchmarks. For developers running long agent loops, fewer tokens means lower costs and faster completions.

What Changed from K2.6 to K2.7-Code#

K2.7-Code is not a general-purpose update. It is tuned specifically for coding and agentic workflows:

Code generation and debugging - the primary focus of the fine-tuning
Tool use - K2.7-Code scores 81.1% on MCPMark Verified (tool invocation benchmark), ahead of Claude Opus 4.8's 76.4%
Multi-step programming workflows - better at sustaining context across long agent sessions
30% token reduction - uses fewer "thinking" tokens while maintaining or improving output quality

The weights ship in native INT4 using the quantization-aware training method Moonshot introduced with K2 Thinking. This makes the model more practical to self-host without sacrificing the quality you would get from a full-precision run.

Benchmarks - What We Know and Don't Know#

Moonshot reports strong internal numbers:

Benchmark	K2.7-Code Improvement
Kimi Code Bench v2	+21.8%
Program Bench	+11.0%
MLS Bench Lite (multi-language)	+31.5%

On MLS Bench Lite, K2.7-Code scores 35.1 - nearly matching GPT-5.5's 35.5. On MCPMark Verified, it leads Opus 4.8 by about 5 points for tool invocation accuracy.

The honest caveat: These are Moonshot's own benchmarks. No independent third-party SWE-bench or equivalent scores exist for K2.7-Code yet. We cannot make an apples-to-apples comparison against Claude Fable 5 (95.0% SWE-bench Verified) or GPT-5.5 on identical tests. The model likely does not match Fable 5 on raw coding benchmarks - but that is not why you would run it.

Pricing Comparison#

The cost structure is where K2.7-Code gets interesting.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Discount
Kimi K2.7-Code (Moonshot API)	$0.95	$4.00	$0.19 cached
Kimi K2.7-Code (OpenRouter)	$0.75	$3.50	-
Claude Fable 5	$10.00	$50.00	0.1x cached
Claude Sonnet 4.6	$3.00	$15.00	0.1x cached
GPT-5.5	$5.00	$25.00	-

K2.7-Code is roughly 4x cheaper than Sonnet 4.6 on output and 12x cheaper than Fable 5. For agent loops that generate substantial output - code reviews, multi-file refactors, documentation generation - this adds up.

Moonshot also offers a CLI plan at $19/month for developers who want predictable costs.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Agent Workspaces Need Filesystem Contracts

Jun 13, 2026 • 8 min read

Best Claude Model Now That Fable 5 Is Disabled (Mythos vs Opus vs GPT-5.5)

Jun 13, 2026 • 6 min read

Claude Mythos and Fable 5 Banned: The Export Controls That Shut Down Two Frontier Models

Jun 13, 2026 • 6 min read

Claude Mythos vs Fable 5: What Is the Difference?

Jun 13, 2026 • 5 min read

Setting Up with Claude Code#

Claude Code supports model routing through environment variables. To use K2.7-Code:

Get your API key from platform.kimi.ai. Navigate to Console, then API Keys.
Set environment variables:

Terminal

export ANTHROPIC_BASE_URL="https://api.moonshot.ai/v1"
export ANTHROPIC_AUTH_TOKEN="your_kimi_api_key"

Start Claude Code. You will see a message about the API base being overridden - that confirms the routing is active.

Claude Code functions identically; only the backend model changes. You retain full access to MCP servers, hooks, skills, and the terminal workflow.

For multi-tool setups with Cline or RooCode, the same environment variables apply. The Kimi API is Anthropic-compatible, so any tool that supports Anthropic routing works without code changes.

GitHub Copilot Integration#

On July 1, 2026 - just 19 days after the Hugging Face release - Kimi K2.7 Code became generally available in GitHub Copilot's model picker. This is the first open-weight model from any vendor to appear alongside Claude, GPT, and Gemini in Copilot.

What this means:

Five-lab coverage. Copilot now routes across OpenAI, Anthropic, Google, Microsoft, and Moonshot AI - the only major coding tool with this breadth under a single subscription.
Azure hosting. Prompts route through Microsoft Azure rather than Moonshot's own servers, which matters for enterprise compliance.
Platform availability. Works in VS Code, Visual Studio, JetBrains, Xcode, Eclipse, Copilot CLI, GitHub.com, and GitHub Mobile.
Plan support. Available on Pro, Pro+, and Max plans.

Enterprise considerations:

Business and Enterprise plans have K2.7 Code off by default. GitHub's documentation notes the model may be less aligned than other Copilot models, recommending administrators review it against their security and compliance requirements before enabling. This is standard caution for open-weight models entering enterprise tooling.

The practical implication: if you are already on Copilot, you can now access K2.7's cost efficiency without managing API routing or environment variables. Select it from the model picker and use it for cost-sensitive bulk work while keeping Claude or GPT for high-stakes tasks.

When to Use K2.7-Code#

Good fits:

Cost-sensitive agent loops - If you are running overnight agents or continuous background tasks where token costs dominate
Tool-heavy workflows - The MCPMark scores suggest strong tool invocation accuracy, relevant for MCP-heavy setups
Multi-language projects - The MLS Bench Lite scores show competitive performance across languages
Self-hosting requirements - Open weights under Modified MIT means you can run it on your own infrastructure

Less ideal:

Maximum coding accuracy - Fable 5 and GPT-5.5 likely outperform on complex debugging and architectural decisions
Production systems requiring third-party validation - Until independent benchmarks exist, risk-sensitive deployments may want proven models
Native Claude Code features - Some Claude-specific optimizations (context compaction, prompt caching) may not transfer perfectly through API routing

Self-Hosting Options#

The 340GB model (INT4 weights) can run on:

vLLM - the recommended path for most GPU servers
SGLang - alternative runtime with similar performance
Docker Model Runner - containerized deployment

No official GGUF / Ollama / llama.cpp builds exist for K2.7-Code yet. Community GGUFs existed for K2.6 and will likely follow. For now, vLLM or SGLang on a proper GPU server is the self-hosting path.

Hardware requirements: You need substantial VRAM. The INT4 quantization helps, but 1T parameters (even with 32B active) still demands serious hardware - multiple A100s or equivalent.

Comparison: K2.7-Code vs K2.6 vs K2 (Original)#

	K2 (July 2025)	K2.6 (March 2026)	K2.7-Code (June 2026)
Context	128K	256K	256K
Focus	General	Balanced	Coding + agents
Token efficiency	Baseline	Improved	30% fewer reasoning tokens
Tool use (MCPMark)	-	~74%	81.1%
License	Modified MIT	Modified MIT	Modified MIT

If you were running K2.6 for coding work, K2.7-Code is a direct upgrade. If you were running K2 original, the jump is substantial on both context and efficiency.

Practical Workflow#

A realistic workflow for K2.7-Code with Claude Code:

Development and prototyping - Use K2.7-Code for the bulk of coding work where cost matters
Critical reviews - Route to Fable 5 or Opus 4.8 for architectural decisions, security reviews, or complex debugging
Production agents - K2.7-Code for high-volume, tool-heavy agent loops; frontier models for customer-facing or high-stakes tasks

This matches how many teams already work with model routing - cheaper models for volume, expensive models for judgment calls.

FAQ#

What is Kimi K2.7-Code?#

Kimi K2.7-Code is Moonshot AI's open-source coding model released June 12, 2026. It is a 1 trillion parameter Mixture-of-Experts model (32B active, 384 experts) with a 256K context window, specifically tuned for code generation, debugging, and agentic tool use. It uses 30% fewer reasoning tokens than K2.6.

How do I use Kimi K2.7-Code with Claude Code?#

Set two environment variables: ANTHROPIC_BASE_URL to https://api.moonshot.ai/v1 and ANTHROPIC_AUTH_TOKEN to your Kimi API key from platform.kimi.ai. Start Claude Code normally - it routes requests to Moonshot's API automatically.

Is Kimi K2.7-Code open source?#

Yes. It is released under a Modified MIT license. Weights are available on Hugging Face and ModelScope. You can self-host with vLLM, SGLang, or Docker Model Runner, or use it through Moonshot's API or OpenRouter.

How much does Kimi K2.7-Code cost?#

Through Moonshot's API: $0.95 per million input tokens, $4.00 per million output tokens, with cached input at $0.19. Through OpenRouter: $0.75 input, $3.50 output. This is roughly 4x cheaper than Claude Sonnet 4.6 on output tokens.

How does K2.7-Code compare to Claude Fable 5?#

Fable 5 scores 95.0% on SWE-bench Verified and is the current leader for raw coding accuracy. K2.7-Code does not have independent SWE-bench scores yet, so direct comparison is not possible. K2.7-Code is about 12x cheaper per output token and is open-source. Use K2.7-Code for cost-sensitive volume work; use Fable 5 for maximum accuracy on critical tasks.

Can I self-host Kimi K2.7-Code?#

Yes. The INT4 weights are 340GB. You can run them with vLLM or SGLang on GPU servers. Hardware requirements are substantial - you need multiple high-end GPUs (A100 or equivalent). No official GGUF builds exist yet for consumer hardware.

What is the token efficiency improvement?#

K2.7-Code uses approximately 30% fewer reasoning tokens than K2.6 while scoring higher on Moonshot's coding benchmarks. This means faster completions and lower costs for the same quality of output on coding tasks.

Does K2.7-Code work with MCP servers?#

Yes. When routed through Claude Code, you get full access to MCP servers, hooks, skills, and the complete Claude Code feature set. The Kimi API is Anthropic-compatible, so MCP integration works without changes.

Can I use Kimi K2.7 Code in GitHub Copilot?#

Yes. As of July 1, 2026, K2.7 Code is available in the Copilot model picker for Pro, Pro+, and Max plans. Select it from the model dropdown in VS Code, Visual Studio, JetBrains, or other supported editors. Business and Enterprise plans require admin enablement. It is the first open-weight model available in Copilot.

Sources#

Kimi K2.7-Code Hugging Face (accessed June 14, 2026)
Moonshot API Platform (accessed June 14, 2026)
OpenRouter Kimi K2.7-Code (accessed June 14, 2026)
Kimi K2.7-Code Release (accessed June 14, 2026)
Claude Code Integration Guide (accessed June 14, 2026)

Official Sources#

Resource	Link
Hugging Face Model	moonshotai/Kimi-K2.7-Code
Moonshot API Platform	platform.kimi.ai
API Documentation	platform.kimi.ai/docs
API Pricing	platform.kimi.ai/docs/pricing
Claude Code Integration	platform.kimi.ai/docs/guide/agent-support
OpenRouter	openrouter.ai/moonshotai/kimi-k2.7-code
GitHub Copilot Changelog	github.blog/changelog/2026-07-01-kimi-k2-7-is-now-available-in-github-copilot

updatedAt: "2026-07-12"#

Last updated: July 12, 2026

What Changed on July 12#

GitHub Copilot integration is now GA. On July 1, 2026, Kimi K2.7 Code became the first open-weight model available in GitHub Copilot's model picker - a significant milestone for open-weight models in enterprise tooling.
Five-lab coverage. Copilot now routes across OpenAI, Anthropic, Google, Microsoft, and Moonshot AI under a single subscription.
Enterprise controls. Business and Enterprise plans have the model off by default - admins must explicitly enable it. Prompts route through Azure, not Moonshot servers.

What Changed from K2.6 to K2.7-Code#

K2.7-Code is not a general-purpose update. It is tuned specifically for coding and agentic workflows:

Code generation and debugging - the primary focus of the fine-tuning
Tool use - K2.7-Code scores 81.1% on MCPMark Verified (tool invocation benchmark), ahead of Claude Opus 4.8's 76.4%
Multi-step programming workflows - better at sustaining context across long agent sessions
30% token reduction - uses fewer "thinking" tokens while maintaining or improving output quality

Benchmarks - What We Know and Don't Know#

Moonshot reports strong internal numbers:

Benchmark	K2.7-Code Improvement
Kimi Code Bench v2	+21.8%
Program Bench	+11.0%
MLS Bench Lite (multi-language)	+31.5%

On MLS Bench Lite, K2.7-Code scores 35.1 - nearly matching GPT-5.5's 35.5. On MCPMark Verified, it leads Opus 4.8 by about 5 points for tool invocation accuracy.

Pricing Comparison#

The cost structure is where K2.7-Code gets interesting.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Discount
Kimi K2.7-Code (Moonshot API)	$0.95	$4.00	$0.19 cached
Kimi K2.7-Code (OpenRouter)	$0.75	$3.50	-
Claude Fable 5	$10.00	$50.00	0.1x cached
Claude Sonnet 4.6	$3.00	$15.00	0.1x cached
GPT-5.5	$5.00	$25.00	-

Moonshot also offers a CLI plan at $19/month for developers who want predictable costs.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Agent Workspaces Need Filesystem Contracts

Jun 13, 2026 • 8 min read

Best Claude Model Now That Fable 5 Is Disabled (Mythos vs Opus vs GPT-5.5)

Jun 13, 2026 • 6 min read

Claude Mythos and Fable 5 Banned: The Export Controls That Shut Down Two Frontier Models

Jun 13, 2026 • 6 min read

Claude Mythos vs Fable 5: What Is the Difference?

Jun 13, 2026 • 5 min read

Setting Up with Claude Code#

Claude Code supports model routing through environment variables. To use K2.7-Code:

Get your API key from platform.kimi.ai. Navigate to Console, then API Keys.
Set environment variables:

Terminal

export ANTHROPIC_BASE_URL="https://api.moonshot.ai/v1"
export ANTHROPIC_AUTH_TOKEN="your_kimi_api_key"

Start Claude Code. You will see a message about the API base being overridden - that confirms the routing is active.

Claude Code functions identically; only the backend model changes. You retain full access to MCP servers, hooks, skills, and the terminal workflow.

For multi-tool setups with Cline or RooCode, the same environment variables apply. The Kimi API is Anthropic-compatible, so any tool that supports Anthropic routing works without code changes.

GitHub Copilot Integration#

What this means:

Five-lab coverage. Copilot now routes across OpenAI, Anthropic, Google, Microsoft, and Moonshot AI - the only major coding tool with this breadth under a single subscription.
Azure hosting. Prompts route through Microsoft Azure rather than Moonshot's own servers, which matters for enterprise compliance.
Platform availability. Works in VS Code, Visual Studio, JetBrains, Xcode, Eclipse, Copilot CLI, GitHub.com, and GitHub Mobile.
Plan support. Available on Pro, Pro+, and Max plans.

Enterprise considerations:

When to Use K2.7-Code#

Good fits:

Cost-sensitive agent loops - If you are running overnight agents or continuous background tasks where token costs dominate
Tool-heavy workflows - The MCPMark scores suggest strong tool invocation accuracy, relevant for MCP-heavy setups
Multi-language projects - The MLS Bench Lite scores show competitive performance across languages
Self-hosting requirements - Open weights under Modified MIT means you can run it on your own infrastructure

Less ideal:

Maximum coding accuracy - Fable 5 and GPT-5.5 likely outperform on complex debugging and architectural decisions
Production systems requiring third-party validation - Until independent benchmarks exist, risk-sensitive deployments may want proven models
Native Claude Code features - Some Claude-specific optimizations (context compaction, prompt caching) may not transfer perfectly through API routing

Self-Hosting Options#

The 340GB model (INT4 weights) can run on:

vLLM - the recommended path for most GPU servers
SGLang - alternative runtime with similar performance
Docker Model Runner - containerized deployment

No official GGUF / Ollama / llama.cpp builds exist for K2.7-Code yet. Community GGUFs existed for K2.6 and will likely follow. For now, vLLM or SGLang on a proper GPU server is the self-hosting path.

Hardware requirements: You need substantial VRAM. The INT4 quantization helps, but 1T parameters (even with 32B active) still demands serious hardware - multiple A100s or equivalent.

Comparison: K2.7-Code vs K2.6 vs K2 (Original)#

	K2 (July 2025)	K2.6 (March 2026)	K2.7-Code (June 2026)
Context	128K	256K	256K
Focus	General	Balanced	Coding + agents
Token efficiency	Baseline	Improved	30% fewer reasoning tokens
Tool use (MCPMark)	-	~74%	81.1%
License	Modified MIT	Modified MIT	Modified MIT

If you were running K2.6 for coding work, K2.7-Code is a direct upgrade. If you were running K2 original, the jump is substantial on both context and efficiency.

Practical Workflow#

A realistic workflow for K2.7-Code with Claude Code:

Development and prototyping - Use K2.7-Code for the bulk of coding work where cost matters
Critical reviews - Route to Fable 5 or Opus 4.8 for architectural decisions, security reviews, or complex debugging
Production agents - K2.7-Code for high-volume, tool-heavy agent loops; frontier models for customer-facing or high-stakes tasks

This matches how many teams already work with model routing - cheaper models for volume, expensive models for judgment calls.