TL;DR
Microsoft unveiled seven in-house MAI models at Build 2026, including MAI-Code-1-Flash now shipping in GitHub Copilot. Here is what the MoE architecture, training data, and Copilot rollout mean for your team's toolchain decisions in H2 2026.
Read next
Microsoft's new in-house coding model matters less as a benchmark headline and more as a signal that Copilot is becoming a routing layer for cost, latency, ownership, and review quality.
7 min readMicrosoft's PHI-4 is an MIT-licensed 14 billion parameter model that matches Llama 3.3 70B and Qwen 2.5 72B on key benchmarks. Here is what makes it special, how to run it locally, and why small language models are increasingly practical for real development work.
9 min readGitHub Copilot is moving from autocomplete into asynchronous coding agents, terminal workflows, MCP, skills, and model choice. Here is what changed in 2026.
8 min readMicrosoft spent years telling its AI story through OpenAI. At Build 2026, the company wrote its own chapter.
On June 2, 2026, Microsoft AI published seven in-house MAI models spanning reasoning, coding, image generation, voice synthesis, and transcription - all available through Microsoft Foundry. Two of those models matter most for developers right now: MAI-Thinking-1, a flagship reasoning model in private preview, and MAI-Code-1-Flash, which is already rolling out as the default model in GitHub Copilot for individual VS Code users.
Last updated: June 10, 2026
Microsoft's Build 2026 announcement introduced the complete MAI family, written by Mustafa Suleyman and the Microsoft AI team:
The framing Microsoft used is notable: they call their development approach a "hill-climbing machine" - a co-designed pipeline meant to improve every component of model development continuously. The stated goal is Humanist Superintelligence, positioning AI as a tool subordinate to human control rather than a replacement for it.
The headline claim from Microsoft's MAI-Thinking-1 announcement is that it "is preferred to Sonnet 4.6 in our blind human side-by-side evaluations." Before accepting that at face value, it helps to understand exactly what was measured.
The evaluation used 1,276 tasks across single-turn and multi-turn conversations, run through professional raters from Surge. The criteria were practical: does the model understand the task, follow instructions, use the right level of detail, write clearly, and respect the user's time. These are reasonable signals for real-world usefulness.
What the evaluation does not tell you: it is a human preference study conducted by Microsoft against a specific version of Sonnet 4.6, not an independent third-party audit. Preference studies measure whether responses feel good to humans, not whether they produce correct code or pass test suites. That distinction matters depending on your use case.
On more objective benchmarks, MAI-Thinking-1 reports 97.0% on AIME 2025 and 94.5% on AIME 2026, with performance described as "toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro." The full numbers are in Table 1 of the announcement.
| Dimension | MAI-Thinking-1 | Claude Sonnet 4.6 |
|---|---|---|
| Architecture | MoE, ~1T total / 35B active | Dense transformer |
| Context window | 256k tokens | 200k tokens |
| Human preference (blind eval) | Preferred (Microsoft study) | Baseline |
| SWE-Bench Pro | Matches Opus 4.6 (per Microsoft) | Not published for direct comparison |
| AIME 2025 | 97.0% | Not published for direct comparison |
| Availability | Private preview, Foundry | Generally available via API |
The context window is legitimately useful: 256k tokens fits approximately a 600-page document. If long-context enterprise workflows are your primary use case, that number is worth tracking.
MAI-Code-1-Flash is the model developers will actually encounter first, because it is rolling out now to GitHub Copilot individual users in VS Code - both through the explicit model picker and through the Auto picker.
The architecture: 137B total parameters, 5B active, Mixture of Experts. Microsoft describes it as "comparable to Haiku but cheaper," and the benchmarks it published compare it against Claude Haiku 4.5 rather than Sonnet or Opus.
On that comparison, MAI-Code-1-Flash claims a +16-point lead on SWE-Bench Pro (51.2% vs. 35.2% for Haiku 4.5), higher pass rates on all four benchmarks tested (SWE-Bench Verified, SWE-Bench Pro, SWE-Bench Multilingual, Terminal Bench 2), and up to 60% fewer tokens on SWE-Bench Verified.
The 60% token reduction claim connects to the "adaptive solution length control" Microsoft built into the model - it adjusts response depth to task complexity, staying concise for simple completions and spending more reasoning budget on complex changes. That efficiency translates directly to latency and cost for Copilot users on usage-based plans.
For a deeper look at how model routing decisions play out in Copilot, see /blog/mai-code-1-flash-model-routing.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 10, 2026 • 9 min read
Jun 10, 2026 • 8 min read
Jun 10, 2026 • 7 min read
Jun 8, 2026 • 8 min read
Both MAI models use Mixture of Experts (MoE) architecture. This is the key technical fact that gets lost when model sizes are reported as single numbers.
In a dense transformer, every parameter activates for every token. In a MoE model, only a subset of "expert" layers activates per token. MAI-Code-1-Flash has 137B total parameters but only 5B active at inference time. MAI-Thinking-1 has roughly 1 trillion total parameters with 35B active.
The practical consequences:
What MoE makes cheaper: Inference cost and latency scale with active parameters, not total parameters. A model with 5B active parameters runs closer to the cost of a 5B dense model, not a 137B one. This is why Microsoft can offer MAI-Code-1-Flash at a price point it describes as "comparable to Haiku" while claiming benchmark performance above it.
What MoE makes harder: Total parameter count still affects what knowledge the model can store. Routing quality matters - if the wrong experts activate, the model degrades. MoE models can also be more sensitive to fine-tuning than dense models.
For everyday Copilot completions - autocomplete, short function generation, inline chat - the 5B active parameter count is the relevant number. For multi-step agentic tasks that require broad reasoning, how well the routing and expert selection work across a complex codebase is an open question that only production usage will answer.
Beyond the models themselves, Microsoft shipped Microsoft Scout - its first "Autopilot" agent built on its OpenClaw architecture, running inside Microsoft Teams.
Scout is described as "always-on" and takes proactive actions: scheduling meetings, prepping materials, surfacing relevant documents before a meeting starts. The key distinction from tools like GitHub Copilot or Claude Code is the workflow orientation. Copilot and Claude Code are reactive - you prompt, they respond. Scout is designed to act without being asked, monitoring your calendar, email, and documents to anticipate needs.
For developers, Scout is not a coding tool. It is a workflow coordination layer. The relevant question is whether always-on agents with write access to your calendar and documents fit your team's risk tolerance, and what the audit trail looks like when Scout takes an action on your behalf.
Microsoft emphasized clean, enterprise-grade, commercially licensed training data across all MAI announcements. The phrase "without distillation from third-party models" appears repeatedly. This was framed as a differentiator.
The MAI-Thinking-1 technical paper tells a more detailed story. As Simon Willison documented after reviewing the paper, the training data includes a large proprietary web crawl:
"After initial page discovery and selection, approximately 1.2 trillion pages are crawled and parsed... this filtering reduces the corpus from 1.2 trillion pages to 794 billion pages."
The paper also describes processing Common Crawl through the same pipeline, resulting in 24.2 billion pages after deduplication.
This is not unique to Microsoft - essentially every large language model trains on web-scale crawl data. But it does qualify the "clean and appropriately licensed" framing. The 794 billion filtered pages come from a public web crawl, which carries the same licensing ambiguities as every other model trained on similar data.
What Microsoft does appear to mean by "appropriately licensed" is that they apply content filters, remove AI-generated content domains, and use an explicit block list for adult and piracy-related domains. That is meaningful engineering work, but it is not the same as a model trained exclusively on explicitly licensed datasets.
The transition is already underway:
The picture that emerges is a Copilot model picker that now includes both Microsoft's in-house MAI models and third-party models from Anthropic. GPT-5.2 and GPT-5.2-Codex were deprecated in GitHub Copilot on June 5, according to the GitHub changelog.
For Enterprise and Business administrators: the Claude Fable 5 policy is off by default and requires explicit enablement. MAI-Code-1-Flash, as a Microsoft-native model, rolls out without an admin toggle for individual users.
For more on GitHub Copilot's overall model landscape, see /blog/github-copilot-guide.
Neither tool is universally better. The right choice depends on your workflow surface, billing model, and how you weight different tradeoffs.
| Factor | GitHub Copilot + MAI-Code-1-Flash | Claude Code |
|---|---|---|
| Primary surface | IDE (VS Code, JetBrains, Xcode, Eclipse) | Terminal and CLI |
| Model at default tier | MAI-Code-1-Flash (5B active, MoE) | Claude Sonnet 4.5 (configurable) |
| Latency for completions | Low (5B active params) | Moderate (denser models) |
| Agentic task handling | Improving with coding agent | Strong, purpose-built for long-horizon tasks |
| Enterprise admin controls | Granular per-model policies | API key + org-level controls |
| Data retention | Zero by default (Claude Fable 5 requires 30-day retention) | Configurable |
| Pricing model | Per seat + usage-based for premium models | Usage-based via API |
| Training data transparency | Technical paper published | Model cards + published policies |
Use Copilot + MAI-Code-1-Flash if: your team lives in VS Code or JetBrains, your primary need is fast inline completions and short-context chat, you are already on a Copilot plan, and you want the lowest latency at everyday coding tasks.
Use Claude Code if: your workflows involve multi-file agentic tasks from the terminal, you want the flexibility to switch models mid-session, or you are running autonomous coding agents that span repositories and require long-horizon planning.
The case for both: they are not mutually exclusive. Many teams are landing on Copilot for IDE completions and Claude Code for terminal-driven agent runs. The Copilot model picker means you can also pull Fable 5 into Copilot for heavyweight tasks without leaving the IDE.
For a broader comparison of AI coding tool pricing across tools, start at /blog/github-copilot-coding-agent-cli-2026.
| Resource | Link |
|---|---|
| Build 2026: Seven MAI models announcement | microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models |
| MAI-Thinking-1 announcement | microsoft.ai/news/introducing-mai-thinking-1 |
| MAI-Thinking-1 technical paper | microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf |
| MAI-Code-1-Flash announcement | microsoft.ai/news/introducingmai-code-1-flash |
| MAI-Code-1-Flash model card (PDF) | microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF |
| Build 2026 live blog | news.microsoft.com/build-2026-live-blog |
| Claude Fable 5 in GitHub Copilot | github.blog/changelog/2026-06-09-claude-fable-5-is-generally-available-for-github-copilot |
| GitHub Copilot model documentation | docs.github.com/copilot/reference/ai-models/supported-models |
| Microsoft Foundry | foundry.microsoft.com |
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
The original AI coding assistant. 77M+ developers. Inline completions in VS Code and JetBrains. Copilot Workspace genera...
View ToolOpen-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolAI app builder - describe what you want, get a deployed full-stack app with React, Supabase, and auth. No coding requi...
View ToolStackBlitz's in-browser AI app builder. Full-stack apps from a prompt - runs Node.js, installs packages, and deploys....
View ToolDeep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.
AI AgentsWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting Started
Microsoft's new in-house coding model matters less as a benchmark headline and more as a signal that Copilot is becoming...

Microsoft's PHI-4 is an MIT-licensed 14 billion parameter model that matches Llama 3.3 70B and Qwen 2.5 72B on key bench...

GitHub Copilot is moving from autocomplete into asynchronous coding agents, terminal workflows, MCP, skills, and model c...

Copilot has 77M users but the competition has changed. Here is how it works in 2026, what Copilot Workspace adds, and wh...

Two platforms, two philosophies. Here is how Anthropic and OpenAI compare on APIs, SDKs, documentation, pricing, and the...
GitHub Copilot switched to AI Credits billing on June 1 - here is what the change means for your team's budget, how Copi...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.