Leanstral 1.5: Mistral's Open Theorem-Proving Model Hits 100% on miniF2F

Mistral has released Leanstral 1.5, an open-weight model specialized for formal theorem proving in Lean 4. The headline numbers are striking: 100% on the miniF2F benchmark (both validation and test sets), 587 out of 672 problems solved on PutnamBench, and state-of-the-art results on the FATE-H and FATE-X evaluation suites.

The model is licensed Apache 2.0 and weighs in at 119B total parameters with only 6B active - a sparse mixture-of-experts architecture that makes it runnable on consumer hardware while maintaining frontier-level performance on formal proof tasks.

Last updated: July 4, 2026

What Leanstral 1.5 Does

Leanstral operates in two specialized environments designed for Lean 4 development:

Multiturn Proof Environment: The model receives theorem statements, submits proof attempts, receives compiler feedback, and iteratively refines its approach until the proof compiles successfully. This mirrors how human mathematicians work with proof assistants - write, get errors, fix, repeat.

Code Agent Environment: Beyond pure proving, Leanstral can function as a development agent - editing files, running bash commands, and using the Lean language server for real-time inspection of goals and type errors. This is closer to how developers actually interact with Lean in practice.

The practical result is a model that can take a theorem statement and, given sufficient token budget, produce a machine-checked proof without human intervention.

The Benchmark Claims

Mistral's published numbers:

Benchmark	Leanstral 1.5	Notes
miniF2F (validation)	100%	Full saturation
miniF2F (test)	100%	Full saturation
PutnamBench	587/672	At 4M token budget
FATE-H	87%	State-of-the-art
FATE-X	34%	State-of-the-art
FLTEval	Surpasses Claude Opus	At 1/7th the cost

The miniF2F saturation is significant because this benchmark has been a standard evaluation for theorem-proving systems. Reaching 100% means the benchmark is no longer useful for differentiating models on this task - Leanstral has effectively solved it.

PutnamBench measures performance on competition-level mathematics problems. Solving 87% (587/672) with a 4M token budget demonstrates strong test-time scaling - the model gets better with more compute.

What HN Is Saying

The Hacker News discussion generated substantive debate about the practical implications and some skepticism about the marketing claims.

The bug-finding example drew fire. Mistral highlighted that Leanstral found an overflow bug in the datrs/varinteger Rust library - "an edge case that testing and fuzzing would typically miss." Multiple commenters pushed back hard on this characterization. One pointed out that any property-based testing system invented since 1980 would explore boundary values like U64.MAX. Another reproduced the bug in seconds using proptest.

The consensus view: the bug was real, but calling it something "testing would typically miss" overstates the case. Fuzz testing with boundary value exploration would catch this routinely. The value of formal verification is proving absence of bugs, not finding obvious ones that good testing would catch anyway.

An OpenAI employee weighed in. One commenter (disclosing they work at OpenAI) ran GPT-5.5 High on the same varinteger repository and found the identical bug. Their point: this particular bug was not tricky; the repository simply lacked attention. The interesting question is whether Leanstral can prove properties that LLM-based bug finding cannot.

The comparison chart timing is awkward. The article compares Leanstral to models from "half a year ago" - several generations behind in the current pace of releases. Commenters noted this is a familiar pattern in benchmark marketing: compare to a snapshot of competitors rather than current versions.

The size efficiency is genuinely impressive. At 6B active parameters (119B total with sparse activation), Leanstral is dramatically smaller than the models it outperforms on these benchmarks. Several commenters noted this is the real story - not that it beats large models, but that it does so at 1/50th the active parameter count.

European AI labs are finding their niche. Some discussion touched on Mistral's strategy of targeting specialized domains (OCR, theorem proving) where they can achieve frontier performance without competing head-to-head with OpenAI and Anthropic on general capabilities. This is pragmatic: France has strong historical expertise in formal methods (Coq, OCaml ecosystem), and Mistral is leveraging that heritage.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Agent Studio: Authoring the Roles, Not Just the Knowledge

Jul 3, 2026 • 9 min read

App Builder: From a Prompt to a Working App You Can Watch Run

Jul 3, 2026 • 8 min read

One Endpoint, Every Capability: A Reference Architecture for Progressive Disclosure

Jul 3, 2026 • 10 min read

Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare

Jul 2, 2026 • 11 min read

Real-World Bug Finding

Beyond benchmarks, Mistral claims Leanstral discovered "5 previously unknown bugs across 57 repositories tested." The most interesting was the varinteger overflow:

// On input Std.U64.MAX, the expression (value + 1) overflowed
// Crashes in debug mode, silent corruption in release mode

The bug was filed as datrs/varinteger#8 a week before the Leanstral announcement. The library is small (about 1k downloads/day on crates.io) and hadn't been touched in 8 years - exactly the kind of low-attention code where automated verification adds value.

The broader point: formal verification tools are not primarily about finding bugs that testing misses. They're about proving properties hold for all inputs, which testing fundamentally cannot do. The bug-finding framing is easier to market, but it undersells the actual capability.

How Developers Can Use It

Leanstral 1.5 is available through:

Mistral Vibe - Free API endpoint
Hugging Face - Downloadable weights
OpenATP - An open-source Python package for automated theorem provers that supports Leanstral natively (GitHub)

The practical workflow involves writing Lean 4 code with theorem statements, then using Leanstral to generate proofs. The model integrates with the Lean language server, so it can inspect intermediate proof states and adjust its approach based on type errors.

For developers new to Lean 4, the learning curve is real but manageable. One HN commenter reported going from zero knowledge to productive Lean 4 development in six months, heavily assisted by LLMs (including but not limited to Leanstral). The key insight: you need to understand the axioms and theorem statements you're trying to prove, but the model can handle much of the proof construction machinery.

The Bigger Picture: Verified AI Code

The interesting application is not mathematical theorem proving - it's using Lean 4 as a target for verified code generation.

Several commenters discussed using Lean 4 as:

A metaprogramming framework that lowers to other languages (C++, Rust, Haskell) with provable correspondence
A tool for describing state machines and protocols with formal correctness guarantees
A GPU kernel compiler where tiling and scheduling properties can be formally verified

One commenter reported using Lean 4 bolted to io_uring for systems programming, with benchmarks that outperform nginx on reverse proxy workloads. The combination of a proof-capable language with competitive runtime performance opens possibilities that traditional formal methods tools (slow, academic) could not reach.

The thesis: as LLM-generated code increases, the need for verification increases proportionally. If humans are no longer reviewing every line, machine-checkable correctness proofs become more valuable. Leanstral points toward a workflow where LLMs write code and other LLMs (or the same LLM) prove properties about it.

Limitations and Caveats

Training data uncertainty. The model's performance on specific repositories may reflect training data contamination rather than generalization. This is difficult to rule out.

Benchmark saturation. 100% on miniF2F is impressive, but it means the benchmark is exhausted. Future evaluations will need harder problems.

Practical adoption barriers. Most developers do not write Lean 4. The path from "LLM can prove theorems" to "my production code has machine-checked properties" involves substantial tooling and process changes.

Comparison to non-specialized models. The FLTEval comparison to Claude Opus is interesting, but Opus is a general-purpose model. The more relevant comparison would be to other specialized theorem provers, which the release does not address.

Why This Matters for Developers

Short term: if you work with Lean 4 or are interested in formal verification, Leanstral 1.5 is the best open-weight option available. The Apache 2.0 license means you can integrate it into commercial tooling without restrictions.

Medium term: the combination of small active parameter count and strong performance suggests specialized models will remain competitive against larger general-purpose models for specific domains. This has implications for how teams choose AI tooling - domain-specific may beat one-size-fits-all.

Long term: the vision of LLM-generated code with machine-checked correctness proofs is getting more practical. Leanstral is a step toward workflows where code and proofs are generated together, reducing the gap between "it compiles" and "it's correct."

Sources

FAQ

What is Leanstral 1.5 and what can it do?

Leanstral 1.5 is Mistral's open-weight model for formal theorem proving in Lean 4. It can take theorem statements, generate proofs, interact with the Lean compiler for feedback, and iteratively refine proofs until they pass verification. It achieves 100% on the miniF2F benchmark and state-of-the-art results on FATE evaluations.

How many parameters does Leanstral 1.5 have?

The model has 119B total parameters but only 6B active due to its sparse mixture-of-experts architecture. This makes it runnable on consumer hardware while maintaining strong performance on theorem-proving tasks.

Can Leanstral 1.5 find bugs in code?

The model can identify bugs by attempting to prove properties about code and failing when those properties don't hold. Mistral claims it found 5 previously unknown bugs across 57 repositories. However, HN commenters noted that the highlighted example (an overflow bug) would have been caught by standard property-based testing or fuzzing.

Is Leanstral 1.5 open source?

Yes, the model is released under the Apache 2.0 license. Weights are available on Hugging Face, and the model is accessible through Mistral's Vibe API. This allows commercial use without restrictions.

Last updated: July 4, 2026

What Leanstral 1.5 Does

Leanstral operates in two specialized environments designed for Lean 4 development:

The practical result is a model that can take a theorem statement and, given sufficient token budget, produce a machine-checked proof without human intervention.

The Benchmark Claims

Mistral's published numbers:

Benchmark	Leanstral 1.5	Notes
miniF2F (validation)	100%	Full saturation
miniF2F (test)	100%	Full saturation
PutnamBench	587/672	At 4M token budget
FATE-H	87%	State-of-the-art
FATE-X	34%	State-of-the-art
FLTEval	Surpasses Claude Opus	At 1/7th the cost

What HN Is Saying

The Hacker News discussion generated substantive debate about the practical implications and some skepticism about the marketing claims.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

Agent Studio: Authoring the Roles, Not Just the Knowledge

Jul 3, 2026 • 9 min read

App Builder: From a Prompt to a Working App You Can Watch Run

Jul 3, 2026 • 8 min read

One Endpoint, Every Capability: A Reference Architecture for Progressive Disclosure

Jul 3, 2026 • 10 min read

Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare

Jul 2, 2026 • 11 min read

Real-World Bug Finding

Beyond benchmarks, Mistral claims Leanstral discovered "5 previously unknown bugs across 57 repositories tested." The most interesting was the varinteger overflow:

// On input Std.U64.MAX, the expression (value + 1) overflowed
// Crashes in debug mode, silent corruption in release mode

How Developers Can Use It

Leanstral 1.5 is available through:

Mistral Vibe - Free API endpoint
Hugging Face - Downloadable weights
OpenATP - An open-source Python package for automated theorem provers that supports Leanstral natively (GitHub)

The Bigger Picture: Verified AI Code

The interesting application is not mathematical theorem proving - it's using Lean 4 as a target for verified code generation.

Several commenters discussed using Lean 4 as:

A metaprogramming framework that lowers to other languages (C++, Rust, Haskell) with provable correspondence
A tool for describing state machines and protocols with formal correctness guarantees
A GPU kernel compiler where tiling and scheduling properties can be formally verified

Limitations and Caveats

Training data uncertainty. The model's performance on specific repositories may reflect training data contamination rather than generalization. This is difficult to rule out.

Benchmark saturation. 100% on miniF2F is impressive, but it means the benchmark is exhausted. Future evaluations will need harder problems.

What Leanstral 1.5 Does

The Benchmark Claims

What HN Is Saying

Agent Studio: Authoring the Roles, Not Just the Knowledge

App Builder: From a Prompt to a Working App You Can Watch Run

One Endpoint, Every Capability: A Reference Architecture for Progressive Disclosure

Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare

Real-World Bug Finding

How Developers Can Use It

The Bigger Picture: Verified AI Code

Limitations and Caveats

Why This Matters for Developers

Sources

FAQ

What is Leanstral 1.5 and what can it do?

How many parameters does Leanstral 1.5 have?

Can Leanstral 1.5 find bugs in code?

Is Leanstral 1.5 open source?

Godot Bans AI-Authored Code Contributions - What It Means for Open Source

Epic Games Releases Lore: A Version Control System Built for Game Development

Outer Shell: A Graphical Desktop for Your Remote Server via SSH

Related Tools

Jan

DeepSeek-TUI

Aider

Continue.dev

Apps from Developers Digest

Maintainer Dashboard

DD Traces

Migrate

Related Guides

Run AI Models Locally with Ollama and LM Studio

Git Integration - Claude Code

Model Aliases - Claude Code

Related Videos

NVIDIA's NEW Open Source Nemotron Nano 2 VL Model in 5 Minutes

Zed: The Open Source Agentic IDE - Use Claude Code, Codex & Gemini CLI in one place

Introducing FireGeo: The Open Source AI Brand Visibility SaaS Starter Kit

Related Posts

Godot Bans AI-Authored Code Contributions - What It Means for Open Source

Outer Shell: A Graphical Desktop for Your Remote Server via SSH

TikZ Editor Is a WYSIWYG LaTeX Figure Tool Built Almost Entirely by Codex

Epic Games Releases Lore: A Version Control System Built for Game Development

Box3D: Erin Catto Releases an Open Source 3D Physics Engine

Webernetes: Kubernetes Ported to the Browser in TypeScript

Build with the member tools

Get Smarter About AI Dev

What Leanstral 1.5 Does

The Benchmark Claims

What HN Is Saying

Agent Studio: Authoring the Roles, Not Just the Knowledge

App Builder: From a Prompt to a Working App You Can Watch Run

One Endpoint, Every Capability: A Reference Architecture for Progressive Disclosure

Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare

Real-World Bug Finding

How Developers Can Use It

The Bigger Picture: Verified AI Code

Limitations and Caveats

Why This Matters for Developers

Sources

FAQ

What is Leanstral 1.5 and what can it do?

How many parameters does Leanstral 1.5 have?

Can Leanstral 1.5 find bugs in code?

Is Leanstral 1.5 open source?

Godot Bans AI-Authored Code Contributions - What It Means for Open Source

Epic Games Releases Lore: A Version Control System Built for Game Development

Outer Shell: A Graphical Desktop for Your Remote Server via SSH

Related Tools

Jan

DeepSeek-TUI

Aider

Continue.dev

Apps from Developers Digest

Maintainer Dashboard

DD Traces

Migrate

Related Guides

Run AI Models Locally with Ollama and LM Studio

Git Integration - Claude Code