Mistral AI released Leanstral on March 16, the first open-source AI agent built specifically for Lean 4 formal verification. The 120-billion parameter mixture-of-experts model proves that AI-generated code meets its specifications, scoring 26.3 on the FLTEval benchmark at pass@2 while costing just $36 per task, 93% less than Claude Sonnet 4.6.

What Happened

Mistral launched Leanstral, an AI agent that generates code alongside mathematical proofs of correctness using the Lean 4 proof language. A developer specifies requirements formally, Leanstral generates the implementation and proof, and the Lean 4 type checker verifies everything automatically. The model integrates with Lean's language server protocol through MCP, giving it direct access to the proof environment.

The release comes under the Apache 2.0 license with three deployment options: integration through Mistral Vibe, a free API endpoint (labs-leanstral-2603), or fully self-hosted using the open weights. The model uses 128 experts with only 6 billion parameters active per token, keeping inference costs low despite the 120 billion total parameter count.

Why It Matters

According to industry surveys, 96% of developers distrust AI-generated code, yet 42% of production code is now written by AI tools. That gap between adoption and trust is a real problem. Leanstral addresses it directly by providing mathematical proof that generated code does what it claims to do, not just test coverage or human review, but formal verification that eliminates entire categories of bugs.

The cost advantage is significant. On the FLTEval benchmark, Leanstral scores 26.3 at $36 per task. Claude Sonnet 4.6 scores 23.7 at $549 per task. Claude Opus 4.6 leads on absolute quality at 39.6, but costs $1,650 per task. For teams that need verified code but cannot justify spending 46x more per task, Leanstral opens formal verification to a much wider range of projects. The fact that it also outperforms larger open-source competitors like GLM5-744B, Kimi-K2.5, and Qwen3.5-397B makes it the strongest open option available.

For the booming AI coding market, Leanstral signals a shift from generating more code faster to generating code you can actually trust.

Key Details

  • Architecture: 120B total parameters, 6B active per token via mixture-of-experts.
  • Benchmark: 26.3 on FLTEval at pass@2, outperforming Claude Sonnet 4.6 (23.7) and all open-source competitors.
  • Cost: $36 per task vs. $549 for Sonnet 4.6 (93% cheaper) and $1,650 for Opus 4.6.
  • License: Apache 2.0, fully open-source weights and code.
  • Deployment: Mistral Vibe integration, free API endpoint (labs-leanstral-2603), or self-hosted.
  • Integration: Connects to Lean's language server protocol through MCP for direct proof environment access.
  • Workflow: Developer writes formal specification, Leanstral generates implementation plus proof, Lean 4 type checker verifies automatically.

What to Do Next

Developers working with AI-generated code should review the Leanstral documentation and consider where formal verification fits their workflow. The free API endpoint lets you test the model without any cost commitment. Teams already using Lean 4 can integrate Leanstral immediately through the language server protocol. For those new to formal verification, the Gigazine overview provides a good introduction to how the proof process works in practice.