Mistral AI released Mistral Small 4, a 119-billion parameter mixture-of-experts model that unifies reasoning, multimodal vision, and coding capabilities under a single Apache 2.0 license. The model activates only 6 billion parameters per token, delivering strong performance at a fraction of the compute cost of dense models its size.
What Happened
On March 16, Mistral AI launched Mistral Small 4, replacing three separate models (Magistral for reasoning, Pixtral for vision, and Devstral for coding) with one unified architecture. The model uses 128 experts with 4 active per token, supports a 256K context window, and ships with configurable reasoning effort that lets developers switch between fast responses and deep step-by-step reasoning.
The release came alongside two related announcements: Leanstral, the first open-source code agent that formally verifies implementations using the Lean 4 proof language, and a partnership with NVIDIA as a founding member of the Nemotron Coalition to co-develop open frontier models.
Why It Matters
For creative AI users, Mistral Small 4 offers a capable open-source alternative to proprietary multimodal models. It handles text, images, and code in one deployment, eliminating the need to juggle multiple specialized models. The Apache 2.0 license means creators and developers can run it locally, fine-tune it for specific workflows, and deploy it commercially without restrictions.
The configurable reasoning effort is particularly useful. Setting reasoning_effort="none" gives fast responses for routine tasks, while reasoning_effort="high" enables deep analysis for complex problems. This flexibility means one model can serve both quick creative tasks and detailed technical work.
Key Details
- Architecture: 119B total parameters, 6B active per token (8B including embedding/output layers). Mixture of 128 experts, 4 active per token.
- Performance: 40% reduction in end-to-end latency and 3x more requests per second compared to Mistral Small 3. Outperforms GPT-OSS 120B on LiveCodeBench with 20% shorter outputs.
- Context window: 256K tokens.
- Hardware requirements: Minimum 4x NVIDIA H100, 2x H200, or 1x DGX B200.
- Availability: Hugging Face, Mistral API, NVIDIA NIM (day-0), and compatible with vLLM, llama.cpp, SGLang, and Transformers.
What to Do Next
Developers can start prototyping immediately through the Mistral API or NVIDIA Build. For local deployment, the model runs on standard multi-GPU setups. Those interested in the broader open-source creative AI wave should note how this release continues the trend of open models closing the gap with proprietary alternatives across reasoning, vision, and code generation tasks.