JetBrains released Mellum2-12B-A2.5B-Thinking on June 1, 2026, an Apache 2.0 open-weights reasoning model built specifically for coding, debugging, and agentic developer workflows. The mixture-of-experts architecture has 12B total parameters but activates only 2.5B per token, putting frontier-grade reasoning quality on hardware that can run a single-GPU inference setup.

How to Run It Today

The weights are available immediately under Apache 2.0, which means they can ship in commercial products without license friction. JetBrains shipped a launch blog post on the Hugging Face Hub with an OpenAI-compatible API example. Practical workflow: pull the GGUF quants when community members publish them, wire the model into Continue.dev or your editor of choice, and use the explicit <think> block output as a debuggable trace when the model gets a refactor wrong.

Why It Matters for Developers

Most open-weights coding models in this size class either skip reasoning entirely or bolt on chain-of-thought as a prompting trick. Mellum2-Thinking was trained with reinforcement learning from verifiable rewards on hard math and code, with explicit reasoning blocks that let you inspect the model's logic before the final answer. That makes it useful in two scenarios that closed-source assistants struggle with: air-gapped enterprise environments where weights cannot leave the building, and agentic loops where the orchestrator needs to read intermediate reasoning to decide whether to retry a step. JetBrains AI, the IDE-integrated assistant that ships with the IntelliJ platform, has signaled that future IDE features will lean on these open models rather than only third-party APIs.

Key Details and Benchmarks

The architecture is 64 total experts with 8 activated per token, 131,072 token context window, sliding-window attention layers paired with full attention layers, and bfloat16 precision. The technical report on arXiv documents the full training recipe and ablations. Benchmark scores at launch: 69.9% pass@1 on LiveCodeBench v6, 58.4% on AIME math, 87.0% on GSM-Plus, 86.2% on MMLU-Redux, 57.6% on GPQA Diamond, and 69.4% on BFCL v3 tool use. The release covers the full Mellum2 family: base pretrain, base, instruct, and the thinking variant launched today.

Comparison Context

Against open peers in the 10B-to-15B class, Mellum2-Thinking lands above DeepSeek-Coder, Qwen2.5-Coder, and Code Llama on LiveCodeBench v6, with reasoning and tool-use scores closer to the closed-source frontier than to the open midfield. The Apache 2.0 license is the differentiator that separates it from most Chinese open-weights releases, which use bespoke licenses that complicate downstream commercial use.

What to Do Next

If you ship a developer-facing AI feature, run Mellum2-Thinking against your evaluation set in the next week. Pin the model card now so you catch the GGUF and MLX quantizations as community members publish them.