Cohere released North Mini Code 1.0 on June 9, 2026, an open-weights coding model that runs on modest hardware yet posts frontier-level scores on agentic software benchmarks. The model is a 30-billion-parameter Mixture-of-Experts design that activates only 3 billion parameters per token, ships under the permissive Apache 2.0 license, and is free to download and run locally. It is the first model in Cohere's new North family built specifically for developers.

What Happened

Cohere Labs published North Mini Code 1.0 as open weights on Hugging Face in both BF16 and FP8 formats. Unlike Cohere's enterprise Command line, this release is fully open under Apache 2.0, so you can run it on your own machine, fine-tune it, and ship it inside commercial tools without licensing fees. It is purpose-built for agentic coding, meaning it writes, edits, and debugs code through terminal-based workflows rather than general chat.

Why It Matters

A model that activates just 3 billion parameters yet competes with far larger systems changes what can run on a creator's own laptop. On the independent Artificial Analysis Coding Index, North Mini Code scores 33.4, ahead of GLM-4.7-Flash at 25.9 and close behind Qwen3.6 35B at 35.2, while generating roughly 199 tokens per second. For anyone wiring AI-assisted coding into a creative pipeline, that mix of small footprint, open license, and strong coding scores is the practical win, though the same analysis flags weaker results on non-coding agentic tasks. It lands amid a wave of coding models reshaping editor defaults, from Project Polaris in GitHub Copilot to Cursor's agentic Canvas.

Key Details

The model uses a sparse MoE architecture with 128 experts, eight active per token, and supports context windows up to 128K tokens. Cohere reports 80.2 percent pass@10 on SWE-Bench Verified and 55.1 percent on Terminal-Bench v2, with reinforcement learning adding several points on top of the base supervised scores. Both a standard build and an FP8 quantized version are available, and the FP8 weights cut memory use enough to fit on a single consumer GPU.

What to Do Next

If you already run local models, pull the FP8 build from Hugging Face and connect it to a terminal coding agent such as OpenCode, where it is offered for free. To call it as a hosted endpoint instead, it is live on Cohere's Chat V2 API; check the Cohere release notes for current model identifiers. Start with a real refactor or bug fix in your own repository rather than a toy prompt, since the model is tuned for multi-step agentic edits, then compare its output against whatever coding model you default to today before deciding where it fits.