Cohere released Command A+ on May 21, 2026 as its first frontier-class model under a permissive open license. The 218-billion-parameter sparse Mixture-of-Experts uses 25B active parameters per token, runs on as few as two NVIDIA H100s or a single Blackwell GPU, supports 128K context, and ships under Apache 2.0. Weights are live on Hugging Face in W4A4, FP8, and BF16 quantizations.

What This Enables for Self-Hosters

Until Command A+, the strongest Apache-2.0 weights you could run in production were Mistral Medium 3.5 class. Command A+ raises that ceiling. Three workflows it unlocks today:

  1. Pull the W4A4 quantization from Hugging Face, deploy on a 2x H100 node, and serve agentic workflows with native citations and 48-language coverage. No usage-fee meter, no API rate limit.
  2. Fine-tune on private domain data. Apache 2.0 means you can modify, redistribute, and use the output commercially. RAG pipelines that send sensitive context to a closed API can now move on-prem.
  3. Plug into existing tool-calling stacks. Command A+ jumps from 37 percent to 85 percent on agent benchmarks versus its predecessor, which makes it usable for browser-use and code-execution agents that previously needed Claude or GPT-5.

Why It Matters

Cohere has spent years positioning itself as the enterprise-friendly alternative to OpenAI and Anthropic. Open-weights frontier releases were not part of that story until today. VentureBeat reports the model also delivers "lossless quantization" at W4A4, meaning the 4-bit version retains nearly the full benchmark score of the BF16 weights. For teams comparing total cost of ownership against Claude or GPT subscriptions, that combination of license, quantization, and hardware footprint is what changes the math.

Key Details

Command A+ scores about 37 on the Artificial Analysis Intelligence Index, putting it in the same bracket as Claude 4.5 Haiku and Mistral Medium 3.5. Coding moved from 3 percent to 25 percent versus the prior generation. Multilingual coverage spans 48 languages plus images as a multimodal input. MarkTechPost notes the sparse-MoE architecture is what makes the 218B parameter count fit on commodity inference hardware. Cohere kept native citations from the prior Command A line, which means agent outputs cite their grounded sources in a structured field rather than as a post-hoc summarization step.

What to Do Next

If you operate an inference cluster, schedule a benchmark run against your current open-weights default. If you build agent products, test whether the 85 percent agent score holds on your eval set. And if you have been waiting for an Apache-2.0 model strong enough to replace a Claude or GPT subscription on a specific workflow, this is the first realistic candidate. Pair it with open-source agent guardrails for a fully on-prem agent stack.