Cohere released Transcribe, a 2-billion-parameter open-source speech recognition model that tops the HuggingFace Open ASR Leaderboard with a 5.42% average word error rate. The model beats OpenAI Whisper Large v3 by 27% and ships under an Apache 2.0 license.
What Happened
Cohere Transcribe uses a Conformer-based encoder-decoder architecture: a large Conformer encoder extracts acoustic representations while a lightweight Transformer decoder handles token generation. At 2 billion parameters, the model is compact enough to run on consumer-grade GPUs for self-hosting.
The model supports 14 languages spanning European (English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish), Asian-Pacific (Chinese, Japanese, Korean, Vietnamese), and Arabic.
Why It Matters
Whisper Large v3 has been the default open-source transcription model since its release. Cohere Transcribe cuts its word error rate from 7.44% to 5.42%, a meaningful jump for production workflows where accuracy compounds across hours of audio. It also edges past ElevenLabs Scribe v2 (5.83%) and Qwen3-ASR-1.7B (5.76%) on the same benchmark.
In human preference evaluations, Transcribe achieved a 61% average win rate across pairwise comparisons measuring accuracy, meaning preservation, named entity recognition, and formatting. For creators transcribing interviews, podcasts, or video narration, better named entity handling means fewer corrections in post.
Key Details
- WER: 5.42% average on HuggingFace Open ASR Leaderboard (rank #1)
- Parameters: 2B (Conformer encoder + Transformer decoder)
- Languages: 14 (EN, FR, DE, IT, ES, PT, EL, NL, PL, ZH, JA, KO, VI, AR)
- License: Apache 2.0 (fully open for commercial use)
- Throughput: Best-in-class RTFx in the 1B+ parameter cohort
Access is available through three channels: open-source download via HuggingFace, free API with rate limits, and Model Vault for dedicated infrastructure with per-hour pricing.
What to Do Next
If you are using Whisper for transcription, benchmark Cohere Transcribe on your own audio. The Apache 2.0 license means no restrictions on commercial deployment. For context on the current speech model landscape, see our coverage of IBM Granite 4.0 Speech, which previously held the top ASR spot. For a wider view on open-source AI progress, our open-source vs closed AI guide tracks the competitive landscape.