Cohere Transcribe: Open Source ASR Leader

Cohere released Transcribe, a 2-billion-parameter open-source speech recognition model that tops the HuggingFace Open ASR Leaderboard with a 5.42% average word error rate. The model beats OpenAI Whisper Large v3 by 27% and ships under an Apache 2.0 license.

What Happened

Cohere Transcribe uses a Conformer-based encoder-decoder architecture: a large Conformer encoder extracts acoustic representations while a lightweight Transformer decoder handles token generation. At 2 billion parameters, the model is compact enough to run on consumer-grade GPUs for self-hosting.

The model supports 14 languages spanning European (English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish), Asian-Pacific (Chinese, Japanese, Korean, Vietnamese), and Arabic.

Why It Matters

Whisper Large v3 has been the default open-source transcription model since its release. Cohere Transcribe cuts its word error rate from 7.44% to 5.42%, a meaningful jump for production workflows where accuracy compounds across hours of audio. It also edges past ElevenLabs Scribe v2 (5.83%) and Qwen3-ASR-1.7B (5.76%) on the same benchmark.

In human preference evaluations, Transcribe achieved a 61% average win rate across pairwise comparisons measuring accuracy, meaning preservation, named entity recognition, and formatting. For creators transcribing interviews, podcasts, or video narration, better named entity handling means fewer corrections in post.

Key Details

WER: 5.42% average on HuggingFace Open ASR Leaderboard (rank #1)
Parameters: 2B (Conformer encoder + Transformer decoder)
Languages: 14 (EN, FR, DE, IT, ES, PT, EL, NL, PL, ZH, JA, KO, VI, AR)
License: Apache 2.0 (fully open for commercial use)
Throughput: Best-in-class RTFx in the 1B+ parameter cohort

Access is available through three channels: open-source download via HuggingFace, free API with rate limits, and Model Vault for dedicated infrastructure with per-hour pricing.

What to Do Next

If you are using Whisper for transcription, benchmark Cohere Transcribe on your own audio. The Apache 2.0 license means no restrictions on commercial deployment. For context on the current speech model landscape, see our coverage of IBM Granite 4.0 Speech, which previously held the top ASR spot. For a wider view on open-source AI progress, our open-source vs closed AI guide tracks the competitive landscape.

Cohere Transcribe Tops Open ASR Leaderboard

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

Sakana Fugu: One API to Orchestrate Top AI Models

VNCCS Utils 0.5.3 Adds UniCanvas Infinite Canvas in ComfyUI

LTX Director 2.0: Free AI Video Editor for ComfyUI

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

Sakana Fugu: One API to Orchestrate Top AI Models

VNCCS Utils 0.5.3 Adds UniCanvas Infinite Canvas in ComfyUI

LTX Director 2.0: Free AI Video Editor for ComfyUI

Stay ahead of Creative AI