Cohere's Open-Source Transcription Model Just Dethroned Whisper — And It's Apache 2.0
CohereLabs released Cohere Transcribe, a 2B-parameter open-source speech recognition model that topped Hugging Face's Open ASR Leaderboard with a 5.42% word error rate — beating OpenAI Whisper v3 (7.44%) and ElevenLabs Scribe v2 (5.83%). The model is released under Apache 2.0, supports 14 languages, and hit 269k downloads in its first day. It's available for free download or via Cohere's managed API.
Original source## A New Default for Open-Source Transcription?
OpenAI's Whisper has been the default open-source transcription backbone since 2022. It's embedded in thousands of production pipelines, developer tools, and AI products. Today, Cohere may have given the community a reason to reconsider.
Cohere Transcribe achieves a 5.42% word error rate on the Hugging Face Open ASR Leaderboard — compared to Whisper v3's 7.44% and ElevenLabs Scribe v2's 5.83%. The model uses a Conformer-based encoder-decoder architecture and processes audio through log-Mel spectrograms. At 2 billion parameters, it's larger than Whisper's standard size but smaller than the very largest competing models.
## The License Matters as Much as the Accuracy
The Apache 2.0 license is the decisive factor for many teams. Whisper is also MIT licensed, so that's not a differentiator — but several competing transcription models that have claimed benchmark wins come with non-commercial or restrictive licenses that make commercial deployment complicated. Cohere Transcribe can be deployed commercially, fine-tuned, and redistributed without restriction.
The Cohere managed API adds a free tier option for teams that don't want to run inference themselves, plus a "Model Vault" pricing structure for production commitments. That combination — own the weights or pay by the hour — covers most deployment scenarios.
## What the 269k Downloads Signal
First-day download counts on Hugging Face are a reasonable early signal of genuine developer interest, though they include automated crawlers and CI systems. Still, 269k represents the kind of immediate uptake that suggests this isn't just a metrics win — it's a model developers actually intend to use.
The 14-language coverage is the most notable gap versus Whisper's 99 languages. For applications targeting non-English or low-resource languages, Whisper remains the stronger choice. For English-dominant or major-language use cases, Cohere Transcribe is now the technically superior free option.
Panel Takes
The Builder
Developer Perspective
“The question for me isn't 'is this better than Whisper on benchmarks' — it's 'does it hold up on my actual audio?' But Apache 2.0, better WER on standard benchmarks, and a managed API free tier is a compelling starting point. I'm running this against my podcast backlog this week.”
The Skeptic
Reality Check
“Whisper's staying power came from real-world robustness — accented speech, noisy environments, inconsistent audio quality. The leaderboard benchmark conditions are cleaner than production reality. And 14 languages versus 99 is a significant regression for any global application. Don't migrate the whole stack on benchmark day.”
The Futurist
Big Picture
“The commoditization of high-accuracy speech recognition continues. When a well-funded lab releases a Whisper-beating open model for free, the floor for acceptable transcription quality rises across the entire ecosystem. Voice interfaces, accessibility tools, and meeting intelligence products all get better by default.”