Cohere's Open-Source Transcription Model Just Dethroned Whisper — And It's Apache 2.0

CohereLabs released Cohere Transcribe, a 2B-parameter open-source speech recognition model that topped Hugging Face's Open ASR Leaderboard with a 5.42% word error rate — beating OpenAI Whisper v3 (7.44%) and ElevenLabs Scribe v2 (5.83%). The model is released under Apache 2.0, supports 14 languages, and hit 269k downloads in its first day. It's available for free download or via Cohere's managed API.

Original source

## A New Default for Open-Source Transcription?

OpenAI's Whisper has been the default open-source transcription backbone since 2022. It's embedded in thousands of production pipelines, developer tools, and AI products. Today, Cohere may have given the community a reason to reconsider.

Cohere Transcribe achieves a 5.42% word error rate on the Hugging Face Open ASR Leaderboard — compared to Whisper v3's 7.44% and ElevenLabs Scribe v2's 5.83%. The model uses a Conformer-based encoder-decoder architecture and processes audio through log-Mel spectrograms. At 2 billion parameters, it's larger than Whisper's standard size but smaller than the very largest competing models.

## The License Matters as Much as the Accuracy

The Apache 2.0 license is the decisive factor for many teams. Whisper is also MIT licensed, so that's not a differentiator — but several competing transcription models that have claimed benchmark wins come with non-commercial or restrictive licenses that make commercial deployment complicated. Cohere Transcribe can be deployed commercially, fine-tuned, and redistributed without restriction.

The Cohere managed API adds a free tier option for teams that don't want to run inference themselves, plus a "Model Vault" pricing structure for production commitments. That combination — own the weights or pay by the hour — covers most deployment scenarios.

## What the 269k Downloads Signal

First-day download counts on Hugging Face are a reasonable early signal of genuine developer interest, though they include automated crawlers and CI systems. Still, 269k represents the kind of immediate uptake that suggests this isn't just a metrics win — it's a model developers actually intend to use.

The 14-language coverage is the most notable gap versus Whisper's 99 languages. For applications targeting non-English or low-resource languages, Whisper remains the stronger choice. For English-dominant or major-language use cases, Cohere Transcribe is now the technically superior free option.

Panel Takes

The Builder

Developer Perspective

“The question for me isn't 'is this better than Whisper on benchmarks' — it's 'does it hold up on my actual audio?' But Apache 2.0, better WER on standard benchmarks, and a managed API free tier is a compelling starting point. I'm running this against my podcast backlog this week.”

The Skeptic

Reality Check

“Whisper's staying power came from real-world robustness — accented speech, noisy environments, inconsistent audio quality. The leaderboard benchmark conditions are cleaner than production reality. And 14 languages versus 99 is a significant regression for any global application. Don't migrate the whole stack on benchmark day.”

The Futurist

Big Picture

“The commoditization of high-accuracy speech recognition continues. When a well-funded lab releases a Whisper-beating open model for free, the floor for acceptable transcription quality rises across the entire ecosystem. Voice interfaces, accessibility tools, and meeting intelligence products all get better by default.”

Panel Takes

Bookmarks