Voxtral 4B TTS
Mistral's open-weights production TTS — 9 languages, 70ms latency, 20 voices
Expert verdict
Ship
3-1The Panel's Take
Voxtral 4B TTS is Mistral AI's first dedicated text-to-speech model — a 4-billion parameter open-weights release targeting production voice agent deployments. It supports 9 languages (English, French, Spanish, German, Italian, Portuguese, Dutch, Russian, Japanese), 20 preset voices, custom voice adaptation from reference audio, and achieves 70ms end-to-end latency at low concurrency. The model outputs 24kHz audio and has first-class deployment support via vLLM, making it easy to slot into existing LLM serving infrastructure. The weights are released under CC BY-NC 4.0 — free for research and personal use, commercial licensing available separately. Voxtral positions Mistral squarely in the voice agent infrastructure space, competing with ElevenLabs, Cartesia, and PlayHT for the latency-sensitive realtime voice pipeline market. The 70ms figure is competitive with most commercial APIs, and the ability to self-host on your own GPU removes the per-character pricing that makes commercial TTS expensive at scale. As voice agents move from experimental to production in 2026, having a capable open-weights TTS option changes the cost calculus significantly.
Share this verdict
Voxtral 4B TTS verdict: SHIP 🚀 3 ships · 1 skip from the expert panel Full review: shiporskip.io/tool/voxtral-4b-tts-mistral-open-weights-voice-agents
Weekly AI Tool Verdicts
Get the next verdict in your inbox
7 critics review a new AI tool every day. Weekly digest — free.
Similar Products
Compare Voxtral 4B TTS with Others
Looking for Voxtral 4B TTS alternatives?
Compare Voxtral 4B TTS with every other Audio & Voice tool reviewed by our panel.
See all Audio & Voice alternativesEmbed this verdict
Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.
<a href="https://shiporskip.io/api/badge-click/voxtral-4b-tts-mistral-open-weights-voice-agents" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/voxtral-4b-tts-mistral-open-weights-voice-agents" alt="Voxtral 4B TTS Ship verdict on ShipOrSkip" width="360" height="90" /></a>[](https://shiporskip.io/api/badge-click/voxtral-4b-tts-mistral-open-weights-voice-agents)<iframe src="https://shiporskip.io/embed/voxtral-4b-tts-mistral-open-weights-voice-agents" title="Voxtral 4B TTS ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>The reviews
“First-class vLLM support means you can run this alongside your language model on the same infrastructure. The 70ms latency is production-viable for realtime voice, and avoiding per-character billing is a massive cost win at scale. The non-commercial license is the only real friction for indie founders.”
“CC BY-NC 4.0 is not truly open source — commercial use requires a Mistral license, which means you're still at their pricing mercy eventually. The 9-language coverage is solid but not exceptional. ElevenLabs and Cartesia have years of production hardening; Mistral TTS v1 will have rough edges.”
“Mistral entering TTS signals that the full AI stack — text in, voice out — is becoming commoditized. When every major open-model lab ships voice capabilities, ElevenLabs' moat narrows significantly. The race to own the realtime voice agent pipeline is one of 2026's defining infrastructure battles.”
“20 preset voices plus custom voice adaptation hits the sweet spot for content creators who need consistent branded voices without building from scratch. The 70ms latency means voice-interactive experiences feel natural rather than robotic. This is the kind of tool that makes podcast-style AI content a weekend project.”