Gemini 3.1 Flash TTS
Google's new TTS API: 70 languages, 200+ audio tags, native multi-speaker
Expert verdict
Ship
3-1The Panel's Take
Gemini 3.1 Flash TTS is Google's new text-to-speech model, launched today on Google AI Studio and Vertex AI. It supports 70+ languages and introduces a natural-language audio tag system with 200+ expressivity controls — developers can describe delivery in plain English ("whisper conspiratorially", "warm and unhurried") and the model interprets those instructions at inference time. The model also supports native multi-speaker dialogue generation from a single prompt, outputting a conversation with distinct, consistent voices without requiring separate passes. All audio output is watermarked via Google's SynthID technology for provenance tracking. For developers building voice agents, podcasting tools, or multilingual apps, this is a meaningful upgrade over existing options. The audio tags approach in particular is a genuinely novel paradigm compared to prosody markup languages like SSML, and developer reception on X and HN has been strong — Simon Willison called out the expressivity controls as the standout feature.
Share this verdict
Gemini 3.1 Flash TTS verdict: SHIP 🚀 3 ships · 1 skip from the expert panel Full review: shiporskip.io/tool/gemini-3-1-flash-tts-google-70-languages-audio-tags-multi-speaker-synthid-2026
Weekly AI Tool Verdicts
Get the next verdict in your inbox
7 critics review a new AI tool every day. Weekly digest — free.
Compare Gemini 3.1 Flash TTS with Others
Looking for Gemini 3.1 Flash TTS alternatives?
Compare Gemini 3.1 Flash TTS with every other Audio & Voice tool reviewed by our panel.
See all Audio & Voice alternativesEmbed this verdict
Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.
<a href="https://shiporskip.io/api/badge-click/gemini-3-1-flash-tts-google-70-languages-audio-tags-multi-speaker-synthid-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/gemini-3-1-flash-tts-google-70-languages-audio-tags-multi-speaker-synthid-2026" alt="Gemini 3.1 Flash TTS Ship verdict on ShipOrSkip" width="360" height="90" /></a>[](https://shiporskip.io/api/badge-click/gemini-3-1-flash-tts-google-70-languages-audio-tags-multi-speaker-synthid-2026)<iframe src="https://shiporskip.io/embed/gemini-3-1-flash-tts-google-70-languages-audio-tags-multi-speaker-synthid-2026" title="Gemini 3.1 Flash TTS ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>The reviews
“This replaces ElevenLabs for a lot of use cases — and at Google's pricing it's hard to argue against. The natural-language audio tags are the real unlock: instead of wrestling with SSML prosody markup, you just describe what you want. The multi-speaker output from a single prompt is going to save a ton of orchestration code in voice agent pipelines.”
“It's Google — which means it could be deprecated in 18 months and replaced with Gemini 4 Flash TTS Pro Ultra. The audio tags sound creative but until there's a published spec for all 200+ of them, you're guessing at prompt-engineering your voice model. And SynthID watermarking is only as useful as the detection ecosystem, which is still nascent.”
“Natural-language expressivity control for TTS is a paradigm shift. When the model can interpret 'sound like you're delivering devastating news gently' without explicit prosody markup, we're entering an era where voice synthesis becomes genuinely directorial. The 70-language coverage plus SynthID watermarking points toward a future where synthesized voice is both globally expressive and auditably provenance-tracked.”
“I've been paying for ElevenLabs and manually tweaking prosody to get the right delivery. The audio tag system here could cut that iteration time dramatically — describing the scene and letting the model interpret is so much more intuitive than sliders and SSML. Multi-speaker from a single prompt is going to be huge for podcast generators and explainer video tools.”