We have all heard AI text-to-speech (TTS) evolve from the robotic GPS voices of the past to the surprisingly decent assistants we have today. But Google’s latest update to Gemini’s voice capabilities feels like another generational jump.
This week, Google announced significant enhancements to its Gemini 2.5 Flash and Gemini 2.5 Pro Text-to-Speech models, and they are bringing a level of control and “humanity” to AI audio that we haven’t really heard before.
It’s not just what you say, it’s how you say it
The biggest change here isn’t just audio quality; it’s expressivity and control. The new models can strictly adhere to style prompts, meaning you can tell the AI to be “cheerful and optimistic” or “somber and serious,” and it actually nails the performance. But the real “mind-blowing” feature is Precision Pacing.
Human beings don’t talk at a constant, metronomic speed. We speed up when we are excited, slow down when we are thoughtful, and pause for dramatic effect. Google’s new models are now context-aware, meaning they automatically adjust speed based on the meaning of the text.
Google shared an example of a mystery novel narrator starting with a nervous, slow tone and then accelerating into excitement and relief as they unlock a door. The difference between the old model and this new one is staggering—it genuinely sounds like acting, not just reading. Below are output, followed by the prompts and script:
“Style: You are a storyteller for a mystery novel. Start with a nervous tone that accelerates into excitement and relief”
Text: “I tried unlocking the door slowly. I fiddled around nervously… nothing… breathing deeply, I tried a second time. Click, I got in! I can’t believe it, I actually got in!”
Better conversations and consistency
The update also tackles one of the hardest parts of AI audio: Multi-speaker scenarios. The new models are better at maintaining consistent character voices and handling the natural “handoff” between speakers in a dialogue. This is huge for developers building podcasts, games, or training modules where you need distinct, reliable voices interacting with each other.
Available now
These aren’t far-off research projects. The updated Gemini 2.5 Flash TTS (optimized for low latency) and Gemini 2.5 Pro TTS (optimized for quality) are available right now in Google AI Studio. If you are a developer—or just someone who likes “vibe coding” new apps—you can start playing with these new voices in the Playground today.
Join Chrome Unboxed Plus
Introducing Chrome Unboxed Plus – our revamped membership community. Join today at just $2 / month to get access to our private Discord, exclusive giveaways, AMAs, an ad-free website, ad-free podcast experience and more.
Plus Monthly
$2/mo. after 7-day free trial
Pay monthly to support our independent coverage and get access to exclusive benefits.
Plus Annual
$20/yr. after 7-day free trial
Pay yearly to support our independent coverage and get access to exclusive benefits.
Our newsletters are also a great way to get connected. Subscribe here!
Click here to learn more and for membership FAQ

