Gemini’s new voice upgrade and ‘context-aware’ pacing is blowing my mind

We have all heard AI text-to-speech (TTS) evolve from the robotic GPS voices of the past to the surprisingly decent assistants we have today. But Google’s latest update to Gemini’s voice capabilities feels like another generational jump.

This week, Google announced significant enhancements to its Gemini 2.5 Flash and Gemini 2.5 Pro Text-to-Speech models, and they are bringing a level of control and “humanity” to AI audio that we haven’t really heard before.

Xremove ads

It’s not just what you say, it’s how you say it

The biggest change here isn’t just audio quality; it’s expressivity and control. The new models can strictly adhere to style prompts, meaning you can tell the AI to be “cheerful and optimistic” or “somber and serious,” and it actually nails the performance. But the real “mind-blowing” feature is Precision Pacing.

Human beings don’t talk at a constant, metronomic speed. We speed up when we are excited, slow down when we are thoughtful, and pause for dramatic effect. Google’s new models are now context-aware, meaning they automatically adjust speed based on the meaning of the text.

Featured Videos

Xremove ads

Google shared an example of a mystery novel narrator starting with a nervous, slow tone and then accelerating into excitement and relief as they unlock a door. The difference between the old model and this new one is staggering—it genuinely sounds like acting, not just reading. Below are output, followed by the prompts and script:

“Style: You are a storyteller for a mystery novel. Start with a nervous tone that accelerates into excitement and relief”

Text: “I tried unlocking the door slowly. I fiddled around nervously… nothing… breathing deeply, I tried a second time. Click, I got in! I can’t believe it, I actually got in!”

Xremove ads

Better conversations and consistency

The update also tackles one of the hardest parts of AI audio: Multi-speaker scenarios. The new models are better at maintaining consistent character voices and handling the natural “handoff” between speakers in a dialogue. This is huge for developers building podcasts, games, or training modules where you need distinct, reliable voices interacting with each other.

Try the Voices from History demo to hear more Multi-speaker scenarios in action

Available now

These aren’t far-off research projects. The updated Gemini 2.5 Flash TTS (optimized for low latency) and Gemini 2.5 Pro TTS (optimized for quality) are available right now in Google AI Studio. If you are a developer—or just someone who likes “vibe coding” new apps—you can start playing with these new voices in the Playground today.

Join Chrome Unboxed Plus

Introducing Chrome Unboxed Plus – our revamped membership community. Join today at just $2 / month to get access to our private Discord, exclusive giveaways, AMAs, an ad-free website, ad-free podcast experience and more.

Xremove ads

Plus Monthly

$2/mo. after 7-day free trial

Pay monthly to support our independent coverage and get access to exclusive benefits.

Start free trial

Plus Annual

$20/yr. after 7-day free trial

Xremove ads

Pay yearly to support our independent coverage and get access to exclusive benefits.

Start free trial

Our newsletters are also a great way to get connected. Subscribe here!

Click here to learn more and for membership FAQ

It’s not just what you say, it’s how you say it

Featured Videos

Better conversations and consistency

Available now

Join Chrome Unboxed Plus

Plus Monthly

Plus Annual

Our newsletters are also a great way to get connected. Subscribe here!

About Robby Payne