Gemini’s new voice upgrade and ‘context-aware’ pacing is blowing my mind

Support our independent tech coverage. Chrome Unboxed is written by real people, for real people—not search algorithms. Join Chrome Unboxed Plus for just $2 a month to get an ad-free experience, access to our private Discord, and more. Learn more about membership here.
START FREE TRIAL (MONTHLY)START FREE TRIAL (ANNUAL)

We have all heard AI text-to-speech (TTS) evolve from the robotic GPS voices of the past to the surprisingly decent assistants we have today. But Google’s latest update to Gemini’s voice capabilities feels like another generational jump.

This week, Google announced significant enhancements to its Gemini 2.5 Flash and Gemini 2.5 Pro Text-to-Speech models, and they are bringing a level of control and “humanity” to AI audio that we haven’t really heard before.

Xremove ads

It’s not just what you say, it’s how you say it

The biggest change here isn’t just audio quality; it’s expressivity and control. The new models can strictly adhere to style prompts, meaning you can tell the AI to be “cheerful and optimistic” or “somber and serious,” and it actually nails the performance. But the real “mind-blowing” feature is Precision Pacing.

Human beings don’t talk at a constant, metronomic speed. We speed up when we are excited, slow down when we are thoughtful, and pause for dramatic effect. Google’s new models are now context-aware, meaning they automatically adjust speed based on the meaning of the text.

Featured Videos

Xremove ads

Google shared an example of a mystery novel narrator starting with a nervous, slow tone and then accelerating into excitement and relief as they unlock a door. The difference between the old model and this new one is staggering—it genuinely sounds like acting, not just reading. Below are output, followed by the prompts and script:

“Style: You are a storyteller for a mystery novel. Start with a nervous tone that accelerates into excitement and relief”

Text: “I tried unlocking the door slowly. I fiddled around nervously… nothing… breathing deeply, I tried a second time. Click, I got in! I can’t believe it, I actually got in!”

Xremove ads

Better conversations and consistency

The update also tackles one of the hardest parts of AI audio: Multi-speaker scenarios. The new models are better at maintaining consistent character voices and handling the natural “handoff” between speakers in a dialogue. This is huge for developers building podcasts, games, or training modules where you need distinct, reliable voices interacting with each other.

Try the Voices from History demo to hear more Multi-speaker scenarios in action

Available now

These aren’t far-off research projects. The updated Gemini 2.5 Flash TTS (optimized for low latency) and Gemini 2.5 Pro TTS (optimized for quality) are available right now in Google AI Studio. If you are a developer—or just someone who likes “vibe coding” new apps—you can start playing with these new voices in the Playground today.

SUBSCRIBE TO UPSTREAM

Get Chrome Unboxed delivered straight to your inbox

Upstream is our flagship, curated newsletter with the top stories, most click-worthy deals, giveaways, and trending articles from Chrome Unboxed sent directly to your inbox a few times a week. Join 31,000+ subscribers.

Xremove ads

SUBSCRIBE HERE!

It’s not just what you say, it’s how you say it

Featured Videos

Better conversations and consistency

Available now

SUBSCRIBE TO UPSTREAM

Get Chrome Unboxed delivered straight to your inbox

About Robby Payne