Back in 2019, Google updated their speech recognition for apps like Gboard that made the service much better at understanding mumbles and other poorly-pronounced syllables. For various reasons, this update was a huge step forward for accessibility, but it also made Gboard far more capable at picking up everyday language and accurately converting it to text in real time. The technical aspect of how Google achieved this can be read in a very technical article here, but what you really need to know is this change made voice-to-text far better and now the engine that drives all this lives right on your phone without need of servers or an internet connection.
That all happened in March of 2019, so we can only assume this speech engine is the same that is available to the Google Assistant and part of the effort that shrunk down the voice model for the Assistant so it now can live on your phone or in something as small and affordable as the new Nest Mini. With an on-device voice modeling engine, apps like the Google Assistant and Gboard can now transcribe voice to text far faster than ever before and it makes things way more accessible to a wider swath of potential users.
What I really want to highlight today, however, is the increased ability of these models to pick up on less-than-clear voices. Sure, the quickness this new on-device modeling brings to the table is fantastic and I love that aspect of it, but what has become far more valuable for me of late is the ability for my phone to understand me when I’m not speaking so clearly. Many of you, like myself, are finding yourself with a mask on your face way more than you ever have before. As masks become part of the new normal and we all take a bit more inventory of how often our hands have to touch our phones while out in public, text-to-speech input is becoming a go-to technology for many users.
Just yesterday I had to run into a handful of stores as I’ve been putting off shopping as much as possible and the errands had piled up. Donning my trusty mask, I also ended up needing to communicate with my wife a handful of times while in a store. With only one hand on my phone, I was able to unlock the device, click into WhatsApp, open the message, and respond with only my voice. Before last year’s voice model update, that last part wouldn’t have been possible. With a mask over my face, words come out muffled and a tad garbled and I can guarantee you older speech recognition would have failed me. Yesterday, it didn’t fail once.
Instead, even though my speech was impeded, Gboard was able to transcribe my language time and time again with accuracy and speed. After doing this several times, it hit me that this feature almost feels custom-made for the pandemic we’re in right now. Without it, I’d have both hands on my phone in public where I need to keep my handling of any personal devices down to a minimum. Because of this amazing speech recognition tech, many times my replies could happen with only a couple touches to my device, and I was just blown away by how quick and accurate it all was even with a mask on my face.
I know Google didn’t design this for a mask-wearing public in the middle of a pandemic, but I’m sure glad the tech is here and available in a situation where the majority of people will see the benefit of it and be able to take advantage.
Leave a Reply
You must be logged in to post a comment.