If you’ve not seen Live Captions in action on a Pixel phone, you are missing out on some really cool tech in action. Essentially, Live Captions allow your device to add real-time captions to any video content playing on your phone, regardless of the service and regardless of the choice to add or omit those captions by the content creator. Seeing captions unfold on your screen in real time is pretty spectacular and makes for a better experience consuming video content when in a loud, bustling environment like a subway or airplane.
The accessibility improvements can’t be understated, either, as this feature gives those with hearing disabilities instant access to follow along with literally any video content available to watch on the phone. I’d say this new feature is one of the most underrated things Google demoed when they announced the Pixel 4 and 4 XL. Living in a smaller town, I don’t have reason to use this feature all that often due to noise issues, but I can confirm that the few times I’ve tried it out, it has worked incredibly well.
Videos, however, are not only consumed on our portable devices. As a matter of fact, I’d wager many people switch over to a larger screen for video watching when possible, and that is where today’s news will have a large impact. Hinted at over in the Chromium Repositories via this commit, it seems the Chrome team is hard at work to implement Live Captions on Chrome. This would likely mean Live Captions will come via the Chrome browser to Chromebooks, Windows PCs, and Macbooks as well. The commit is for SODA (speech on-device API), but in the end of the commit message, we see what this is being enabled for:
This CL creates a sandboxed service that hosts the Speech On-Device API (SODA). It contains the components required to launch the service from the renderer process, but the implementation of the service itself is stubbed out. The design document for the feature is located at: go/chrome-live-captions.
Speech recognition on-device is obviously crucial to Live Captions working. Watching the feature do its thing on a Pixel phone, you can’t help but be reminded of what speech recognition looks like as you talk to your assistant or respond to a text via voice input. The words on screen lag just a touch behind them actually being said, allowing for the vocal processing to happen in near real time while edits to the sentence structure are ironed out as the phrase is created on screen. It looks exactly like a sentence being virtually typed out in your favorite messaging app if you choose to input your message via voice. Oh, and all this is happening via on-device speech recognition.
The fact that this will soon be part of Chrome is an awesome addition we’ll be keeping a close eye on. Since Live Captions are literally listening to the audio from the video you are watching and transcribing them just as it would with your own voice, it means that most videos will be caption-ready without any additional steps if the audio is clear. It’s a big win for accessibility and a big win for Chrome’s future feature development as well. We’ll be watching out for this to hit the Stable channel, but don’t expect it for a few updates.