2 Sources
[1]
Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation
Google has been chasing real-time translation for years, which it says has been one of its "pioneering machine learning experiments." We've seen numerous demos on stage at Google events in the past, but you needed Google phones, earbuds, or some other specific setup. Last year, Google brought real-time translation to more users in the Translate app, and now it's expanding availability more. With the release of Gemini 3.5 Live Translate, you'll have access to instant translation in more places and with lower latency than ever before. The new AI model is part of the version 3.5 family that launched at I/O. Before today, Google had only rolled out the Flash version, but we're expecting a Pro model to drop in the coming weeks. Gemini 3.5 Live Translate is a speech-to-speech model tuned to automatically detect and translate in more than 70 languages. Google says Gemini 3.5 Live Translate is fast enough to keep up with a normal conversation, following just a few seconds behind the speaker while also matching intonation, pacing, and pitch. In short, the voice sounds more like you than a generic robot. The demos, which are all being recorded under controlled conditions, do sound impressive. You won't have to wait long to verify the model's abilities for yourself, though. Gemini 3.5 Live Translate is rolling out across several parts of the Google ecosystem. Developers can begin building with a public preview in the Gemini Live API or AI Studio. The model processes speech continuously and handles all the multilingual inputs automatically, saving developers from manually configuring settings. It also filters out background noise in busy environments. Select enterprise customers will also get access to the new translation model in Google Meet starting this month in advance of a wider rollout. Google says it's tweaking the Meet interface to bring the live translate feature to the front, too. Most notably, 3.5 Live Translate will come to the Google Translate app on both Android and iOS soon. At the tail end of last year, Google began testing Gemini-based live translation in the app with any earbuds (and in the iOS app); previously, you needed to have the company's Pixel Buds with an Android phone. The pending update will expand further with the addition of the latest 3.5 model. Not only can you use any earbuds, you don't need earbuds at all. If you don't have any handy, you can hold the phone up to your ear like you're on a call to hear the spoken translation. However, this "listening mode" only works on Android at this time. The audio streams from Gemini 3.5 Live Translate are intended to sound lifelike even if they don't exactly mimic the user's voice. However, Google is still proceeding cautiously. All Gemini 3.5 Live Translate audio streams will have SynthID watermarks integrated into the waveform data. This will mark the speech as AI-generated, and there is (currently) no way to remove that.
[2]
Google's Gemini 3.5 Live Translate enables realistic real-time translation at the speed of natural conversations
Google's Gemini 3.5 Live Translate enables realistic real-time translation at the speed of natural conversations Google LLC's newest artificial intelligence tool promises to bring real-time translation to every smartphone user, enabling more natural and fluid conversations between speakers of different languages. That's according to a new blog post which announced the arrival of Gemini 3.5 Live Translate, which explained that it's the company's most advanced audio model for speech-to-speech translation released to date. Whereas traditional translation tools have always been cumbersome because of the way speech is processed and then translated in turns, Gemini 3.5 Live Translate is much speedier. According to Google, it's able to listen continuously as someone is talking, translate what they're saying and then speak to the other person in their own language. What this means is that non-multilinguals will be able to engage in almost-natural conversations, with only a couple of seconds of delay - similar perhaps, to long-distance telephone calls back in the days of rotary telephones. Google Product Manager Anuda Weerasinghe and Senior Staff Software Engineer Tony Lu said in the co-authored blog post that Gemini 3.5 Live Translate is able to detect which language a person is speaking automatically, so there's no need to set anything up first. It supports more than 70 languages at launch, and that means it can support "thousands" of different language pairings. The company is making it available to developers and enterprises, so we can expect the capability to be integrated with third-party communication platforms in the near future. Of course, it's also being rolled out to everyone directly in the Google Translate application. This isn't Google's first attempt at real-time translation, but earlier efforts have always required specific hardware such as the company's own smartphones and earbuds. Gemini 3.5 Live Translate is different in that it can work on any smartphone. It's also based on a new architecture that changes how the translation process works. It relies on "continuous stream translation" which means that it doesn't have to wait until one person has finished speaking before it starts generating a response. It results in much more fluid translated conversations, as the video below demonstrates: Weerasinghe and Lu said Gemini 3.5 Live Translate is designed for the realities of the real world, meaning it can perform well in noisy environments and handle overlapping voices and informal speech. This means it's suitable for more practical use cases, including customer support calls, classrooms, guided tours, ride-sharing services, live broadcasts and so on, they said. They also emphasized the quality of the model's voices. Rather than the robotic, synthetic voices found on the standard Google Translate app, it tries to preserve the speaker's authenticity by matching their pacing, intonation and emotional tone. As such, the translated speech sounds a lot more natural, enhancing the flow of the conversation. Google's long-term goal with Gemini 3.5 Live Translate is to change the world by enabling people to converse naturally with anyone in the world, regardless of the languages they speak. By the sounds of it, it has a lot of potential to make life easier for travelers and anyone trying to do business with foreign entities.
Share
Copy Link
Google unveiled Gemini 3.5 Live Translate, its most advanced AI translation model that enables real-time speech-to-speech conversations across more than 70 languages. The model matches speaker intonation and pacing while maintaining just seconds of delay, making it suitable for natural conversations without requiring specific hardware like previous Google translation tools.
Google has launched Gemini 3.5 Live Translate, marking a significant shift in how the company delivers AI translation capabilities to users worldwide. Unlike previous attempts at real-time translation that required Google phones, earbuds, or other specific hardware setups, this new model works on any smartphone running the Google Translate app on both Android and iOS. The speech-to-speech model automatically detects and translates conversations across over 70 languages, enabling thousands of different language pairings without manual configuration
2
.
Source: SiliconANGLE
The technology behind Gemini 3.5 Live Translate relies on continuous stream translation, a new architecture that fundamentally changes how the translation process works
2
. Rather than waiting until one person finishes speaking before generating a response, the model listens continuously and translates in real time, following just a few seconds behind the speaker. This approach delivers lower-latency translations that make instant voice-to-voice translation feel almost as natural as long-distance telephone calls. Google's model also matches intonation and pacing while preserving emotional tone, ensuring the translated speech sounds authentic rather than robotic2
.Gemini 3.5 Live Translate is rolling out across several parts of the Google ecosystem this month. Developers can start building with a public preview available through the Gemini Live API or AI Studio, where the model processes speech continuously and filters out background noise in busy environments. Select enterprise customers will gain access to the translation model in Google Meet starting this month, with Google tweaking the interface to bring the live translate feature to the front. The model is designed to handle real-world conditions including noisy environments, overlapping voices, and informal speech, making it suitable for customer support calls, classrooms, guided tours, ride-sharing services, and live broadcasts
2
.Related Stories
The pending update to the Google Translate app introduces a "listening mode" feature that eliminates the need for earbuds entirely. Users can simply hold their phone up to their ear like a regular call to hear spoken translations, though this functionality currently works only on Android devices. Despite the lifelike quality of the audio streams, Google is proceeding with caution by integrating SynthID watermarks into all waveform data generated by Gemini 3.5 Live Translate. These watermarks mark the speech as AI-generated and cannot currently be removed, addressing concerns about synthetic voice misuse while maintaining transparency.
The availability of real-time speech-to-speech translation through standard smartphones has significant implications for international businesses and travelers who previously faced language barriers
2
. As third-party communication platforms integrate this capability through the available APIs, we can expect to see the technology embedded in customer service systems, video conferencing tools, and collaboration platforms. The model's ability to automatically detect languages without setup requirements removes friction from cross-border interactions, potentially accelerating business negotiations and customer support resolution times. Google's long-term vision centers on enabling people to converse naturally with anyone worldwide regardless of language differences, a goal that appears increasingly achievable as the technology matures and expands to more use cases.Summarized by
Navi
27 Aug 2025•Technology

12 Dec 2025•Technology

27 Mar 2026•Technology

1
Technology

2
Policy and Regulation

3
Health
