2 Sources
2 Sources
[1]
Gemini is getting even better at handling natural conversations
You can also set us as a preferred source in Google Search by clicking the button below. Today, Google announced that it's rolling out an update to Gemini 2.5 Flash Native Audio for live voice agents. For this update, Google focused on three key areas: In addition to these improvements, Josh Woodward, VP of Google Labs, Gemini, and AI Studio, shared two other enhancements. One improvement makes it so that Gemini Live will not cut you off mid-sentence if you pause for too long. The other allows you to mute your microphone while Gemini Live is talking, so you don't accidentally interrupt it.
[2]
Improved Gemini audio models for powerful voice interactions
Earlier this week, we introduced greater control over audio generation with an upgrade to our Gemini 2.5 Pro and Flash Text-to-Speech models. But generating expressive speech is only one side of the conversation. Today, we're releasing an updated Gemini 2.5 Flash Native Audio for live voice agents. This update improves the model's ability to handle complex workflows, navigate user instructions, and hold natural conversations. Gemini 2.5 Flash Native Audio is now available across Google products including Google AI Studio, Vertex AI, and has also started rolling out in Gemini Live and Search Live, bringing the naturalness of native audio to Search Live for the first time. This means you can more effectively brainstorm live with Gemini, get real-time help in Search Live, or build the next generation of enterprise-ready customer service agents. Beyond powering helpful agents, native audio unlocks new possibilities for global communication. We're introducing live speech translation, a capability that enables streaming speech-to-speech translation for headphones. It preserves the speaker's intonation, pacing and pitch. This beta experience is rolling out in the Google Translate app starting today.
Share
Share
Copy Link
Google has rolled out major updates to Gemini 2.5 Flash Native Audio, improving how live voice agents handle complex workflows and natural conversations. The update introduces live speech translation that preserves speaker intonation and prevents Gemini Live from cutting users off mid-sentence. These enhancements are now available across Google AI Studio, Vertex AI, and the Google Translate app.
Google has released a significant update to Gemini 2.5 Flash Native Audio, focusing on transforming how live voice agents interact with users
2
. The update targets three core areas: improved handling of complex workflows, better navigation of user instructions, and enhanced ability to hold natural conversations1
. This matters for anyone building or using voice-based AI applications, as the improvements address long-standing frustrations with voice assistants that struggle to maintain conversational flow or understand nuanced requests.
Source: Android Authority
The enhanced Gemini audio models are now rolling out across multiple Google platforms including Google AI Studio, Vertex AI, and Gemini Live
2
. Notably, the native audio capabilities are arriving in Search Live for the first time, enabling users to brainstorm live with Gemini or get real-time help through voice interactions2
. For enterprises, this update enables the development of more sophisticated customer service agents that can handle complex queries without breaking conversational context.Josh Woodward, VP of Google Labs, Gemini, and AI Studio, revealed two practical improvements that address common user pain points
1
. Gemini Live will no longer cut users off mid-sentence when they pause for too long—a frequent complaint with existing voice assistants that interpret natural pauses as the end of a statement. Additionally, users can now mute their microphone while Gemini Live is speaking to avoid accidentally interrupting the AI's response1
.These refinements signal Google's focus on mimicking human-to-human dialogue patterns, where pauses for thought are natural and speakers take turns without abrupt interruptions. The short-term impact means smoother interactions for users conducting research, brainstorming sessions, or troubleshooting problems through voice. Long-term, these improvements lay groundwork for voice interfaces that feel less transactional and more collaborative.
Beyond conversational improvements, Google introduced live speech translation as a new capability powered by native audio technology
2
. This feature enables streaming speech-to-speech translation for headphones while preserving the speaker's intonation, pacing, and pitch2
. The beta experience is rolling out in the Google Translate app starting today2
.Preserving vocal characteristics during translation represents a technical leap, as most translation systems strip away emotional context and speaking style. This matters for global communication scenarios where tone and delivery carry as much meaning as the words themselves. Watch for this technology to expand beyond headphones into video conferencing, customer support, and accessibility applications where maintaining speaker identity enhances understanding and trust.
Related Stories
Earlier this week, Google upgraded both Gemini 2.5 Pro and Flash Text-to-Speech models to provide greater control over audio generation
2
. While generating expressive speech represents one side of voice interactions, the newly updated Gemini 2.5 Flash Native Audio completes the conversational loop by improving listening and response capabilities2
. Together, these updates position Gemini as a more complete voice interaction platform that handles both input and output with greater nuance.The combination of improved speech generation and comprehension creates opportunities for developers building enterprise-ready applications through Vertex AI or experimenting with prototypes in Google AI Studio. As voice becomes a primary interface for AI interactions, the ability to maintain context across complex workflows while sounding natural will separate useful tools from frustrating ones.
Summarized by
Navi
[1]
1
Science and Research

2
Policy and Regulation

3
Technology
