Google upgrades Gemini audio models to handle natural conversations and live translation

2 Sources

[1]

Gemini is getting even better at handling natural conversations

You can also set us as a preferred source in Google Search by clicking the button below. Today, Google announced that it's rolling out an update to Gemini 2.5 Flash Native Audio for live voice agents. For this update, Google focused on three key areas: In addition to these improvements, Josh Woodward, VP of Google Labs, Gemini, and AI Studio, shared two other enhancements. One improvement makes it so that Gemini Live will not cut you off mid-sentence if you pause for too long. The other allows you to mute your microphone while Gemini Live is talking, so you don't accidentally interrupt it.

[2]

Google

Improved Gemini audio models for powerful voice interactions

Earlier this week, we introduced greater control over audio generation with an upgrade to our Gemini 2.5 Pro and Flash Text-to-Speech models. But generating expressive speech is only one side of the conversation. Today, we're releasing an updated Gemini 2.5 Flash Native Audio for live voice agents. This update improves the model's ability to handle complex workflows, navigate user instructions, and hold natural conversations. Gemini 2.5 Flash Native Audio is now available across Google products including Google AI Studio, Vertex AI, and has also started rolling out in Gemini Live and Search Live, bringing the naturalness of native audio to Search Live for the first time. This means you can more effectively brainstorm live with Gemini, get real-time help in Search Live, or build the next generation of enterprise-ready customer service agents. Beyond powering helpful agents, native audio unlocks new possibilities for global communication. We're introducing live speech translation, a capability that enables streaming speech-to-speech translation for headphones. It preserves the speaker's intonation, pacing and pitch. This beta experience is rolling out in the Google Translate app starting today.

Twitter

Facebook

Copy Link

Google has rolled out major updates to Gemini 2.5 Flash Native Audio, improving how live voice agents handle complex workflows and natural conversations. The update introduces live speech translation that preserves speaker intonation and prevents Gemini Live from cutting users off mid-sentence. These enhancements are now available across Google AI Studio, Vertex AI, and the Google Translate app.

Google Enhances Gemini 2.5 Flash Native Audio for Live Voice Agents

Google has released a significant update to Gemini 2.5 Flash Native Audio, focusing on transforming how live voice agents interact with users 2

. The update targets three core areas: improved handling of complex workflows, better navigation of user instructions, and enhanced ability to hold natural conversations 1

. This matters for anyone building or using voice-based AI applications, as the improvements address long-standing frustrations with voice assistants that struggle to maintain conversational flow or understand nuanced requests.

Source: Android Authority

The enhanced Gemini audio models are now rolling out across multiple Google platforms including Google AI Studio, Vertex AI, and Gemini Live 2

. Notably, the native audio capabilities are arriving in Search Live for the first time, enabling users to brainstorm live with Gemini or get real-time help through voice interactions 2

. For enterprises, this update enables the development of more sophisticated customer service agents that can handle complex queries without breaking conversational context.

Smarter Conversation Flow Prevents Mid-Sentence Interruptions

Josh Woodward, VP of Google Labs, Gemini, and AI Studio, revealed two practical improvements that address common user pain points 1

. Gemini Live will no longer cut users off mid-sentence when they pause for too long—a frequent complaint with existing voice assistants that interpret natural pauses as the end of a statement. Additionally, users can now mute their microphone while Gemini Live is speaking to avoid accidentally interrupting the AI's response 1

These refinements signal Google's focus on mimicking human-to-human dialogue patterns, where pauses for thought are natural and speakers take turns without abrupt interruptions. The short-term impact means smoother interactions for users conducting research, brainstorming sessions, or troubleshooting problems through voice. Long-term, these improvements lay groundwork for voice interfaces that feel less transactional and more collaborative.

Live Speech Translation Preserves Speaker Intonation and Pacing

Beyond conversational improvements, Google introduced live speech translation as a new capability powered by native audio technology 2

. This feature enables streaming speech-to-speech translation for headphones while preserving the speaker's intonation, pacing, and pitch 2

. The beta experience is rolling out in the Google Translate app starting today 2

Preserving vocal characteristics during translation represents a technical leap, as most translation systems strip away emotional context and speaking style. This matters for global communication scenarios where tone and delivery carry as much meaning as the words themselves. Watch for this technology to expand beyond headphones into video conferencing, customer support, and accessibility applications where maintaining speaker identity enhances understanding and trust.

Gemini 2.5 Pro Receives Text-to-Speech Upgrades

Earlier this week, Google upgraded both Gemini 2.5 Pro and Flash Text-to-Speech models to provide greater control over audio generation 2

. While generating expressive speech represents one side of voice interactions, the newly updated Gemini 2.5 Flash Native Audio completes the conversational loop by improving listening and response capabilities 2

. Together, these updates position Gemini as a more complete voice interaction platform that handles both input and output with greater nuance.

The combination of improved speech generation and comprehension creates opportunities for developers building enterprise-ready applications through Vertex AI or experimenting with prototypes in Google AI Studio. As voice becomes a primary interface for AI interactions, the ability to maintain context across complex workflows while sounding natural will separate useful tools from frustrating ones.

References

Summarized by

Navi

[1]

Android Authority

Gemini is getting even better at handling natural conversations

[2]

Google

Improved Gemini audio models for powerful voice interactions

Recent Highlights

Today's Top Stories

Nvidia unveils new AI chip with Groq technology to accelerate inference computing for OpenAI

Nvidia is set to launch a new processor incorporating Groq's design at its GTC conference next month, aimed at speeding up inference computing for AI systems. OpenAI will be the biggest customer, committing to 3GW of dedicated inference capacity following a $20-billion licensing deal between Nvidia and Groq that ended OpenAI's talks with competing chip startups.

3 Sources

Technology

11 hrs ago

Living human neurons from Cortical Labs are now playing Doom on a $35,000 biological computer

Australian biotech startup Cortical Labs has achieved a milestone in biological computing by demonstrating its CL1 system running Doom using 200,000 living human neurons. The $35,000 commercial platform represents a leap from the company's 2022 Pong experiment, offering developers access through Cortical Cloud to explore what they call Synthetic Biological Intelligence.

2 Sources

Technology

7 hrs ago

Google Cloud API Keys Expose Gemini AI Access After Generative AI Rollout Transforms Security Risk

Security researchers discovered nearly 3,000 Google Cloud API keys embedded in public websites that now authenticate to Gemini AI endpoints without warning. The issue emerged when enabling the Generative AI rollout on existing projects retroactively granted sensitive permissions to keys once considered harmless billing tokens, exposing organizations to quota theft and unexpected AI billing.

2 Sources

Technology

19 hrs ago

Nvidia posts $68B revenue quarter as AI chips demand accelerates, forecasts $78B ahead

Nvidia reported its strongest financial quarter with $68 billion in revenue, driven by unprecedented demand for AI compute. CEO Jensen Huang announced an upbeat sales forecast of $78 billion for the next quarter, marking accelerating revenue growth. However, the company faces challenges with China export restrictions on H200 chips and mounting questions about the sustainability of Big Tech's massive AI spending.

47 Sources

Business and Economy

4 days ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

Google upgrades Gemini audio models to handle natural conversations and live translation

Google Enhances Gemini 2.5 Flash Native Audio for Live Voice Agents

Smarter Conversation Flow Prevents Mid-Sentence Interruptions

Live Speech Translation Preserves Speaker Intonation and Pacing

Gemini 2.5 Pro Receives Text-to-Speech Upgrades

References

Gemini is getting even better at handling natural conversations

Improved Gemini audio models for powerful voice interactions

Related Stories

Google Unveils Gemini 2.5 with Advanced Audio Generation Capabilities

Google Gemini Live gets biggest update ever with human-like conversational AI upgrades

Google's Gemini Live: Free Voice AI Assistant Now Available for Android Users

Recent Highlights

OpenAI secures $110 billion funding round from Amazon, Nvidia, and SoftBank at $730B valuation

Anthropic stands firm against Pentagon's demand for unrestricted military AI access

Pentagon Clashes With AI Firms Over Autonomous Weapons and Mass Surveillance Red Lines

Recent Highlights

Today's Top Stories

Nvidia unveils new AI chip with Groq technology to accelerate inference computing for OpenAI

Living human neurons from Cortical Labs are now playing Doom on a $35,000 biological computer

Google Cloud API Keys Expose Gemini AI Access After Generative AI Rollout Transforms Security Risk

Nvidia posts $68B revenue quarter as AI chips demand accelerates, forecasts $78B ahead