2 Sources
[1]
You Can Now Try Out Gemini 2.5's Native Audio Dialog Generation
TTS in Gemini 2.5 Flash allows multi-speaker dialogue generation Google introduced new audio generation capabilities with the Gemini 2.5 models at the Google I/O 2025. The Mountain View-based tech giant is now letting developers and individuals test these features on its platform. The two new capabilities include native audio dialog and controllable text-to-speech (TTS) with Gemini 2.5 Flash preview. While the former can natively generate human-like audio while responding to user prompts, the latter can convert any script into conversational speech. These features are currently not available to developers via application programming interfaces (APIs). In a blog post, the tech giant detailed the features of these two audio generation modes, highlighting how developers can use them to build new experiences for people. Currently, native audio dialog can be tried out in Google AI Studio's stream tab, whereas the TTS feature can be tested in the generate media tab within AI Studio. Native audio dialog with Gemini 2.5 Flash preview is designed for real-time conversations between a human user and the AI. The user can either type a prompt or speak it, and the AI responds verbally. This process directly generates audio, instead of first generating text and then converting it into speech. There are several advantages to that as well. It supports affective dialog, which means when Gemini 2.5 Flash responds to the user's tone of voice, it can recognise the emotion behind the said words. It can understand when the user sounds scared, angry, or surprised and respond accordingly. Apart from this, the audio generation feature can express emotions when speaking, adopt different accents and linguistic styles, can access tools such as Google Search, and supports more than 24 languages. Coming to the controllable TTS feature, it offers multi-speaker dialogue generation, can produce emotions and accents while narrating the script, control delivery speed and emphasise pronunciation, and supports the same 24 languages and language mixing. Google says these capabilities were assessed for potential risks across the development process. The company used both internal mechanisms as well as red teaming to find and fix any vulnerabilities. The company also highlighted that all audio outputs from these models are embedded with SynthID, its watermarking technology.
[2]
Google expands Gemini 2.5 with native voice and TTS tools
At its I/O event, Google unveiled Gemini 2.5, an AI model with cutting-edge audio dialogue and generation capabilities. These enhancements aim to deliver seamless voice interactions across various products and languages globally. Google has integrated Gemini 2.5 into applications like NotebookLM's Audio Overviews and Project Astra. The model prioritizes real-time audio conversations, enabling AI to interpret and produce speech with natural tone, style, and contextual awareness. Gemini 2.5 offers advanced control over audio generation, allowing users to tailor speech output with precision: Google provides two Gemini 2.5 configurations for audio development: These configurations facilitate audio creation for applications such as podcasts, video games, and public announcements. Google conducted comprehensive risk evaluations during the development of Gemini 2.5's audio features. Safety measures were refined through internal and external testing, including red teaming. All AI-generated audio includes SynthID, Google's watermarking technology, to clearly identify AI-produced content. Google enables developers to utilize Gemini 2.5's audio capabilities via the Gemini API, accessible through Google AI Studio and Vertex AI environments.
Share
Copy Link
Google introduces native audio dialog and controllable text-to-speech features in Gemini 2.5, offering developers new tools for creating immersive AI-powered audio experiences.
Google has introduced groundbreaking audio generation features in its latest Gemini 2.5 model, showcased at the Google I/O 2025 event. These new capabilities, now available for testing by developers and individuals, mark a significant advancement in AI-powered audio interactions 1.
The native audio dialog feature in Gemini 2.5 Flash preview enables real-time conversations between users and AI. This innovative approach generates audio responses directly, bypassing the traditional text-to-speech conversion process. Key features include:
Source: NDTV Gadgets 360
Gemini 2.5's controllable TTS feature offers unprecedented control over audio output:
Google has prioritized safety and ethical considerations in developing these audio features:
The new audio capabilities of Gemini 2.5 have been integrated into various Google products:
While currently available for testing in Google AI Studio, these features are not yet accessible via APIs 1. However, Google plans to make Gemini 2.5's audio capabilities available through the Gemini API, accessible via Google AI Studio and Vertex AI environments 2.
This development opens up new possibilities for creating immersive AI-powered experiences across various domains, including podcasting, gaming, and public communications 2. As these technologies continue to evolve, they promise to revolutionize how we interact with AI systems and consume audio content.
Summarized by
Navi
[1]
OpenAI reports an increase in Chinese groups using ChatGPT for various covert operations, including social media manipulation, cyber operations, and influence campaigns. The company has disrupted multiple operations originating from China and other countries.
7 Sources
Technology
16 hrs ago
7 Sources
Technology
16 hrs ago
Palantir CEO Alex Karp emphasizes the dangers of AI and the critical nature of the US-China AI race, highlighting Palantir's role in advancing US interests in AI development.
3 Sources
Technology
16 hrs ago
3 Sources
Technology
16 hrs ago
Microsoft's stock reaches a new all-time high, driven by its strategic AI investments and strong market position in cloud computing and productivity software.
3 Sources
Business and Economy
16 hrs ago
3 Sources
Business and Economy
16 hrs ago
A UN report highlights a significant increase in indirect carbon emissions from major tech companies due to the energy demands of AI-powered data centers, raising concerns about the environmental impact of AI expansion.
3 Sources
Technology
16 hrs ago
3 Sources
Technology
16 hrs ago
WhatsApp is testing a new feature that allows users to create their own AI chatbots within the app, similar to OpenAI's Custom GPTs and Google Gemini's Gems.
2 Sources
Technology
1 day ago
2 Sources
Technology
1 day ago