2 Sources
[1]
You Can Now Try Out Gemini 2.5's Native Audio Dialog Generation
TTS in Gemini 2.5 Flash allows multi-speaker dialogue generation Google introduced new audio generation capabilities with the Gemini 2.5 models at the Google I/O 2025. The Mountain View-based tech giant is now letting developers and individuals test these features on its platform. The two new capabilities include native audio dialog and controllable text-to-speech (TTS) with Gemini 2.5 Flash preview. While the former can natively generate human-like audio while responding to user prompts, the latter can convert any script into conversational speech. These features are currently not available to developers via application programming interfaces (APIs). In a blog post, the tech giant detailed the features of these two audio generation modes, highlighting how developers can use them to build new experiences for people. Currently, native audio dialog can be tried out in Google AI Studio's stream tab, whereas the TTS feature can be tested in the generate media tab within AI Studio. Native audio dialog with Gemini 2.5 Flash preview is designed for real-time conversations between a human user and the AI. The user can either type a prompt or speak it, and the AI responds verbally. This process directly generates audio, instead of first generating text and then converting it into speech. There are several advantages to that as well. It supports affective dialog, which means when Gemini 2.5 Flash responds to the user's tone of voice, it can recognise the emotion behind the said words. It can understand when the user sounds scared, angry, or surprised and respond accordingly. Apart from this, the audio generation feature can express emotions when speaking, adopt different accents and linguistic styles, can access tools such as Google Search, and supports more than 24 languages. Coming to the controllable TTS feature, it offers multi-speaker dialogue generation, can produce emotions and accents while narrating the script, control delivery speed and emphasise pronunciation, and supports the same 24 languages and language mixing. Google says these capabilities were assessed for potential risks across the development process. The company used both internal mechanisms as well as red teaming to find and fix any vulnerabilities. The company also highlighted that all audio outputs from these models are embedded with SynthID, its watermarking technology.
[2]
Google expands Gemini 2.5 with native voice and TTS tools
At its I/O event, Google unveiled Gemini 2.5, an AI model with cutting-edge audio dialogue and generation capabilities. These enhancements aim to deliver seamless voice interactions across various products and languages globally. Google has integrated Gemini 2.5 into applications like NotebookLM's Audio Overviews and Project Astra. The model prioritizes real-time audio conversations, enabling AI to interpret and produce speech with natural tone, style, and contextual awareness. Gemini 2.5 offers advanced control over audio generation, allowing users to tailor speech output with precision: Google provides two Gemini 2.5 configurations for audio development: These configurations facilitate audio creation for applications such as podcasts, video games, and public announcements. Google conducted comprehensive risk evaluations during the development of Gemini 2.5's audio features. Safety measures were refined through internal and external testing, including red teaming. All AI-generated audio includes SynthID, Google's watermarking technology, to clearly identify AI-produced content. Google enables developers to utilize Gemini 2.5's audio capabilities via the Gemini API, accessible through Google AI Studio and Vertex AI environments.
Share
Copy Link
Google introduces native audio dialog and controllable text-to-speech features in Gemini 2.5, offering developers new tools for creating immersive AI-powered audio experiences.
Google has introduced groundbreaking audio generation features in its latest Gemini 2.5 model, showcased at the Google I/O 2025 event. These new capabilities, now available for testing by developers and individuals, mark a significant advancement in AI-powered audio interactions 1.
The native audio dialog feature in Gemini 2.5 Flash preview enables real-time conversations between users and AI. This innovative approach generates audio responses directly, bypassing the traditional text-to-speech conversion process. Key features include:
Source: NDTV Gadgets 360
Gemini 2.5's controllable TTS feature offers unprecedented control over audio output:
Google has prioritized safety and ethical considerations in developing these audio features:
The new audio capabilities of Gemini 2.5 have been integrated into various Google products:
While currently available for testing in Google AI Studio, these features are not yet accessible via APIs 1. However, Google plans to make Gemini 2.5's audio capabilities available through the Gemini API, accessible via Google AI Studio and Vertex AI environments 2.
This development opens up new possibilities for creating immersive AI-powered experiences across various domains, including podcasting, gaming, and public communications 2. As these technologies continue to evolve, they promise to revolutionize how we interact with AI systems and consume audio content.
Summarized by
Navi
[1]
Google is providing free users of its Gemini app temporary access to the Veo 3 AI video generation tool, typically reserved for paying subscribers, for a limited time this weekend.
3 Sources
Technology
18 hrs ago
3 Sources
Technology
18 hrs ago
The UK's technology secretary and OpenAI's CEO discussed a potential multibillion-pound deal to provide ChatGPT Plus access to all UK residents, highlighting the government's growing interest in AI technology.
2 Sources
Technology
2 hrs ago
2 Sources
Technology
2 hrs ago
Multiple news outlets, including Wired and Business Insider, have been duped by AI-generated articles submitted under a fake freelancer's name, raising concerns about the future of journalism in the age of artificial intelligence.
4 Sources
Technology
2 days ago
4 Sources
Technology
2 days ago
Google inadvertently revealed a new smart speaker during its Pixel event, sparking speculation about its features and capabilities. The device is expected to be powered by Gemini AI and could mark a significant upgrade in Google's smart home offerings.
5 Sources
Technology
1 day ago
5 Sources
Technology
1 day ago
As AI and new platforms transform search behavior, brands must adapt their strategies beyond traditional SEO to remain visible in an increasingly fragmented digital landscape.
2 Sources
Technology
1 day ago
2 Sources
Technology
1 day ago