2 Sources
2 Sources
[1]
Microsoft's Copilot AI text-to-speech gets new, cleaner 'scripted mode'
Copilot Audio Expressions can now read text aloud in a more direct fashion without altering its content. Microsoft's AI Manager Mustafa Suleyman recently unveiled in a social media post a new feature called "Scripted Mode" in Copilot Labs for turning written scripts into text-to-speech. Copilot Labs is an experimental platform where you can try out Microsoft's newest AI features that are still in development. Specifically, the feature is part of Copilot Audio Expressions, which is the actual tool that turns text into spoken audio. Previously, Copilot Audio Expressions only had two modes: Story Mode (which weaves together multiple vocal styles and characters for a storytelling experience) and Emotive Mode (which uses a single, distinct voice that matches a particular mood with some improvisation). With Scripted Mode, Copilot Audio Expressions can now take text and read it aloud with a fast and direct take, making sure to read the content verbatim without any riffing or creative changes. You can still select which voice and style to use for the reading. Copilot Audio Expressions is still only available in English, but Microsoft is exploring ways to support more languages.
[2]
Microsoft's Copilot Can Now Turn Your Scripts Into Expressive Voiceovers
Microsoft is adding another new artificial intelligence (AI) feature to Copilot, giving it the ability to natively generate audio. On Wednesday, the Redmond-based tech giant announced that Copilot is getting a new audio generation feature where users will be able to hand it a script and it will convert it into an AI voiceover in different styles. Since it is native voice generation, none of the modes will sound like typical text-to-speech models. Notably, the company is powering this capability via the homegrown MAI-Voice-1 AI model. Microsoft's Copilot Can Now Read Aloud Your Scripts In a post on X (formerly known as Twitter), Mustafa Suleyman, CEO of Microsoft AI, announced the release of Copilot's new audio generation modes. He highlighted that these are powered by the MAI-Voice-1 AI model, which was released at the end of August. Currently, this experience is only available via Copilot Labs when signing in using a personal account. There are three modes to try out. First is the Scripted mode, where the AI chatbot reads out the input verbatim, without adding any unnecessary flair or style. These are best used for tasks such as formal announcements, document narration, and information presentation. The second mode is dubbed Emotive. Suleyman says it is more focused on making the input sound dramatic and flashy. The voice here will include a wide range of intonation, pitch, and tone to deliver a performative piece. This is ideal for advertising, marketing, or informal narration. Copilot's final audio generation mode is Story. This is the most versatile format, which includes multiple voices and characters. The company says this mode is ideal for storytelling, podcast-like presentations, and analysis-related tasks. The feature is currently free to use, although Microsoft has not mentioned any rate limits. It is unclear when the feature will be released into the Copilot mobile and desktop apps. Notably, at the time of release, Microsoft said the MAI-Voice-1 is a speech generation model that natively generates expressive and natural-sounding voice. It can generate a full minute of audio in under a second on a single GPU. The tech giant trained the model on around 15,000 Nvidia GPUs.
Share
Share
Copy Link
Microsoft introduces new AI-powered text-to-speech capabilities in Copilot, offering three distinct modes for various applications. The feature, powered by the MAI-Voice-1 AI model, promises to revolutionize audio content creation.

Microsoft has unveiled a significant upgrade to its Copilot AI assistant, introducing advanced text-to-speech capabilities that promise to revolutionize how users interact with written content. The new feature, part of Copilot Audio Expressions, allows for the conversion of text into expressive, AI-generated voiceovers
1
2
.At the heart of this innovation lies Microsoft's homegrown MAI-Voice-1 AI model, released in late August. This sophisticated model can generate a full minute of audio in under a second on a single GPU, showcasing the company's commitment to efficient and powerful AI solutions
2
.Copilot Audio Expressions now offers three unique modes, each catering to different user needs:
Scripted Mode: This newly introduced mode reads text verbatim, providing a clean and direct audio rendition without altering the content. It's ideal for formal announcements, document narration, and information presentation
1
2
.Emotive Mode: Designed for dramatic and flashy delivery, this mode employs a wide range of intonation, pitch, and tone. It's particularly suitable for advertising, marketing, or informal narration
2
.Story Mode: The most versatile option, Story Mode incorporates multiple voices and characters, making it perfect for storytelling, podcast-like presentations, and analytical tasks
1
2
.Related Stories
Currently, these new audio generation features are accessible through Copilot Labs, Microsoft's experimental platform for testing cutting-edge AI features. Users can try out the new capabilities by signing in with a personal account
2
.While the feature is presently free to use, Microsoft has not specified any rate limits. The company is also exploring ways to support more languages beyond English, signaling potential expansions in the future
1
.This advancement in AI-generated audio represents a significant step forward in natural language processing and speech synthesis. By offering diverse modes for audio generation, Microsoft is catering to a wide range of use cases, from professional content creation to personal storytelling.
The ability to generate expressive and natural-sounding voice outputs quickly and efficiently could have far-reaching implications for industries such as media production, education, and accessibility services. As AI continues to evolve, we can expect even more sophisticated and nuanced audio generation capabilities in the future.
Summarized by
Navi
[2]
30 Sept 2024•Technology

14 Feb 2025•Technology

02 Oct 2024
