Microsoft Enhances Copilot with Advanced AI-Powered Text-to-Speech Features

Reviewed byNidhi Govil

2 Sources

Share

Microsoft introduces new AI-powered text-to-speech capabilities in Copilot, offering three distinct modes for various applications. The feature, powered by the MAI-Voice-1 AI model, promises to revolutionize audio content creation.

News article

Microsoft Enhances Copilot with New AI-Powered Audio Generation Features

Microsoft has unveiled a significant upgrade to its Copilot AI assistant, introducing advanced text-to-speech capabilities that promise to revolutionize how users interact with written content. The new feature, part of Copilot Audio Expressions, allows for the conversion of text into expressive, AI-generated voiceovers

1

2

.

The Power of MAI-Voice-1

At the heart of this innovation lies Microsoft's homegrown MAI-Voice-1 AI model, released in late August. This sophisticated model can generate a full minute of audio in under a second on a single GPU, showcasing the company's commitment to efficient and powerful AI solutions

2

.

Three Distinct Modes for Diverse Applications

Copilot Audio Expressions now offers three unique modes, each catering to different user needs:

  1. Scripted Mode: This newly introduced mode reads text verbatim, providing a clean and direct audio rendition without altering the content. It's ideal for formal announcements, document narration, and information presentation

    1

    2

    .

  2. Emotive Mode: Designed for dramatic and flashy delivery, this mode employs a wide range of intonation, pitch, and tone. It's particularly suitable for advertising, marketing, or informal narration

    2

    .

  3. Story Mode: The most versatile option, Story Mode incorporates multiple voices and characters, making it perfect for storytelling, podcast-like presentations, and analytical tasks

    1

    2

    .

Availability and Future Prospects

Currently, these new audio generation features are accessible through Copilot Labs, Microsoft's experimental platform for testing cutting-edge AI features. Users can try out the new capabilities by signing in with a personal account

2

.

While the feature is presently free to use, Microsoft has not specified any rate limits. The company is also exploring ways to support more languages beyond English, signaling potential expansions in the future

1

.

Implications for AI-Powered Audio Content Creation

This advancement in AI-generated audio represents a significant step forward in natural language processing and speech synthesis. By offering diverse modes for audio generation, Microsoft is catering to a wide range of use cases, from professional content creation to personal storytelling.

The ability to generate expressive and natural-sounding voice outputs quickly and efficiently could have far-reaching implications for industries such as media production, education, and accessibility services. As AI continues to evolve, we can expect even more sophisticated and nuanced audio generation capabilities in the future.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo