Curated by THEOUTPOST
On Tue, 15 Apr, 4:06 PM UTC
2 Sources
[1]
Deepgram's New Text-to-Speech AI Model Outperforms ElevenLabs and Open AI | AIM Media House
Deepgram's Aura-2 could be a wild card entry for enterprise use cases. Deepgram, a voice AI platform, on Tuesday launched Aura-2, its next-generation text-to-speech (TTS) model. The company calls it the world's most professional and cost-effective enterprise-grade TTS solution. In blind tests by users specifically for conversational enterprise applications, the model outperformed leading competitors like ElevenLabs, Cartesia, and OpenAI. Aura-2 is built on top of Deepgram Enterprise Runtime (DER), a custom infrastructure layer for its speech models. It aims to provide domain-specific pronunciation, professional voice quality, and context-aware delivery with the speech generated. With this, developers can enhance real-time enterprise interactions across various use cases, including customer service, virtual agents, and AI-powered assistants. Aura-2 can be deployed via cloud or on-premises APIs. Moreover, new users will receive $200 in free credits to try the model's capabilities on the official website. The company explains a significant gap in enterprise-optimised voice AI, which requires a natural-sounding voice and domain-specific pronunciation. Deepgram's Aura-2 attempts to bridge this gap for business-critical environments. "In head-to-head comparisons across enterprise scenarios, Deepgram came out on top nearly 60% of the time," the company stated. As per the chart shared, Aura-2 was preferred by users 61.8% compared to 38.2% for ElevenLabs. Similarly, a preference of 52% can be seen in comparison to 48% for OpenAI. When asked about the model's different use cases, Natalie Rutgers, VP of product for Deepgram, told AIM: "While people can use Aura-2 for podcasts and other entertainment use cases, that isn't our focus with this offering. Our customers care about having real-time voices that represent the people you'd hear at your appointments, your pharmacy, and your customer service lines." Rutgers also mentioned that the model supports English voices, including British and Australian accents, with multilingual support underway. Deepgram's Aura-2 is also optimised for real-time performance. It claims to deliver fast response times, with a sub-150ms time-to-first-byte. The model claims to offer the lowest pricing compared to ElevenLabs Flash and Cartesia Sonic. Deepgram explains, "At $0.030 per 1,000 characters, it offers substantial savings compared to alternatives like Elevenlabs Turbo ($0.050) and Cartesia Sonic ($0.038)." The company states that usage-based pricing eliminates quality/cost tradeoffs, enabling uniform voice experiences at every touchpoint while maintaining performance and managing costs.
[2]
Deepgram's Aura-2 is a high-performance text-to-speech engine built for business interactions - SiliconANGLE
The speech recognition-focused startup Deepgram Inc. today launched a new text-to-speech model called Aura-2, saying it will be a game-changer for real-time voice applications. According to the startup, Aura-2 is built for clarity, consistency and low-latency performance, enabling much smoother, more fluid and natural conversations between humans and artificial intelligence-powered chatbots. It can be used in almost any kind of voice application, including customer support agents, AI-powered assistants and more. Deepgram is known for its highly sophisticated speech recognition engine, which enables humanlike conversations between human and machine. Its software has been widely praised for its responsiveness and its realistic nature, waiting for the most appropriate moment to break into the conversation, as well as its "interruptible" nature, which means humans can interject and it will immediately pause whatever it's saying and reconfigure its response. It argues that most existing text-to-speech engines are actually more focused on entertainment, optimized for character voices, storytelling and emotionally expressive delivery, which means they don't always meet the demands of enterprise-grade voice systems. The customer service industry in particular needs more natural-sounding voices that support domain-specific pronunciation, with a professional tone and consistent contextual handling, while being cost-effective and secure. According to Deepgram, that's exactly what Aura-2 provides, delivering quality, context-aware speech and conversational experiences for any enterprise environment. In a blog post, Deepgram points to a number of capabilities that set Aura-2 apart from other models, the main one being its excellence in terms of domain-specific pronunciation. By this, it means it's designed to converse using highly specific terminology in various industries, fully literate with financial jargon, for example, or product names in the chemical industries. This eliminates the need for customers to fine-tune LLMs with extensive pronunciation dictionaries to provide clear communication when speaking about niche topics. Aura-2 supports more than 40 distinct voices in English, including numerous regional U.S. accents and those from other English-speaking countries. Moreover, each accent will employ "business-appropriate speech" that purposely avoids using the overly theatrical tones that are too common with entertainment-focused text-to-speech engines. Customers can choose from various voice personas, ranging from empathetic and charismatic to calm and professional, to ensure their voice apps align with their brand identity. Deepgram says Aura-2 also excels in terms of its interruption handling, context awareness and "end-of-thought detection," enabling more fluid conversations, even when the human speaker interrupts the AI. The startup says it can intelligently adjust different aspects of its voice, such as pacing, pauses, tone and expression, based on the context of whatever it's discussing, resulting in smoother, more coherent speech overall. Deepgram Chief Executive Scott Stephenson said the AI chatbot industry has evolved to the point where enterprises don't just require voices that sound real, but can reliably communicate with humanlike precision in professional contexts. "Aura-2 delivers the perfect balance of natural speech and enterprise-grade accuracy, enabling organizations to create voice experiences that truly enhance customer engagement while maintaining operational efficiency," he promised. The company showed off this graphic, which shows that humans strongly prefer interacting with Aura-2 over various other text-to-speech models, including the most advanced models from OpenAI and Microsoft Corp. Aura-2's responsiveness is another major strength. The company boasts of sub-200 milliseconds responses, as well as impressive scalability, handling thousands of concurrent requests to support high-volume deployments in call centers and virtual assistant scenarios. Customers have the option to deploy Aura-2 on-premises or in virtual private cloud environments to ensure full control over their data, which also has the impact of further reducing latency. Deepgram highlighted Aura-2's competitiveness in terms of cost, too. It said it's priced at just three cents per 1,000 characters, making it significantly cheaper than other business-focused models such as Elevenlabs Turbo and Cartesia Sonic, which cost five cents and 3.8 cents, respectively. It charges the same rate for all 40-plus voices offered, with a tiered pricing structure for higher-volume deployments. Lastly, Deepgram explained that Aura-2 is powered by a customized infrastructure layer called Deepgram Enterprise Runtime. This supports additional features such as automated model adaptation, enabling it to learn on the job and improve its performance over time, and model "hot-swapping," where customers can instantly switch among the underlying large language models that power their chat applications, without downtime. Deepgram is inviting customers to get started with Aura-2 now via its interactive playground. New signups will get a generous $200 worth of free credits, enough to generate around 220 hours of speech, providing ample opportunity to experiment and see how it performs in various voice application scenarios.
Share
Share
Copy Link
Deepgram launches Aura-2, a new text-to-speech AI model designed for enterprise use, outperforming competitors in blind tests and offering cost-effective, high-quality voice solutions for business applications.
Deepgram, a leading voice AI platform, has launched Aura-2, its next-generation text-to-speech (TTS) model, positioning it as the world's most professional and cost-effective enterprise-grade TTS solution 1. This new model aims to bridge the significant gap in enterprise-optimized voice AI, which requires natural-sounding voices and domain-specific pronunciation capabilities.
In blind tests conducted specifically for conversational enterprise applications, Aura-2 outperformed leading competitors such as ElevenLabs, Cartesia, and OpenAI 1. The model was preferred by users 61.8% of the time compared to 38.2% for ElevenLabs, and 52% compared to 48% for OpenAI 1. These results highlight Aura-2's potential as a game-changer for real-time voice applications in business environments.
Aura-2 boasts several features that set it apart from other TTS models:
Domain-specific pronunciation: The model is designed to converse using highly specific terminology in various industries, eliminating the need for extensive pronunciation dictionaries 2.
Professional voice quality: Aura-2 supports over 40 distinct voices in English, including regional U.S. accents and those from other English-speaking countries, all employing "business-appropriate speech" 2.
Context-aware delivery: The model can intelligently adjust aspects of its voice, such as pacing, pauses, tone, and expression, based on the context of the conversation 2.
Real-time performance: Aura-2 offers fast response times with a sub-150ms time-to-first-byte, enabling smooth and fluid conversations 12.
Deepgram's Aura-2 is specifically tailored for enterprise use cases, including:
Natalie Rutgers, VP of product for Deepgram, emphasized that while Aura-2 can be used for various purposes, its focus is on providing real-time voices for business-critical environments such as appointments, pharmacies, and customer service lines 1.
Aura-2 can be deployed via cloud or on-premises APIs, offering flexibility for businesses with different security and data control requirements 12. The model is priced competitively at $0.030 per 1,000 characters, making it more cost-effective than alternatives like Elevenlabs Turbo ($0.050) and Cartesia Sonic ($0.038) 1.
Aura-2 is built on top of Deepgram Enterprise Runtime (DER), a custom infrastructure layer that supports additional features such as automated model adaptation and model "hot-swapping" 2. This foundation allows for continuous improvement and flexibility in underlying large language models.
As the AI chatbot industry evolves, Deepgram CEO Scott Stephenson notes that enterprises now require voices that not only sound real but can also communicate with human-like precision in professional contexts 2. With its focus on clarity, consistency, and low-latency performance, Aura-2 is poised to significantly impact the landscape of AI-powered business interactions.
While currently supporting English voices, including British and Australian accents, Deepgram has indicated that multilingual support is underway 1. This expansion will likely further enhance Aura-2's appeal to global enterprises seeking advanced TTS solutions.
Reference
[1]
Analytics India Magazine
|Deepgram's New Text-to-Speech AI Model Outperforms ElevenLabs and Open AI | AIM Media HouseOpenAI introduces new AI models for speech-to-text and text-to-speech, offering improved accuracy, customization, and potential for building AI agents with voice capabilities.
7 Sources
7 Sources
Hume AI launches Octave, an innovative text-to-speech system powered by a large language model, capable of generating contextually aware and emotionally nuanced speech for various applications.
5 Sources
5 Sources
OpenAI has finally released its advanced voice feature for ChatGPT Plus and Team users, allowing for more natural conversations with the AI. The feature was initially paused due to concerns over potential misuse.
14 Sources
14 Sources
OpenAI announces significant cost reductions for its Realtime API and introduces new voice options, potentially revolutionizing AI-powered voice assistants and chatbots.
2 Sources
2 Sources
Meta has launched Spirit LM, an open-source multimodal language model that seamlessly integrates speech and text, offering more expressive and natural-sounding AI-generated speech. This development challenges existing AI voice systems and competes with models from OpenAI and others.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved