Curated by THEOUTPOST
On Wed, 9 Apr, 12:02 AM UTC
10 Sources
[1]
Amazon unveils a new AI voice model, Nova Sonic | TechCrunch
On Tuesday, Amazon debuted a new generative AI model, Nova Sonic, capable of natively processing voice and generating natural-sounding speech. Amazon claims that Sonic's performance is competitive with frontier voice models from OpenAI and Google on benchmarks measuring speed, speech recognition, and conversational quality. Nova Sonic is Amazon's answer to newer AI voice models such as the model powering ChatGPT's Voice Mode, which feel more natural to speak with than the more rigid models from Amazon Alexa's early days. Recent technological breakthroughs have made legacy models and the digital assistants they underpin, such as Alexa and Apple's Siri, seem incredibly stilted by comparison. Nova Sonic is available through Bedrock, Amazon's developer platform for building enterprise AI applications, via a new bi-directional streaming API. In a press release, Amazon called Nova Sonic "the most cost-efficient" AI voice model on the market, and around 80% less expensive than OpenAI's GPT-4o. Components of Nova Sonic are already powering Alexa+, Amazon's upgraded digital voice assistant, according to Amazon SVP and Head Scientist of AGI Rohit Prasad. In an interview, Prasad told TechCrunch that Nova Sonic builds on Amazon's expertise in "large orchestration systems," the technical scaffolding that makes up Alexa. Compared to rival AI voice models, Nova Sonic excels at routing user requests to different APIs, said Prasad. This capability helps Nova Sonic "know" when it needs to fetch real-time information from the internet, parse a proprietary data source, or take action in an external application -- and use the appropriate tool to do it. During a two-way dialogue, Nova Sonic waits to speak "at the appropriate time," taking into account a speaker's pauses and interruptions, says Amazon. It also generates a text transcript for the user's speech, which developers can use for various applications. Nova Sonic is less prone to speech recognition errors than other AI voice models, according to Prasad, meaning the model is relatively good at understanding a user's intent even if they mumble, misspeak, or are in a noisy setting. On a benchmark measuring speech recognition across languages and dialects, Multilingual LibriSpeech, Amazon says Nova Sonic achieved a word error rate (WER) of just 4.2% when averaged across English, French, Italian, German, and Spanish. That means that roughly four out of every 100 words from the model differed from a human transcription in those languages. On another benchmark measuring loud interactions with multiple participants, Augmented Multi Party Interaction, Amazon says Nova Sonic was 46.7% more accurate in terms of WER than OpenAI's GPT-4o-transcribe model. Nova Sonic also has industry-leading speed, with an average perceived latency of 1.09 seconds, according to Amazon. That makes it faster than the GPT-4o model powering OpenAI's Realtime API, which responds in 1.18 seconds, per benchmarking by Artificial Analysis. Prasad says Nova Sonic is a part of Amazon's broader strategy to build AGI (artificial general intelligence), which the company defines as "AI systems that can do anything a human can do on a computer." Moving forward, Prasad says Amazon plans to release more AI models that can understand different modalities, including image, video, and voice, as well as "other sensory data that are relevant if you bring things into the physical world." Amazon's AGI division, which Prasad oversees, seems to be playing a larger role in the company's product strategy these days. Just last week, Amazon launched a preview of Nova Act, a browser-using AI model that appears to be powering elements of Alexa+ and Amazon's Buy for Me feature. Starting with Nova Sonic, Prasad says the company wants to offer more of its internal AI models for developers to build with.
[2]
Amazon plays catchup with new Nova AI models to generate voices and video
Umar Shakir is a news writer fond of the electric vehicle lifestyle and things that plug in via USB-C. He spent over 15 years in IT support before joining The Verge. Amazon is showing off new AI technology this week, including its take on a more conversational voice model to better compete with things like Gemini Live and OpenAI's Advanced Voice Mode and an update to its model that can generate video. The new Nova Sonic voice model handles real-time speech processing and AI voice generation for conversational applications, Amazon says. Nova Sonic uses a "unified model architecture" that Amazon claims is better than other approaches that interconnect separate models to handle speech recognition, speech-to-text conversion, response generation, and then text-to-audio. Amazon says Nova Sonic can also better detect someone's tone and deliver more natural responses. Nova Sonic is available to try through Amazon's Bedrock developer platform and the company says it can be used to make things like customer service bots or build AI agents for travel, education, healthcare, and a variety of other industries. "Components" of Nova Sonic are already being used in Amazon's new Alexa Plus assistant, Amazon's Rohit Prasad, SVP and head scientist of AGI, told TechCrunch. As for video, Amazon announced Nova Reel 1.1, which the company says provides quality and latency improvements over 1.0. It also can now keep consistent styles across multiple 6-second scenes cut together to a full video of up to two minutes in length.
[3]
Amazon Nova Sonic speech model takes tonal cues
The foundation model supports real-time bi-directional speech Amazon has introduced a foundation model that claims to grasp not just what you're saying, but how you're saying it - tone, hesitation, and more. Amazon Nova Sonic, the latest member of the Nova family of foundation models first introduced in December 2024, accepts spoken input and responds with real-time speech, while also generating a transcript for developers. Traditionally, voice-based AI apps stitch together three separate models: one for speech recognition, one to generate responses, and one to synthesize speech. Amazon claims Nova Sonic unifies these capabilities into a single model. "This unification enables the model to adapt the generated voice response to the acoustic context (e.g., tone, style) and the spoken input, resulting in more natural dialogue," Amazon said in its announcement. "Nova Sonic even understands the nuances of human conversation, including the speaker's natural pauses and hesitations, waiting to speak until the appropriate time, and gracefully handling barge-ins." The e-souk has posted sample audio of an exchange in which this scenario might come into play. In the recording, an AI travel assistant handling a customer trying to book a trip adopts a reassuring tone after the customer expresses concern about the price of the tickets in the customer's voice. "Amazon Nova Sonic doesn't just understand what you say," explains Osman Ipek, senior machine learning solutions architect at Amazon, in a video. "It also understands how you say it. So it adapts its responses to mirror your communication style. If you speak with excitement Nova Sonic's response will match with the similar enthusiasm. If you adopt a serious tone it will adjust accordingly by recognizing prosodic elements like pitch and emotion. It creates truly conversational interactions." Available within Amazon Bedrock via the bidirectional streaming API, Nova Sonic "understands streaming speech in various speaking styles and generates expressive speech responses that dynamically adapt to the prosody of input speech." Essentially, the model can modulate its voice and will pause when interrupted and then resume, which makes for more natural conversational flow. API code can be tied to analytics-based sentiment analysis. But much of the model's tonal variation is expected to be driven by LLM prompts. Nova Sonic models don't provide direct access to voice control parameters. Rather, the user instructs the model on the tone it should take via the system prompt. For example: Nova Sonic supports a context window of 32K tokens for audio and has a default connection limit of eight minutes, which can be renewed to continue longer conversations. It can interface with enterprise systems via Retrieval Augmented Generation (RAG) and it can handle function calling and agent-orient workflows, in a variety of speaking styles across its set of supported languages - currently just English (American and British). IT consultancy Gartner in April published a report titled, "Market Guide for Conversational AI Solutions." The firm found, "Demand for [conversational AI] capabilities is increasing across numerous use cases, both customer and employee-facing. However, leaders find it challenging to discern solutions that can best meet their requirements in such a rapidly evolving market." Gartner expects the conversational AI market to reach $36 billion in revenue by 2032, up from $8.2 billion in 2023. ®
[4]
Amazon enters real-time AI voice race with Nova Sonic, a unified voice model that senses emotion
What happens when the AI senses the frustration or joy in your voice? A new speech-to-speech AI model from Amazon, called Nova Sonic, unifies speech recognition and generation to deliver more natural voice interactions -- part of the Seattle tech giant's broader effort to develop human-like intelligence in competition with Google, OpenAI, and others. Among other advances, Amazon says Nova Sonic picks up on tone of voice, adapting to the style and emotions of users. An angry customer on a support call might hear a calm, steady voice in return, while someone sounding excited could get a more upbeat response. "I think of intelligence as inseparable from context," said Rohit Prasad, Amazon's senior vice president of artificial general intelligence, who leads a central team working on the company's most advanced AI technology. "If you're excited about Hawaii, it will be excited about it," he explained, as an example. "If you're not, then it will suggest a separate destination." Nova Sonic will be available to third-party developers through Amazon's Bedrock service. Amazon is already using components of the model internally, in products including its newly released Alexa+ voice assistant. Unlike traditional voice systems that stitch together separate models for speech recognition, language processing, and text-to-speech, Nova Sonic combines all three in a single architecture, according to the company. Amazon says this integration allows the model to preserve the full context of a conversation -- including intonation, pacing, and intent -- making interactions feel more conversational and responsive. It can also take action in the middle of a conversation, like pulling up flight options or checking an account, without breaking the flow of the interaction. Amazon is making Nova Sonic available via a new streaming API built for real-time voice applications. It currently supports English with a few different voices and accents. Amazon says it's working on support for more languages. Nova Sonic enters a growing field of voice and multimodal AI models, as companies race to build more human-like digital assistants. OpenAI recently launched GPT-4o, its own real-time speech model, while Google has added conversational voice capabilities to its Gemini assistant. Based on its testing, Amazon says Nova Sonic outperforms these rivals on speed and cost, with lower latency and better pricing. For example, Amazon says Nova Sonic responds in just over a second on average -- faster than both OpenAI's GPT-4o and Google's Gemini Flash 2.0 in tests run by the research firm Artificial Analysis. The company says Nova Sonic is nearly 80% cheaper to use than GPT-4o for real-time voice interactions. Prasad, who previously was previously Alexa's chief scientist, now oversees Amazon's AGI group, reporting to Amazon CEO Andy Jassy. The long-term goal, Prasad said in an interview, is to create unified models that can handle any kind of input and respond in the most natural way -- delivering the "general" in artificial general intelligence. "I actually think you're merging the powers of the human and machine together," Prasad said of AGI initiatives. "That's why this is so important." He called Nova Sonic "a huge step" in that direction. Companies testing Nova Sonic include ASAPP, for customer service calls; Education First, applying it to language learning tools; and Stats Perform, which is using it to deliver real-time sports insights through voice. Amazon says Nova Sonic is designed to integrate with company systems to access real-time information such as pricing, availability, or schedules. The model can also be used to carry out tasks mid-conversation, including making reservations or offering alternative options. Nova Sonic is the latest addition to Amazon's Nova line of AI models, introduced by Jassy at AWS re:Invent in December, which includes AI for generating and understanding text, images, and video. It follows Amazon's recent release of a research preview of Nova Act, for building web-based AI agents.
[5]
Move over, Alexa: Amazon launches new realtime voice model Nova Sonic for third-party enterprise development
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Amazon is best known as an e-commerce giant and then somewhere perhaps slightly further down the list of notable offerings is its Alexa AI voice assistant product, which just got a big intelligence upgrade last month thanks in part to Amazon investment Anthropic. Now Alexa will have to make space for a new Amazon voice AI sibling: today the company is introducing Amazon Nova Sonic, a new foundation model designed to allow third-party app developers to build realtime, naturalistic, conversational voice interactivity to their products using Amazon's web platform Bedrock It's available now via a bi-directional streaming application programming interface (API). Obvious use cases include customer support and service, guidance, information retrieval, and entertainment. A unified approach Nova Sonic addresses a key challenge in voice AI: the fragmentation of technologies. Traditionally, building voice interfaces required combining separate models for speech recognition, language processing, and speech synthesis, according to Rohit Prasad, SVP and Head Scientist for Artificial General Intelligence (AGI) at Amazon, in a video call interview with VentureBeat yesterday using Amazon's Chime video service. This complexity often results in robotic, unnatural interactions and increased development overhead. Now, Sonic seeks to improve on this state of affairs by combining all three distinct model types into one. Prasad explained the model's core innovation: "Nova Sonic brings together three traditionally separate models -- speech-to-text, text understanding, and text-to-speech -- into one unified system that can model not just the 'what' but also the 'how' of communication." By retaining the acoustic context -- such as tone, cadence, and style -- Nova Sonic helps maintain the nuances of human conversation. Recognizing the intricacies and quirks of live, two-way audio conversations One of Nova Sonic's defining capabilities is its ability to handle live, two-way conversations. It recognizes when users pause, hesitate, or interrupt -- common behaviors in human speech -- and responds fluidly while maintaining context. "The real breakthrough here is real-time, interactive, low-latency voice interaction, which means you can interrupt the AI mid-sentence, and it will still maintain context and respond coherently," said Prasad. This feature is especially relevant in scenarios like customer service, where responsiveness and adaptability are critical. Built-in tool use and workflow integration Nova Sonic is also designed to integrate seamlessly with other systems. It automatically generates transcripts of spoken input, which can be used to trigger APIs or interact with proprietary tools. This allows companies to build AI agents that can perform tasks such as booking appointments, retrieving live information, or answering complex customer inquiries. "You can use Nova Sonic through Amazon Bedrock and connect it with any tools or proprietary data sources, even visual ones, as long as they're wrapped as callable APIs," said Prasad. This flexibility makes the model suitable for a wide range of industries, from education and travel to enterprise operations and entertainment. Benchmark performance and industry comparisons Nova Sonic has been benchmarked against other real-time voice models, including OpenAI's GPT-4o and Google's Gemini Flash 2.0. On the Common Eval data set, it achieved a 69.7% win-rate over Gemini Flash 2.0 and a 51.0% win-rate over GPT-4o for American English single-turn conversations using a masculine voice. Similar gains were seen with feminine and British English voices. Prasad emphasized Nova Sonic's strong performance in its primary language markets: "Nova Sonic is currently best-in-class in U.S. and British English, outperforming even GPT-4o real-time in both conversational naturalness and accuracy." He added, "To the best of our knowledge, only two other models -- GPT-4o real-time and a variant of GPT-4o mini -- come close to what Nova Sonic does in combining speech understanding and generation in real time. This space is still very early and very hard." Multilingual capabilities and noisy environment handling In speech recognition, Nova Sonic also excels in multilingual and real-world conditions. It recorded a word error rate (WER) of 4.2% on the Multilingual LibriSpeech benchmark, outperforming GPT-4o Transcribe by over 36% across English, French, German, Italian, and Spanish. In noisy, multi-speaker environments (measured using the AMI benchmark), Nova Sonic showed a 46.7% improvement in WER over GPT-4o Transcribe. Expressive voices and language expansion Currently, the model supports multiple expressive voices, both masculine and feminine, in American and British English. Amazon noted that additional accents and languages are in development and will be released in future updates. Low latency and enterprise-friendly cost Speed and cost are also part of the appeal. Third-party benchmarking shows Nova Sonic delivers a customer-perceived latency of 1.09 seconds, compared to 1.18 seconds for OpenAI's GPT-4o and 1.41 seconds for Google's Gemini Flash 2.0. From a pricing standpoint, Amazon positions Nova Sonic as an enterprise-ready solution. "We're nearly 80% cheaper than GPT-4o real-time, and that superior price-performance is resonating with enterprises moving from experimentation to deployment," said Prasad. Early adoption across sectors According to Amazon, companies across different sectors have already begun using or testing Nova Sonic. ASAPP is applying the technology to optimize contact center workflows, praising its accuracy and natural dialog handling. Education First (EF) uses the model to support language learners with real-time pronunciation feedback, especially for non-native speakers with varied accents. Sports data provider Stats Perform is leveraging Nova Sonic's low latency and simple setup to power rapid, data-rich interactions in its Opta AI Chat platform. Responsible AI and safety commitment Alongside performance and cost, Amazon is highlighting its commitment to responsible AI development. The Nova family of models includes built-in safeguards and is supported by AWS AI Service Cards that outline intended use cases, potential limitations, and ethical guidelines. Prasad underscored Amazon's focus on trust and safety: "Trust is paramount for us -- developers can customize personality within limits, but we've put in strong guardrails to prevent voice cloning or unwanted mimicry." He added, "We work extremely hard to eliminate hallucinations and voice drift. The bar we've set for release is high because speech generation must be trustworthy." Amazon Nova Sonic is now generally available through Amazon Bedrock. Developers and enterprises interested in exploring the model can get started by visiting https://aws.amazon.com/nova/.
[6]
Meet Nova Sonic, Amazon's new AI voice model
Amazon has unveiled its new human-like AI voice model, Nova Sonic. Credit: Amazon AI companies have been working on voice models for a while now, but it seems things really ramped up after OpenAI unveiled ChatGPT Voice Mode. Now, Amazon has just introduced its new "foundation" AI voice model called Nova Sonic. And it really makes Alexa sound like she's living way in the past. According to Amazon, Nova Sonic "unifies speech understanding and speech generation into a single model, to enable more human-like voice conversations in AI applications." With the samples provided, it certainly does seem more human-like than the company's previous iterations of AI voice models. For example, there are proper pauses, tone, and inflections on words depending on where they are and what they mean in a sentence. Amazon provided some samples you can listen to here and here. Again, "more human-like" is the key description here. There are still plenty of signs that it's an AI voice, but it also does sound like a big step over previous AI voice assistants like Alexa. Amazon says that it achieved this by combining multiple models that would traditionally be used, like speech recognition, large language models, and text-to-speech, into one single unified model. According to Amazon, it not only understands the nuances in speech to produce it, but it also understands it when a human inputs their own speech with these nuances as well. According to TechCrunch, Nova Sonic is already powering Amazon's next-generation AI voice assistant, Alexa+. Based on recent developments, it does seem like the big AI companies are currently focusing on voice models. So, prepare for competition in that space to heat up. Amazon is already pointing to claims that Nova Sonic is roughly 80 percent cheaper than OpenAI's GPT-4o model and promoting it as "the most cost-efficient." Nova Sonic is currently available to developers through Amazon's enterprise AI developer platform, Bedrock.
[7]
Amazon Rolls Out Nova Sonic and Nova Reel 1.1 for Generative Voice and Video AI
Nova Sonic integrates speech recognition, understanding and generation into one model, removing the need for separate components. Traditional voice systems involve complex pipelines -- converting speech to text, processing through a large language model, and converting the response back to speech. According to Amazon, this approach "fails to preserve crucial acoustic context and nuances." "Nova Sonic takes a new approach," the company said. "It unifies the understanding and generation capabilities into a single model." The result is a voice agent that not only understands user input but also responds with an appropriate tone, pace, and style. The model is available through Amazon Bedrock. It supports applications in customer service, travel, education, healthcare and entertainment. In one example shared by Amazon, a virtual travel assistant shifts its tone in response to a customer's change in emotion -- moving from enthusiastic to reassuring when concerns about cost are raised. Another use case includes an enterprise dashboard assistant that grounds answers in company data and maintains multi-turn dialogue without requiring users to reset context. Nova Sonic also generates transcripts of user speech. This feature allows developers to integrate external APIs and tools, enabling AI agents to perform tasks such as retrieving flight options or accessing internal dashboards. On the other hand, Nova Reel 1.1 enables multi-shot videos up to two minutes in length, with consistent visual style across 6-second segments. It improves on the previous version in terms of generation speed and coherence. Users can choose to provide a single prompt for the entire video or set individual prompts per shot for more control. The model supports use cases like marketing campaigns, product design showcases, and social media content creation. "Nova Reel enhances creative productivity," Amazon said, "while helping to reduce the time and cost of video production using generative AI." To get started with Amazon Nova Reel 1.1, users need to visit the Amazon Bedrock console and request access to the model. In the left-hand navigation panel, they should select "Model access" and then locate Amazon Nova Reel in the list of available models. Requesting access here provides permission to use both version 1.0 and 1.1 of the model. Once access is granted, users can begin using Amazon Nova Reel 1.1 through the Amazon Bedrock console, the AWS SDK, or the AWS Command Line Interface (CLI). The releases are part of Amazon's broader Nova model family, introduced at re:Invent 2024, which also includes Nova Micro, Lite, and Pro that generate text from different modalities
[8]
Amazon debuts new Amazon Nova Sonic voice model - SiliconANGLE
Amazon.com Inc. today debuted a new foundation model, Amazon Nova Sonic, that is optimized for voice interactions such as customer support calls. The company says that it's using components of the model to power Alexa+. Introduced in February, the latest iteration of Amazon's voice assistant can automatically perform actions such as ordering takeout and booking flights. When necessary, it's capable of interacting with third-party applications to carry out those tasks. Usually, processing speech involves three steps. First, an application has to use a speech recognition model to transcribe the audio. It then feeds the transcript into a large language model, which generates a text-based response, and a third algorithm subsequently turns the outputted text into speech. Using three different neural networks complicates software development. It can also slow artificial intelligence applications' performance. Data takes time to move from one neural network to another, which adds latency to prompt responses. Amazon says that its new Nova Sonic model simplifies the workflow. It allows companies to replace the three different neural networks usually needed to process speech with one, which eases development. Amazon is also promising performance benefits: Nova Sonic starts responding to user input in 1.09 seconds on average. According to Amazon, that makes it faster than competing products from OpenAI and Google LLC. Nova Sonic adapts the synthetic speech it generates based on user behavior. According to Amazon, the model can switch tones in the middle of a conversation and ask follow-up questions if more information is needed to fulfill a request. When the information that Nova Sonic requires isn't provided during a conversation, it can retrieve data from external systems. The model could, for example, check an inventory management application to determine if a product requested by a customer is in stock. Nova Sonic can also perform tasks such as placing orders in the applications with which it's integrated. In the background, the model generates a transcript of the speech it processes. That transcript can be streamed to other artificial intelligence models through an application programming interface. An electronics maker, for example, could send contact center transcripts to an AI application that measures customer sentiment. On launch, Nova Sonic supports English and multiple accents. Additional languages and accents will be added in the future. Developers can access Nova Sonic through Amazon Web Services Inc.'s Amazon Bedrock service, which provides access to hosted foundation models from the company and third-party providers. "We are releasing a new foundation model in Amazon Bedrock that makes it simpler for developers to build voice-powered applications that can complete tasks for customers with higher accuracy, while being more natural, and engaging," said Rohit Prasad, Amazon's senior vice president of artificial general intelligence.
[9]
Amazon's New AI Model Comes With Real-Time Voice Conversation Capability
Currently, it only supports the English language in multiple accents Amazon introduced a new artificial intelligence (AI) model in its flagship Nova family of models on Tuesday. Dubbed Amazon Nova Sonic, it is a voice generation model capable of generating human-like speech. However, it is not a text-to-speech (TTS) tool; instead, it can process voice input in real time and respond to it. The Seattle-based tech giant says developers can use the model to build conversational AI chatbots and similar tools. Notably, the Amazon Nova Sonic AI model also supports functional calling and tool use, making it compatible with agentic application developments as well. In a blog post, the tech giant announced the release of the Amazon Nova Sonic. The company said traditional approaches to voice-enabled applications use a complex with multiple models such as text recognition, speech-to-text conversion, data processing, and TTS models. This often leads to an increase in latency, and failure in preserving linguistic context, the post added. Amazon said its approach with the Nova Sonic model was to unify speech understanding and speech generation components. The AI model is said to be able to process data and generate speech in real time, giving it a conversation-like experience. This unified system also allows the model to better understand the pace and timbre of input speech to contextualise the intent of the user. Additionally, the AI model can understand different speaking styles as well as separate masculine and feminine-sounding voices in different accents. It can also understand when a user misspeaks, mumbles, or pauses while speaking. Amazon says the model can pick up speech even in a noisy setting. In response generation, the company claims the model can be more expressive and human-like, and can adjust its response style to match the context of the conversation. Currently, the AI model only supports the English language. Amazon said support for more languages will be added soon. The model supports a context window of 32,000 tokens for audio, with an additional window to handle longer conversations. It has a default session limit of eight minutes. To use the Nova Sonic model, developers can head to Amazon Bedrock and find it under the model access option. It can also be accessed via a bidirectional streaming application programming interface (API) that can both process audio input and generate output.
[10]
Amazon Touts AI Milestone, Says New Model Outshines GPT-4o In Real-Time Speech - Amazon.com (NASDAQ:AMZN)
Feel unsure about the market's next move? Copy trade alerts from Matt Maley -- a Wall Street veteran who consistently finds profits in volatile markets. Claim your 7-day free trial now. Amazon.com Inc. AMZN has released a new speech-based AI model called Amazon Nova Sonic, designed to change real-time voice interactions in AI-powered applications. This system integrates both speech comprehension and voice generation within one unified architecture, removing the need for multiple standalone models to manage each task separately. Nova Sonic streamlines speech processing by replacing the conventional multi-step approach, in which separate systems handle recognition, interpretation, and speech output, with a single, integrated framework. This all-in-one model enables smoother and more lifelike interactions. Accessible via Amazon Bedrock through a bi-directional streaming API, the technology is poised to support diverse sectors, including healthcare, travel, and hospitality. Nova Sonic captures subtle elements of speech, including intonation, rhythm, and pauses, allowing it to respond with a level of sensitivity that closely resembles human dialogue, the company says. It adapts to real-time interruptions, holding off on replies until it's contextually appropriate to speak. This conversational awareness creates a more lifelike and engaging interaction, making it especially effective for roles in customer service and AI-driven assistance. Also Read: What's Going On With Rocket Lab Stock Today? "From the invention of the world's best personal AI assistant with Alexa, to developing AWS services like Connect, Lex, and Polly that are used across a wide range of industries, Amazon has long believed that voice-powered applications can make all of our customers' lives better and easier," said SVP of Amazon Artificial General Intelligence, Rohit Prasad. In standardized industry evaluations, Nova Sonic outperformed competitors, including OpenAI's GPT-4o (Realtime) and Google's Gemini Flash 2.0 in several categories. Notably, Nova Sonic scored higher win-rates in both masculine and feminine American English voices, as well as British English, when measured against datasets like Common Eval and Multilingual LibriSpeech, according to Amazon. Nova Sonic delivered speech recognition results across five key languages, recording a word error rate of 4.2%, a more than 36% improvement over OpenAI's equivalent offering. It also prevailed under challenging audio conditions, surpassing competitors by roughly 47% in noisy, real-world tests. With an average reply speed just above one second, it also stands out for its affordability, costing nearly 80% less than GPT-4o. In February, Amazon said that around 1,000 generative AI projects are currently underway or already created across its various business divisions, spanning multiple operational areas, from customer service improvements to inventory management. The company is investing about $100 billion in artificial intelligence initiatives this year, aligning with rivals like Alphabet ($75 billion) and Microsoft ($80 billion). The tech companies' chase for AI dominance became significant after the launch of Chinese AI startup DeepSeek's R1, which made waves with its performance and lower costs. Price Action: AMZN shares traded higher by 1.6% at $178.07 at last check on Tuesday. Read Next: Nvidia, Applied Materials Back Digital Engineering Startup In $115M Round AMZNAmazon.com Inc$178.361.77%Stock Score Locked: Want to See it? Benzinga Rankings give you vital metrics on any stock - anytime. Reveal Full ScoreEdge RankingsMomentum53.14Growth94.20Quality74.44Value48.87Price TrendShortMediumLongOverviewGot Questions? AskWhich companies might benefit from Nova Sonic adoption?How will AI advancements reshape customer service roles?Which sectors will leverage speech recognition tech?Could Amazon's investment trigger growth in AI startups?How might healthcare improve with Nova Sonic integration?Which travel companies could adopt AI voice tech?What challenges will competitors face against Nova Sonic?Are AI-driven applications a viable investment opportunity?How will cost savings impact AI product pricing?Which investors are focusing on AI technologies now?Powered By This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. Market News and Data brought to you by Benzinga APIs
Share
Share
Copy Link
Amazon introduces Nova Sonic, a unified AI voice model that processes speech in real-time, understands emotional context, and generates natural responses, positioning itself as a competitor to OpenAI and Google in the conversational AI market.
Amazon has unveiled Nova Sonic, a groundbreaking AI voice model that promises to revolutionize conversational AI technology. Announced on Tuesday, Nova Sonic is designed to process voice natively and generate natural-sounding speech, positioning itself as a formidable competitor to voice models from OpenAI and Google 14.
Unlike traditional voice systems that combine separate models for speech recognition, language processing, and text-to-speech, Nova Sonic integrates all three functionalities into a single architecture 24. This unified approach allows the model to preserve the full context of a conversation, including intonation, pacing, and intent, resulting in more natural and responsive interactions 4.
Nova Sonic supports real-time, bi-directional speech processing, enabling it to handle live, two-way conversations with remarkable fluidity. The model can recognize when users pause, hesitate, or interrupt, adapting its responses accordingly 34.
One of Nova Sonic's standout features is its ability to grasp not just what is being said, but how it's being said. The model can detect a speaker's tone, style, and emotional state, allowing it to adapt its responses to mirror the user's communication style 34. For instance, if a user expresses excitement about a topic, Nova Sonic can match that enthusiasm in its reply 4.
Amazon claims that Nova Sonic outperforms its rivals in speed and cost-effectiveness. The model reportedly responds in just over a second on average, faster than both OpenAI's GPT-4o and Google's Gemini Flash 2.0 4. On the Common Eval dataset, Nova Sonic achieved a 69% win rate over Gemini Flash 2.0 and a 51% win rate over GPT-4o for American English single-turn conversations 5.
In multilingual speech recognition, Nova Sonic recorded a word error rate (WER) of 4.2% on the Multilingual LibriSpeech benchmark, outperforming GPT-4o Transcribe by over 36% across English, French, German, Italian, and Spanish 15.
Nova Sonic is available through Amazon's Bedrock developer platform via a new bi-directional streaming API 12. The model can integrate with enterprise systems through Retrieval Augmented Generation (RAG) and supports function calling and agent-oriented workflows 3.
Amazon envisions Nova Sonic being used across various industries, including customer service, education, healthcare, and entertainment 24. Companies already testing or implementing Nova Sonic include ASAPP for customer service calls, Education First for language learning tools, and Stats Perform for delivering real-time sports insights 45.
Nova Sonic is part of Amazon's broader strategy to develop artificial general intelligence (AGI). Rohit Prasad, Amazon's SVP and Head Scientist of AGI, stated that the company plans to release more AI models capable of understanding different modalities, including image, video, and voice 14.
As the conversational AI market continues to grow, with Gartner projecting revenues to reach $36 billion by 2032, Nova Sonic represents a significant step forward in Amazon's quest to create more human-like digital assistants and maintain its competitive edge in the rapidly evolving AI landscape 34.
Reference
[3]
[4]
Amazon launches its Nova family of AI models, offering text, image, and video generation capabilities. The move positions Amazon as a strong competitor in the enterprise AI market, challenging Microsoft, Google, and OpenAI.
29 Sources
29 Sources
Amazon introduces Nova, a family of AI foundation models, aiming to compete with OpenAI and Google in generative AI capabilities while emphasizing responsible AI practices and cost-efficiency.
2 Sources
2 Sources
Amazon is developing a new AI reasoning model called Nova, set to launch in June 2025. The model aims to compete with offerings from OpenAI, Google, and Anthropic, focusing on cost-efficiency and advanced reasoning capabilities.
5 Sources
5 Sources
Amazon introduces Nova Act, an AI agent capable of controlling web browsers and performing autonomous tasks, positioning the company in direct competition with OpenAI and Anthropic in the AI agent race.
18 Sources
18 Sources
OpenAI introduces new AI models for speech-to-text and text-to-speech, offering improved accuracy, customization, and potential for building AI agents with voice capabilities.
7 Sources
7 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved