3 Sources
[1]
Hume's new EVI 3 model lets you customize AI voices - how to try it
The company is betting that the future of AI will belong to models that can speak and emote in humanlike voices. Hume AI is launching EVI 3, the third iteration of its Empathic Voice Interface (EVI) model, which can interact with users in a huge variety of humanlike voices. Like ChatGPT's voice mode, EVI 3 comes with an assortment of preprogrammed AI voices. These are listed by personality and character descriptions, including "Old Knocks Comedian," "Seasoned Life Coach," "Wise Wizard," and "Dungeon Master," as well as the company's namesake, the 18th-century philosopher David Hume. Crucially, the model also comes with a feature that allows users to customize their own AI voices from scratch. And rather than having to adjust a long list of specific attributes, as you might when building a Bitmoji or a video game character, you can simply describe the characteristics of your desired voice, using natural language, and the model will do the rest. The launch reflects a broader effort among AI companies to build more personable and engaging models by training them to exhibit distinct "personalities." Anthropic's Claude was trained to be thoughtful and open-minded, for example, while xAI's Grok is supposed to be edgier, with a sense of humor. Hume describes itself on its website as working "to ensure that artificial intelligence is built to serve human goals and emotional well-being." That mission statement is reminiscent of those of some of the most preeminent AI developers (OpenAI, for example, is aiming "to ensure that artificial general intelligence...benefits all of humanity"). But whereas the bigger players are mainly oriented around building bigger and more powerful models, Hume seems primarily focused on fine-tuning the believability of its models, so that they can verbally communicate in a way that not only sounds, but feels real, down to the little pauses between words and the occasional "umm" peppered into sentences. Also: What is AI? Everything to know about artificial intelligence The results are impressive. My first time demoing the model, I asked it to generate a character that spoke in a world-weary but witty working-class British accent -- à la Michael Caine -- and who was a staunch Flat-Earther. When the voice was ready, I asked it why it thought the government and scientists were lying about the shape of the Earth, and it immediately launched into a passionate tirade about why the real logical fallacy was believing an official narrative when all of the direct evidence from one's senses pointed to the opposite story being true (i.e., the Earth is a flat disc). The voice was lyrical and full of energy, as if we were speaking at some Olde English pub. Also: AI voice generators: What they can do and how they work In a company blog post published Thursday, Hume wrote that the launch of EVI 3 marks the next step in the company's mission to "achieve a voice AI experience that can be fully personalized" by the end of this year. "We believe this is an essential step toward voice being the primary way people want to interact with AI." In 1950, the mathematician Alan Turing proposed his famous test for assessing machine intelligence. The "Imitation Game," as he called it -- now known as the Turing Test -- envisioned a human being interviewing another human and a machine, both of which were hidden behind a partition. If the interlocutor couldn't tell which responses were coming from the human and which were coming from the machine, the latter had passed the test and could be considered true artificial intelligence. Seventy-five years later, we have AI tools that can not only write, but actually speak in a way that seems convincingly human. Many of the latest voice-equipped AI models have none of the mechanical monotone or emotional vacancy characteristic of earlier automated voices, like the ones that greet you when you call your bank. They instead exhibit a broad range of tenors and personalities, encapsulating what's effectively become an entire subfield of AI research in and of itself, sparked by a competition among tech companies to build more personable and engaging software. The question of how the average person will interact with AI in the future has been a growing concern across Silicon Valley in recent years, as companies have searched for viable successors to chatbots like ChatGPT. OpenAI recently announced a plan to buy io, a company founded by former Apple executive Jony Ive (the designer of the iPhone), with long-term plans to build hardware centered on AI. A similar goal was undertaken by the company Humane with its AI Pin, before that product flopped. Hume is banking on the idea that the future of AI will belong to models that can speak with users in humanlike voices. When developing EVI 3, Hume compared its performance to some of the most powerful AI voice assistant models currently available, including GPT-4o and Gemini Live, across a few key benchmarks. Also: What is Gemini? Everything you should know about Google's new AI model According to the company blog post, EVI 3 outperformed its competitors in "emotion/style modulation, or adjusting its emotional tone throughout the course of a conversation. It also outperformed GPT-4o in "emotion understanding" -- an ability to recognize and interpret the emotional tenor of users' voices. Finally, early testing showed that it has a lower latency than both GPT-4o and Gemini Live -- though it was outscored by the chatbot from AI company Sesame. You can try EVI 3 today through a demo and Hume's iOS app. Hume hasn't announced pricing for the model just yet. An API is slated for release in the coming weeks. The model currently specializes in English but will become proficient in other major languages, including French and Spanish, as it continues to be trained and after it's generally released, according to the company blog post.
[2]
I test AI every day -- the best AI voice generator I've ever used just dropped
You've probably heard an AI-generated voice by now. They are usually pretty noticeable, lacking in emotion or styling and still fairly robotic in nature. But with its latest update, Hume is changing that. Evi 3, the latest version of the company's leading AI-voice generator, has just landed. While it has an array of pre-made voices to try out, the thing that really sets this model above the rest is its customization options. Hume gave me access to an early version of the tool, allowing me to try out an array of custom voices and the ability to generate one down to the smallest details. Here's how it went. Once you're on the Hume system, you get a few options. You can choose from a selection of pre-set voices, but that's not what we're here for. The new feature on offer is the ability to design a voice. Do this and you'll enter a conversation with an AI. This will ask you questions about the voice you want to generate. Either you can take control, pitching an exact style of voice you want, or you can allow Hume to suggest options for you to take a guided approach. I tried a variety of different options here, some pretty simple in nature, some overly complicated and specific. Impressively, most of the voices fit my descriptions pretty closely. For example, the first voice I asked for was raspy and low in energy. It had a villainous nature, similar to a bad guy you would see in a fantasy film who rules over some evil kingdom. Incredibly specific, but a few minutes later, I had a voice that fit this exact description chatting away to me. Not only is the tone matched, but also his vocal mannerisms, mocking me and being sarcastic in his tone. Next, a British game show host, complete with an old-fashioned accent, a lot of energy, and an overly positive nature. Again, both accent and tone are matched up to the request surprisingly well. I went on to try an array of different voices, ranging from pirate-like in nature to very simple American accents. All of which were achieved by Hume. This isn't to say it was perfect throughout. Sometimes the voices would clip, revealing a slight robotic-sounding voice or a slip in the accent. It is also always clear you're talking to an AI, however, only slightly so, still sounding more human than robot. You can also only design these specific voices by having a back-and-forth voice conversation with Hume AI. It would be great to be able to add text prompts to generate them, especially when you can't speak. This also means it takes twice as long to generate a voice, having to work through a long conversation to get a result. It's a small but noticeable concern. Compared to text, video, and image generation, AI voice generation hasn't seen the same push. Companies like Elevenlabs have been at the forefront, and the likes of Google and OpenAI have made progress in the field. However, this is the best attempt I've seen in terms of customisation on the voices, not to mention in the tone and personality of said voices. Hume claims that in a blind comparison against OpenAI's GPT-4o, EVI 3 was rated higher, on average, on empathy, expressiveness, naturalness, interruption quality, response speed, and audio quality. The company also claims that the model outperformed GPT-4o, Gemini, and Sesame (a popular AI voice system) in ratings of how well it acted out a wide range of emotions and styles to study participants. This, for now, puts Hume in a great place in the market. However, AI moves fast. While it currently stands out as a leader in AI voice generation, especially in terms of creative expression, they'll have to keep the updates coming to stay ahead.
[3]
Emotive voice AI startup Hume launches new EVI 3 model with rapid custom voice creation
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More New York-based AI startup Hume has unveiled its latest Empathic Voice Interface (EVI) conversational AI model, EVI 3 (pronounced "Evee" Three, like the Pokémon character), targeting everything from powering customer support systems and health coaching to immersive storytelling and virtual companionship. EVI 3 lets users create their own voices by talking to the model (it's voice-to-voice/speech-to-speech), and aims to set a new standard for naturalness, expressiveness, and "empathy" according to Hume -- that is, how users perceive the model's understanding of their emotions and its ability to mirror or adjust its own responses, in terms of tone and word choice. Designed for businesses, developers, and creators, EVI 3 expands on Hume's previous voice models by offering more sophisticated customization, faster responses, and enhanced emotional understanding. Individual users can interact with it today through Hume's live demo on its website and iOS app, but developer access through Hume's proprietary application programming interface (API) is said to be made available in "the coming weeks," as a blog post from the company states. At that point, developers will be able to embed EVI 3 into their own customer service systems, creative projects, or virtual assistants -- for a price (see below). My own usage of the demo allowed me to create a new, custom synthetic voice in seconds based on qualities I described to it -- a mix of warm and confident, and a masculine tone. Speaking to it felt more naturalistic and easy than other AI models and certainly the stock voices from legacy tech leaders such Apple with Siri and Amazon with Alexa. What developers and businesses should know about EVI 3 Hume's EVI 3 is built for a range of uses -- from customer service and in-app interactions to content creation in audiobooks and gaming. It allows users to specify precise personality traits, vocal qualities, emotional tone, and conversation topics. This means it can produce anything from a warm, empathetic guide to a quirky, mischievous narrator -- down to requests like "a squeaky mouse whispering urgently in a French accent about its scheme to steal cheese from the kitchen." EVI 3's core strength lies in its ability to integrate emotional intelligence directly into voice-based experiences. Unlike traditional chatbots or voice assistants that rely heavily on scripted or text-based interactions, EVI 3 adapts to how people naturally speak -- picking up on pitch, prosody, pauses, and vocal bursts to create more engaging, humanlike conversations. However, one big feature Hume's models currently lack -- and which is offered by rivals open source and proprietary, such as ElevenLabs -- is voice cloning, or the rapid replication of a user's or other voice, such as a company CEO. Yet Hume has indicated it will add such a capability to its Octave text-to-speech model, as it is noted as "coming soon" on Hume's website, and prior reporting by yours truly on the company found it will allow users to replicate voices from as little as five seconds of audio. Hume has stated it's prioritizing safeguards and ethical considerations before making this feature broadly available. Currently, this cloning capability is not available in EVI itself, with Hume emphasizing flexible voice customization instead. Internal benchmarks show users prefer EVI 3 to OpenAI's GPT-4o voice model According to Hume's own tests with 1,720 users, EVI 3 was preferred over OpenAI's GPT-4o in every category evaluated: naturalness, expressiveness, empathy, interruption handling, response speed, audio quality, voice emotion/style modulation on request, and emotion understanding on request (the "on request" features are covered in "instruction following" seen below). It also usually bested Google's Gemini model family and the new open source AI model firm Sesame from former Oculus co-creator Brendan Iribe. It also boasts lower latency (~300 milliseconds), robust multilingual support (English and Spanish, with more languages coming), and effectively unlimited custom voices. As Hume writes on its website (see screenshot immediately below): Key capabilities include: Pricing and developer access Hume offers flexible, usage-based pricing across its EVI, Octave TTS, and Expression Measurement APIs. While EVI 3's specific API pricing has not been announced yet (marked as TBA), the pattern suggests it will be usage-based, with enterprise discounts available for large deployments. For reference, EVI 2 is priced at $0.072 per minute -- 30% lower than its predecessor, EVI 1 ($0.102/minute). For creators and developers working with text-to-speech projects, Hume's Octave TTS plans range from a free tier (10,000 characters of speech, ~10 minutes of audio) to enterprise-level plans. Here's the breakdown: For developers working on real-time voice interactions or emotional analysis, Hume also offers a Pay as You Go plan with $20 in free credits and no upfront commitment. High-volume enterprise customers can opt for a dedicated Enterprise plan featuring dataset licenses, on-premises solutions, custom integrations, and advanced support. Hume's history of emotive AI voice models Founded in 2021 by Alan Cowen, a former researcher at Google DeepMind, Hume aims to bridge the gap between human emotional nuance and AI interaction. The company trained its models on an expansive dataset drawn from hundreds of thousands of participants worldwide -- capturing not just speech and text, but also vocal bursts and facial expressions. "Emotional intelligence includes the ability to infer intentions and preferences from behavior. That's the very core of what AI interfaces are trying to achieve," Cowen told VentureBeat. Hume's mission is to make AI interfaces more responsive, humanlike, and ultimately more useful -- whether that's helping a customer navigate an app or narrating a story with just the right blend of drama and humor. In early 2024, the company launched EVI 2, which offered 40% lower latency and 30% reduced pricing compared to EVI 1, alongside new features like dynamic voice customization and in-conversation style prompts. February 2025 saw the debut of Octave, a text-to-speech engine for content creators capable of adjusting emotions at the sentence level with text prompts. With EVI 3 now available for hands-on exploration and full API access just around the corner, Hume hopes to allow developers and creators to reimagine what's possible with voice AI.
Share
Copy Link
Hume AI launches EVI 3, its latest Empathic Voice Interface model, offering unprecedented customization in AI-generated voices and outperforming competitors in naturalness and expressiveness.
Hume AI, a New York-based startup, has unveiled EVI 3, the latest iteration of its Empathic Voice Interface (EVI) model, marking a significant advancement in AI voice generation technology 1. This new model offers unprecedented customization capabilities, allowing users to create unique AI voices through natural language descriptions.
Source: Tom's Guide
EVI 3's standout feature is its ability to generate custom voices based on user descriptions. Unlike traditional character creation interfaces, users can simply describe the desired voice characteristics in natural language, and the model will create a matching voice 2. This innovative approach enables the creation of a wide range of voices, from "Old Knocks Comedian" to "Wise Wizard," and even allows users to replicate specific accents or personalities.
According to Hume's internal benchmarks, EVI 3 outperforms competitors like OpenAI's GPT-4o and Google's Gemini in several key areas:
The model currently specializes in English but is expected to expand to other languages, including French and Spanish, in the near future 1.
Source: VentureBeat
EVI 3 is designed for a wide range of applications, including:
The model targets businesses, developers, and creators, offering them tools to embed sophisticated voice AI into their products and services.
Currently, individual users can interact with EVI 3 through Hume's live demo on their website and iOS app. Developer access through an API is expected to be available in the coming weeks 3. While specific pricing for EVI 3 has not been announced, it is expected to follow a usage-based model similar to its predecessors, with potential enterprise discounts for large-scale deployments.
Hume's approach with EVI 3 reflects a growing trend in AI development towards more personable and engaging models. By focusing on fine-tuning the believability of its models, Hume aims to create AI voices that not only sound real but also feel real in their communication style 1.
Alan Cowen, founder of Hume and former researcher at Google DeepMind, emphasizes the importance of emotional intelligence in AI interfaces: "Emotional intelligence includes the ability to infer intentions and preferences from behavior. That's the very core of what AI interfaces are trying to achieve" 3.
As the field of AI voice generation continues to evolve rapidly, Hume's EVI 3 represents a significant step forward in creating more natural, expressive, and customizable AI voices. This development could potentially reshape how users interact with AI in various applications, from customer service to entertainment and beyond.
Apple is reportedly in talks with OpenAI and Anthropic to potentially use their AI models to power an updated version of Siri, marking a significant shift in the company's AI strategy.
22 Sources
Technology
10 hrs ago
22 Sources
Technology
10 hrs ago
Microsoft unveils an AI-powered diagnostic system that demonstrates superior accuracy and cost-effectiveness compared to human physicians in diagnosing complex medical conditions.
6 Sources
Technology
18 hrs ago
6 Sources
Technology
18 hrs ago
Google announces a major expansion of AI tools in education, including Gemini for Education and NotebookLM for under-18 users, aiming to transform classroom experiences while addressing concerns about AI in learning environments.
7 Sources
Technology
10 hrs ago
7 Sources
Technology
10 hrs ago
NVIDIA's upcoming GB300 Blackwell Ultra AI servers, slated for release in the second half of 2025, are poised to become the most powerful AI servers globally. Major Taiwanese manufacturers are vying for production orders, with Foxconn securing the largest share.
2 Sources
Technology
2 hrs ago
2 Sources
Technology
2 hrs ago
Elon Musk's AI company, xAI, has raised $10 billion through a combination of debt and equity financing to expand its AI infrastructure and development efforts.
3 Sources
Business and Economy
2 hrs ago
3 Sources
Business and Economy
2 hrs ago