3 Sources
[1]
Text-to-speech with feeling - this new AI model does everything but shed a tear
ElevenLabs' 'most expressive' v3 model can speak with a huge range of emotions in more than 70 languages. Try it for yourself. Not so long ago, generative AI could only communicate with human users via text. Now it's increasingly being given the power of speech -- and this ability is improving by the day. On Thursday, AI voice platform ElevenLabs introduced v3, described on the company's website as "the most expressive text-to-speech model ever." The new model can exhibit a wide range of emotions and subtle communicative quirks -- like sighs, laughter, and whispering -- making its speech more humanlike than the company's previous models. Also: Could WWDC be Apple's AI turning point? Here's what analysts are predicting In a demo shared on X, v3 was shown generating the voices of two characters, one male and the other female, who were having a lighthearted conversation about their newfound ability to speak in more humanlike voices. There's certainly none of the Alexa-esque flatness of tone, but the v3-generated voices tend to be almost excessively animated, to the point that their laughter is more creepy than charming -- take a listen yourself. The model can also speak more than 70 languages, compared to its predecessor's v2 limit of 29. It's available now in public alpha, and its price tag has been slashed by 80% until the end of this month. AI-generated voice has become a major focus of innovation as tech developers look toward the future of human-machine interaction. Automated assistants like Siri and Alexa have long been able to speak, of course, but as anyone who routinely uses these systems can attest, their voices are very mechanical, with a rather narrow range of emotional cadence and tones. They're useful for handling quick and easy tasks, like playing a song or setting an alarm, but they don't make great conversation partners. Some of the latest text-to-speech (TTS) AI tools, on the other hand, have been engineered to speak in voices that are maximally realistic and engaging. Also: You shouldn't trust AI for therapy - here's why Users can prompt v3, for example, to speak in voices that are easily customizable through the use of "audio tags." Think of these as stylistic filters that modify the output, and which can be inserted directly into text prompts: "Excited," "Loudly," "Sings," "Laughing," "Angry," and so on. ElevenLabs isn't the only company racing to build more lifelike TTS models, which big tech companies are selling as a more intuitive and accessible way to interact with AI. In late May, ElevenLabs competitor Hume AI unveiled its Empathic Voice Interface (EVI) 3 model, which allows users to generate custom voices by describing them in natural language. Similarly nuanced conversational abilities are also now on offer through Google's Gemini 2.5 Pro Flash model.
[2]
ElevenLabs Launches Eleven v3 (alpha) : New Expressive Text to Speech Model
ElevenLabs has launched Eleven v3 (alpha), a new Text to Speech model designed to deliver highly expressive and realistic speech generation. This version introduces advanced features like multi-speaker dialogue, inline audio tags for emotional and tonal control, and support for over 70 languages. While it requires more prompt engineering than previous models, it offers significant improvements in expressiveness and naturalness, making it ideal for applications in media, audiobooks, and creative projects. A real-time version is under development, and API access will be available soon. At the core of Eleven v3 is its ability to produce highly expressive and lifelike speech, offering users greater control over tone, emotion, and delivery. This is achieved through several innovative features: These features make Eleven v3 particularly valuable for applications such as storytelling, audiobooks, media production, and interactive entertainment. By allowing more natural and expressive speech, the model enhances the overall user experience across a variety of platforms. Eleven v3 addresses the growing demand for multilingual support by offering compatibility with over 70 languages. This capability ensures that speech output maintains natural stress, cadence, and contextual accuracy across diverse linguistic settings. By supporting diverse languages and accents, Eleven v3 fosters inclusive communication and helps bridge language gaps, making it a valuable tool for global accessibility. Although Eleven v3 currently requires more prompt engineering than its predecessors, a real-time version is under development. This future iteration is expected to cater to applications that demand instantaneous speech synthesis, such as live voiceovers and conversational AI systems. The model also offers robust API integration, allowing developers to incorporate its features into existing workflows and platforms. This flexibility makes Eleven v3 a versatile tool for industries such as: The combination of real-time capabilities and developer-friendly integration ensures that Eleven v3 can meet the diverse needs of professionals across multiple sectors. The enhanced expressiveness and realism of Eleven v3 open up a wide range of applications, particularly in creative and functional domains. By offering a combination of emotional depth, linguistic versatility, and technical precision, Eleven v3 has the potential to transform how industries approach voice generation and communication. Eleven v3 is currently available on the ElevenLabs platform, with an 80% discount on the ElevenLabs app offered until the end of June. API access and Studio support are expected to roll out soon, with early access available through direct sales contact. For applications requiring real-time speech synthesis, ElevenLabs recommends using v2.5 Turbo or Flash until the real-time version of v3 becomes available. Eleven v3 was designed to address the limitations of earlier models, particularly in terms of expressiveness and naturalness. By allowing lifelike and responsive speech, the model meets the needs of professionals in industries such as film, gaming, education, and accessibility. As demand for realistic AI voice generation continues to grow, Eleven v3 represents a significant advancement in TTS technology. Its combination of emotional nuance, multilingual support, and developer-friendly integration positions it as a valuable tool for both creative and functional applications. By focusing on realism, versatility, and accessibility, Eleven v3 demonstrates the potential of AI-driven speech synthesis to enhance communication and storytelling across a wide range of industries. Here are additional guides from our expansive article library that you may find useful on Text-to-Speech.
[3]
ElevenLabs Unveils Eleven v3 (Alpha) Text to Speech Model
Eleven v3 brings expressive control to voice generation -- enabling true performances instead of simple readings. It can shift emotion, modulate delivery, and move fluidly between characters in a single generation. For the first time, AI speech can follow the rhythm and emotional nuance of human conversation -- across more than 70 languages. "Eleven v3 is the most expressive Text to Speech model ever - offering full control over emotions, delivery, and nonverbal cues. With audio tags, you can prompt it to whisper, laugh, change accents, or even sing. You can control the pacing, emotion, and style to match any script. And with our global mission, we are happy to extend the model with support for over 70 languages. This release is the result of the vision and leadership of my co-founder Piotr, and the incredible research team he's built. Creating a good product is hard - creating an entirely new paradigm is almost impossible. I, and all of us at ElevenLabs, feel lucky to witness the magic this team brings to life - and with this release, we're excited to push the frontier once again." -- Mati Staniszewski, Co-Founder & CEO, ElevenLabs
Share
Copy Link
ElevenLabs has launched Eleven v3 (alpha), a groundbreaking text-to-speech model that offers unprecedented expressiveness and realism in AI-generated voices across multiple languages.
ElevenLabs, a leading AI voice platform, has unveiled its latest innovation in text-to-speech technology: Eleven v3 (alpha). Described as "the most expressive text-to-speech model ever," this new release represents a significant leap forward in the realm of AI-generated speech 1.
Source: Geeky Gadgets
Eleven v3 sets itself apart from previous models and competitors by offering an extensive range of emotional expressions and subtle communicative nuances. The model can incorporate sighs, laughter, and whispers, making its speech output remarkably humanlike 1. This level of expressiveness is achieved through the use of "audio tags," which act as stylistic filters that can be inserted directly into text prompts to modify the output 1.
Source: ZDNet
One of the most notable advancements in Eleven v3 is its support for over 70 languages, a significant increase from its predecessor's 29 languages 12. This expanded language capability ensures that speech output maintains natural stress, cadence, and contextual accuracy across diverse linguistic settings, fostering inclusive communication and bridging language gaps 2.
Eleven v3 introduces several innovative features that enhance its versatility:
These capabilities make Eleven v3 particularly valuable for applications in storytelling, audiobooks, media production, and interactive entertainment 2.
The launch of Eleven v3 comes at a time when AI-generated voice has become a major focus of innovation in human-machine interaction. Tech giants like Google are also developing similar technologies, as seen with the Gemini 2.5 Pro Flash model 1.
ElevenLabs is currently working on a real-time version of v3, which is expected to cater to applications requiring instantaneous speech synthesis, such as live voiceovers and conversational AI systems 2. The company is also planning to offer robust API integration, allowing developers to incorporate Eleven v3's features into existing workflows and platforms 2.
Eleven v3 is currently available in public alpha on the ElevenLabs platform. To encourage adoption, the company is offering an 80% discount on the ElevenLabs app until the end of June 23. API access and Studio support are expected to roll out soon, with early access available through direct sales contact 2.
As the demand for realistic AI voice generation continues to grow, Eleven v3 represents a significant advancement in text-to-speech technology. Its combination of emotional nuance, multilingual support, and developer-friendly integration positions it as a valuable tool for both creative and functional applications across various industries.
A multi-billion dollar deal to build one of the world's largest AI data center hubs in the UAE, involving major US tech companies, is far from finalized due to persistent security concerns and geopolitical complexities.
4 Sources
Technology
1 day ago
4 Sources
Technology
1 day ago
A new PwC study challenges common fears about AI's impact on jobs, showing that AI is actually creating jobs, boosting wages, and increasing worker value across industries.
2 Sources
Business and Economy
1 day ago
2 Sources
Business and Economy
1 day ago
The High Court of England and Wales has issued a stern warning to lawyers about the misuse of AI in legal work, following incidents where fake cases generated by AI were cited in court proceedings.
8 Sources
Technology
23 hrs ago
8 Sources
Technology
23 hrs ago
Runway's AI Film Festival in New York highlights the growing role of artificial intelligence in filmmaking, showcasing innovative short films and sparking discussions about AI's impact on the entertainment industry.
5 Sources
Technology
1 day ago
5 Sources
Technology
1 day ago
Google rolls out "scheduled actions" for Gemini, allowing AI Pro and Ultra subscribers to automate recurring tasks and one-off reminders, positioning the AI assistant as a more proactive and personalized tool.
4 Sources
Technology
23 hrs ago
4 Sources
Technology
23 hrs ago