Curated by THEOUTPOST
On Tue, 3 Dec, 4:01 PM UTC
2 Sources
[1]
This AI Tool Will Let You Customise Voices for AI Systems
Hume, a New York-based artificial intelligence (AI) firm, unveiled a new tool on Monday that will allow users to customise AI voices. Dubbed Voice Control, the new feature is aimed at helping developers integrate these voices into their chatbots and other AI-based applications. Instead of offering a large range of voices, the company offers granular control over 10 different dimensions of voices. By selecting the desired parameters in each of the dimensions, users can generate unique voices for their apps. The company detailed the new AI tool in a blog post. Hume stated that it is trying to solve the problem of enterprises finding the right AI voice to match their brand identity. With this feature, users can customise different aspects of the perception of voice and allow developers to create a more assertive, relaxed, or buoyant voice for AI-based applications. Hume's Voice Control is currently available in beta, but it can be accessed by anyone registered on the platform. Gadgets 360 staff members were able to access the tool and test the feature. There are 10 different dimensions developers can adjust including gender, assertiveness, buoyancy, confidence, enthusiasm, nasality, relaxedness, smoothness, tepidity, and tightness. Instead of adding a prompt-based customisation, the company has added a slider that goes from -100 to +100 for each of the metrics. The company stated that this approach was taken to eliminate the vagueness associated with the textual description of a voice and to offer granular control over the languages. In our testing, we found changing any of the ten dimensions makes an audible difference to the AI voice and the tool was able to disentangle the different dimensions correctly. The AI firm claimed that this was achieved by developing a new "unsupervised approach" which preserves most characteristics of each base voice when specific parameters are varied. Notably, Hume did not detail the source of the procured data. Notably, after creating an AI voice, developers will have to deploy it to the application by configuring its Empathic Voice Interface (EVI) AI model. While the company did not specify, the EVI-2 model was likely used for this experimental feature. In the future, Hume plans to expand the range of base voices, introduce additional interpretable dimensions, enhance the preservation of voice characteristics under extreme modifications, and develop advanced tools to analyse and visualise voice characteristics.
[2]
Hume launches Voice Control allowing users and developers to make custom AI voices
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Hume AI, the startup specializing in emotionally intelligent voice interfaces, has launched Voice Control, an experimental feature that empowers developers and users to create custom AI voices through precise modulation of vocal characteristics -- no coding, AI prompt engineering, or sound design skills required. This release builds on the foundation laid by the company's earlier Empathic Voice Interface 2 (EVI 2), which introduced advanced capabilities in naturalness, emotional responsiveness, and customization. Both EVI 2 and Voice Control avoid the risks of voice cloning, a practice that Cowen has stated carries ethical and practical challenges. Instead, Hume focuses on providing tools for creating unique, expressive voices that align with user needs, such as customer service chatbots, digital assistants, tutors, guides, or accessibility features. Moving beyond preset AI voices toward custom bespoke solutions Voice Control offers developers the ability to adjust voices along 10 distinct dimensions, including: "Masculine/Feminine: The vocalization of gender, ranging between more masculine and more feminine. Confidence: The assuredness of the voice, ranging between shy and confident. Enthusiasm: The excitement within the voice, ranging between calm and enthusiastic. Nasality: The openness of the voice, ranging between clear and nasal. Relaxedness: The stress within the voice, ranging between tense and relaxed. Smoothness: The texture of the voice, ranging between smooth and staccato. Tepidity: The liveliness behind the voice, ranging between tepid and vigorous. Tightness: The containment of the voice, ranging between tight and breathy." This no-code tool allows users to fine-tune voice attributes in real time through virtual onscreen sliders. It's currently available in Hume's virtual playground, which requires a free user sign-up to access. The release addresses a key pain point in the AI industry: the reliance on preset voices, which often fail to meet the specific needs of brands or applications, or the risks associated with voice cloning. This focus on customization aligns with Hume's broader goal of developing emotionally nuanced voice AI. The company's efforts to advance voice AI were highlighted in September 2024 with the launch of EVI 2, which the company described as a significant upgrade to its predecessor. EVI 2 improved latency by 40%, reduced costs by 30%, and expanded voice modulation features, offering developers a safer alternative to voice cloning. Sliders > text prompts Hume's research-driven approach plays a central role in its product development. The company, co-founded by former Google DeepMinder Alan Cowen, utilizes a proprietary model based on cross-cultural voice recordings paired with emotional survey data. This methodology, rooted in emotion science, forms the backbone of both EVI 2 and the newly launched Voice Control. Voice Control extends these principles by addressing the granular, often ineffable ways humans perceive voices. The tool's slider-based interface reflects common perceptual qualities of voice, such as buoyancy or assertiveness, without attempting to oversimplify these attributes through text-based prompts. Developer tools Voice Control is immediately available in beta and integrates with Hume's Empathic Voice Interface (EVI), making it accessible for a wide range of applications. Developers can select a base voice, adjust its characteristics, and preview the results in real time. This process ensures reproducibility and stability across sessions, key features for real-time applications like customer service bots or virtual assistants. EVI 2's influence is evident in Voice Control's capabilities. The earlier model introduced features like in-conversation prompts and multilingual capabilities, which have broadened the scope of voice AI applications. For example, EVI 2 supports sub-second response times, enabling natural and immediate conversations. It also allows dynamic adjustments to speaking style during interactions, making it a versatile tool for businesses. Differentiating in a competitive market Hume's focus on voice customization and emotional intelligence positions it as a strong competitor in the voice AI space, even against well-funded rivals such as OpenAI with its Advanced Voice Mode and ElevenLabs, both of which offer libraries of pre-set voices. Hume continues to build on its innovative approach to voice AI. Plans for expanding Voice Control include introducing additional modifiable dimensions, refining voice quality under extreme adjustments, and increasing the range of base voices available. With the launch of Voice Control, Hume reinforces its position as a leader in voice AI innovation, offering tools that prioritize customization, emotional intelligence, and real-time adaptability. Developers can access Voice Control today via Hume's platform, marking another step forward in the evolution of AI-driven voice solutions.
Share
Share
Copy Link
Hume AI launches Voice Control, an innovative tool allowing users to create custom AI voices by adjusting 10 distinct vocal dimensions, offering a new level of personalization in voice AI technology.
Hume AI, a New York-based artificial intelligence firm, has unveiled an innovative tool called Voice Control, marking a significant advancement in the realm of AI-generated voices. This experimental feature, launched on Monday, allows users and developers to create custom AI voices without the need for coding, AI prompt engineering, or sound design skills [1][2].
Voice Control offers a unique approach to voice customization by providing granular control over 10 different dimensions of voice characteristics. Users can adjust parameters such as gender, assertiveness, buoyancy, confidence, enthusiasm, nasality, relaxedness, smoothness, tepidity, and tightness [1]. This level of customization is achieved through a slider-based interface that ranges from -100 to +100 for each metric, allowing for precise fine-tuning of vocal attributes [1][2].
The introduction of Voice Control addresses a significant pain point in the AI industry: the reliance on preset voices that often fail to meet specific brand or application needs. By offering this level of customization, Hume AI aims to provide a safer and more flexible alternative to voice cloning, a practice that has raised ethical and practical concerns [2].
Voice Control is currently available in beta and can be accessed by anyone registered on Hume's platform. The tool integrates seamlessly with Hume's Empathic Voice Interface (EVI) AI model, likely utilizing the EVI-2 model for this experimental feature [1][2]. This integration ensures that the customized voices can be easily deployed in various applications, from customer service chatbots to digital assistants and accessibility features.
Hume's approach is rooted in emotion science and utilizes a proprietary model based on cross-cultural voice recordings paired with emotional survey data. The company claims to have developed a new "unsupervised approach" that preserves most characteristics of each base voice when specific parameters are varied [1][2]. This methodology allows for the disentanglement of different voice dimensions, resulting in audible and distinct changes when adjustments are made.
Looking ahead, Hume plans to expand the range of base voices, introduce additional interpretable dimensions, and develop advanced tools for analyzing and visualizing voice characteristics [1]. These developments could potentially reshape the landscape of voice AI technology, offering new possibilities for personalized and emotionally intelligent voice interfaces across various industries.
Hume's focus on voice customization and emotional intelligence positions it as a strong competitor in the voice AI space. While companies like OpenAI and ElevenLabs offer libraries of pre-set voices, Hume's approach to granular customization sets it apart in the market [2]. This innovative tool could have far-reaching implications for industries relying on AI-driven voice solutions, from customer service to entertainment and beyond.
As Voice Control enters the market, it represents a significant step forward in the evolution of AI-driven voice technology, offering unprecedented levels of customization and control to developers and users alike.
Reference
[1]
OpenAI has finally released its advanced voice feature for ChatGPT Plus and Team users, allowing for more natural conversations with the AI. The feature was initially paused due to concerns over potential misuse.
14 Sources
ChatGPT's new Advanced Voice Mode brings human-like speech to AI interactions, offering multilingual support, customization, and diverse applications across personal and professional domains.
2 Sources
OpenAI introduces a suite of new tools for developers, including real-time voice capabilities and improved image processing, aimed at simplifying AI application development and maintaining its competitive edge in the AI market.
5 Sources
OpenAI brings ChatGPT's Advanced Voice Mode to Windows and Mac desktop apps, offering users a more natural and intuitive way to interact with AI through voice conversations.
6 Sources
OpenAI has rolled out an advanced voice mode for ChatGPT, allowing users to engage in verbal conversations with the AI. This feature is being gradually introduced to paid subscribers, starting with Plus and Enterprise users in the United States.
12 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved