Curated by THEOUTPOST
On Fri, 14 Mar, 4:01 PM UTC
2 Sources
[1]
Sesame, the startup behind the viral virtual assistant Maya, open-sources its base AI model | TechCrunch
Sesame, the AI company behind the impressively realistic voice assistant Maya, has released the base AI model powering Maya, as it recently promised. The model, which is 1 billion parameters in size ("parameters" referring to individual components of the model), is under an Apache 2.0 license, meaning it can be used commercially with few restrictions. Called CSM-1B, the model generates "RVQ audio codes" from text and audio inputs, according to Sesame's description on the AI dev platform Hugging Face. RVQ refers to "residual vector quantization," a technique for encoding audio into discrete tokens called codes. RVQ is used in a number of recent AI audio technologies, including Google's SoundStream and Meta's Encodec. CSM-1B uses a model from Meta's Llama family as its backbone paired with an audio "decoder" component. A fine-tuned variant of CSM powers Maya, Sesame says. "The model open-sourced here is a base generation model," Sesame writes in CSM-1B's Hugging Face and GitHub repositories. "It is capable of producing a variety of voices, but it has not been fine-tuned on any specific voice [...] The model has some capacity for non-English languages due to data contamination in the training data, but it likely won't do well." It's unclear what data Sesame used to train CSM-1B. The company didn't say. The model has no real safeguards to speak of, it's worth noting. It's an "honor system" situation. Sesame is merely urging developers and users not to use the model to mimic a person's voice without their consent, create misleading content like fake news, or engage in "harmful" or "malicious" activities. I tried the demo on Hugging Face, and cloning my voice took less than a minute. From there, it was easy to generate speech to my heart's desire, including on controversial topics like the election and Russian propaganda: Sesame, co-founded by Oculus co-creator Brendan Iribe, went viral in late February for its assistant tech, which comes close to clearing uncanny valley territory. Maya and Sesame's other assistant, Miles, take breaths and speak with disfluencies, and can be interrupted while speaking, much like OpenAI's Voice Mode. Sesame has raised an undisclosed amount of capital from Andreessen Horowitz, Spark Capital, and Matrix Partners. In addition to building voice assistant tech, the company says it's prototyping AI glasses "designed to be worn all day" that'll be equipped with its custom models.
[2]
You can now try the AI that made Maya go viral
AI company Sesame has released its base model, CSM-1B, which powers the viral virtual assistant Maya. This model, consisting of 1 billion parameters, is available under an Apache 2.0 license that allows for commercial use with minimal restrictions. You should remember them from their incredible AI conversation demonstration. According to Sesame's description on the AI development platform Hugging Face, CSM-1B generates "RVQ audio codes" from text and audio inputs. RVQ, or "residual vector quantization," encodes audio into discrete tokens. This technique is utilized in several modern AI audio technologies, including Google's SoundStream and Meta's Encodec. The CSM-1B model utilizes a backbone from Meta's Llama family along with an audio decoder component. Sesame notes that a fine-tuned variant of the CSM model powers Maya. In its repositories, Sesame states, "The model open-sourced here is a base generation model. It is capable of producing a variety of voices, but it has not been fine-tuned on any specific voice." The model has limited capability for non-English languages due to data contamination in the training set. Consumer Reports: AI voice cloning tools have almost no security checks Sesame has not disclosed the specific data used to train CSM-1B. The company has acknowledged a lack of safeguards for the model, relying instead on an honor system. They urge developers not to use the model to imitate a person's voice without their consent, generate misleading content such as fake news, or participate in "harmful" or "malicious" activities. A demo on Hugging Face demonstrated that voice cloning could be achieved in under a minute, enabling the generation of speech on various topics, including contentious issues like elections and Russian propaganda. Consumer Reports has recently cautioned that many widely-used AI voice cloning tools lack "meaningful" safeguards against fraud or abuse. Co-founded by Brendan Iribe, an Oculus co-creator, Sesame gained attention in late February 2025 for its assistant technologies that approach uncanny realism. Maya, along with Sesame's other assistant, Miles, incorporates human-like breathing patterns, speech disfluencies, and can be interrupted while speaking, similar to OpenAI's Voice Mode. Sesame has secured an undisclosed amount of funding from Andreessen Horowitz, Spark Capital, and Matrix Partners. The company is also prototyping AI glasses "designed to be worn all day," which will feature its custom voice models.
Share
Share
Copy Link
Sesame, the startup behind the viral virtual assistant Maya, has released its base AI model CSM-1B for public use. While this move promotes innovation, it also raises ethical concerns about potential misuse of voice cloning technology.
Sesame, the AI company behind the viral virtual assistant Maya, has open-sourced its base AI model, CSM-1B, under an Apache 2.0 license 1. This 1 billion parameter model, which powers Maya, is now available for commercial use with minimal restrictions. The model generates "RVQ audio codes" from text and audio inputs, utilizing residual vector quantization (RVQ) technology similar to that used in Google's SoundStream and Meta's Encodec 2.
CSM-1B uses a model from Meta's Llama family as its backbone, paired with an audio "decoder" component. While capable of producing various voices, it has not been fine-tuned on any specific voice. The model has some capacity for non-English languages due to data contamination in the training data, but its performance in these languages may be limited 1.
The release of CSM-1B has raised significant ethical concerns due to its lack of built-in safeguards. Sesame relies on an "honor system," urging developers and users not to misuse the technology for voice imitation without consent, creation of misleading content, or engagement in harmful activities 1. This approach has been met with skepticism, especially in light of a recent Consumer Reports warning about the lack of meaningful safeguards in many AI voice cloning tools 2.
A demo on Hugging Face showcased the model's ability to clone voices in less than a minute, allowing for the generation of speech on various topics, including controversial ones like elections and Russian propaganda 1. This ease of use has sparked discussions about the potential for misuse in creating deepfakes or spreading misinformation.
Sesame, co-founded by Oculus co-creator Brendan Iribe, gained attention in late February 2025 for its impressively realistic assistant technology. The company's virtual assistants, Maya and Miles, feature human-like breathing patterns, speech disfluencies, and can be interrupted while speaking, similar to OpenAI's Voice Mode 2.
Having secured funding from prominent investors like Andreessen Horowitz, Spark Capital, and Matrix Partners, Sesame is not only focusing on voice assistant technology but also venturing into hardware. The company is currently prototyping AI glasses designed for all-day wear, which will incorporate their custom voice models 12.
The release of CSM-1B represents a significant step in the democratization of advanced AI voice technology. While it opens up new possibilities for innovation and development in the field, it also highlights the pressing need for robust ethical guidelines and safeguards in AI development. The balance between open-source accessibility and responsible use of AI technology remains a critical challenge for the industry to address.
Sesame AI's new Conversational Speech Model (CSM) introduces Maya and Miles, AI-generated voices that blur the line between human and machine interaction, sparking both excitement and concern.
10 Sources
10 Sources
OpenAI, the company behind ChatGPT, plans to release its first open-weight language model since GPT-2 in 2019. This strategic shift comes as the AI industry faces increasing pressure from open-source competitors and changing economic realities.
20 Sources
20 Sources
OpenAI introduces a suite of new tools for developers, including real-time voice capabilities and improved image processing, aimed at simplifying AI application development and maintaining its competitive edge in the AI market.
5 Sources
5 Sources
OpenAI has begun rolling out its highly anticipated voice assistant to select ChatGPT Plus subscribers. The launch comes after a delay to address safety issues, marking a significant advancement in AI-powered voice technology.
5 Sources
5 Sources
OpenAI introduces new AI models for speech-to-text and text-to-speech, offering improved accuracy, customization, and potential for building AI agents with voice capabilities.
7 Sources
7 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved