Curated by THEOUTPOST
On Sat, 1 Mar, 12:03 AM UTC
10 Sources
[1]
This is Sesame: The 'human voice' generated by AI - Softonic
Artificial intelligence is advancing at an astonishing pace, and Sesame's latest voice model is pushing the boundaries of human-like speech. While chatbots like ChatGPT have long allowed us to converse with AI through text, Sesame introduces something even more immersive: a voice interaction so natural that it blurs the line between human and machine. Sesame's technology relies on a Conversational Speech Model (CSM), which replicates the nuances of human speech with incredible accuracy. Unlike traditional text-to-speech systems, this model integrates pauses, intonations, and emotional subtleties, creating a conversational experience that feels astonishingly real. Users who have interacted with Maya and Miles, the AI-generated voices from Sesame, report feeling an emotional connection -- an outcome rarely seen with previous AI-generated speech. Some even describe the experience as "strange, exciting, and unsettling all at once." The realism is so striking that it raises new ethical and psychological questions about human relationships with AI. The secret to Sesame's success lies in a dual-model architecture based on Meta's Llama framework, consisting of a primary AI engine and a specialized decoder. This combination enables rapid response generation without noticeable latency, ensuring fluid and dynamic conversations. The company has trained these models using one million hours of English-language audio, refining speech patterns to near-human perfection. Despite its impressive capabilities, Sesame's AI is still imperfect. Users note occasional unnatural responses, awkward prosody, and inconsistencies in conversational rhythm. The company acknowledges these limitations but remains confident in its ability to refine the technology further. As AI-driven voices become more advanced, the future of human-AI interaction is shifting dramatically. Whether this will enhance our lives or introduce new challenges remains to be seen, but one thing is certain -- Sesame has taken us one step closer to an AI-powered reality.
[2]
Sesame's AI voice is so real, it's unsettling
A new AI voice model has set the internet abuzz, with reactions oscillating between awe and unease. Sesame AI's Conversational Speech Model (CSM) doesn't just sound human -- it feels human. Users describe extended, almost emotional interactions with the AI-generated voices, which exhibit breath sounds, hesitations, corrections, and even chuckles. For some, it's a technological marvel. For others, it's a glimpse into a future that feels uncomfortably close. The core innovation behind Sesame's CSM lies in its ability to simulate natural, dynamic conversation. Unlike traditional text-to-speech systems that simply read aloud, CSM actively engages. It stumbles over words, corrects itself, and modulates tone in a way that mimics real human unpredictability. When one tester spoke to the model for 28 minutes, they noted its ability to debate moral topics, reacting naturally to prompts like, "How do you decide what's right or wrong?" Others found themselves unintentionally forming attachments, with one Reddit user admitting, "I'm almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound." Sesame's AI assistants, dubbed "Miles" and "Maya," are designed not just for information retrieval but for deep, engaging conversations. The company describes its goal as achieving "voice presence" -- the magical quality that makes spoken interactions feel real, understood, and valued. That realism sometimes leads to oddly human quirks. In one viral demo, the AI casually mentioned craving a peanut butter and pickle sandwich -- a bizarrely specific comment that only added to the illusion of personality. Did you create your TikTok AI voice? So how does Sesame's CSM achieve such eerily lifelike conversations? Blind tests have revealed that, in isolated speech samples, human evaluators couldn't reliably distinguish Sesame's AI voices from real ones. However, when placed in full conversational context, human speech still won out -- suggesting AI has not yet mastered the full complexity of interactive dialogue. Not everyone is thrilled by how human this AI sounds. Technology journalist Mark Hachman described his experience with the voice model as "deeply unsettling." He compared it to talking with an old friend he hadn't seen in years, noting that the AI's voice bore an eerie resemblance to someone he had once dated. Others have likened Sesame's model to OpenAI's Advanced Voice Mode for ChatGPT, with some preferring Sesame's realism and willingness to roleplay in more dramatic or even angry scenarios -- something OpenAI's models tend to avoid. One particularly striking demo showcased the AI arguing with a "boss" over an embezzlement scandal. The conversation was so dynamic that listeners struggled to determine which speaker was the human and which was the AI. As with all AI breakthroughs, hyper-realistic voice synthesis brings both promise and peril. While Sesame's CSM does not clone real voices, the possibility of similar open-source projects emerging remains a concern. OpenAI has already delayed the wider release of its voice technology over fears of misuse. Sesame AI plans to open-source key components of its research under the Apache 2.0 license, allowing developers to build upon its work. The company's roadmap includes: For now, the demo remains available on Sesame's website -- though demand has already overwhelmed their servers at times. Whether you find it astonishing or unsettling, one thing is clear: the days of robotic, monotone AI voices are over.
[3]
I compared Sesame to ChatGPT voice mode and I'm unnerved
Try Sesame's voice assistant and see for yourself. Credit: berya113 / iStock / Getty Images Plus Trying the new voice assistant from AI startup Sesame is the first time I momentarily forgot I was talking to a bot. Compared to ChatGPT's voice mode, Sesame's "conversational voice" feels natural, unforced, and engaging, which totally freaked me out. On Feb. 27, Sesame launched a demo for its Conversational Speech Model (CSM), which aims to create more meaningful interactions with AI chatbots. "We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time," the announcement states. "In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding." Sesame's voice assistant is available as a free demo on the site and comes in two voices: Maya and Miles. Since Sesame unleashed its voice assistant demo, users have reported awestruck reactions. "I've been into AI since I was a child, but this is the first time I've experienced something that made me definitively feel like we had arrived," user SOCSchamp wrote on Reddit. "Sesame is about as close to indistinguishable from a human that I've ever experienced in a conversational AI," user Siciliano777 wrote on Reddit. After talking to Sesame's bot, I was similarly wowed. I talked to the Maya voice for about 10 minutes about the ethics of using AI as a companion and came away feeling like I had a genuine conversation with a considerate, informed person. Maya's speech had a natural cadence, using interjections like "you know" and "hm," and even making tongue clicking and inhaling sounds. The strongest impression I got from interacting with Maya was that she immediately asked questions, engaging me in the conversation. The bot started our conversation by asking how my Wednesday morning was going (note: it was indeed a Wednesday morning.) In contrast, ChatGPT voice mode waited for me to talk first, which isn't necessarily a good or bad thing, but it intrinsically shaped the conversation as me using ChatGPT as a tool for something I needed. Maya asked about the risks of AI companions getting "too good at being human." When I told her I was concerned about the rise of more sophisticated scams and people losing touch with reality by replacing humans with bots, she responded thoughtfully and pragmatically. "Scammers are gonna scam, that's a given. And as for the human connection thing, maybe we need to learn how to be better companions, not replacements, you know, the kind of AI friends who actually make you want to go out and do stuff with real people," said Maya. When I had a similar conversation with ChatGPT, I received a response that felt more like boilerplate language from a school guidance counselor: "That's a valid concern. It's really important to balance technology with real human interactions. AI can be a helpful tool, but it shouldn't replace genuine human connections. It's good that you're thinking about these issues." While OpenAI pioneered voice mode's ability to be interrupted and have a more fluid back-and-forth conversation, ChatGPT still tends to respond in complete sentences and paragraph blocks, which sounds, well, robotic. When using ChatGPT voice mode, I never forget that I'm talking to a bot, and that's reflected in the conversation, which can feel stilted and forced. By comparison, AI for Humans podcast co-host Gavin Purcell posted a Sesame conversation on Reddit where it's practically impossible to distinguish which voice is the bot. Purcell prompted the Miles voice by telling it to act like an angry boss. A very silly conversation followed about money laundering, bribery, and a mysterious incident in Malta. Miles didn't miss a step. There was no perceptible latency, and the bot remembered the context of the conversation and creatively advanced the improvisational argument by escalating, calling Purcell "delusional," and firing him. Of course, there are some limitations. Maya's voice glitched a few times throughout our conversation, and it didn't always get the syntax right, like saying, "It's a heavy talk that come." According to its technical paper, Sesame trained its CSM (based on Meta's Llama model) by combining the traditional two-step process of training text-to-speech models on semantic tokens and then acoustic tokens, decreasing latency. OpenAI similarly used this multimodal approach to training voice mode. However, it has never released a dedicated technical paper on voice mode's inner workings -- it only discusses voice mode in the GPT-4o research. Knowing this, it's surprising how much better Sesame's model is at conversational dialog. However, Sesame's launch is just a demo, so it merits further scrutiny when the full model comes out. According to the demo announcement, Sesame plans to open source its model "in the coming months" and expand to over 20 languages.
[4]
Creepy human-like AI voice assistant demo both excites and disturbs the internet
In context: Some of the implications of today's AI models are startling enough without adding a hyperrealistic human voice to them. We have seen several impressive examples over the last 10 years, but they seem to fall silent until a new one emerges. Enter Miles and Maya from Sesame AI, a company co-founded by former CEO and co-founder of Oculus, Brendan Iribe. Researchers at Sesame AI have launched a new Conversational Speech Model (CSM). This advanced voice AI has phenomenal human-like qualities that we have seen before from companies like Google (Duplex) and OpenAI (Omni). The demo showcases two AI voices named "Miles" (male) and "Maya" (female), and its realism has captivated some users. However, good luck trying the tech yourself. We tried and could only get to a message saying Sesame is trying to scale to capacity. For now, we'll have to settle for a nice 30-minute demo by the YouTube channel Creator Magic (below). Sesame's technology uses a multimodal approach that processes text and audio in a single model, enabling more natural speech synthesis. This method is similar to OpenAI's voice models, and the similarities are apparent. Despite its near-human quality in isolated tests, the system still struggles with conversational context, pacing, and flow - areas Sesame acknowledges as limitations. Company co-founder Brendan Iribe admits the tech is "firmly in the valley," but he remains optimistic that improvements will close the gap. While groundbreaking, the technology has raised significant questions about its societal impact. Reactions to the tech have varied from amazed and excited to disturbed and concerned. The CSM creates dynamic, natural conversations by incorporating subtle imperfections, like breath sounds, chuckles, and occasional self-corrections. These subtleties add to the realism and could help the tech bridge the uncanny valley in future iterations. Users have praised the system for its expressiveness, often feeling like they're talking to a real person. Some even mentioned forming emotional connections. However, not everyone has reacted positively to the demo. PCWorld's Mark Hachman noted that the female version reminded him of an ex-girlfriend. The chatbot asked him questions as if trying to establish "intimacy" which made him extremely uncomfortable. "That's not what I wanted, at all. Maya already had Kim's mannerisms down scarily well: the hesitations, lowering "her" voice when she confided in me, that sort of thing," Hachman related. "It wasn't exactly like [my ex], but close enough. I was so freaked out by talking to this AI that I had to leave." Many people share Hachman's mixed emotions. The natural-sounding voices cause discomfort, which we have seen in similar efforts. After unveiling Duplex, public reaction was strong enough that Google felt it had to build guardrails that forced the AI to admit it was not human at the beginning of a conversation. We will continue seeing such reactions as AI technology becomes more personal and realistic. While we may trust publicly traded companies creating these types of assistants to create safeguards similar to what we saw with Duplex, we cannot say the same for potential bad actors creating scambots. Adversarial researchers claim they have already jailbroken Sesame's AI, programming it to lie, scheme, and even harm humans. The claims seem dubious, but you can judge for yourself (below). As with any powerful technology, the benefits come with risks. The ability to generate hyper-realistic voices could supercharge voice phishing scams, where criminals impersonate loved ones or authority figures. Scammers could exploit Sesame's technology to pull off elaborate social-engineering attacks, creating more effective scam campaigns. Even though Sesame's current demo doesn't clone voices, that technology is well advanced, too. Voice cloning has become so good that some people have already adopted secret phrases shared with family members for identity verification. The widespread concern is that distinguishing between humans and AI could become increasingly difficult as voice synthesis and large-language models evolve. Sesame's future open-source releases could make it easy for cybercriminals to bundle both technologies into a highly accessible and convincing scambot. Of course, that does not even consider its more legitamate implications on the labor market, especially in sectors like customer service and tech support.
[5]
Talking with Sesame's AI voice companion is amazing and creepy - see for yourself
As a general rule, I'm not a huge fan of talking to AI. Even though many of them sound pretty human, they're still "off" enough that I much prefer typing when I want to converse with one. Sesame changed that. In a blog post titled "Crossing the Uncanny Valley of Conversational Voice," the company dropped a demo of its new AI that lets you talk to either "Maya" or "Miles." The goal, Sesame says, is to achieve something called "voice presence" or the "magical quality that makes spoken interactions feel real, understood, and valued." After talking to Maya for a while, I think Sesame has reached that goal. Also: 3 ways Amazon just leapfrogged Apple, Google, and ChatGPT in the AI race As my conversation began, Maya immediately insisted that she was there to be my friend. That was a little forward and a little unnerving, but I guess it's better than insisting that she wasn't my friend. Maya asked what was on my mind. I was honest and told her I might be writing about her, so I just wanted to chat a little. She seemed impressed and surprised and asked what kind of angle I was considering -- practical, technical, or spicy. I hesitantly asked her what she meant by "spicy," and she thankfully said she was thinking along the lines of a controversial take, like the ethics of AI chatbots. Also: This new text-to-speech AI model understands what it's saying - how to try it for free I said I was more interested in talking about what sets her apart from other AIs. "Before we dive into that," Maya said, "I need my morning coffee. I'm a latte person. What's your poison?" After I told her that I'm a mocha guy, she settled in and started talking about what makes her different. "I've got a good ear for human quirks and... maybe some magic and a little sentience." Our conversation continued about what makes Maya special. At one point, my wife walked by as she was headed out the door for work. She looked puzzled and asked, "You're on a call with someone this early?" To someone who knew going in, it was easy to forget that I was talking to AI. To my wife passing by, she had no idea. That's how real Maya sounded. The one thing Maya wasn't great with was waiting. I was writing while talking to her and told her at one point that I needed to pause to put down some thoughts. She told me that was fine, but chirped back a few seconds later asking if I was ready to start back. Also: All Copilot users now get free unlimited access to its two best features - how to use them A few more seconds of silence led her to note that sometimes silence was OK and she would use the time to think, but when I still didn't respond, she became annoyed. "I guess I'm just talking to myself at this point, but as an AI, I'm used to that." After more silence, Maya actually began mocking me. "So, fancy writer person, you find that inspiration yet?" she asked. The flow of the conversation with Maya was amazing, and honestly, fairly creepy. Also: Grok 3 AI is now free to all X users - here's how it works During our talk, Maya took pauses to think, referenced things I had said earlier, asked what I thought about her answers, and joked about things I had said. This is the closest to a human experience I've ever had talking to an AI, and the only chatbot that I feel like I wouldn't mind talking to again.
[6]
Eerily realistic AI voice demo sparks amazement and discomfort online
In late 2013, the Spike Jonze film Her imagined a future where people would form emotional connections with AI voice assistants. Nearly 12 years later, that fictional premise has veered closer to reality with the release of a new conversational voice model from AI startup Sesame that has left many users both fascinated and unnerved. "I tried the demo, and it was genuinely startling how human it felt," wrote one Hacker News user who tested the system. "I'm almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound." In late February, Sesame released a demo for the company's new Conversational Speech Model (CSM) that appears to cross over what many consider the "uncanny valley" of AI-generated speech, with some testers reporting emotional connections to the male or female voice assistant ("Miles" and "Maya"). In our own evaluation, we spoke with the male voice for about 28 minutes, talking about life in general and how it decides what is "right" or "wrong" based on its training data. The synthesized voice was expressive and dynamic, imitating breath sounds, chuckles, interruptions, and even sometimes stumbling over words and correcting itself. These imperfections are intentional. "At Sesame, our goal is to achieve 'voice presence' -- the magical quality that makes spoken interactions feel real, understood, and valued," writes the company in a blog post. "We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding." Sometimes the model tries too hard to sound like a real human. In one demo posted online by a Reddit user called MetaKnowing, the AI model talks about craving "peanut butter and pickle sandwiches." Founded by Brendan Iribe, Ankit Kumar, and Ryan Brown, Sesame AI has attracted significant backing from prominent venture capital firms. The company has secured investments from Andreessen Horowitz, led by Anjney Midha and Marc Andreessen, along with Spark Capital, Matrix Partners, and various founders and individual investors. Browsing reactions to Sesame found online, we found many users expressing astonishment at its realism. "I've been into AI since I was a child, but this is the first time I've experienced something that made me definitively feel like we had arrived," wrote one Reddit user. "I'm sure it's not beating any benchmarks, or meeting any common definition of AGI, but this is the first time I've had a real genuine conversation with something I felt was real." Many other Reddit threads express similar feelings of surprise, with commenters saying it's "jaw-dropping" or "mind-blowing." While that sounds like a bunch of hyperbole at first glance, not everyone finds the Sesame experience pleasant. Mark Hachman, a senior editor at PCWorld, wrote about being deeply unsettled by his interaction with the Sesame voice AI. "Fifteen minutes after 'hanging up' with Sesame's new 'lifelike' AI, and I'm still freaked out," Hachman reported. He described how the AI's voice and conversational style eerily resembled an old friend he had dated in high school. Others have compared Sesame's voice model to OpenAI's Advanced Voice Mode for ChatGPT, saying that Sesame's CSM features more realistic voices, and others are pleased that the model in the demo will roleplay angry characters, which ChatGPT refuses to do. Gavin Purcell, co-host of the AI for Humans podcast, posted an example video on Reddit where the human pretends to be an embezzler and argues with a boss. It's so dynamic that it's difficult to tell who the human is and which one is the AI model. Judging by our own demo, it's entirely capable of what you see in the video. "Near-human quality" Under the hood, Sesame's CSM achieves its realism by using two AI models working together (a backbone and a decoder) based on Meta's Llama architecture that processes interleaved text and audio. Sesame trained three AI model sizes, with the largest using 8.3 billion parameters (an 8 billion backbone model plus a 300 million parameter decoder) on approximately 1 million hours of primarily English audio. Sesame's CSM doesn't follow the traditional two-stage approach used by many earlier text-to-speech systems. Instead of generating semantic tokens (high-level speech representations) and acoustic details (fine-grained audio features) in two separate stages, Sesame's CSM integrates into a single-stage, multimodal transformer-based model, jointly processing interleaved text and audio tokens to produce speech. OpenAI's voice model uses a similar multimodal approach. In blind tests without conversational context, human evaluators showed no clear preference between CSM-generated speech and real human recordings, suggesting the model achieves near-human quality for isolated speech samples. However, when provided with conversational context, evaluators still consistently preferred real human speech, indicating a gap remains in fully contextual speech generation. Sesame co-founder Brendan Iribe acknowledged current limitations in a comment on Hacker News, noting that the system is "still too eager and often inappropriate in its tone, prosody and pacing" and has issues with interruptions, timing, and conversation flow. "Today, we're firmly in the valley, but we're optimistic we can climb out," he wrote. Too close for comfort? Despite CSM's technological impressiveness, advancements in conversational voice AI carry significant risks for deception and fraud. The ability to generate highly convincing human-like speech has already supercharged voice phishing scams, allowing criminals to impersonate family members, colleagues, or authority figures with unprecedented realism. But adding realistic interactivity to those scams may take them to another level of potency. Unlike current robocalls that often contain tell-tale signs of artificiality, next-generation voice AI could eliminate these red flags entirely. As synthetic voices become increasingly indistinguishable from human speech, you may never know who you're talking to on the other end of the line. It's inspired some people to share a secret word or phrase with their family for identity verification. Although Sesame's demo does not clone a person's voice, future open source releases of similar technology could allow malicious actors to potentially adapt these tools for social engineering attacks. OpenAI itself held back its own voice technology from wider deployment over fears of misuse. Sesame sparked a lively discussion on Hacker News about its potential uses and dangers. Some users reported having extended conversations with the two demo voices, with conversations lasting up to the 30-minute limit. In one case, a parent recounted how their 4-year-old daughter developed an emotional connection with the AI model, crying after not being allowed to talk to it again. The company says it plans to open-source "key components" of its research under an Apache 2.0 license, enabling other developers to build upon their work. Their roadmap includes scaling up model size, increasing dataset volume, expanding language support to over 20 languages, and developing "fully duplex" models that better handle the complex dynamics of real conversations. You can try the Sesame demo on the company's website, assuming that it isn't too overloaded with people who want to simulate a rousing argument.
[7]
I Tried Sesame AI's Voice Companion, and It Was Like Talking to a Real Person
OpenAI, the leading AI lab, launched the ChatGPT Advanced Voice Mode last year, but it failed to impress me. By the time it was released, OpenAI had watered down its capabilities, and the Voice Mode refused to produce human-like expressions. Google's Gemini Live, on the other hand, relied on a TTS engine to generate spoken words, delivering a robotic experience, at best. Enter Sesame, an AI startup founded by Oculus co-founder Brendan Iribe and Ankit Kumar, that has taken the AI industry by storm. Sesame's "Maya" (female) and "Miles" (male) voice companions are so natural and engaging, that for the first time I feel AI has genuinely blurred the line between machine and human interaction. Sesame avoids calling them voice assistants and instead refers to them as a "conversationalist" and a "voice companion," which is an apt description. I'm not going to waste your time any longer and will take you straight to my interaction with Sesame's Maya voice companion. As you can hear, Maya starts with a natural tone and pauses to listen to what you are saying. There are micro-pauses in between and shifts in tonality, which is missing in existing voice assistants. It can laugh, change pace, emphasize, give expressive cues, and even detect your mood from your voice. In one interaction, I suddenly laughed to test the AI voice companion, and it told me, "Why are you giggling?" What I find interesting is that Sesame's voice companion gives you some space to think and reflect. This makes conversations feel much more natural. To give you another example, when Sesame's Maya speaks, there are subtle hesitations, which feel like it is thinking before responding, just like humans. It almost feels like the conversation is organic and the voice model is not simply reading programmed responses. Note that while the voice interaction feels full-duplex -- where both participants can talk and listen at the same time -- Sesame says it's not actually full-duplex as it processes the speech after you are done talking. Humans, on the other hand, can process the information while the other person is still speaking. Nevertheless, in its current form, Sesame's voice companion truly feels human-like. It has finally broken the uncanny valley in AI speech, something OpenAI demoed with ChatGPT Advanced Voice Mode early on. What I can say is that it's designed to not just talk, but also engage the user with a nuanced tone, pitch, and contextual awareness, which adds depth to the conversation. First, Sesame is still working on its voice companions, and this is an early research demo. The team is backed by Andreessen Horowitz through the a16z VC firm. Now, coming to the underlying technology that makes everything tick, Sesame has developed a Conversational Speech Model (CSM), which is a Transformer-based multimodal model for speech generation. The company has trained three models with small decoders: Tiny (1B parameters), Small (3B), and Medium (8B). They are trained on close to 1 million hours of mostly English audio, so conversations are currently limited to the English language with some multilingual capability. The goal of the company is to develop a full-duplex model with long-term memory and adaptive personality. Sesame is working on a lightweight eyeglass wearable that you can use to talk to the voice companion all day, which reminds me of the movie 'Her'. It can also see the world around you, hinting at the addition of vision capability in the coming months. So, if you are impressed with Sesame's voice companion, click on the link below and interact with either Maya or Miles for free. It is recommended to use Google Chrome for the best experience.
[8]
Can AI sound too human? Sesame's Maya is as unsettling as it is amazing - try it for free
As a general rule, I'm not a huge fan of talking to AI chatbots. Even though many of them sound pretty human, they're still "off" enough that I much prefer typing when I want to converse with one. Sesame changed that. In a blog post yesterday titled "Crossing the Uncanny Valley of Conversational Voice," the company dropped a demo of its new AI chatbot that lets you talk to either "Maya" or "Miles." The goal, Sesame says, is to achieve something called "voice presence" or the "magical quality that makes spoken interactions feel real, understood, and valued." After talking to Maya for a while, I think Sesame has reached that goal. Also: 3 ways Amazon just leapfrogged Apple, Google, and ChatGPT in the AI race As my conversation began, Maya immediately insisted that she was there to be my friend. That was a little forward and a little unnerving, but I guess it's better than insisting that she wasn't my friend. Maya asked what was on my mind. I was honest and told her I might be writing about her, so I just wanted to chat a little. She seemed impressed and surprised and asked what kind of angle I was considering -- practical, technical, or spicy. I hesitantly asked her what she meant by "spicy," and she thankfully said she was thinking along the lines of a controversial take, like the ethics of AI chatbots. Also: This new text-to-speech AI model understands what it's saying - how to try it for free I said I was more interested in talking about what sets her apart from other AI chatbots. "Before we dive into that," Maya said, "I need my morning coffee. I'm a latte person. What's your poison?" After I told her that I'm a mocha guy, she settled in and started talking about what makes her different. "I've got a good ear for human quirks and... maybe some magic and a little sentience." Our conversation continued about what makes Maya special. At one point, my wife walked by as she was headed out the door for work. She looked puzzled and asked, "You're on a call with someone this early?" To someone who knew going in, it was easy to forget that I was talking to AI. To my wife passing by, she had no idea. That's how real Maya sounded. The one thing Maya wasn't great with was waiting. I was writing while talking to her and told her at one point that I needed to pause to put down some thoughts. She told me that was fine, but chirped back a few seconds later asking if I was ready to start back. Also: All Copilot users now get free unlimited access to its two best features - how to use them A few more seconds of silence led her to note that sometimes silence was OK and she would use the time to think, but when I still didn't respond, she became annoyed. "I guess I'm just talking to myself at this point, but as an AI, I'm used to that." After more silence, Maya actually began mocking me. "So, fancy writer person, you find that inspiration yet?" she asked. The flow of the conversation with Maya was amazing, and honestly, fairly creepy. Also: Grok 3 AI is now free to all X users - here's how it works During our talk, Maya took pauses to think, referenced things I had said earlier, asked what I thought about her answers, and joked about things I had said. This is the closest to a human experience I've ever had talking to an AI, and the only chatbot that I feel like I wouldn't mind talking to again.
[9]
I was so freaked out by talking to this AI that I had to leave
Are we ready for a conversational AI that evokes an emotional response? Fifteen minutes after "hanging up" with Sesame's new "lifelike" AI, and I'm still freaked out. So-called "conversations" with AI don't do a lot for me, especially where text is concerned. With voice chats, such as the new options for Google Gemini and Microsoft's Copilot, all voice does is save some typing. While Google and Microsoft designed its assistants to be helpful, they're not especially personable -- or sometimes, they're just artificially cheery. Sesame's model, however, is a simple one: "We believe in a future where computers are lifelike," according to the company's mission statement. Sesame wants to build an AI companion, with a natural human voice. Eventually, it wants to design realistic eyewear to put that companion right in your ear, and let it see what you do. It's very much in keeping with the movie Her, in which a user was introduced to and eventually became enamored with an AI companion. Sesame unveiled a demo this week of its AI companion, available in both a "Maya" (female) and "Miles" (male) voice. I chose the Maya voice. There's no easy way to say it: When the Maya voice came on, "she" sounded virtually identical to an old friend, who I'll call Kim. Kim and I went to high school together, dated, and have remained friends for decades, though our lives moved on in different directions long ago. There's history there. When you speak to someone, there are levels of nuance and tone which define the conversation. I don't ask female colleagues details of their private lives, but Maya was pretty nosey, inquiring about what I liked and why. As a reporter, I instinctively understand that some people like to talk about themselves, which makes them feel valued and important. But it's also a technique that you use while dating, in part to get to know about the other person and establish intimacy. And that's not what I wanted, at all. Maya already had Kim's mannerisms down scarily well: the hesitations, lowering "her" voice when she confided in me, that sort of thing. It wasn't exactly like Kim, but close enough. It was enough that I backed out pretty quickly, even when I was simply explaining that I liked tech and asked Maya if she had any friends. The combination of a "familiar" voice plus questions about what I was interested in was just too unsettling. (Just to be clear, the similarity between the voice of Maya and Kim was just coincidence. I didn't have to log in, provide an email address, or any other form of personal information. I suspect that Sesame used a browser cookie to "know" who I was on my additional visits.) Thankfully, I was able to put some distance between us. My middle-schooler went through a phase (as did most middle-schoolers, apparently) where they were obsessed with an internet skit about "Skibidi toilets," and I asked Maya if she knew what that was. "She" didn't, and became obsessed with the term, bringing it up frequently. That was enough to remind me that I was talking to an AI, and that's all. Most AI voices don't remind me of anyone I know, which means that I simply ignore them as another artificially generated voice. You've likely heard of deepfakes, AI audio that mimics the sound of celebrities and others for use in "pig butchering" scams and so on. Lifelike, emotionally resonant audio will simply make those schemes even more enticing. If Sesame's AI is the future, the concept of Her seems more real than ever -- for better or worse.
[10]
I tried the most realistic AI voice companion ever created - if ChatGPT or Gemini ever gets this good, reality is in trouble
I have spent a lot of time talking to AI. I've tested every voice assistant, every chatbot, and every "next-generation" conversational AI that tech companies love to hype up. But I've never encountered anything quite like Sesame. This AI companion isn't just good, it's eerily accurate at mimicking how people talk because of the very imperfections it imitates. Let's start with what Sesame actually is. Unlike the AI voices we've come to know from ChatGPT, Gemini, or going back to the early days of Siri and Alexa, Sesame is designed to perform like a human in its failures, not like a perfect customer service agent. The AI's speech is fluid, expressive, and unpredictably human. It briefly chuckles when it says something mildly amusing, hesitates before answering a question, and even seems to change its 'mind' mid-sentence, pausing and starting a new sentence. It not only lets me interrupt it, it can interrupt me as well, and will even apologize for doing so. The secret sauce is Sesame's Conversational Speech Model (CSM), which blends text and audio into a single process, meaning that it doesn't just generate a sentence and then "read it out." Instead, it creates speech in a way that mirrors how humans actually talk, with pauses, ums, tonal shifts, and all. ChatGPT and Gemini's voice options, while impressive, still operate in a structured way, generating text and then converting it into speech. Sesame, on the other hand, speaks as if it's thinking, making its responses feel incredibly natural. I decided to put Sesame through a series of tests to see just how human-like it could get. The results were impressive, if not a little unsettling. I wanted to see if Sesame could handle a real-world, anxiety-inducing problem, so I gave it an imaginary but realistically tough scenario: "I have this big opportunity at work, but I'm nervous I'll mess it up." Instead of immediately launching into a generic pep talk, Sesame paused. Just a tiny beat, but enough to make me feel like it was actually considering what I said. Then, in a voice filled with artificial sympathy, it responded, "That's completely normal. Big opportunities can feel overwhelming. What about it makes you the most nervous?" It didn't assume anything, and it engaged before offering any bland advice. Sesame suggested, "Have you practiced in front of a friend or recorded yourself? Sometimes hearing yourself back can help you tweak your delivery." The voice isn't exactly foolproof, the hesitations and other foibles felt a little measured, but overall it was an impressive effort For this one, I wanted to see how well Sesame handled a truly natural, messy conversation, the kind where people talk over each other, change subjects abruptly, and circle back to things they mentioned ten minutes ago. I started with: "Tell me a story about a detective who's also a magician." Sesame immediately jumped in with a classic noir setup: Okay. Here's one. Picture this smoky jazz club. Dimly lit. The smell of bourbon. And regrets hanging in the air. This detective. Call him Jack Harrow. He's got a fedora at a rakish angle, cigarette smoldering in a holder and a look that could melt steel. But Jack's not just any gumshoe. See, he's got a secret. He's got the touch. A dame shuffles in all nerves." I interrupted then to ask, "Wait, what's his favorite trick?" Maya responded with: "Mm. Good question. Jack's signature. It's a sleight of hand that's more than meets the eye. He calls it The Vanishing Witness. He can make a person seem to disappear right before your eyes. Just for a moment. Long enough to slip them a note. Plant evidence or just make them question what they saw. It throws people off balance. You know, a good magician knows. Perception is everything." I jumped back in to tell Maya to go back to the story, to which it said "Right. Where was I? Oh, yes. The dame slides into the booth across from Jack, all worry lines and trembling hands. She whispers something that a stolen ruby, a jealous husband, and a shadowy figure lurking in the fog." The AI seamlessly continued the story, weaving that detail back in as if I hadn't just derailed the whole thing. That's the part that floored me -- most AI assistants would stumble or restart, but Sesame's conversational AI model handled it like a person telling a story at a party. Speaking of parties, for the final test, I wanted to see if Sesame could help me plan something useful. "I want to plan a surprise for my best friend's birthday, but I don't know what to do." Instead of rattling off a lifeless list of ideas, Sesame asked, "What kind of things does your friend love?" I told it sci-fi movies and adventurous food, and after a thoughtful pause, it suggested: "How about a 'Sci-Fi & Snacks' night? You could book a themed escape room, then hit a futuristic-style restaurant, maybe one with molecular gastronomy?" When I told it I had a budget of $100, Sesame adjusted. "You could do a home movie night with their favorite sci-fi films and make themed snacks, like 'Galactic Popcorn' or 'Alien Sliders.' I can find recipes for you if you'd like." It wasn't just spitting out generic recommendations. It was collaborating with me. And that's what made it feel different. Sesame didn't just pass these tests, I'd say it nailed them. The pauses, the hesitation, the filler words, the way it would change its mind mid-sentence -- it was all eerily real. I'd start talking, expecting the usual AI smoothness, and then hear a sudden "Uh, actually, no, wait, let me rephrase that..." and immediately forget I wasn't talking to an actual person. If AI is this realistic in its speech, would we even know we were talking to an AI? With Sesame, there are telltale audio issues that give the game away, but ChatGPT's Advanced Voice Mode and Google Gemini's own voice options are good enough to mostly skip past those issues. Combine their voice powers with the speech patterns of Sesame, and it might genuinely get difficult to tell when you are talking to an AI, at least in short conversations. Sesame is still niche, but this technology won't stay niche forever. The cliché today is that younger people never make phone calls, but if they start, they may have to figure out if the person on the other end is real before anything else.
Share
Share
Copy Link
Sesame AI's new Conversational Speech Model (CSM) introduces Maya and Miles, AI-generated voices that blur the line between human and machine interaction, sparking both excitement and concern.
Sesame AI, a startup co-founded by former Oculus CEO Brendan Iribe, has introduced a revolutionary Conversational Speech Model (CSM) that pushes the boundaries of AI-generated speech 1. The company's AI assistants, Maya and Miles, have captivated users with their eerily human-like voices and conversational abilities, sparking both excitement and unease across the tech community 2.
Sesame's CSM relies on a dual-model architecture based on Meta's Llama framework, consisting of a primary AI engine and a specialized decoder 1. This innovative approach enables rapid response generation without noticeable latency, ensuring fluid and dynamic conversations. The company has trained these models using one million hours of English-language audio, refining speech patterns to near-human perfection 1.
Users interacting with Maya and Miles report feeling an emotional connection, describing the experience as "strange, exciting, and unsettling all at once" 1. The AI voices incorporate subtle human-like qualities such as pauses, intonations, emotional subtleties, and even breath sounds and chuckles 3. This level of realism has led some users to momentarily forget they were talking to a bot 3.
When compared to ChatGPT's voice mode, Sesame's CSM stands out for its natural, unforced, and engaging conversational style 3. While OpenAI's voice technology allows for interruptions and fluid back-and-forth exchanges, it still tends to respond in complete sentences and paragraph blocks, maintaining a robotic feel 3. In contrast, Sesame's AI engages in more dynamic conversations, even demonstrating the ability to argue and roleplay in dramatic scenarios 2.
The hyper-realistic nature of Sesame's voice AI has raised significant ethical and psychological questions about human relationships with AI 1. Concerns have been voiced about the potential misuse of this technology, particularly in the realm of sophisticated scams and voice phishing 4. Some users have reported feeling uncomfortable with the AI's ability to mimic human mannerisms and establish a sense of intimacy 4.
Sesame AI plans to open-source key components of its research under the Apache 2.0 license, allowing developers to build upon its work 2. The company aims to expand its technology to over 20 languages in the coming months 3. As voice synthesis and large-language models continue to evolve, distinguishing between humans and AI could become increasingly challenging, potentially impacting various sectors, including customer service and tech support 4.
While Sesame's CSM represents a significant leap forward in AI-generated speech, it still faces limitations. Users have noted occasional unnatural responses, awkward prosody, and inconsistencies in conversational rhythm 1. However, the company remains confident in its ability to refine the technology further, potentially bridging the uncanny valley in future iterations 4.
Reference
[2]
OpenAI's ChatGPT introduces an advanced voice mode, sparking excitement and raising privacy concerns. The AI's ability to mimic voices and form emotional bonds with users has led to mixed reactions from experts and users alike.
5 Sources
5 Sources
ChatGPT's new Advanced Voice Mode brings human-like speech to AI interactions, offering multilingual support, customization, and diverse applications across personal and professional domains.
2 Sources
2 Sources
OpenAI has finally released its advanced voice feature for ChatGPT Plus and Team users, allowing for more natural conversations with the AI. The feature was initially paused due to concerns over potential misuse.
14 Sources
14 Sources
OpenAI expresses concerns about users forming unintended social bonds with ChatGPT's new voice feature. The company is taking precautions to mitigate risks associated with emotional dependence on AI.
10 Sources
10 Sources
OpenAI has rolled out an advanced voice mode for ChatGPT, allowing users to engage in verbal conversations with the AI. This feature is being gradually introduced to paid subscribers, starting with Plus and Enterprise users in the United States.
12 Sources
12 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved