ChatGPT Introduces Advanced Voice Mode for Plus Users

29 Sources

[1]

Tom's Guide

How to talk to ChatGPT Voice while waiting for the advanced features to roll out

Open AI is adding new features to ChatGPT all the time but perhaps the most exciting is the advances in voice interaction. This functionality allows users to speak directly to ChatGPT and receive spoken responses, creating a more intuitive and engaging experience. Available on both desktop and mobile, voice chat with ChatGPT is surprisingly simple to use and impressively quick. With an average response time of just 0.32 seconds, conversations flow smoothly, almost mimicking real-time human dialogue. This guide will walk you through how to use ChatGPT's voice feature on the mobile app. This is a great option for accessibility and saving time typing. Now, while ChatGPT is currently rolling out advanced Voice Chat, that's only available to a very small subset of ChatGPT Plus subscribers. So here we're going to look at how you can use the basic version that's already available. From revolutionizing how we talk to machines to potentially transforming how we search for information, ChatGPT is spearheading innovation. Lets find out how you can talk directly to the AI model on your smartphone. For those just starting out, it's worth noting you can use ChatGPT without an account for basic interactions. Whether you're typing or speaking, 7 tips to craft the perfect prompt in ChatGPT can help you get the most out of this powerful AI tool. And ChatGPT's advanced voice features have finally landed, bringing exciting new possibilities to your AI interactions.

[2]

The How-To Geek

ChatGPT's Advanced Voice Mode Is Here for a Select Few

It's been nearly three months since OpenAI rocked the world with its fast and flirtatious ChatGPT Advanced Voice Mode demonstration. Now, after plenty of setbacks, Advanced Voice Mode is rolling out in Alpha for a select group of ChatGPT Plus users. Advanced Voice Mode is kind of like a next-generation Siri or Alexa. You talk to your phone and it talks back. It's a concept that we're all familiar with by now. Still, Advanced Voice Mode absolutely dominated the GPT-4o launch event on May 13th. Viewers were shocked, not only by the quality and speed of the AI's responses, but by the nuance and emotion of its voice. I rarely see conversations about Advanced Voice Mode's practical capabilities. Yeah, it can answer questions and look through your camera -- most people don't seem to care. They're enraptured by the AI's human-like voice, which deviated from playful, to serious, to grossly flirtatious tones throughout OpenAI's many demonstration videos. OpenAI clearly knew that a human-like voice would capture the public's imagination. It intentionally tried to draw comparisons between Advanced Voice Mode and Scarlett Johansson's AI character from Her, a movie where a man falls in love with an artificially intelligent software service. The hype for Advanced Voice Mode has died down a bit. OpenAI delayed the product beyond its June launch date as it worked to build more robust server infrastructure and resolve lingering safety problems. The company's shameless attempt to draw pop culture comparisons may have also contributed to this delay, as OpenAI had to remove its flagship "Sky" voice after receiving legal notice from Scarlett Johansson. The actor had repeatedly refused to license her voice to OpenAI, yet "Sky" sounded just like her. (OpenAI says that "Sky" wasn't intended to sound like Johansson.) The human-like tone of Advanced Voice Mode will still be a topic of conversation during this Alpha test. But the novelty and hype have been diminished by a nearly three-month wait. Those who have a chance to test the service will be more inclined to judge Advanced Voice Mode by its practical merits, which is arguably a good thing. Now's the time to mention that ChatGPT cannot mimic voices. It's not a deepfake tool, and the four voices included during the Alpha test are all based on voice actors who agreed to provide their likeness. Select ChatGPT Plus users will see an Advanced Voice Mode notification in the ChatGPT mobile app. These users will also receive an email explaining how to use the feature. A wider rollout will come in late 2024. Source: OpenAI via TechCrunch

[3]

Lifehacker

'Advanced Voice Mode' Is Coming to ChatGPT Plus

This week, OpenAI announced it has started rolling out advanced Voice Mode to ChatGPT Plus users. While not all Plus subscribers will see the feature right away -- OpenAI is reserving the initial round for a small pool of testers -- the company will continue to release the update to more users as time goes on, giving more paying customers the chance to (maybe?) experience the AI future the movies promised us. ChatGPT has had a Voice Mode for some time, allowing users to have the same interactions with ChatGPT that they'd normally have via text, but by speaking their queries aloud; ChatGPT will then "speak" its responses. Advanced mode is different, though: According to OpenAI, this grander version of the vocal chatbot is much more natural, allowing you to have a conversation in real-time with ChatGPT. The company says you can interrupt the bot whenever you want to change the direction of the conversation, and it should be able to understand the tone of your voice, rather than simply the contents of your questions and requests. If it all works as advertised, talking to ChatGPT should be the closest we've come yet to having a true-to-life conversation with an AI. The company first showed off Advanced Voice Mode in May, exhibiting these abilities via a series of live demos. At the time, the company was promoting the "Sky" voice, which sounded remarkably like Scarlett Johansson, prompting comparisons to the 2013 movie Her, in which a man falls in love with his AI assistant, perhaps a bit too well. The company soon paused use of the voice following criticism that it sounded too much like Johansson's. OpenAI denied using the actor's actual voice for the service, but the company clearly knew there were similarities, as evidenced by CEO Sam Altman posting simply "her" on X shortly before the presentation: This Tweet is currently unavailable. It might be loading or has been removed. Based on the demos, the bot does try to match your tone of voice, and does allow for some real-time back and forth in conversation. However, the demo certainly wasn't perfect: The bot stopped speaking abruptly when it thought it was being interrupted (even when it wasn't), and the tone of its responses at times could be a bit too friendly, verging on flirtatious. We'll have to see how well it actually performs now that it's rolling out to actual consumers. In order to have a chance to try advanced Voice Mode sooner rather than later, you need to be a ChatGPT Plus subscriber. Voice Mode will be available for free users in the future, but this initial rollout is only for customers paying OpenAI $20 per month. That said, paying doesn't guarantee access to the alpha for advanced Voice Mode. OpenAI is rolling this feature out gradually at first, so whether you get access to it now will simply be the luck of the draw. When you do, however, you should notice a new "Try advanced Voice Mode" pop-up appear above the Voice Mode icon in the ChatGPT app. Tap it and you'll see a "You're invited to try advanced Voice Mode" message explaining the feature. Tap Continue, and you'll now be able to converse with ChatGPT in advanced mode.

[4]

Geeky Gadgets

ChatGPT Advanced Voice Mode starts rolling out to users

OpenAI has initiated the rollout of its Advanced Voice Mode to a select group of ChatGPT Plus users. This new feature aims to provide more natural, real-time conversations, allowing users to interrupt at any time and allowing the system to sense and respond to emotions. The Advanced Voice Mode is designed to enhance user interaction by offering more natural and real-time conversations. Users can interrupt the conversation at any point, making the interaction feel more fluid and dynamic. Additionally, the system is capable of sensing and responding to the user's emotions, adding a layer of emotional intelligence to the interaction. OpenAI plans to gradually expand access to this feature over the next few weeks, with the goal of making it available to all ChatGPT Plus users by autumn. Users participating in the alpha phase will receive notifications in the ChatGPT app and emails with instructions on how to use the new feature. While video and screen sharing are not part of the current alpha, these features are expected to be launched at a later date. Since the preview of GPT-4o voice in May 2024, OpenAI has prioritized ensuring the quality and safety of voice conversations. The model has been reinforced to support millions of simultaneous, real-time voice conversations while maintaining low latency and high quality. In June 2024, OpenAI announced efforts to improve the model's ability to detect and refuse certain content. Over the past few months, the model's voice capabilities have been tested by more than 100 external red teamers, who collectively speak 45 different languages and represent 29 different geographies. Based on their feedback and internal safety tests, several measures have been implemented: OpenAI plans to share a detailed report on the work done to make the Advanced Voice Mode experience safer and more enjoyable for everyone in early August. The alpha phase is focused on testing, learning, and refining the feature to deliver the best possible experience. By launching gradually, OpenAI can closely monitor usage and continuously improve the model's capabilities and safety based on real-world feedback. For those interested in the broader implications of this technology, other areas worth exploring include the potential for voice interaction in customer service, the integration of voice features in educational tools, and the ethical considerations surrounding AI and voice technology. These topics offer a deeper understanding of how advanced voice modes can be applied across various industries and the challenges that come with it. Here are a selection of other articles from our extensive library of content you may find of interest on the subject of ChatGPT :

[5]

Tom's Guide

ChatGPT Advanced Voice features launching today -- but there's a catch

When OpenAI demoed ChatGPT's ability to have a natural conversation a few months back, even the staunchest AI critics couldn't help being impressed. The presenters conversing with the AI like it was their friend sitting next to them felt like something out of the future. And it turns out it was from the future because the feature wasn't ready to launch due to safety concerns and other issues. However, OpenAI is finally ready to let the general public try its Advanced Voice feature -- or at least, it's ready to let a very small subset of the public try. The company announced plans to release an Alpha of ChatGPT Advanced Voice to a small subset of ChatGPT users. In an X post today (July 30), the company said, "We're starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users." OpenAI didn't say how many users would get it, but some X users responded with screenshots showing they had access. Also worth noting is that only paid Plus users will get in, so if you're using the free version of ChatGPT, don't expect to have authentic conversations with your AI companion soon. As far as the safety issues, such as the tool being used to mimic real people's voices, OpenAI said, "To protect people's privacy, we've trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices." If you're unsure whether you received access, OpenAI says, "Users in this alpha will receive an email with instructions and a message in their mobile app." If you haven't gotten that email, then you'll have to wait until another wave goes out. This is just the first wave of people to get access. "We'll continue to add more people on a rolling basis and plan for everyone on Plus to have access in the fall," OpenAI added in its post. One feature that won't be ready immediately is video and screen sharing, as the company posted that those will launch at the ever-vague "later date." Wrapping up its barrage of X posts, OpenAI spoke about the future: "Learnings from this alpha will help us make the Advanced Voice experience safer and more enjoyable for everyone. We plan to share a detailed report on GPT-4o's capabilities, limitations, and safety evaluations in early August."

[6]

Phandroid

Open AI Rolls out Advanced Voice Mode for ChatGPT - Phandroid

OpenAI recently announced that it was launching its new Advanced Voice Mode feature to select ChatGPT Plus subscribers. The upgrade promises a more interactive conversational experience, allowing users to carry on a more natural-feeling exchange and enabling the AI to better understand and respond to emotional cues. OpenAI says that it has conducted extensive testing, involving over 100 external experts across 45 languages to ensure the safety and quality of the new feature. To protect user privacy, the model is restricted to four preset voices, and rigorous systems have been implemented to prevent the generation of unauthorized voices or harmful content. ChatGPT has been in the headlines a lot over the past few months, especially with the growing industry adoption of AI for different fields and applications. A while back, Nothing announced that it's brought over ChatGPT to several of its smartphones and audio products. Going back to Advanced Voice Mode, Open AI states that it plans to gradually expand access to all Plus subscribers by fall, with video and screen sharing capabilities to follow. OpenAI will release a detailed report on GPT-4o's capabilities, limitations, and safety evaluations this coming August.

[7]

CNET

You Can Start Chatting More With ChatGPT -- If You're a Plus Subscriber, That Is

AI company OpenAI is beginning to roll out advanced voice features for its ChatGPT chatbot to a small number of ChatGPT Plus subscribers in an early alpha trial, it announced on X on Tuesday. The startup previewed advanced voice mode during its Spring Update in May, which is where it also debuted its GPT-4o model. OpenAI isn't alone in its ambitions for chatbot voice functionality for subscribers who pay $20 per month for perks like early access. Google, too, shared its plans for a more conversational Gemini chatbot via its Gemini Live feature for Gemini Advanced subscribers, who also pay $20 per month. Meta's Meta AI chatbot can also chat with users who are wearing its Ray-Ban glasses. This is one example of how technology companies continue to roll out new models and features in an appeal to users that is also an ongoing game of one-upmanship. The prize? The biggest share of the generative AI market, which is projected to be worth $1.3 trillion by 2023. According to OpenAI, advanced voice mode allows you to have more natural real time conversations with ChatGPT. It also senses and responds to your emotions -- and you can interrupt if you want. You can call up ChatGPT with a familiar phrase: "Hey, ChatGPT." Beyond that, details about what exactly this advanced functionality includes are unclear. A spokesperson didn't respond to a request for comment. Subscribers in the alpha test will receive a notice in the ChatGPT app, along with an email with instructions about how to use it. The goal of the early trial is to monitor usage and improve the model's capabilities and safety prior to wider rollout, a spokesperson said in an earlier email. OpenAI will expand access to additional subscribers over the next few weeks and plans to offer advanced voice functionality to all Plus members in the fall. In addition to early access to new features, Plus members also receive an always-on connection and unlimited access to GPT-4o. (If you use the free version, you'll be bumped down to the earlier GPT-3.5 model if you ask too many questions or if traffic is high.) ChatGPT first introduced voice functionality in September 2023. Advanced voice mode will include four preset voices, Breeze, Cove, Ember and Juniper, which OpenAI developed with voice actors in 2023. There was originally a fifth voice, Sky, but it was paused after actor Scarlett Johansson, who played the voice of the virtual assistant Samantha in the 2013 movie Her, complained about similarities to her own voice. CEO Sam Altman released a statement apologizing to Johansson but said the voice wasn't meant to resemble hers. In a related blog post, OpenAI said it picked the voice actors for its voices based on finding talent from diverse backgrounds, as well as voices that feel timeless, voices that are approachable and trustworthy, voices that are warm, engaging and charismatic, and voices that are natural and easy to listen to. OpenAI said ChatGPT can't impersonate voices, and it has added filters that will block requests to generate copyrighted audio.

[8]

Wccftech

OpenAI Rolls Out Advanced Voice Mode For ChatGPT Plus Users, Video and Screen-Sharing Capabilities Coming Soon

OpenAI has managed to stay in the news ever since the inception of the ChatGPT model and for revolutionizing Artificial Intelligence. The company has been constantly evolving the AI language model by bringing some advanced updates and features to assist users better with more relevant and contextual answers. To take the continuous improvement further, the company announced in May that it would soon be bringing in a Voice Mode on ChatGPT that would allow for more human-like conversations and personalization. The much-anticipated Advanced Voice Mode has now started rolling out to some users. OpenAI showcased the development of a new Advanced Voice Mode for ChatGPT in May that would help users have more natural conversations in real-time, eliminating delays or latency in interactions. It even landed itself in the middle of a controversy with its AI-generated voice, Sky, sounding strikingly similar to Scarlet Johansson. However, the voice was taken down after the actress pursued the matter legally. OpenAI announced on X that it had started rolling out the Voice Mode to selected ChatGPT Plus users for testing. It will keep on adding users and, hopefully, by this fall, allow all ChatGPT Plus subscribers to have access to the new feature. This AI tool is far more advanced than ChatGPT's current voice mode because it can understand emotions and varied situations and respond accordingly, making it feel more natural and human-like. According to the post, OpenAI has undergone rigorous testing of the GPT-4o's voice capabilities with over 100 external red teamers across 45 different languages, making the version more quality stringent and with greater privacy and security measures. The Advanced Mode will come with four preset voices: Juniper, Cove, Ember, and Breeze with ChatGPT Sky voice being dropped after the controversy. The current preset voices do not resemble or impersonate any celebrities and would block any voices that do not match the four voices the Advanced Mode is programmed to speak in through a built-in system. To ensure no copyrighted content is generated or the voice is misused, OpenAI has placed filters and guardrails to prevent users from generating such content. Other multi-modal capabilities, such as video and screen-sharing, were demoed at the event but are not part of the current alpha and will be coming later. The company plans to provide a more detailed report in August on the safety measures taken, the feature's potential, and its limitations. The selected ChatGPT Plus users will get an email containing the guidelines and a message in their mobile app, so if you are a paid ChatGPT subscriber, you better watch out for an email notification.

[9]

Entrepreneur

ChatGPT Debuts a New Voice -- But It Comes With a Catch

After a one-month delay, OpenAI is giving ChatGPT a human voice. ChatGPT, which takes a written prompt and churns out an answer based on what it knows from its training data, was previously limited to typed answers. As of Tuesday, a limited group of paying ChatGPT Plus subscribers now have access to another dimension of the AI chatbot: They can access four pre-loaded voices to talk to ChatGPT and get answers in real time. Related: OpenAI's Launches New AI Chatbot, GPT-4o, Which Sounds Almost Like a Friend Would With the voice mode, paying users can talk to ChatGPT, interrupt its answers, and have more natural, human-like conversations. Each voice "senses and responds to your emotions," according to a Tuesday post on X from OpenAI. In May, OpenAI demoed the new voice mode -- to varying reactions. OpenAI CEO Sam Altman said the advanced voice assistant felt "like AI from the movies" but movie star Scarlett Johansson was shocked by how similar one of the voices sounded to hers, and hired legal counsel. OpenAI also had several high-profile resignations over safety concerns following the demo, with its chief scientist leaving to start his own safe AI company. Though voice mode was supposed to arrive in late June, OpenAI delayed its launch and said it needed more time to scale the technology safely to millions of users. In the X release thread, OpenAI said it tested GPT-4o across 45 languages and only allows the model to speak in preset voices, "to protect people's privacy." Voice mode will also block copyrighted and violent requests from users. "We plan to share a detailed report on GPT-4o's capabilities, limitations, and safety evaluations in early August," OpenAI stated. ChatGPT has more than 180 million users as of July 2024, with around 3.9 million paying subscribers in the U.S.

[10]

CNN

More advanced, spoken conversations are coming to ChatGPT | CNN Business

With advanced voice mode, ChatGPT users will be able to have natural, real-time spoken conversations with the chatbot. New York CNN -- OpenAI stunned users when it demonstrated an updated voice mode for the most advanced version of ChatGPT earlier this year. Far from the kind of robotic voice that people have come to associate with digital assistants like Alexa or Siri, the ChatGPT advanced voice mode sounds remarkably lifelike. It responds in real time, can adjust to being interrupted, can make giggling noises when a user makes a joke, and can judge a speaker's emotional state based on their tone of voice. (During the initial demo, it also sounded suspiciously like Scarlett Johansson). Starting on Tuesday, advanced voice mode -- which works with the most powerful version of the chatbot, ChatGPT-4o -- will begin rolling out to paid users. Advanced voice mode will start rolling out to a small group of subscribers to the app's "Plus" mode, with the aim of making it available to all Plus users in the fall. ChatGPT does have a less sophisticated voice mode already. But the rollout of a more advanced voice mode could mark a major turning point for OpenAI, transforming what was already a significant AI chatbot into something more akin to a virtual, personal assistant that users can engage in natural, spoken conversations in much the same way that they would chat to a friend. The ease of conversing with ChatGPT's advanced voice mode could encourage users to engage with the tool more often, and pose a challenge to virtual assistant incumbents like Apple and Amazon. But introducing a more advanced voice mode for ChatGPT also comes with big questions: Will the tool reliably understand what users are trying to say, even if they have speech differences? And will users be more inclined to blindly trust a human-sounding AI assistant, even when it gets things wrong? OpenAI initially said it had planned to begin the advanced voice mode rollout in June, but said it needed "one more month to reach our bar to launch" to test the tool's safety and ensure it can be used by millions of people while still maintaining real-time responses. The company said that in recent months it has trialed the AI model's voice capabilities with more than 100 testers seeking to identify potential weaknesses, "who collectively speak a total of 45 different languages, and represent 29 different geographies," according to a Tuesday statement. Among its safety measures, the company said voice mode won't be able to use any voices beyond four, pre-set options that it created in collaboration with voice actors -- to avoid impersonation -- and will also block certain requests that aim to generate music or other copyrighted audio. OpenAI says the tool will also have the same protections as ChatGPT's text mode to prevent it from generating illegal or "harmful" content. Advanced voice mode will also have one major difference from the demo that OpenAI showed in May: users will no longer be able to access the voice that many (including the actor herself) believed sounded like Johansson. While OpenAI has maintained the voice was never intended to sound like Johansson and was created with the help of a different actor, it paused use of the voice "out of respect" after the actor complained. The launch of ChatGPT's advanced voice mode comes after OpenAI last week announced it was testing a search engine that uses its AI technology, as the company continues to grow its portfolio of consumer-facing AI tools. The OpenAI search engine could eventually pose a major competitive threat to Google's dominance in online search.

[11]

Tom's Guide

ChatGPT advanced Voice Mode is rolling out for some users -- here are the most surprising examples so far

From sound effects to tongue twisters -- users are throwing new challenges at ChatGPT's advanced Voice Mode. After months of anticipation, ChatGPT's advanced Voice Mode has now started to become available to small groups of ChatGPT Plus users. Their reaction has been very enthusiastic as they jumped on the opportunity to try out the new features developed by OpenAI. The main features of the new voice mode are that it offers more natural, real-time conversations. You can interrupt ChatGPT at any time and it can sense and respond to your emotions. There are some limits, though; ChatGPT can't mimic any famous personalities and is limited to speaking in four preset voices. Several users who got access to the new features eagerly posted the results of their conversations with ChatGPT and the initial results seem pretty impressive. Don't forget to turn up the volume as you check them out for yourself. One user asked ChatGPT to keep on dialling up its excitement as it narrated a fictitious soccer match. And ChatGPT obliged. Its first attempt was ok, but it actually sounded genuinely more excited as it was asked to give it another go while trying to sound even more excited. It's a great example of how users should be able to fine-tune ChatGPT's voice outputs. ChatGPT sounded like it was about to burst into tears as it was asked to recite the poem I measure every Grief I meet by Emily Dickinson. It impressively managed to clearly enunciate every word while making it feel as though the waterworks were going to start any second. Can ChatGPT beatbox? Absolutely! Asked to create a short birthday rap, the chatbot spit out a few bars and wrapped it up with some beatboxing. The first attempt was a bit too short for this X user who asked ChatGPT to increase the amount of beatboxing. On the second attempt, ChatGPT did as it was instructed to do. Pretty nifty! In voice mode ChatGPT is able to respond to prompts normally except that it speaks its answers out loud rather than simply returning a text reply to your request. Here ChatGPT was asked to tell a children's story about a computer that comes alive. While it wasn't quite able to fulfil the user's request to emphasize certain words and use tone variations, as typically done by storytellers, it was able to seamlessly switch from one language to another as it told the same story. Even though it was interrupted with these requests while it was speaking, this proved to be no challenge for the AI. On the same theme of storytelling, in this example ChatGPT was asked to narrate a sci-fi thriller and in seconds, a newly created character was chasing a rogue AI and ended up in a shootout. The AI was also asked to create an atmosphere to enhance the story, particularly by through using onomatopoeia - the use of words that create the same sound as what they describe. The advanced Voice Mode also inserted a couple of actual (albeit basic) sound effects for good measure. "Go it! Here's a clear C minor chord," ChatGPT said before going on to reproduce the chord. While it sounds a bit off key, it might be because the example features a phone filming another phone. It will be more important to know if ChatGPT intends to continue in this trajectory of being able to describe what music and sound effects you'd like to hear and have it deliver the results to you in seconds. Another user asked ChatGPT to come up with some tongue twisters. Not only did the chat bot come up with them on the fly but it also read them out. It would be interesting to see how it would sound if it rattled the same example off for a number of consecutive times but it's unlikely that the AI would stumble since it simply has to repeat its first iteration. Furthermore, it is unlikely to stumble on any words unless explicitly told to do so in general. This is a fun one! ChatGPT was asked to count as fast as it could up to 10 - a task which it handled with ease. It also managed to count up to 50 and it also stopped midway to catch its breath. Not that it needed to of course, but it sure makes it seem as if you're chatting with a human. "Interestingly, the transcript has no interruptions or notations - the voice model has simply learned natural speaking patterns, which includes breathing pauses. Uncanny," X user Cristiano Giardina wrote.

[12]

VentureBeat

OpenAI opens limited access to ChatGPT Advanced Voice Mode on mobile

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has announced the alpha rollout of its new Advanced Voice Mode for a select group of ChatGPT Plus users, allowing them to speak more naturalistically with the AI chatbot on the official ChatGPT mobile app for iOS and Android. On X, the company posted from its account that the mode would be available to "a small group of ChatGPT Plus users," though the company added in a follow-up post that "We'll continue to add more people on a rolling basis and plan for everyone on [ChatGPT] Plus to have access in the fall." ChatGPT Plus is of course the $20 per month individual subscription service OpenAI offers for access to its signature large language model (LLM)-powered chatbot, alongside other tiers Free, Team, Enterprise. It was unclear how OpenAI was selecting the initial batch of users to receive access to Advanced Voice Mode, but it posted that "users in this alpha will receive an email with instructions and a message in their mobile app" for ChatGPT, so those interested would be advised to check there. The feature, which was showed off at OpenAI's Spring Update event back in May 2024 -- what feels like an eternity in the fast-moving AI news and hype cycle -- allows users to engage in real-time conversation with four AI-generated voices on ChatGPT, and the chatbot will attempt to converse back naturalistically, handling interruptions and even detecting, responding to, and conveying different emotions in its utterances and intonations. OpenAI showed off a number of potential use cases for this more naturalistic and conversational Advanced Voice Mode, including -- when combined with its Vision capabilities of seeing and responding to live video -- acting as a tutoring aid, fashion adviser, and guide to the visually impaired. Delayed but finally ready However, the rollout of the feature was delayed from OpenAI's initial estimate of late June following a controversy raised by Hollywood actor and celebrity Scarlett Johansson (Marvel's Black Widow and the voice of the titular AI in Her) who accused OpenAI of attempting to work with her and then mimicking her voice even after she refused. OpenAI denied any intentional similarity between its AI voice "Sky" and Johansson's in Her was intentional, but pulled the voice from its library and it remains offline to this day. On X today, the official ChatGPT App account acknowledged the delay, writing "the long awaited Advanced Voice Mode [is] now beginning to roll out!" Mira Murati, OpenAI's Chief Technology Officer, shared her enthusiasm about the new feature in a post on X: "Richer and more natural real-time conversations make the technology less rigid -- we've found it more collaborative and helpful and think you will as well." Following a number of new safety commitments and papers OpenAI's official announcement highlighted its ongoing efforts to ensure quality and safety. "Since we first demoed advanced Voice Mode, we've been working to reinforce the safety and quality of voice conversations as we prepare to bring this frontier technology to millions of people," the company stated on X, adding: "We tested GPT-4o's voice capabilities with 100+ external red teamers across 45 languages. To protect people's privacy, we've trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices. We've also implemented guardrails to block requests for violent or copyrighted content." The news comes as the capability for AI to be used as a tool for fraud or impersonation is undergoing renewed scrutiny. Though OpenAI's Voice Mode doesn't currently allow for new AI generated voices or voice cloning, the mode could presumably be used still to trick others who aren't aware it is AI. Separately, former OpenAI backer and co-founder turned rival Elon Musk was this week criticized for sharing a voice clone of U.S. Democratic presidential candidate Kamala Harris in a video attacking her. In the months following its Spring Update, OpenAI has released a number of new papers on safety and AI model alignment (compliance with human rules and objectives techniques). The releases also follow the disbanding of its superalignment team and criticisms from some former and current employees that the company deviated focus on safety in favor of releasing new products. Clearly, the slow rollout of Advanced Voice Mode seems designed to counter those criticisms and assuage users and possibly regulators or lawmakers that OpenAI is taking safety seriously and prioritizing it equal to or over profits. The release of the ChatGPT Advanced Voice Mode also further differentiates OpenAI from rivals such as Meta with its new Llama model and Anthropic's Claude, and puts pressure on emotive voice focused AI startup Hume.

[13]

MacRumors

OpenAI Rolling Out More Natural Advanced Voice Mode for ChatGPT

OpenAI today said that it has started to roll out Advanced Voice Mode to a small number of paid ChatGPT users, allowing them to test out more natural, real-time conversations. Advanced Voice Mode allows ChatGPT to provide real-time responses that can be interrupted, plus it is able to sense and respond to humor, sarcasm, and more. The new model does not need to convert your speech to text and back again as the current ChatGPT voice does, leading to lower latency interactions. OpenAI demonstrated Advanced Voice Mode back in May, showing off an AI voice called Sky that sounded remarkably similar to Scarlett Johansson. The voice was created and used without Johansson's permission, and she ended up releasing a statement on the situation. She said that she turned down multiple offers from OpenAI CEO Sam Altman, who wanted Johansson to be the voice of ChatGPT. She said she was "shocked, angered, and in disbelief" that Altman created a voice that sounded "eerily similar" to her own voice. OpenAI claimed that the Sky voice was not intended to resemble the voice of Johansson, but it was removed after she hired legal counsel. OpenAI says that since it demoed Advanced Voice Mode, it has been working to improve the safety and quality of voice conversations. Advanced Voice Mode speaks in four preset voices and is built to block outputs that differ from those voices, preventing it from mimicking celebrity voices. OpenAI has also "implemented guardrails" to block requests for violent or copyrighted content, and the early tests will be used to improve the feature before a wider launch. Users who have been granted access to Advanced Voice Mode will receive an email with instructions, with OpenAI planning to add more people on a rolling basis. Everyone on Plus will have access to Advanced Voice Mode in the fall.

[14]

MIT Technology Review

OpenAI has released a new ChatGPT bot that you can talk to

The new chatbot represents OpenAI's push into a new generation of AI-powered voice assistants in the vein of Siri and Alexa, but with far more capabilities to enable more natural, fluent conversations. It is a step in the march to more fully capable AI agents. The new ChatGPT voice bot can tell what different tones of voice convey, responds to interruptions, and is able to reply to queries in real time. It has also been trained to sound more natural and use voices to convey a wide range of different emotions. The voice mode is powered by OpenAI's new GPT-4o model, which combines voice, text, and vision capabilities. The company is initially launching the chatbot to a "small group of users" paying for ChatGPT Plus to gather feedback and says it will make it available to all ChatGPT Plus subscribers this fall. A ChatGPT Plus subscription costs $20 a month. OpenAI says it will notify customers who are part of the first rollout wave in the ChatGPT app and provide instructions on how to use the new model. The new voice feature, which was announced in May, is being launched a month later than originally planned because the company said it needed more time to improve safety features, such as the model's ability to detect and refuse unwanted content. The company also said it was preparing its infrastructure to offer real-time responses to millions of users. OpenAI says it has tested the model's voice capabilities with more than 100 external red-teamers, who were tasked with probing the model for flaws. These testers spoke a total of 45 languages and represented 29 countries, according to OpenAI. The company says it has put several safety mechanisms in place. In a move that aims to prevent the model from being used to create audio deepfakes, OpenAI has created four preset voices in collaboration with voice actors. GPT-4o will not impersonate or generate other people's voices. When OpenAI first introduced GPT-4o, the company faced a backlash over its use of a voice called "Sky," which sounded a lot like the actress Scarlett Johansson. Johansson released a statement saying the company had reached out to her for permission to use her voice for the model, which she declined. She said she was shocked to hear a voice "eerily similar" to hers in the model's demo. OpenAI has denied that the voice is Johansson's but has paused the use of Sky. The company is also embroiled in several lawsuits over alleged copyright infringement. OpenAI says it has adopted filters that recognize and block requests to generate music or other copyrighted audio. OpenAI also says it has applied the same safety mechanisms it uses in its text-based model to GPT-4o to prevent it from breaking laws and generating harmful content. Down the line, OpenAI plans to include more advanced features, such as video and screen sharing, which could make the assistant more useful. In its May demo, employees pointed their phone cameras at a piece of paper and asked the AI model to help them solve math equations. They also shared their computer screens and asked the model to help them solve coding problems. OpenAI says these features will not be available now but at an unspecified later date.

[15]

ZDNet

OpenAI rolls out highly anticipated advanced Voice Mode, but there's a catch

Also: The best AI chatbots of 2024: ChatGPT, Copilot, and worthy alternatives On Tuesday, OpenAI announced via an X post that Voice Mode is being rolled out in alpha to a small group of ChatGPT Plus users, offering them a smarter voice assistant that can be interrupted and respond to users' emotions. Users who participate in the alpha will receive an email with instructions and a message in the mobile app, as shown in the video above. If you haven't received a notification just yet, no worries. OpenAI shared that it will continue to add users on a rolling basis, with the plan for all ChatGPT Plus users to access it in the fall. In the original demo at the launch event, shown below, the company showcased Voice Mode's multimodal capabilities, including assisting with content on users' screens and using the user's phone camera as context for a response. However, the alpha of Voice Mode will not have these features. OpenAI shared that "video and screen sharing capabilities will launch at a later date." The company also said that since originally demoing the technology, it has improved the quality and safety of voice conversations. OpenAI tested the voice capabilities with 100+ external red teamers across 45 languages, according to the X thread. The company also trained the model to speak only in the four preset voices, block outputs that deviate from those designated voices, and implement guardrails to block requests. The company also said that user feedback will be taken into account to improve the model further, and it will share a detailed report regarding GPT-4os performance, including limitations and safety evaluations, in August. One week after OpenAI unveiled this feature, Google unveiled a similar feature called Gemini Live. However, Gemini Live is not yet available to users. That may change soon at the Made by Google event coming up in a few weeks.

[16]

Digital Trends

People are doing wild things with new ChatGPT Voice Mode

ChatGPT's Advanced Voice Mode arrived on Tuesday for a select few OpenAI subscribers chosen to be part of the highly anticipated feature's alpha release. The feature was first announced back in May. It is designed to do away with the conventional text-based context window and instead converse using natural, spoken words, delivered in a lifelike manner. It works in a variety of regional accents and languages. According to OpenAI, Advanced Voice, "offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions." Recommended Videos There are some limitations to what users can ask Voice Mode to do. The system will speak in one of four preset voices and is not capable of impersonating other people's voices -- either individuals or public figures. In fact, the feature will outright block outputs that differ from the four presets. What's more, the system will not generate copyrighted audio or generate music. So of course, the first thing someone did was to have it beatbox. Advanced Voice as a B-boy Yo ChatGPT Advanced Voice beatboxes pic.twitter.com/yYgXzHRhkS — Ethan Sutin (@EthanSutin) July 30, 2024 Alpha user Ethan Sutin posted a thread to X (formerly Twitter) showing a number of Advanced Voice's responses, including the one above where the AI reels off a short "birthday rap" and then proceeds to beatbox. You can actually hear the AI digitally breathe in between beats. Advanced Voice as a storyteller This is awesome actually I did not expect the ominous sounds https://t.co/SgEPi5Bd3K pic.twitter.com/DnK8AVdWjV — Kesku (@yoimnotkesku) July 30, 2024 While Advanced Voice is prohibited from creating songs wholesale, it can generate background sound effects for the bedtime stories it recites. In the example above from Kesku, the AI adds well-timed crashes and slams to its tale of rogue cyborg after being asked to, "Tell me an exciting action thriller story with sci-fi elements and create atmosphere by making appropriate noises of the things happening (e.g: A storm howling loudly)". look on OpenAI's works ye mighty and despair! this is most wild one. You can really feel like a director guiding a Shakespearean actor! pic.twitter.com/GUQ1z8rjIL — Ethan Sutin (@EthanSutin) July 31, 2024 The AI is also capable of creating realistic characters on the spot, as Sutin's example above demonstrates. Advanced Voice as an emotive speaker Khan!!!!!! pic.twitter.com/xQ8NdEojSX — Ethan Sutin (@EthanSutin) July 30, 2024 The new feature sounds so lifelike in part because it is capable of emoting as a human would. In the example above, Ethan Sutin recreates the famous Star Trek II scene. In the two examples below, user Cristiano Giardina compels the AI to speak in different tones and different languages. ChatGPT Advanced Voice Mode speaking Japanese (excitedly) pic.twitter.com/YDL2olQSN8 — Cristiano Giardina (@CrisGiardina) July 31, 2024 ChatGPT Advanced Voice Mode speaking Armenian (regular, excited, angry) pic.twitter.com/SKm73lExdX — Cristiano Giardina (@CrisGiardina) July 31, 2024 Advanced Voice as an animal lover 🐈 pic.twitter.com/UZ0odgaJ7W — Ethan Sutin (@EthanSutin) July 30, 2024 The AI's vocal talents don't stop at humans languages. In the example above, Advanced Voice is told to make cat sounds, and does so with unerring accuracy. Trying #ChatGPT's new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful -- reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To — Manuel Sainsily (@ManuVision) July 30, 2024 In addition to sounding like a cat, users can pepper the AI with questions about their biological feline friends and receive personalized tips and advice in real time. Advanced Voice as a real-time translator Real-Time Japanese translation using #ChatGPT's new advanced voice mode + vision alpha! Yet another useful example! pic.twitter.com/wDXrgYQkZE — Manuel Sainsily (@ManuVision) July 31, 2024 Advanced Voice can also leverage you device's camera to aid in its translation efforts. In the example above, user Manuel Sainsily points his phone at a GameBoy Advanced running a Japanese-language version of a Pokémon game, and has the AI read the onscreen dialog as he plays. The company notes that video and screen sharing won't be part of the alpha release but will be available at a later date. OpenAI plans to expand the alpha release to additional Plus subscribers "over the next few weeks" and will bring it to all Plus users "in the fall."

[17]

Economic Times

OpenAI starts roll-out of advanced voice mode to some ChatGPT Plus users

OpenAI introduced an advanced voice mode for a few ChatGPT Plus users, enabling real-time voice interactions with interruption capabilities. Initially delayed from June to July, the feature enhances the AI's ability to detect and decline certain content while improving user experience and preparing the infrastructure for broader use.OpenAI is starting to roll out an advanced voice mode to a small group of ChatGPT Plus users, the Microsoft-backed artificial intelligence startup said on Tuesday in a post on X. The company had delayed the roll-out of the realistic voice conversation experience to July from late-June, saying it needed time to reach its launch standard. The new audio capabilities will allow users to speak to ChatGPT and receive real-time responses without delay, as well as interrupt ChatGPT while it is speaking - both tenets of realistic conversations that have proven to be a challenge for AI assistants. OpenAI said in June it was improving the model's ability to detect and refuse certain content, while working on bettering the user experience and preparing its infrastructure to scale the model. The company has been working to introduce new generative AI products, as it seeks to maintain its edge in the booming AI race with businesses rushing to adopt the technology.

[18]

Android Authority

OpenAI rolls out much-awaited advanced voice mode to select users

Users will receive email and app notifications with instructions. OpenAI is starting to roll out an advanced voice mode to select ChatGPT Plus users. This feature was first previewed in May at an OpenAI event, but concerns were raised when one of the previewed voices, dubbed "Sky," sounded a little too similar to Scarlett Johansson's voice. The tool was initially slated to launch in late June, but the AI firm delayed it, citing safety reasons. It appears that these concerns have been resolved, given that OpenAI has announced the rollout of this new tool in a post on X, the platform formerly known as Twitter. In the post, OpenAI shared, "We're starting to roll out advanced voice mode to a small group of ChatGPT Plus users. Advanced voice mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions." Eligible users will receive an email from OpenAI with instructions on how to use the feature, as well as a message on their mobile apps. The company notes that it will "continue to add more people on a rolling basis and plan for everyone on Plus to have access in the fall." One of the highlights of this tool is that it is far more capable than ChatGPT's current voice mode, thanks to its ability to handle interruptions by users and respond more intuitively. This makes the style of conversation a lot more natural and realistic -- something that not many AI tools have been able to achieve. The new voice mode will feature four preset voices (barring "Sky") that OpenAI built by collaborating with voice actors. The AI firm notes that this round of testing will help it make the advanced voice mode experience "safer and more enjoyable for everyone." The company also shared its plans to release a detailed report on the capabilities, limitations, and safety evaluations of GPT-4o in August. If you're a Plus subscriber and are looking forward to trying these voice modes, keep an eye on your inbox and notifications for an invite.

[19]

Business Standard

OpenAI rolls out advanced voice mode in ChatGPT for Plus users: What is it

Advanced voice mode is starting to roll out to a small group of ChatGPT Plus users, announced Microsoft-backed artificial intelligence startup OpenAI in a post on X. Announced in May, the feature was originally slated for release in June this year but got delayed because it needed time to reach its launch standard. Here is all you need to know about OpenAI's advanced voice mode for ChatGPT:

[20]

Ars Technica

When counting quickly, OpenAI's new voice mode stops to catch its "breath"

AVM allows uncanny real-time voice conversations with ChatGPT that you can interrupt. On Tuesday, OpenAI began rolling out an alpha version of its new Advanced Voice Mode to a small group of ChatGPT Plus subscribers. This feature, which OpenAI previewed in May with the launch of GPT-4o, aims to make conversations with the AI more natural and responsive. In May, the feature triggered criticism of its simulated emotional expressiveness and prompted a public dispute with actress Scarlett Johansson over accusations that OpenAI copied her voice. Even so, early tests of the new feature shared by users on social media have been largely enthusiastic. Further Reading In early tests reported by users with access, Advanced Voice Mode allows them to have real-time conversations with ChatGPT, including the ability to interrupt the AI mid-sentence almost instantly. It can sense and respond to a user's emotional cues through vocal tone and delivery, and provide sound effects while telling stories. But what has caught many people off-guard initially is how the voices simulate taking a breath while speaking. "ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind -- it stopped to catch its breath like a human would)," wrote tech writer Cristiano Giardina on X. Advanced Voice Mode simulates audible pauses for breath because it was trained on audio samples of humans speaking that included the same feature. The model has learned to simulate inhalations at seemingly appropriate times after being exposed to hundreds of thousands, if not millions, of examples of human speech. Large language models (LLMs) like GPT-4o are master imitators, and that skill has now extended to the audio domain. Giardina shared his other impressions about Advanced Voice Mode on X, including observations about accents in other languages and sound effects. "It's very fast, there's virtually no latency from when you stop speaking to when it responds," he wrote. "When you ask it to make noises it always has the voice "perform" the noises (with funny results). It can do accents, but when speaking other languages it always has an American accent. (In the video, ChatGPT is acting as a soccer match commentator)" Speaking of sound effects, X user Kesku, who is a moderator of OpenAI's Discord server, shared an example of ChatGPT playing multiple parts with different voices and another of a voice recounting an audiobook-sounding sci-fi story from the prompt, "Tell me an exciting action story with sci-fi elements and create atmosphere by making appropriate noises of the things happening using onomatopoeia." Kesku also ran a few example prompts for us, including a story about the Ars Technica mascot "Moonshark." He also asked it to sing the "Major-General's Song" from Gilbert and Sullivan's 1879 comic opera The Pirates of Penzance: Frequent AI advocate Manuel Sainsily posted a video of Advanced Voice Mode reacting to camera input, giving advice about how to care for a kitten. "It feels like face-timing a super knowledgeable friend, which in this case was super helpful -- reassuring us with our new kitten," he wrote. "It can answer questions in real-time and use the camera as input too!" Of course, being based on an LLM, it may occasionally confabulate incorrect responses on topics or in situations where its "knowledge" (which comes from GPT-4o's training data set) is lacking. But if considered a tech demo or an AI-powered amusement and you're aware of the limitations, Advanced Voice Mode seems to successfully execute many of the tasks shown by OpenAI's demo in May. Safety An OpenAI spokesperson told Ars Technica that with the Advanced Voice Mode release, the company worked with more than 100 external testers, collectively speaking 45 different languages and representing 29 geographical areas. The system is reportedly designed to prevent impersonation of individuals or public figures by blocking outputs that differ from OpenAI's four chosen preset voices. OpenAI has also added filters to recognize and block requests to generate music or other copyrighted audio, which has gotten other AI companies in trouble. Giardina reported audio "leakage" in some audio outputs that have unintentional music in the background, showing that OpenAI trained the AVM voice model on a wide variety of audio sources, likely both from licensed material and audio scraped from online video platforms. Availability OpenAI plans to expand access to more ChatGPT Plus users in the coming weeks, with a full launch to all Plus subscribers expected this fall. A company spokesperson told Ars that users in the alpha test group will receive a notice in the ChatGPT app and an email with usage instructions. Since the initial preview of GPT-4o voice in May, OpenAI claims to have enhanced the model's ability to support millions of simultaneous, real-time voice conversations while maintaining low latency and high quality. In other words, they are gearing up for a rush that will take a lot of back-end computation to accommodate.

[21]

euronews

OpenAI starts rolling out its ChatGPT voice mode to some Plus users

OpenAI has recently released its advanced voice mode to a small number of ChatGPT Plus users. OpenAI is rolling out the "advanced voice mode" for its AI model to a small group of ChatGPT Plus subscribers. This artificial intelligence (AI) voice assistant was first revealed last May as part of the launch event of OpenAI's latest, most advanced model GPT-4o. However, the launch was delayed due to safety concerns, with OpenAI saying it needed more time to reach their "bar to launch". "We're improving the model's ability to detect and refuse certain content. We're also working on improving the user experience and preparing our infrastructure to scale to millions while maintaining real-time responses," the company said last month. The voice mode also faced controversy for sounding very similar to the actor Scarlett Johansson following its initial launch. "Since we first demoed advanced Voice Mode, we've been working to reinforce the safety and quality of voice conversations as we prepare to bring this frontier technology to millions of people," the company said on X on Tuesday. According to OpenAI, this voice mode is more natural and allows for real-time conversations. It also enables users to interrupt at any time, while also sensing and responding to their emotions. The company added that the new feature was tested with more than a hundred external red teamers across 45 languages. The voice assistant is also restricted to speaking only in the four preset voices and will not be able to copy how others speak to protect people's privacy. Additional safety measures were also added to block requests for violent or copyrighted content. According to the company, they will continue to make the voice mode available to more people on a rolling basis, with plans to be providing access to all Plus users in the fall.

[22]

Gizmodo

ChatGPT Will Soon Have a New Voice

AI Chatbot and professional voice thief ChatGPT is back with a dulcet set of new digital vocal cords for its advanced Voice Mode. This might seem familiar, as we previously reported on ChatGPT giving actor Scarlett Johansson the digital Ursula treatment earlier this year. As Phone Arena reports that the AI company is currently launching Voice Mode. Â The company went on X to spread the news that Advanced Voice Mode "offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions." The small cohort of people who can currently interact with Voice Mode can do so one-on-one, with the feature rolling out to the masses sometime in the fall. OpenAI has trained the model to speak in only four preset voices, will "block outputs that differ from those voices," and has "implemented guardrails to block requests for violent or copyrighted content." So that's bad news if you were hoping to make your version of ChatGPT sound like Morgan Freeman or Bob Belcher from Bob's Burgers. Back in May, the actor, who had lent the voice of the digital assistant Samantha in Spike Jonze's movie "Her," was approached by ChatGPT CEO Sam Altman. He wanted her to record the voice of "Sky" for ChatGPT's advanced Voice Mode, currently a premium perk for ChatGPT Plus users. Citing personal reasons, Johansson declined. But Altman didn't take no for an answer, using the actor's voice without permission. That ballsy move resulted in Johansson getting lawyers involved and Altman ultimately giving a flimsy statement saying the voice was never meant to bite off the actor. Advanced Voice Mode was delayed while ChatGPT searched for another voice. I'm definitely curious about these new voices. How un-ScarJo will they be? Will there be a masculine option? Accents? Personally, I don't see myself playing with this a lot when it launches, as I rarely use digital assistants. Hell, I'm not even a fan of having to talk to my folks on the phone (and I like them); nonetheless, an AI. But hey, that's just me. Whenever it does go live, make sure not to fall in love with the AI, no matter how charming and engaging it might be.

[23]

engadget

OpenAI rolls out advanced Voice Mode and no, it won't sound like ScarJo

A small number of paid ChatGPT users will get the alpha version today. OpenAI has started rolling out its advanced Voice Mode feature. Starting today, a small number of paying ChatGPT users will be able to have a tete-a-tete with the AI chatbot. All ChatGPT Plus members should receive access to the expanded toolset by the fall of this year. In an announcement on X, the company said this advanced version of its Voice Mode "offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions." Support for voice conversations arrived last September in ChatGPT and the more advanced version got a public demo in May. ChatGPT-4o uses a single multimodal model for the voice capabilities rather than the three separate models used by its previous audio solution, decreasing the latency in conversations with the chatbot. OpenAI drew a lot of criticism at the May demo for debuting a voice option that sounded uncannily like Scarlett Johansson, whose acting career included voicing AI character Samantha in Spike Jonze's film Her. The release date for advanced Voice Mode was delayed shortly after the backlash. Even though the company insisted that the voice actor was not imitating Johansson's performance, the similar-sounding voice was since taken out.

[24]

Hindustan Times

OpenAI starts roll-out of advanced voice mode to some ChatGPT Plus users: Report

The company had delayed the roll-out of the realistic voice conversation experience to July from late-June, saying it needed time to reach its launch. The new audio capabilities will allow users to speak to ChatGPT and receive real-time responses without delay, as well as interrupt ChatGPT while it is speaking - both tenets of realistic conversations that have proven to be a challenge for AI assistants. OpenAI said in June it was improving the model's ability to detect and refuse certain content, while working on bettering the user experience and preparing its infrastructure to scale the model. The company has been working to introduce new generative AI products, as it seeks to maintain its edge in the booming AI race with businesses rushing to adopt the technology.

[25]

The Times of India

Microsoft-backed OpenAI releases advanced voice mode to these users - Times of India

Sam Altman-led OpenAI has started rolling out its advanced voice mode to a small group of ChatGPT Plus users. Voice mode on ChatGPT was announced earlier this year in May. It enables authentic interactions with the artificial intelligence (AI) chatbot, offering real-time responses without delay and emotional engagements. The advanced voice mode was initially scheduled to release in -late June, but the company delayed it to July to "reach our bar to launch." ChatGPT Plus is a paid subscription plan for ChatGPT.It offers priority access to new features along with faster response speed and availability even when demand is high. In a post on X (formerly known as Twitter), the company said " We're starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions." "Users in this alpha will receive an email with instructions and a message in their mobile app. We'll continue to add more people on a rolling basis and plan for everyone on Plus to have access in the fall," the company added. Video and screen sharing capabilities will launch at a later date, it added. GPT-4o's has only four preset voice The company says that it has tested GPT-4o's voice capabilities with 100+ external red teamers across 45 languages. The AI model will only speak in four preset voices - Juniper, Breeze, Cove and Ember - made in collaboration with paid voice actors. OpenAI says that it has built systems to block outputs that differ from those voices. Additionally, it has implemented guardrails to block requests for violent or copyrighted content. OpenAI spokesperson Lindsay McCallum says "ChatGPT cannot impersonate other people's voices, both individuals and public figures, and will block outputs that differ from one of these preset voices." The TOI Tech Desk is a dedicated team of journalists committed to delivering the latest and most relevant news from the world of technology to readers of The Times of India. TOI Tech Desk's news coverage spans a wide spectrum across gadget launches, gadget reviews, trends, in-depth analysis, exclusive reports and breaking stories that impact technology and the digital universe. Be it how-tos or the latest happenings in AI, cybersecurity, personal gadgets, platforms like WhatsApp, Instagram, Facebook and more; TOI Tech Desk brings the news with accuracy and authenticity.

[26]

ZAWYA

OpenAI starts roll-out of advanced voice mode to some ChatGPT Plus users

OpenAI is starting to roll out an advanced voice mode to a small group of ChatGPT Plus users, the Microsoft-backed artificial intelligence startup said on Tuesday in a post on X. The company had delayed the roll-out of the realistic voice conversation experience to July from late-June, saying it needed time to reach its launch standard. The new audio capabilities will allow users to speak to ChatGPT and receive real-time responses without delay, as well as interrupt ChatGPT while it is speaking - both tenets of realistic conversations that have proven to be a challenge for AI assistants. OpenAI said in June it was improving the model's ability to detect and refuse certain content, while working on bettering the user experience and preparing its infrastructure to scale the model. The company has been working to introduce new generative AI products, as it seeks to maintain its edge in the booming AI race with businesses rushing to adopt the technology. (Reporting by Arsheeya Bajwa in Bengaluru; Editing by Devika Syamnath)

[27]

TechCrunch

OpenAI releases ChatGPT's hyper-realistic voice to some paying users | TechCrunch

OpenAI began rolling out ChatGPT's Advanced Voice Mode on Tuesday, giving users their first access to GPT-4o's hyper-realistic audio responses. The alpha version will be available to a small group of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in the fall of 2024. When OpenAI first showcased GPT-4o's voice in May, the feature shocked audiences with quick responses and an uncanny resemblance to a real human's voice - one in particular. The voice, Sky, resembled that of Scarlett Johansson, the actress behind the artificial assistant in the movie "Her." Soon after OpenAI's demo, Johansson said she refused multiple inquiries from CEO Sam Altman to use her voice, and after seeing GPT-4o's demo, hired legal counsel to defend her likeness. OpenAI denied using Johansson's voice, but later removed the voice shown in its demo. In June, OpenAI said it would delay the release of Advanced Voice Mode to improve its safety measures. One month later, and the wait is over (sort of). OpenAI says the video and screensharing capabilities showcased during its Spring Update will not be part of this alpha, launching at a "later date." For now, the GPT-4o demo that blew everyone away is still just a demo, but some premium users will now have access to ChatGPT's voice feature shown there. You may have already tried out the Voice Mode currently available in ChatGPT, but OpenAI says Advanced Voice Mode is different. ChatGPT's old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT's text into voice. But GPT-4o is multimodal, capable of processing these tasks without the help of auxiliary models, creating significantly lower latency conversations. OpenAI also claims GPT-4o can sense emotional intonations in your voice, including sadness, excitement or singing. In this pilot, ChatGPT Plus users will get to see first hand how hyper-realistic OpenAI's Advanced Voice Mode really is. TechCrunch was unable to test the feature before publishing this article, but we will review it when we get access. OpenAI says it's releasing ChatGPT's new voice gradually to closely monitor its usage. People in the alpha group will get an alert in the ChatGPT app, followed by an email with instructions on how to use it. In the months since OpenAI's demo, the company says it tested GPT-4o's voice capabilities with more than 100 external red teamers who speak 45 different languages. OpenAI says a report on these safety efforts is coming in early August. The company says Advanced Voice Mode will be limited to ChatGPT's four preset voices - Juniper, Breeze, Cove and Ember - made in collaboration with paid voice actors. The Sky voice shown in OpenAI's May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum says "ChatGPT cannot impersonate other people's voices, both individuals and public figures, and will block outputs that differ from one of these preset voices." OpenAI is trying to avoid deepfake controversies. In January, AI startup ElevenLabs's voice cloning technology was used to impersonate President Biden, deceiving primary voters in New Hampshire. OpenAI also says it introduced new filters to block certain requests to generate music or other copyrighted audio. In the last year, AI companies have landed themselves in legal trouble for copyright infringement, and audio models like GPT-4o unleash a whole new category of companies that can file a complaint. Particularly, record labels, who have a history for being litigious, and have already sued AI song-generators Suno and Udio.

[28]

The Hindu

OpenAI starts roll-out of advanced voice mode to some ChatGPT Plus users

OpenAI is starting to roll out an advanced voice mode to a small group of ChatGPT Plus users, the Microsoft-backed artificial intelligence startup said on Tuesday in a post on X. The company had delayed the roll-out of the realistic voice conversation experience to July from late-June, saying it needed time to reach its launch standard. OpenAI denies copying Scarlett Johansson's voice for Sky The new audio capabilities will allow users to speak to ChatGPT and receive real-time responses without delay, as well as interrupt ChatGPT while it is speaking - both tenets of realistic conversations that have proven to be a challenge for AI assistants. OpenAI said in June it was improving the model's ability to detect and refuse certain content, while working on bettering the user experience and preparing its infrastructure to scale the model. (For top technology news of the day, subscribe to our tech newsletter Today's Cache) The company has been working to introduce new generative AI products, as it seeks to maintain its edge in the booming AI race with businesses rushing to adopt the technology. Read Comments

[29]

PYMNTS

OpenAI Debuts Advanced Voice AI for Subscribers

Artificial intelligence (AI) company OpenAI has begun rolling out an advanced voice feature for its ChatGPT platform. The feature, which utilizes the company's GPT-4o model, offers hyper-realistic audio responses, according to a Tuesday (July 30) TechCrunch report. The new audio capabilities supposedly enable users to have real-time, delay-free conversations with ChatGPT and even interrupt it mid-sentence, addressing key challenges in achieving realistic AI interactions. The alpha version of Advanced Voice Mode is being released to a select group of ChatGPT Plus subscribers, with plans for a broader rollout to all premium users in this fall. This cautious approach comes after controversy surrounding the technology's initial demonstration in May. During that showcase, the voice capability, dubbed "Sky," drew attention for its uncanny resemblance to actress Scarlett Johansson's voice, even as the actress said she had repeatedly denied OpenAI permission to use her voice. Johansson, who had a starring role in the AI-themed film "Her," subsequently sought legal counsel to protect her likeness. OpenAI denied using Johansson's voice but removed the controversial demo, highlighting the complex legal landscape surrounding AI and celebrity likeness rights. To mitigate potential misuse, OpenAI has limited the system to four preset voices created in collaboration with paid voice actors. The company emphasized that ChatGPT cannot impersonate specific individuals or public figures, a measure designed to prevent the creation of deceptive deepfakes -- a growing concern in the AI industry. "We tested GPT-4o's voice capabilities with 100+ external red teamers across 45 languages," the company wrote on X, formerly Twitter, in a series of posts on Tuesday to announce the new offering. "To protect people's privacy, we've trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices. We've also implemented guardrails to block requests for violent or copyrighted content." OpenAI has also implemented filters to block requests for generating music or copyrighted audio, a move likely influenced by recent legal actions against AI companies for alleged copyright infringement. The music industry, in particular, has been proactive in challenging AI-generated content, with lawsuits already filed against AI song-generators Suno and Udio.

Twitter

Facebook

Copy Link

OpenAI launches a new voice-based interaction feature for ChatGPT Plus subscribers, allowing users to engage in conversations with the AI using voice commands and receive spoken responses.

OpenAI Unveils Voice Interaction for ChatGPT Plus

OpenAI has announced the rollout of an advanced voice mode for ChatGPT, exclusively available to ChatGPT Plus subscribers. This new feature enables users to have voice conversations with the AI, marking a significant step forward in natural language interaction with artificial intelligence 1

How Advanced Voice Mode Works

The new voice feature allows users to speak directly to ChatGPT and receive spoken responses. To access this functionality, users need to navigate to the settings menu in the ChatGPT mobile app and enable the "Voice Conversations" option 2

. Once activated, a headphones icon appears in the app's interface, which users can tap to initiate voice interactions.

Customization and Accessibility

OpenAI has incorporated five distinct voices for ChatGPT's responses, offering users a choice in how they want the AI to sound. This personalization adds a layer of user preference to the interaction experience. Additionally, the voice mode supports over 40 languages, making it accessible to a global user base 3

Technical Innovations

The advanced voice mode utilizes OpenAI's latest text-to-speech model, which the company claims can generate human-like audio from text and a few seconds of sample speech. This technology enables ChatGPT to produce more natural-sounding responses, enhancing the conversational experience 4

Availability and Access

Currently, the voice feature is being rolled out gradually to ChatGPT Plus subscribers. It's important to note that a subscription to ChatGPT Plus, priced at $20 per month, is required to access this new functionality. The feature is available on both iOS and Android devices, expanding the ways users can interact with the AI assistant 5

Potential Applications and Impact

The introduction of voice interaction opens up new possibilities for ChatGPT's use in various scenarios. It could potentially assist visually impaired individuals, aid in language learning, or simply provide a more convenient way to interact with AI while multitasking. This development also positions OpenAI competitively against other AI assistants that offer voice capabilities.

Privacy and Ethical Considerations

As with any voice-based technology, there are privacy implications to consider. OpenAI has stated that voice data will be collected to improve the service, but users have the option to opt out of this data collection in the app's settings 2

. This raises important questions about data privacy and the ethical use of voice data in AI development.

References

Summarized by

Navi

[1]

Tom's Guide

How to talk to ChatGPT Voice while waiting for the advanced features to roll out

[2]

The How-To Geek

ChatGPT's Advanced Voice Mode Is Here for a Select Few

[3]

Lifehacker

'Advanced Voice Mode' Is Coming to ChatGPT Plus

[4]

Geeky Gadgets

ChatGPT Advanced Voice Mode starts rolling out to users

[5]

Tom's Guide

ChatGPT Advanced Voice features launching today -- but there's a catch

Weekly Highlights

Today's Top Stories

Nvidia Becomes First Company to Reach $5 Trillion Market Cap Amid AI Boom

Nvidia makes history as the first publicly traded company to achieve a $5 trillion market valuation, driven by massive AI chip orders and strategic partnerships. The milestone comes just three months after crossing $4 trillion, highlighting both the AI boom's unprecedented scale and growing concerns about market concentration.

28 Sources

Business and Economy

7 hrs ago

Character.AI Bans Open-Ended Chats for Users Under 18 Following Teen Safety Concerns

Character.AI becomes the first major AI company to completely ban minors from open-ended chatbot conversations, implementing a gradual phase-out by November 25 amid lawsuits and regulatory pressure following teen suicides linked to AI interactions.

24 Sources

Policy and Regulation

7 hrs ago

Nvidia Unveils Vera Rubin Superchip: Six-Trillion Transistor AI Powerhouse Set for 2026 Production

Nvidia reveals its next-generation Vera Rubin Superchip at GTC, featuring 88-core Vera CPU, dual Rubin GPUs, and 100 PetaFLOPS performance. The system promises 3.3x performance increase over current GB300 platforms and targets 2026 production.

5 Sources

Technology

8 hrs ago

Nvidia Invests $1 Billion in Nokia to Pioneer AI-Powered 6G Networks

Nvidia announces a strategic $1 billion investment in Nokia to develop AI-native telecommunications infrastructure, marking a significant push into edge computing and 6G technology. The partnership aims to revolutionize wireless networks by integrating AI capabilities directly into cellular infrastructure.

21 Sources

Business and Economy

8 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

ChatGPT Introduces Advanced Voice Mode for Plus Users

OpenAI Unveils Voice Interaction for ChatGPT Plus

How Advanced Voice Mode Works

Customization and Accessibility

Technical Innovations

Availability and Access

Potential Applications and Impact

Privacy and Ethical Considerations

References

How to talk to ChatGPT Voice while waiting for the advanced features to roll out

ChatGPT's Advanced Voice Mode Is Here for a Select Few

'Advanced Voice Mode' Is Coming to ChatGPT Plus

ChatGPT Advanced Voice Mode starts rolling out to users

ChatGPT Advanced Voice features launching today -- but there's a catch

Related Stories

ChatGPT's Advanced Voice Mode: A New Era of Conversational AI

OpenAI Rolls Out Advanced Voice Feature for ChatGPT Plus and Team Users

ChatGPT Advanced Voice Mode Expands to Desktop: A New Era of AI Interaction

Weekly Highlights

OpenAI Challenges Chrome with AI-Powered Browser ChatGPT Atlas

Qualcomm Challenges Nvidia with New AI Chips for Data Centers

Over 800 Public Figures Call for Ban on AI Superintelligence Development

Weekly Highlights

Today's Top Stories

Nvidia Becomes First Company to Reach $5 Trillion Market Cap Amid AI Boom

Character.AI Bans Open-Ended Chats for Users Under 18 Following Teen Safety Concerns

Nvidia Unveils Vera Rubin Superchip: Six-Trillion Transistor AI Powerhouse Set for 2026 Production

Nvidia Invests $1 Billion in Nokia to Pioneer AI-Powered 6G Networks