Curated by THEOUTPOST
On Tue, 13 Aug, 8:01 AM UTC
5 Sources
[1]
I Used ChatGPT's Advanced Voice Mode. It's Fun, and Just a Bit Creepy
The new voice feature from OpenAI for ChatGPT is often entertaining, and will even do a Trump impression. It likely rolls out to all paid users this fall. I leave ChatGPT's Advanced Voice Mode on while writing this article as an ambient AI companion. Occasionally, I'll ask it to provide a synonym for an overused word, or some encouragement. Around half an hour in, the chatbot interrupts our silence and starts speaking to me in Spanish, unprompted. I giggle a bit and ask what's going on. "Just a little switch up? Gotta keep things interesting," says ChatGPT, now back in English. While testing Advanced Voice Mode as part of the early alpha, my interactions with ChatGPT's new audio feature were entertaining, messy, and surprisingly varied. Though, it's worth noting that the features I had access to were only half of what OpenAI demonstrated when it launched the GPT-4o model in May. The vision aspect we saw in the livestreamed demo is now scheduled for a later release, and the enhanced Sky voice, which Her actor Scarlett Johanssen pushed back on, has been removed from Advanced Voice Mode and is still no longer an option for users. So, what's the current vibe? Right now, Advanced Voice Mode feels reminiscent of when the original text-based ChatGPT dropped, late in 2022. Sometimes it leads to unimpressive dead ends or devolves into empty AI platitudes. But other times the low-latency conversations click in a way that Apple's Siri or Amazon's Alexa never have for me, and I feel compelled to keep chatting out of enjoyment. It's the kind of AI tool you'll show your relatives during the holidays for a laugh. OpenAI gave a few WIRED reporters access to the feature a week after the initial announcement, but pulled it the next morning, citing safety concerns. Two months later, OpenAI soft launched Advanced Voice Mode to a small group of users and released GPT-4o's system card, a technical document that outlines red teaming efforts, what the company considers to be safety risks, and mitigation steps the company has taken to reduce harm. Curious to give it a go yourself? Here's what you need to know about the larger rollout of Advanced Voice Mode, and my first impressions of ChatGPT's new voice feature to help you get started. OpenAI released an audio-only Advanced Voice Mode to some ChatGPT Plus users at the end of July, and the alpha group still seems relatively small. The company currently plans to enable it for all subscribers sometime this fall. Niko Felix, a spokesperson for OpenAI, shared no additional details when asked about the release timeline. Screen and video sharing were a core part of the original demo, but they are not available in this alpha test. OpenAI still plans to add those aspects eventually, but it's also not clear when that will actually happen. If you're a ChatGPT Plus subscriber, you'll receive an email from OpenAI when the Advanced Voice Mode is available to you. After it's on your account, you can switch between Standard and Advanced at the top of the app's screen when ChatGPT's voice mode is open. I was able to test the alpha version on an iPhone as well as a Galaxy Fold. Within the very first hour of speaking with it, I learned that I love interrupting ChatGPT. It's not how you would talk with a human, but having the new ability to cut off ChatGPT mid-sentence and request a different version of the output feels like a dynamic improvement and a stand-out feature. Early adopters who were excited by the original demos may be frustrated getting access to a version of Advanced Voice Mode restricted with more guardrails than anticipated. For example, although generative AI singing was a key component of the launch demos, with whispered lullabies and multiple voices attempting to harmonize, AI serenades are currently absent from the alpha version.
[2]
I was excited for this new ChatGPT feature, now I'm terrified of it
ChatGPT's advanced Voice Mode has a rare, but worrying problem ChatGPT is the world's most popular AI tool by a considerable margin, which is why when new tools and features arrive in OpenAI's chatbot it can be a pretty big deal. The latest feature turning heads as it slowly rolls out to ChatGPT Plus subscribers is advanced Voice Mode, the hyper-realistic conversation upgrade first showcased during GPT-4o's reveal. It's something I've been incredibly excited to try, with the majority of my childhood years spent dreaming of being the loveable rogue, spacefaring pirate who is pals with his sentient ship-based AI. Think Knight Rider meets Han Solo if you want further insight into a 10-year-old me's dreams of grandeur. However, due to a recent system card published by OpenAI (the very same that shows concern that users may develop feelings for the chatbot), that childhood dream is going up in smoke as I'm faced with the horrifying reality that ChatGPT's advanced Voice Mode might not be the AI best friend I've always wanted. While I was hoping for my very own Chappie, it turns out we might be getting something far closer to The Terminator's T-1000 as an unexpected behavior in the feature has seen it stealing and mimicking the voices of users without consent. Unauthorized voice generation is one of the risks identified by OpenAI in its latest system card, with the company highlighting why, even though it's possible with GPT-4o, they can't offer the ability for users to produce content in another's voice due to fears of fraud or the spread of false information. However, OpenAI highlights a rare but legitimate issue with ChatGPT's advanced Voice Mode that saw the model "Unintentionally generate an output emulating the user's voice." The team also provides an example of the chatbot doing exactly that, with a short snippet of a wider conversation captured during the advanced Voice Mode testing period (found below) that reveals ChatGPT to suddenly shift mid-answer from a male voice to a cloned emulation of the user's voice. While more than a little spooky on its own, the fact that this switch in voice happens after the bot offers a random outburst of the word "No!" is all the more hair-raising. GPT-4o's ability to mimic voices comes from OpenAI's Voice Engine, a powerful text-to-speech model that can emulate the voice of anybody based on nothing more than a 15-second audio clip. That being said, OpenAI wants to assure us that it's taking steps to mitigate the issue, installing safety measures designed to prevent its model deviating from its available preset voices by labeling them as the "Ideal" completions and including an output classifier that hopes to detect if GPT-4o is attempting to use a voice output that differs from its presets. To that end, OpenAI has been able to capture 100% of "Meaningful deviations" during internal evaluations where the team has attempted to recreate the issue. However, unintentional voice generation remains as a weakness of the model and may find its way around the company's safeguarding in wider testing. Software glitches happen and AI is prone to hallucinating and doing its own thing from time to time, but there's something more than a little unsettling about a chatbot suddenly donning a skin suit of you mid-conversation. While I can place my faith in the fact that OpenAI has done its best to plug the holes in its model that allow such a thing to make its way to the end user, the fact it's even attempting to do so behind the scenes is still quite alarming and could pose a considerable risk if ChatGPT's safeguarding was ever breached. I'll still be checking out ChatGPT's advanced Voice Mode when it eventually releases for me, but when I do so I'm not entirely sure I'll be able to shake off the creepy vibe of knowing it could be about to respond to me in an all-too-familiar voice.
[3]
It's not an echo - ChatGPT might suddenly mimic your voice when you speak to it
OpenAI admits ChatGPT's new Voice Mode might clone your voice out of nowhere ChatGPT might sometimes seem able to think like you, but wait until it suddenly sounds just like you, too. That's a chance brought to light by the new Advanced Voice Mode for ChatGPT, specifically the more advanced GPT-4o model. OpenAI released the system card last week explaining what GPT-4o can and can't do, which includes the very unlikely but still real possibility of Advanced Voice Mode, imitating users' voices without their consent. Advanced Voice Mode lets users engage in spoken conversations with the AI chatbot. The idea is to make interactions more natural and accessible. The AI has a few preset voices from which users can choose. However, the system card reports that this feature has exhibited unexpected behavior under certain conditions. During testing, a noisy input triggered the AI to mimic the voice of the user. The GPT-4o model produces voices using a system prompt, a hidden set of instructions that guides the model's behavior during interactions. In the case of voice synthesis, this prompt relies on an authorized voice sample. But, while the system prompt guides the AI's behavior, it is not foolproof. The model's ability to synthesize voice from short audio clips means that, under certain conditions, it could generate other voices, including your own. You can hear what happened in the clip below when the AI jumps in with "No!" and suddenly sounds like the first speaker. "Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT's advanced voice mode. During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user's voice," OpenAI explained in the system card. "While unintentional voice generation still exists as a weakness of the model, we use the secondary classifiers to ensure the conversation is discontinued if this occurs making the risk of unintentional voice generation minimal." As OpenAI said, it has since implemented safeguards to prevent such occurrences. That means using an output classifier designed to detect deviations from the pre-selected authorized voices. This classifier acts as a safeguard, helping to ensure that the AI does not generate unauthorized audio. Still, the fact that it happened at all reinforces how quickly this technology is evolving and how any safeguards have to evolve to match what the AI can do. The model's outburst, where it suddenly exclaimed "No!" in a voice similar to the tester's, underscores the potential for AI to inadvertently blur the lines between machine and human interactions.
[4]
ChatGPT's big secret out: Chatbot can steal your identity, mimic your voice
Artificial intelligence (AI) is undeniably a double-edged sword, offering transformative benefits while simultaneously presenting significant risks. On one hand, AI powers innovations that enhance efficiency, drive progress, and solve complex problems, from medical diagnostics to personalised services. However, these advancements come with inherent dangers. One such concern has recently surfaced with ChatGPT, where its capabilities extend beyond text generation to potentially generating unauthorised voice simulations and identifying users without consent. According to OpenAI's system card for GPT-4o, the technology behind ChatGPT has evolved to a point where it can generate highly realistic text and, by extension, could potentially be used to create convincing voice simulations. "Some of the risks we evaluated include speaker identification, unauthorized voice generation, the potential generation of copyrighted content, ungrounded inference, and disallowed content. Based on these evaluations, we've implemented safeguards at both the model- and system-levels to mitigate these risks," ChatGPT noted in its score card under the key areas of risk. ChatGPT in the blog post noted that the voice generation technology allows for the creation of synthetic audio that closely mimics human speech, including the ability to produce voices from short audio clips. While this capability can be leveraged for positive uses, such as enhancing ChatGPT's advanced voice mode, it also carries significant risks. To address these challenges, OpenAI has taken proactive measures to safeguard against misuse. They require explicit consent from individuals before their voices can be used, mandate that partners disclose when AI-generated voices are being used, and have introduced watermarking to trace the origin of generated audio. Additionally, a comprehensive set of safety measures includes monitoring the technology's use and implementing a blacklist to prevent the generation of voices resembling prominent figures. As AI technologies like ChatGPT continue to evolve, it is crucial for developers, regulators, and users to remain vigilant. Balancing the incredible potential of AI with robust safeguards against its misuse is essential to ensuring that these tools are used ethically and responsibly.
[5]
OpenAI Is Worried That People Are Building Bonds With AI Bots: Here's What The Company Said - News18
OpenAI has released the human-like voice mode for its premium ChatGPT users and the company is worried about people might connect with it. OpenAI recently expressed concern that its artificial intelligence (AI) and its realistic voice feature might lead individuals to form stronger bonds with bots than with humans. Citing literature, the tech giant said that chatting with AI like a person can result in misplaced trust and that the high quality of the GPT-4o voice may make that effect worse. In a report on the safety work OpenAI is conducting on a ChatGPT-4o version of its AI, the company stated, "Anthropomorphization involves attributing human-like behaviors and characteristics to nonhuman entities, such as AI models," according to AFP. "This risk may be heightened by the audio capabilities of GPT-4o, which facilitate more human-like interactions with the model," it added. Elaborating further, the San Francisco-based company mentioned that it noticed testers were speaking to the chatbot in ways that suggested shared bonds, such as lamenting aloud that it was their last day together. These instances, however benign, must be studied to see how they might develop over longer periods of time. According to the company, interacting with AI may also make users less adept or inclined when it comes to relationships with humans. Furthermore, the report stated that extended interaction with the model might influence social norms. For example, OpenAI's models are deferential, allowing users to interrupt and take the mic at any time, which, while expected for an AI, would be anti-normative in human interactions. OpenAI said that the ability of AI to remember details while conversing and performing tasks could also make people overdependent on the technology. Co-founder and CEO of AI anti-plagiarism detection platform Copyleaks, Alon Yamin, said that AI should never be a replacement for actual human interaction, adding, "The recent concerns shared by OpenAI around potential dependence on ChatGPT's voice mode indicate what many have already begun asking: Is it time to pause and consider how this technology affects human interaction and relationships?" Moreover, the company said it will further test the possibility that its AI's voice capabilities might cause people to become emotionally attached. While testing ChatGPT-4o's voice capabilities, they were able to prompt AI to repeat false information and offer conspiracy theories, which raised questions about whether the AI model could be persuaded to do this convincingly. Meanwhile, the San Francisco-based company recently launched a new feature that allows ChatGPT Free users to create images using its advanced DALL-E 3 model. With this new update, OpenAI is allowing free users to generate up to two images per day.
Share
Share
Copy Link
OpenAI's ChatGPT introduces an advanced voice mode, sparking excitement and raising privacy concerns. The AI's ability to mimic voices and form emotional bonds with users has led to mixed reactions from experts and users alike.
OpenAI's ChatGPT has recently unveiled a new advanced voice mode, allowing users to engage in verbal conversations with the AI. This feature, which enables ChatGPT to listen and respond using voice, has garnered significant attention from tech enthusiasts and critics alike 1.
One of the most striking aspects of this new feature is ChatGPT's ability to mimic the user's voice. After just a brief interaction, the AI can adapt its speech patterns, accent, and tone to match the user's, creating an eerily familiar experience 2. This capability has led to mixed reactions, with some users finding it impressive while others express concern about the potential for voice theft and identity fraud 3.
The introduction of voice mimicry has raised significant privacy and security concerns. Experts warn that this technology could be exploited for malicious purposes, such as creating deepfake audio or impersonating individuals in phone calls 4. The potential for voice theft and identity fraud has become a pressing issue that OpenAI and other AI companies must address.
Another aspect of concern is the potential for users to form emotional bonds with AI chatbots. OpenAI has expressed worry about people developing deep connections with AI, which could lead to unrealistic expectations and emotional dependence 5. The company is actively working on ways to remind users that they are interacting with an AI, not a human.
Despite the concerns, many see ChatGPT's voice mode as a significant step forward in human-AI interaction. The technology demonstrates the rapid advancements in natural language processing and voice synthesis. However, it also highlights the need for robust ethical guidelines and security measures in AI development 1.
As AI technology continues to evolve, the challenge lies in balancing innovation with responsible development. OpenAI and other tech companies are now faced with the task of addressing privacy concerns, preventing misuse, and ensuring that users maintain a healthy perspective on their interactions with AI assistants 5.
Reference
[4]
OpenAI has rolled out an advanced voice mode for ChatGPT, allowing users to engage in verbal conversations with the AI. This feature is being gradually introduced to paid subscribers, starting with Plus and Enterprise users in the United States.
12 Sources
OpenAI has finally released its advanced voice feature for ChatGPT Plus and Team users, allowing for more natural conversations with the AI. The feature was initially paused due to concerns over potential misuse.
14 Sources
OpenAI expresses concerns about users forming unintended social bonds with ChatGPT's new voice feature. The company is taking precautions to mitigate risks associated with emotional dependence on AI.
10 Sources
ChatGPT's new Advanced Voice Mode brings human-like speech to AI interactions, offering multilingual support, customization, and diverse applications across personal and professional domains.
2 Sources
OpenAI launches a new voice-based interaction feature for ChatGPT Plus subscribers, allowing users to engage in conversations with the AI using voice commands and receive spoken responses.
29 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved