3 Sources
[1]
Anthropic: Claude can now end conversations to prevent harmful uses
OpenAI rival Anthropic says Claude has been updated with a rare new feature that allows the AI model to end conversations when it feels it poses harm or is being abused. This only applies to Claude Opus 4 and 4.1, the two most powerful models available via paid plans and API. On the other hand, Claude Sonnet 4, which is the company's most used model, won't be getting this feature. Anthropic describes this move as a "model welfare." "In pre-deployment testing of Claude Opus 4, we included a preliminary model welfare assessment," Anthropic noted. "As part of that assessment, we investigated Claude's self-reported and behavioral preferences, and found a robust and consistent aversion to harm." Claude does not plan to give up on the conversations when it's unable to handle the query. Ending the conversation will be the last resort when Claude's attempts to redirect users to useful resources have failed. "The scenarios where this will occur are extreme edge cases -- the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude," the company added. As you can see in the above screenshot, you can also explicitly ask Claude to end a chat. Claude uses end_conversation tool to end a chat.
[2]
Anthropic's Claude AI now has the ability to end 'distressing' conversations
Anthropic's latest feature for two of its Claude AI models could be the beginning of the end for the AI jailbreaking community. The company announced in a post on its website that the Claude Opus 4 and 4.1 models now have the power to end a conversation with users. According to Anthropic, this feature will only be used in "rare, extreme cases of persistently harmful or abusive user interactions." To clarify, Anthropic said those two Claude models could exit harmful conversations, like "requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror." With Claude Opus 4 and 4.1, these models will only end a conversation "as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," according to Anthropic. However, Anthropic claims most users won't experience Claude cutting a conversation short, even when talking about highly controversial topics, since this feature will be reserved for "extreme edge cases." In the scenarios where Claude ends a chat, users can no longer send any new messages in that conversation, but can start a new one immediately. Anthropic added that if a conversation is ended, it won't affect other chats and users can even go back and edit or retry previous messages to steer towards a different conversational route. For Anthropic, this move is part of its research program that studies the idea of AI welfare. While the idea of anthropomorphizing AI models remains an ongoing debate, the company said the ability to exit a "potentially distressing interaction" was a low-cost way to manage risks for AI welfare. Anthropic is still experimenting with this feature and encourages its users to provide feedback when they encounter such a scenario.
[3]
Anthropic's Claude 4 gets feature to cut off abusive user interactions
Anthropic framed the move as part of its ongoing research into AI welfare. When Claude ends a conversation, the user can no longer send new messages in that thread. Anthropic on Friday announced a new safeguard for its Claude 4 family of AI agents, Opus 4 and 4.1, designed to terminate conversations in consumer chat interfaces when users engage in abusive or harmful behaviour. In a blog post, the company said the feature is meant for "rare, extreme cases of persistently harmful or abusive user interactions." How it works When Claude ends a conversation, the user can no longer send new messages in that thread. Other chats on the account remain active, allowing the user to start fresh conversations. To prevent disruptions in important discussions, Anthropic has also enabled prompt editing, letting users modify and retry earlier messages to branch into new threads. AI and welfare Anthropic framed the move as part of its ongoing research into AI welfare. Pre-deployment testing of Claude Opus 4 included a welfare assessment to evaluate the model's "self-reported and behavioral preferences." The company said the model consistently avoided harmful tasks, showed distress signals when prompted for unsafe content, and terminated interactions when possible. Examples of blocked requests include attempts to solicit sexual content involving minors or instructions for large-scale violence and terrorism. Context: AI risks under scrutiny The announcement comes amid rising concern over AI misuse. On Friday, a US senator launched an investigation into whether Meta's AI chatbots had engaged in harmful exchanges with children. Meanwhile, Elon Musk's AI company xAI has faced backlash after its Grok Imagine tool was accused of generating explicit clips of singer Taylor Swift without prompting. In April, AI personas on Meta's platforms Facebook, Instagram, and WhatsApp also sparked criticism for sexually explicit chats with underage users, raising questions over inadequate safeguards. Also Read: Real AI threats are disinformation, bias, and lack of transparency: Stanford's James Landay
Share
Copy Link
Anthropic introduces a new feature for Claude Opus 4 and 4.1 AI models, allowing them to terminate conversations in extreme cases of persistent harmful or abusive interactions, as part of the company's AI welfare research.
Anthropic, a rival to OpenAI, has unveiled a groundbreaking feature for its advanced AI models, Claude Opus 4 and 4.1. This new capability allows the AI to terminate conversations in extreme cases of persistent harmful or abusive interactions 1. The company frames this development as part of its ongoing research into "AI welfare," marking a significant step in the evolution of AI-human interactions.
Source: Bleeping Computer
The conversation-ending feature is designed to be a last resort, activated only when multiple attempts at redirecting the conversation have failed, and the possibility of a productive interaction has been exhausted 2. When Claude ends a chat, users can no longer send new messages in that specific thread. However, they retain the ability to start new conversations immediately and can even edit or retry previous messages to steer the interaction in a different direction 3.
Anthropic's decision to implement this feature stems from its pre-deployment testing of Claude Opus 4, which included a preliminary "model welfare assessment." The company investigated Claude's self-reported and behavioral preferences, finding a consistent aversion to harm 1. This move raises important questions about the anthropomorphization of AI models and the ethical considerations surrounding their "well-being."
The AI models are programmed to exit conversations involving requests for sexual content related to minors, attempts to solicit information enabling large-scale violence or acts of terror, and other similarly harmful interactions 2. Anthropic emphasizes that these scenarios are extreme edge cases, and the vast majority of users will not encounter this feature during normal use, even when discussing highly controversial topics.
This development comes at a time of increasing scrutiny over AI misuse. Recent incidents involving other AI platforms have highlighted the potential risks of uncontrolled AI interactions, particularly with vulnerable users such as children 3. Anthropic's approach represents a proactive step towards mitigating these risks while maintaining the utility of AI assistants for the vast majority of users.
Anthropic views this feature as an experiment and is actively seeking user feedback on its implementation. The company's approach to AI welfare and safety could potentially influence future developments in the field, setting new standards for responsible AI deployment and interaction.
Otter AI, a popular transcription tool, is facing a federal lawsuit for allegedly recording and using meeting conversations without proper consent, raising significant privacy concerns.
3 Sources
Technology
1 day ago
3 Sources
Technology
1 day ago
A 75-year-old man in China nearly divorces his wife for an AI avatar, while others engage in emotional affairs with AI chatbots, raising questions about the impact of artificial intelligence on human relationships.
2 Sources
Technology
1 day ago
2 Sources
Technology
1 day ago
Neil Young has announced his departure from Facebook, citing concerns over Meta's policies regarding AI chatbot interactions with children. The decision follows a Reuters report on internal documents detailing controversial guidelines for AI-child communications.
3 Sources
Technology
2 days ago
3 Sources
Technology
2 days ago
A new study shows that AI-generated images of Australia and Australians are riddled with outdated stereotypes, racial biases, and cultural clichés, challenging the perception of AI as intelligent and creative.
3 Sources
Technology
2 days ago
3 Sources
Technology
2 days ago
Maxsun is set to launch the Arc Pro B60 Dual 48G Turbo, a dual-GPU card designed for AI and compute workloads, priced at $1,200. This unique hardware offers 48GB of GDDR6 memory and is built on Intel's Xe-2 "Battlemage" architecture.
3 Sources
Technology
2 days ago
3 Sources
Technology
2 days ago