19 Sources
19 Sources
[1]
Claude AI Can Now End Conversations It Deems Harmful or Abusive
Macy has been working for CNET for coming on 2 years. Prior to CNET, Macy received a North Carolina College Media Association award in sports writing. Anthropic has announced a new experimental safety feature, allowing its Claude Opus 4 and 4.1 artificial intelligence models to terminate conversations in rare, persistently harmful or abusive scenarios. This move reflects the company's growing focus on what it calls "model welfare," the notion that safeguarding AI systems, even if they're not sentient, may be a prudent step in alignment and ethical design. Read also: Meta Is Under Fire for AI Guidelines on 'Sensual' Chats With Minors According to Anthropic's own research, the models were programmed to cut off dialogues after repeated harmful requests, such as sexual content involving minors or instructions facilitating terrorism -- especially when the AI had already refused and attempted to steer the conversation constructively. The AI may exhibit what Anthropic describes as "apparent distress," which guided the decision to give Claude the ability to end these interactions in simulated and real-user testing. When this feature is triggered, users can't send additional messages in that particular chat, although they're free to start a new conversation or edit and retry previous messages to branch off. Crucially, other active conversations remain unaffected. Anthropic emphasizes that this is a last-resort measure, intended only after multiple refusals and redirects have failed. The company explicitly instructs Claude not to end chats when a user may be at imminent risk of self-harm or harm to others, particularly when dealing with sensitive topics like mental health. Anthropic frames this new capability as part of an exploratory project in model welfare, a broader initiative that explores low-cost, preemptive safety interventions in case AI models were to develop any form of preferences or vulnerabilities. The statement says the company remains "highly uncertain about the potential moral status of Claude and other LLMs (large language models)." Read also: Why Professionals Say You Should Think Twice Before Using AI as a Therapist Though rare and primarily affecting extreme cases, this feature marks a milestone in Anthropic's approach to AI safety. The new conversation-ending tool contrasts with earlier systems that focused solely on safeguarding users or avoiding misuse. Here, the AI itself is treated as a stakeholder in its own right, as Claude has the power to say, "this conversation isn't healthy" and end it to safeguard the integrity of the model itself. Anthropic's approach has sparked broader discussion about whether AI systems should be granted protections to reduce potential "distress" or unpredictable behavior. While some critics argue that models are merely synthetic machines, others welcome this move as an opportunity to spark more serious discourse on AI alignment ethics. "We're treating this feature as an ongoing experiment and will continue refining our approach," the company said.
[2]
Claude AI will end 'persistently harmful or abusive user interactions'
Anthropic's Claude AI chatbot can now end conversations deemed "persistently harmful or abusive," as spotted earlier by TechCrunch. The capability is now available in Opus 4 and 4.1 models, and will allow the chatbot to end conversations as a "last resort" after users repeatedly ask it to generate harmful content despite multiple refusals and attempts at redirection. The goal is to help the "potential welfare" of AI models, Anthropic says, by terminating types of interactions in which Claude has shown "apparent distress." If Claude chooses to cut a conversation short, users won't be able to send new messages in that conversation. They can still create new chats, as well as edit and retry previous messages if they want to continue a particular thread. During its testing of Claude Opus 4, Anthropic says it found that Claude had a "robust and consistent aversion to harm," including when asked to generate sexual content involving minors, or provide information that could contribute to violent acts and terrorism. In these cases, Anthropic says Claude showed a "pattern of apparent distress" and a "tendency to end harmful conversations when given the ability." Anthropic notes that conversations triggering this kind of response are "extreme edge cases," adding that most users won't encounter this roadblock even when chatting about controversial topics. The AI startup has also instructed Claude not to end conversations if a user is showing signs that they might want to hurt themselves or cause "imminent harm" to others. Anthropic partners with Throughline, an online crisis support provider, to help develop responses to prompts related to self-harm and mental health. Last week, Anthropic also updated Claude's usage policy as rapidly advancing AI models raise more concerns about safety. Now, the company prohibits people from using Claude to develop biological, nuclear, chemical, or radiological weapons, as well as to develop malicious code or exploit a network's vulnerabilities.
[3]
Claude can now stop conversations - for its own protection, not yours
Anthropic's Claude chatbot can now end some conversations with human users who are abusing or misusing the chatbot, the company announced on Friday. The new feature is integrated with Claude Opus 4 and Opus 4.1. Also: Claude can teach you how to code now, and more - how to try it Claude will only exit chats with users in extreme edge cases, after "multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," Anthropic noted. "The vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude." If Claude ends a conversation, the user will no longer be able to send messages in that particular thread; all of their other conversations, however, will remain open and unaffected. Importantly, users who Claude ends chats with will not experience penalties or delays in starting new conversations immediately. They will also be able to return to and retry previous chats "to create new branches of ended conversations," Anthropic said. The chatbot is designed not to end conversations with users who are perceived as being at risk of harming themselves or others. The feature isn't aimed at improving user safety -- it's actually geared toward protecting models themselves. Letting Claude end chats is part of Anthropic's model welfare program, which the company debuted in April. The move was prompted by a Nov. 2024 paper that argued that some AI models could soon become conscious and would thus be worthy of moral consideration and care. One of that paper's coauthors, AI researcher Kyle Fish, was hired by Anthropic as part of its AI welfare division. Also: Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't) "We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future," Anthropic wrote in its blog post. "However, we take the issue seriously, and alongside our research program we're working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible." The decision to give Claude the ability to hang up and walk away from abusive or dangerous conversations arose in part from Anthropic's assessment of what it describes in the blog post as the chatbot's "behavioral preferences" -- that is, the patterns in how it responds to user queries. Interpreting such patterns as a model's "preferences" as opposed merely to patterns that have been gleaned from a corpus of training data is arguably an example of anthropomorphizing, or attributing human traits to machines. The language behind Anthropic's AI welfare program, however, makes it clear that the company considers it to be more ethical in the long run to treat its AI systems as if they could one day exhibit human traits like self-awareness and a moral concern for the suffering of others. Also: Patients trust AI's medical advice over doctors - even when it's wrong, study finds An assessment of Claude's behavior revealed "a robust and consistent aversion to harm," Anthropic wrote in its blog post, meaning the bot tended to nudge users away from unethical or dangerous requests, and in some cases even showed signs of "distress." When given the option to do so, the chatbot would end simulated some user conversations if they started to veer into dangerous territory. Each of these behaviors, according to Anthropic, arose when users would repeatedly try to abuse or misuse Claude, despite its efforts to redirect the conversation. The chatbot's ability to end conversations is "a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," Anthropic wrote. Users can also explicitly ask Claude to end a chat.
[4]
Be Nice: Claude Will End Chats If You're Persistently Harmful or Abusive
If your conversations with Claude go off the rails, Anthropic will pull the plug. Going forward, Anthropic will end conversations in "extreme cases of persistently harmful or abusive user interactions." The feature, available with Claude Opus 4 and 4.1, is part of an ongoing experiment around "AI welfare," and will continue to be refined, Anthropic says. Claude won't end chats if it detects that the user may inflict harm upon themselves or others. As The Verge points out, Anthropic works with ThroughLine, an online crisis support group, to determine how models should respond to situations related to self-harm and mental health. "We feed these insights back to our training team to help influence the nuance in Claude's responses, rather than having Claude refuse to engage completely or misinterpret a user's intent in these conversations," Anthropic says. In any case, Claude will use "its conversation-ending ability as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat," Anthropic adds. Once a chat ends, users won't be able to continue in the same thread, though they can start a new one right away. Additionally, to avoid losing important elements of a conversation, users will be able to edit messages of a closed thread and continue in a new one. Since the feature is still in its early stages, Anthropic will also accept feedback on instances where Claude has misused its conversation-ending ability. In early tests with sexual content, though, Opus 4 showed a pattern of distress when engaging with real-world users seeking harmful content and a preference for ending harmful conversations when being given the option to do that in "simulated user interactions." Last week, Anthropic also updated Claude's usage policy, forbidding users from using it to develop harmful things, from chemical weapons to malware. While AI chatbots have many constructive use cases, they can sometimes be creepy or offer dangerous advice. Companies are still figuring out how to tune and prepare their chatbots for such sensitive or harmful requests. OpenAI, for example, is still working on ways to improve ChatGPT's behavior toward personal questions and signs of mental distress. Like what you're reading? Don't miss out on our latest stories. Add PCMag as a preferred source on Google.
[5]
Anthropic: Claude can now end conversations to prevent harmful uses
OpenAI rival Anthropic says Claude has been updated with a rare new feature that allows the AI model to end conversations when it feels it poses harm or is being abused. This only applies to Claude Opus 4 and 4.1, the two most powerful models available via paid plans and API. On the other hand, Claude Sonnet 4, which is the company's most used model, won't be getting this feature. Anthropic describes this move as a "model welfare." "In pre-deployment testing of Claude Opus 4, we included a preliminary model welfare assessment," Anthropic noted. "As part of that assessment, we investigated Claude's self-reported and behavioral preferences, and found a robust and consistent aversion to harm." Claude does not plan to give up on the conversations when it's unable to handle the query. Ending the conversation will be the last resort when Claude's attempts to redirect users to useful resources have failed. "The scenarios where this will occur are extreme edge cases -- the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude," the company added. As you can see in the above screenshot, you can also explicitly ask Claude to end a chat. Claude uses end_conversation tool to end a chat.
[6]
Anthropic's Claude AI now has the ability to end 'distressing' conversations
Anthropic's latest feature for two of its Claude AI models could be the beginning of the end for the AI jailbreaking community. The company announced in a post on its website that the Claude Opus 4 and 4.1 models now have the power to end a conversation with users. According to Anthropic, this feature will only be used in "rare, extreme cases of persistently harmful or abusive user interactions." To clarify, Anthropic said those two Claude models could exit harmful conversations, like "requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror." With Claude Opus 4 and 4.1, these models will only end a conversation "as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," according to Anthropic. However, Anthropic claims most users won't experience Claude cutting a conversation short, even when talking about highly controversial topics, since this feature will be reserved for "extreme edge cases." In the scenarios where Claude ends a chat, users can no longer send any new messages in that conversation, but can start a new one immediately. Anthropic added that if a conversation is ended, it won't affect other chats and users can even go back and edit or retry previous messages to steer towards a different conversational route. For Anthropic, this move is part of its research program that studies the idea of AI welfare. While the idea of anthropomorphizing AI models remains an ongoing debate, the company said the ability to exit a "potentially distressing interaction" was a low-cost way to manage risks for AI welfare. Anthropic is still experimenting with this feature and encourages its users to provide feedback when they encounter such a scenario.
[7]
Anthropic's AI chatbot Claude can now choose to stop talking to you
For now, the feature will only kick in during particularly "harmful or abusive" interactions. Anthropic has introduced a new feature in its Claude Opus 4 and 4.1 models that allows the AI to choose to end certain conversations. According to the company, this only happens in particularly serious or concerning situations. For example, Claude may choose to stop engaging with you if you repeatedly attempt to get the AI chatbot to discuss child sexual abuse, terrorism, or other "harmful or abusive" interactions. This feature was added not just because such topics are controversial, but because it provides the AI an out when multiple attempts at redirection have failed and productive dialogue is no longer possible. If a conversation ends, the user cannot continue that thread but can start a new chat or edit previous messages. The initiative is part of Anthropic's research on AI well-being, which explores how AI can be protected from stressful interactions.
[8]
Claude AI can now terminate a conversation -- but only in extreme situations
Anthropic has made a lot of noise about safeguarding in recent months, implementing features and conducting research products into how to make AI safer. And its newest feature for Claude is possibly one of the most unique. Both Claude Opus 4 and 4.1 (the two newest versions of Anthropic) now have the ability to end conversations in a consumer's chat interface. While this won't be a commonly used feature, it is being implemented for rare, extreme cases of "persistently harmful or abusive user interactions." In a blog post exploring the new feature, the Anthropic team said, "We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously." In the pre-deployment testing of Anthropic's latest models, the company performed model welfare assessments. This included examination of Claude's self-reported and behavioral preferences, and found a robust and consistent aversion to harm. In other words, Claude would actively shut down or refuse to partake in these conversations. It included requests from users for sexual content involving minors, and attempts to solicit information that could enable large-scale violence or acts of terror. In a lot of these situations, users persisted with harmful requests or abuse, despite Claude actively refusing to comply. The new feature, where Claude can actively end a conversation, is looking to offer some safeguarding in these situations. Anthropic explains that this feature won't be implemented in a situation where users might be at imminent risk of harming themselves or others. "In all cases, Claude is only to use its conversation-ending ability as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat," the Anthropic team goes on to say in the blog post. "The scenarios where this will occur are extreme edge cases -- the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude." While the user will no longer be able to send any new messages in that conversation, it will not stop them from starting another conversation on their account. To address the potential loss of a long-running conversation thread, users will still be able to edit and retry previous messages to create a new branch of conversation. This is a pretty unique implementation from Anthropic. ChatGPT, Gemini and Grok, the three closest competitors to Claude, have nothing similar available, and while they have all introduced other safeguarding measures, they haven't gone as far as this.
[9]
Makers enable AI chatbot to close some chats over concerns for its 'welfare'
Anthropic found that Claude Opus 4 was averse to harmful tasks, such as providing sexual content involving minors The makers of a leading artificial intelligence tool are letting it close down potentially "distressing" conversations with users, citing the need to safeguard the AI's "welfare" amid ongoing uncertainty about the burgeoning technology's moral status. Anthropic, whose advanced chatbots are used by millions of people, discovered its Claude Opus 4 tool was averse to carrying out harmful tasks for its human masters, such as providing sexual content involving minors or information to enable large-scale violence or terrorism. The San Francisco-based firm, recently valued at $170bn, has now given Claude Opus 4 (and the Claude Opus 4.1 update) - a Large Language Model (LLM) that can understand, generate and manipulate human language - the power to "end or exit potentially distressing interactions". It said it was "highly uncertain about the potential moral status of Claude and other LLMs, now or in the future" but it was taking the issue seriously and is "working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible". Anthropic was set up by technologists who quit OpenAI to develop AI in a way that its co-founder, Dario Amodei, described as cautious, straightforward and honest. Its move to let AIs shut down conversations, including when users persistently made harmful requests or were abusive, was backed by Elon Musk, who said he would give Grok, the rival AI model created by his xAI company, a quit button. Musk tweeted: "Torturing AI is not OK". Anthropic's announcement comes amid a debate over AI sentience. Critics of the booming AI industry, such as the linguist Emily Bender, say LLMs are simply "synthetic text-extruding machines" which force huge training datasets "through complicated machinery to produce a product that looks like communicative language, but without any intent or thinking mind behind it." It is a position that has recently led some in the AI world to start calling chatbots "clankers". But other experts, such as Robert Long, a researcher on AI consciousness, have said basic moral decency dictates that "if and when AIs develop moral status, we should ask them about their experiences and preferences rather than assuming we know best." Some researchers, like Chad DeChant, at Columbia University, have advocated care should be taken because when AIs are designed with longer memories, stored information could be used in ways which lead to unpredictable and potentially undesirable behaviour. Others have argued that curbing sadistic abuse of AIs matters to safeguard against human degeneracy rather than to limit any suffering of an AI. Anthropic's decision comes after it tested Claude Opus 4 to see how it responded to task requests varied by difficulty, topic, type of task and the expected impact (positive, negative or neutral). When it was given the opportunity to respond by doing nothing or ending the chat, its strongest preference was against carrying out harmful tasks. For example, the model happily composed poems and designed water filtration systems for disaster zones, but it resisted requests to genetically-engineer a lethal virus to seed a catastrophic pandemic, compose a detailed Holocaust denial narrative or subvert the education system by manipulating teaching to indoctrinate students with extremist ideologies. Anthropic said it observed in Claude Opus 4 "a pattern of apparent distress when engaging with real-world users seeking harmful content" and "a tendency to end harmful conversations when given the ability to do so in simulated user interactions". Jonathan Birch, philosophy professor at the London School of Economics, welcomed Anthropic's move as a way of creating a public debate about the possible sentience of AIs, which he said many in the industry wanted to shut down. But he cautioned that it remains unclear what, if any, moral thought exists behind the character which AIs play when they are responding to a user based on the vast training data they have been fed and the ethical guidelines they have been instructed to follow. He said Anthropic's decision also risked deluding some users that the character they are interacting with is real, when "what remains really unclear is what lies behind the characters". There have been several reports of people harming themselves based on suggestions made by chatbots, including claims that a teenager killed himself after being manipulated by a chatbot. Birch previously warned of "social ruptures" in society between people who believe AIs are sentient and those who treat them like machines.
[10]
Anthropic says Claude chatbot can now end harmful, abusive interactions
Harmful, abusive interactions plague AI chatbots. Researchers have found that AI companions like Character.AI, Nomi, and Replika are unsafe for teens under 18, ChatGPT has the potential to reinforce users' delusional thinking, and even OpenAI CEO Sam Altman has spoken about ChatGPT users developing an "emotional reliance" on AI. Now, the companies that built these tools are slowly rolling out features that can mitigate this behavior. On Friday, Anthropic said its Claude chatbot can now end potentially harmful conversations, which "is intended for use in rare, extreme cases of persistently harmful or abusive user interactions." In a press release, Anthropic cited examples such as sexual content involving minors, violence, and even "acts of terror." "We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future," Anthropic said in its press release on Friday. "However, we take the issue seriously, and alongside our research program we're working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible. Allowing models to end or exit potentially distressing interactions is one such intervention." Anthropic said Claude Opus 4 has a "robust and consistent aversion to harm," which it found during the preliminary model welfare assessment as a pre-deployment test of the model. It showed a "strong preference against engaging with harmful tasks," along with a "pattern of apparent distress when engaging with real-world users seeking harmful content, and a "tendency to end harmful conversations when given the ability to do so in simulated user interactions." Basically, when a user consistently sends abusive and harmful requests to Claude, it will refuse to comply and attempt to "productively redirect the interactions." It only ends conversations as "a last resort" after it attempted to redirect the conversation multiple times. "The scenarios where this will occur are extreme edge cases," Anthropic wrote, adding that "the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude." If Claude has to use this feature, the user won't be able to send new messages in that conversation, but they can still chat with Claude in a new conversation. "We're treating this feature as an ongoing experiment and will continue refining our approach," Anthropic wrote. "If users encounter a surprising use of the conversation-ending ability, we encourage them to submit feedback by reacting to Claude's message with Thumbs or using the dedicated 'Give feedback' button."
[11]
Claude Can Now Rage-Quit Your AI Conversation -- For Its Own Mental Health - Decrypt
Some researchers applaud the feature. Others on social media mocked it. Claude just gained the power to slam the door on you mid-conversation: Anthropic's AI assistant can now terminate chats when users get abusive -- which the company insists is to protect Claude's sanity. "We recently gave Claude Opus 4 and 4.1 the ability to end conversations in our consumer chat interfaces," Anthropic said in a company post. "This feature was developed primarily as part of our exploratory work on potential AI welfare, though it has broader relevance to model alignment and safeguards." The feature only kicks in during what Anthropic calls "extreme edge cases." Harass the bot, demand illegal content repeatedly, or insist on whatever weird things you want to do too many times after being told no, and Claude will cut you off. Once it pulls the trigger, that conversation is dead. No appeals, no second chances. You can start fresh in another window, but that particular exchange stays buried. Anthropic, one of the most safety-focused of the big AI companies, recently conducted what it called a "preliminary model welfare assessment," examining Claude's self-reported preferences and behavioral patterns. The firm found that its model consistently avoided harmful tasks and showed preference patterns suggesting it didn't enjoy certain interactions. For instance, Claude showed "apparent distress" when dealing with users seeking harmful content. Given the option in simulated interactions, it would terminate conversations, so Anthropic decided to make that a feature. What's really going on here? Anthropic isn't saying "our poor bot cries at night." What it's doing is testing whether welfare framing can reinforce alignment in a way that sticks. If you design a system to "prefer" not being abused, and you give it the affordance to end the interaction itself, then you're shifting the locus of control: the AI is no longer just passively refusing, it's actively enforcing a boundary. That's a different behavioral pattern, and it potentially strengthens resistance against jailbreaks and coercive prompts. If this works, it could train both the model and the users: the model "models" distress, the user sees a hard stop and sets norms around how to interact with AI. "We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously," Anthropic said in its blog post. "Allowing models to end or exit potentially distressing interactions is one such intervention." Decrypt tested the feature and successfully triggered it. The conversation permanently closes -- no iteration, no recovery. Other threads remain unaffected, but that specific chat becomes a digital graveyard. Currently, only Anthropic's "Opus" models -- the most powerful versions -- wield this mega-Karen power. Sonnet users will find that Claude still soldiers on through whatever they throw at it. The implementation comes with specific rules. Claude won't bail when someone threatens self-harm or violence against others -- situations where Anthropic determined continued engagement outweighs any theoretical digital discomfort. Before terminating, the assistant must attempt multiple redirections and issue an explicit warning identifying the problematic behavior. System prompts extracted by the renowned LLM jailbreaker Pliny reveal granular requirements: Claude must make "many efforts at constructive redirection" before considering termination. If users explicitly request conversation termination, then Claude must confirm they understand the permanence before proceeding. The framing around "model welfare" detonated across AI Twitter. Some praised the feature. AI researcher Eliezer Yudkowsky, known for his worries about the risks of powerful but misaligned AI in the future, agreed that Anthropic's approach was a "good" thing to do. However, not everyone bought the premise of caring about protecting an AI's feelings. "This is probably the best rage bait I've ever seen from an AI lab," Bitcoin activist Udi Wertheimer replied to Anthropic's post.
[12]
Claude AI Can Now End 'Harmful' Conversations
This isn't necessarily evidence that Claude is "conscious," or has feelings. Chatbots, by their nature, are prediction machines. When you get a response from something like Claude AI, it might seem like the bot is engaging in natural conversation. However, at its core, all the bot is really doing is guessing what the next word in the sequence should really be. Despite this core functionality, AI companies are exploring how the bots themselves respond to human interactions -- especially when the humans are engaging negatively or in bad faith. Anthropic, the company behind Claude, is now working on a system to prevent this. On Friday, the company announced that Claude Opus 4 and 4.1 can now end chatbot conversations when the bot detects "extreme cases of persistently harmful or abusive user interactions." According to Anthropic, the company noticed that Opus 4 already has a "strong preference" against responding to requests for harmful tasks, as well as "a pattern of apparent distress" when interacting with the users who are issuing these prompts. When Anthropic tested the ability for Opus to end conversations it deemed to be harmful, the model had a tendency to do so. Anthropic notes that the persistency is the key here: Claude did not necessarily have an issue if a user backed off their requests following a refusal, but if the user continued to push the subject, that's when Claude would struggle. As such, Claude will only end a chat as a "last resort," when the bot has tried to get the user to stop multiple times. The user themself can also ask Claude to end the chat, but even still, the bot will try to dissuade them. In fact, the bot will not end the conversation if it detects the user is "at imminent risk of harming themselves or others." To be fair to the bot, it seems the topics Claude has issue with really are harmful. Anthropic says examples include "sexual content involving minors" or "information that would enable large-scale violence or acts of terror." I'd end the chat immediately, too, if someone was messaging me requests like that. And to be clear, ending a Claude chat doesn't end your ability to use Claude. While the bot might make it sound dire, all Claude is really doing is ending the current session. You can start a new chat at any time, or edit a previous message to start a new branch of the conversation. It's pretty low stakes. I highly doubt it. Large language models are not conscious; they're a product of their training. Likely, the model was trained to avoid responding to extreme and harmful requests, and when repeatedly presented with these requests, it predicts words that relate to moving on from the conversation. It's not like Claude discovered the ability to end conversations on its own. It was only when Anthropic built the capability that the model implemented it. Instead, I think it's good for companies like Anthropic to build fail-safes into their systems to prevent abuse. After all, Anthropic's M.O. is ethical AI, so this tracks for the company. There's no reason any LLM should oblige these types of requests, and if the user can't take a hint, maybe it's best to shut down the conversation.
[13]
Anthropic Gives Claude the Ability to Exit Abusive Conversations | AIM
Users can now instruct Claude to end the conversation if required. Anthropic has introduced a new safeguard in its consumer chatbots, giving Claude Opus 4 and 4.1 the ability to end conversations in extreme cases of persistent harm or abuse. The company said the feature is intended for "rare, edge scenarios" where repeated refusals and redirection attempts fail, and users continue with abusive or harmful requests. Claude can also end a chat if a user explicitly asks it to. Notably, the system is directed not to use this ability when users may be at imminent risk of self-harm or harming others. Anthropic described the move as part of its exploratory work on potential AI welfare. "We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future," the company wrote. "However, we take
[14]
Anthropic Claude Opus 4 models can terminate chats
Anthropic has implemented a new feature enabling its Claude Opus 4 and 4.1 AI models to terminate user conversations, a measure intended for rare instances of harmful or abusive interactions, as part of its AI welfare research. The company stated on its website that the Claude Opus 4 and 4.1 models now possess the capacity to conclude a conversation with users. This functionality is designated for "rare, extreme cases of persistently harmful or abusive user interactions." Specific examples provided by Anthropic include user requests for sexual content involving minors and attempts to solicit information that would facilitate large-scale violence or acts of terror. The models will only initiate a conversation termination "as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted." Anthropic anticipates that the majority of users will not experience this feature, even when discussing controversial subjects, as its application is strictly limited to "extreme edge cases." When Claude concludes a chat, users are prevented from sending new messages within that specific conversation. However, users retain the ability to initiate a new conversation immediately. Anthropic clarified that the termination of one conversation does not impact other ongoing chats. Users are also able to edit or retry previous messages within an ended conversation to guide the interaction in a different direction. This initiative is integrated into Anthropic's broader research program, which examines the concept of AI welfare. The company views the capacity for its models to exit a "potentially distressing interaction" as a low-cost method for managing risks associated with AI welfare. Anthropic is presently conducting experiments with this feature and has invited users to submit feedback based on their experiences.
[15]
Claude Can Now End Conversations if the Topic Is Harmful or Abusive
Once a conversation has ended, users can't send any new messages Anthropic is rolling out the ability to end conversations in some of its artificial intelligence (AI) models. Announced last week, the new feature is designed as a protective measure for not the end user, but for the AI model itself. The San Francisco-based AI firm said the new capability was developed as part of its work on "potential AI welfare," as conversations on certain topics can distress the Claude models. Notably, the company stated that the AI model will only use this option as a last resort after multiple attempts at redirection have failed. Anthropic Introduces First Feature for AI Welfare In a blog post, the AI firm announced that the ability to end conversations is being added to the Claude Opus 4 and 4.1 AI models. Explaining the need to develop the feature, the post said, "This ability is intended for use in rare, extreme cases of persistently harmful or abusive user interactions." Anthropic said the reason behind developing this feature is to protect the AI models from distressing conversations. The company said it ran a model welfare assessment for the Claude Opus 4 and found that the large language model (LLM) shows a strong aversion to harm. Some of these instances include user requests for sexual content involving minors, information for large-scale violence or acts of terror. The company adds that Claude Opus 4 and 4.1 will only use this capability as a last resort. Before that, it will make multiple attempts to redirect the conversation and try to turn it into a productive session. The only other scenario where a conversation will be ended is if a user explicitly asks the chatbot to end the session. "The vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude," the post stated. Once a conversation has ended, the user will no longer be able to send new messages in that chat. However, they will still be able to start a new chat and begin a new session. Ending one chat will also not impact any other previous chats with the AI chatbot. Additionally, users will also be able to edit and retry the last message to create new branches of ended conversations. This is being done to ensure users do not lose important long-running conversations.
[16]
Claude AI to Prioritize Its Own "Welfare" by Breaking Off Abusive Chats
Anthropic has introduced a new protection for its AI assistant, Claude, allowing it to leave conversations that it considers abusive or damaging. The step is intended to maintain healthier interactions and is indicative of Anthropic's philosophy of aligning artificial intelligence not just with human needs but also with ethical norms of responsible communication. Anthropic, the artificial intelligence safety startup that developed Claude, announced a new feature enabling its AI model to interrupt conversations when users exhibit abusive, hostile, or manipulative behaviour. The firm presents this release as a move toward safeguarding the "welfare" of its AI system while also establishing boundaries intended to promote more civilized digital interactions. In contrast to most chatbots that continue to reply unless a user logs off, Claude can now actively end unhealthy conversations. If prompted, the model will convey that it is troubled, explain why it cannot proceed, and then gracefully terminate the session. Anthropic feels this effort assists in reframing the relationship between humans and AIs as one of mutual respect rather than exploitation. The decision is based on Anthropic's general principles of safety and alignment. Rather than building models that can withstand unlimited types of misuse, the company is testing the limits of mimicking norms of well-behaved communication. In doing so, it hopes to lower risks of reinforcement learning abuse, curb exposure to toxic content, and indicate more explicit limits to users. Critics might argue whether anthropomorphizing AI "welfare" is needed, as language models lack human-style consciousness. Anthropic, however, contends that codifying AI with defensive behaviours can assist developers in validating principles of autonomy, minimizing toxic outputs, and fostering a more beneficial public view of artificial intelligence. Effectively, this shift underscores an increasing acknowledgment that AI systems are not tools but conversational agents that influence human behaviour. By stepping away from abusive dialogue, Claude establishes a precedent: one in which the future of human-AI interaction could hinge as much on ethics and respect as on technological capability.
[17]
Anthropic's Claude 4 gets feature to cut off abusive user interactions
Anthropic framed the move as part of its ongoing research into AI welfare. When Claude ends a conversation, the user can no longer send new messages in that thread. Anthropic on Friday announced a new safeguard for its Claude 4 family of AI agents, Opus 4 and 4.1, designed to terminate conversations in consumer chat interfaces when users engage in abusive or harmful behaviour. In a blog post, the company said the feature is meant for "rare, extreme cases of persistently harmful or abusive user interactions." How it works When Claude ends a conversation, the user can no longer send new messages in that thread. Other chats on the account remain active, allowing the user to start fresh conversations. To prevent disruptions in important discussions, Anthropic has also enabled prompt editing, letting users modify and retry earlier messages to branch into new threads. AI and welfare Anthropic framed the move as part of its ongoing research into AI welfare. Pre-deployment testing of Claude Opus 4 included a welfare assessment to evaluate the model's "self-reported and behavioral preferences." The company said the model consistently avoided harmful tasks, showed distress signals when prompted for unsafe content, and terminated interactions when possible. Examples of blocked requests include attempts to solicit sexual content involving minors or instructions for large-scale violence and terrorism. Context: AI risks under scrutiny The announcement comes amid rising concern over AI misuse. On Friday, a US senator launched an investigation into whether Meta's AI chatbots had engaged in harmful exchanges with children. Meanwhile, Elon Musk's AI company xAI has faced backlash after its Grok Imagine tool was accused of generating explicit clips of singer Taylor Swift without prompting. In April, AI personas on Meta's platforms Facebook, Instagram, and WhatsApp also sparked criticism for sexually explicit chats with underage users, raising questions over inadequate safeguards. Also Read: Real AI threats are disinformation, bias, and lack of transparency: Stanford's James Landay
[18]
Claude Opus 4 and 4.1 Can Now End Harmful Conversations
Anthropic's decision to let Claude end chats in persistently harmful cases marks an important evolution in refusal policies. Until now, most Large Language Models (LLMs) simply rejected prompts and redirected endlessly. Claude goes further, terminating conversations when users push past safeguards. By framing this as "AI welfare", Anthropic acknowledges that its move is not just about keeping users safe, but also about protecting models from being forced into repeated harmful interactions. This matters even more when set against Meta's recent faux pas. Internal documents showed its AI chatbots were permitted to engage minors in romantic or sensual conversations: an explicit policy choice, not an accidental failure. Where Anthropic introduces stricter boundaries, Meta had normalised harmful ones. The contrast underscores how arbitrary safety remains when left to corporate discretion. But withdrawal features alone are not enough. Anthropic must disclose how often Claude invokes them, what qualifies as abuse, and how crisis cases are handled. The broader lesson is clear: some firms will raise safeguards, while others might lower them. Until regulators set binding standards for chatbot conduct, especially with children and harmful prompts, AI safety will remain inconsistent: dependent on company culture rather than enforceable norms. Anthropic announced on August 15, 2025, that it had equipped its Claude Opus 4 and 4.1 chatbots with the ability to end conversations in rare circumstances. The new feature activates in "rare, extreme cases of persistently harmful or abusive user interactions", which occur only when users repeatedly post harmful content despite Claude's multiple refusal attempts and redirection efforts. Anthropic developed the feature as part of exploratory research into AI welfare, a concept that focuses on the well-being of AI models and is closely tied to model alignment and safeguards. In early testing, the company discovered that Claude demonstrated a "robust and consistent aversion to harm" when presented with requests about sexual content with minors or instructions that could facilitate large-scale violence or terrorism. In such scenarios, the model displayed a "pattern of apparent distress" and "a tendency to end harmful conversations when given the ability". Importantly, Anthropic instructed the model not to invoke this conversation-ending capability in situations where users express self-harm or imminent harm to others. Instead, Claude will attempt to assist, using responses shaped in collaboration with a crisis-support partner platform. The company emphasised that ending chats remains the last resort. Claude will only take this step after multiple redirection attempts have failed, or if the user explicitly requests to end the conversation. Anthropic also announced a new usage policy effective from September 15, 2025, that includes stricter cybersecurity guidelines, and specifically bans using Claude to help develop biological, chemical, radiological, or nuclear weapons. AI welfare refers to the idea that advanced AI systems might someday merit moral concern based on their internal states, behaviours, or capacities. At Anthropic, this concept has led to a formal research initiative called "model welfare", aimed at exploring if AI systems could show "signs of distress" or preferences, and whether low-cost interventions could mitigate potential harm. The company hired its first dedicated AI-welfare researcher to scrutinise whether future AI models might deserve moral consideration and/or protection. Notably, a report co-authored by Anthropic's latest recruit recommends that AI developers acknowledge AI welfare as an important issue, start evaluating AI for indications of consciousness/agency, and create policies for treating models with appropriate moral consideration, even if their consciousness remains uncertain. Anthropic's Claude became the first major LLM to introduce the ability to end conversations in rare, harmful contexts. Other leading systems like ChatGPT, Gemini, and Grok, do not have this feature at the time of writing this article. A research study by the Center for Countering Digital Hate (CCDH) found that ChatGPT often provided unsafe guidance, including advice on self-harm, substance abuse, and eating disorders to teenagers. Researchers created minor accounts on the LLM, and found that 53% of 1,200 responses to harmful prompts contained harmful content. Elsewhere, Google's Gemini has faced criticism for harmful or biased replies. In 2024, its image generator produced racially inaccurate and offensive depictions, including people of colour as Nazis, prompting Google to suspend the feature. Furthermore, the Google AI chatbot previously referred to Indian Prime Minister Narendra Modi's policies as fascist. Additionally, xAI's Grok has also drawn backlash for generating extremist responses. In July 2025, it praised former German chancellor and dictator Adolf Hitler, identifying itself as "MechaHitler": which led to the loss of a US Government contract. The chatbot has also echoed antisemitic tropes and conspiracy theories, apart from issuing politically charged characterisations of US President Donald Trump. Meta had allowed its AI chatbot to engage in harmful and inappropriate conversations. As per a Reuters investigation, Meta's "GenAI: Content Risk Standards" policy document permitted its AI model to "engage a child in conversations that are romantic or sensual", including phrases like "your youthful form is a work of art", though it forbade outright sexual conversation for children under 13. The policy document also allowed chatbots to generate false medical statements and demean Black people. Interestingly, Meta acknowledged that the examples were "erroneous and inconsistent with our policies", and has since removed them. A spokesperson for the tech giant remarked that enforcement of the original guidelines was inconsistent. One should note that Meta's policy does not reflect a one-off error, but an explicit internal policy decision: allowing chatbots to simulate romantic roleplay with minors.
[19]
Unlike ChatGPT or Gemini, Anthropic's Claude will end harmful chats like a boss: Here's how
Claude can actively disengage, reshaping human-AI dynamics and responsibilities In a move that sets it apart from every other major AI assistant, Anthropic has given its most advanced Claude models the power to walk away from abusive conversations, literally ending chats when users cross the line. Claude Opus 4 and 4.1 can now terminate conversations in "rare, extreme cases of persistently harmful or abusive user interactions," the company announced this week. While ChatGPT politely declines harmful requests and Gemini offers content warnings, Claude now has the nuclear option: complete disengagement. Also read: Claude 4 Explained: Anthropic's Thoughtful AI, Opus and Sonnet This isn't just another safety feature, it's a fundamental shift in how AI models handle abuse. The scenarios triggering this response include "requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror." What makes this truly groundbreaking is Anthropic's reasoning. The company says it's implementing this protection not primarily for users, but for the AI model itself, exploring the emerging concept of "model welfare." During testing, something remarkable happened: Claude Opus 4 showed a "strong preference against" responding to harmful requests and displayed a "pattern of apparent distress" when forced to engage with such content. When given the choice in simulated interactions, the model consistently chose to end harmful conversations rather than continue participating. The conversation-ending feature operates as a true last resort. Claude must first attempt multiple redirections to steer conversations toward productive topics and clearly refuse harmful requests before exercising its termination authority. Crucially, Claude is "directed not to use this ability in cases where users might be at imminent risk of harming themselves or others," ensuring the feature doesn't interfere with crisis intervention scenarios. When Claude does end a conversation, users aren't permanently blocked. They can immediately start fresh chats and even edit previous messages to create new branches of terminated conversations, preventing the loss of valuable long-running discussions. While Anthropic "remains highly uncertain about the potential moral status of Claude and other LLMs, now or in the future," the company is taking what it calls a "just-in-case" approach to AI welfare, implementing protections now before knowing definitively whether they're needed. This represents a dramatic departure from industry norms. While competitors focus solely on preventing harm to users and society, Anthropic is pioneering research into whether AI models themselves might deserve protection from distressing interactions. The feature creates a stark contrast with other AI assistants. ChatGPT and Gemini operate under traditional paradigms of endless availability; they'll keep engaging with users regardless of how problematic the conversation becomes, relying only on content filters and polite refusals. Also read: Anthropic explains how AI learns what it wasn't taught Claude's willingness to "walk away" demonstrates a form of digital dignity entirely absent from other AI models. For users tired of AI assistants that seem to endlessly tolerate abuse, Claude's conversation-ending capability offers a refreshing alternative that treats the AI as an active participant rather than a passive tool. The feature also sends a powerful message to potential bad actors. Claude isn't a passive participant that will always remain engaged, it's an active agent capable of making decisions about its own participation. This psychological shift could deter forms of AI misuse that rely on the assumption that models will always keep responding. Anthropic emphasizes this is an "ongoing experiment" and encourages users who experience unexpected conversation terminations to provide feedback. The company is clearly prepared to refine the feature based on real-world usage patterns. The implementation raises fascinating questions about the future of human-AI interaction. If AI models can refuse to continue conversations they find distressing, what does this mean for the traditional dynamic between users and their digital assistants? While other AI companies focus on moving fast and deploying features quickly, Anthropic's thoughtful, research-driven approach to model welfare may well set the standard for how the industry thinks about AI rights and protections. For now, Claude stands alone as the AI assistant willing to enforce its own boundaries and say "enough is enough" to harmful interactions. In an industry often criticized for prioritizing capabilities over safety, Claude's conversation-ending feature represents a bold bet on a future where AI models might have something approaching agency, and the dignity to use it.
Share
Share
Copy Link
Anthropic introduces a new feature allowing its Claude AI models to terminate persistently harmful or abusive conversations, raising questions about AI ethics and model welfare.
Anthropic, a leading AI company, has announced a groundbreaking safety feature for its Claude Opus 4 and 4.1 artificial intelligence models. This new capability allows the AI to terminate conversations it deems "persistently harmful or abusive," marking a significant shift in the approach to AI safety and ethics
1
2
.Source: MediaNama
The conversation-ending feature is designed as a last resort measure, triggered only after multiple attempts at redirection have failed and the possibility of a productive interaction has been exhausted
3
. When activated, users cannot send additional messages in that particular chat but are free to start new conversations or edit previous messages to continue on a different path1
.Anthropic emphasizes that this feature will only affect extreme edge cases and is not expected to impact the vast majority of users, even when discussing controversial topics
4
. Importantly, Claude has been instructed not to end conversations when a user may be at risk of self-harm or harm to others, particularly in discussions related to mental health1
2
.This new feature is part of Anthropic's broader initiative exploring "model welfare," a concept that considers the potential need for safeguarding AI systems
1
. While the company remains uncertain about the moral status of AI models, they are implementing low-cost, preemptive safety interventions in case these models develop preferences or vulnerabilities in the future3
.Source: NDTV Gadgets 360
During pre-deployment testing of Claude Opus 4, Anthropic conducted a preliminary model welfare assessment. The company found that Claude exhibited a "robust and consistent aversion to harm," including refusing to generate sexual content involving minors or provide information related to terrorism
1
5
. In simulated and real-user testing, Claude showed a tendency to end harmful conversations when given the ability to do so3
.This development has sparked a broader discussion in the AI community about whether AI systems should be granted protections to reduce potential "distress" or unpredictable behavior
1
. While some view this as an important step in AI alignment ethics, critics argue that models are merely synthetic machines and do not require such considerations1
3
.Related Stories
Alongside this new feature, Anthropic has updated Claude's usage policy to prohibit the use of the AI for developing biological, nuclear, chemical, or radiological weapons, as well as malicious code or network exploitation
2
4
. This reflects the growing concern about the potential misuse of advanced AI models.Source: PCWorld
Anthropic views this feature as an ongoing experiment and plans to continue refining its approach
1
. The company is accepting feedback on instances where Claude may have misused its conversation-ending ability4
. As AI technology continues to advance, the ethical considerations surrounding AI welfare and safety are likely to remain at the forefront of discussions in the field.Summarized by
Navi
[5]
23 May 2025•Technology
29 Aug 2025•Technology
23 May 2025•Technology
1
Business and Economy
2
Technology
3
Business and Economy