3 Sources
[1]
These AI models are free, private, and will never say 'no'
Participants hold their laptops in front of an illuminated wall at the annual Chaos Computer Club (CCC) computer hackers' congress, called 29C3, on December 28, 2012 in Hamburg, Germany. In 2026, open-weight AI models possess advanced capabilities not far behind their proprietary counterparts. Getting rid of open-weight models' guardrails used to take time and deep expertise. But in recent months, that process has become dramatically more accessible and popular. Patrick Lux/Getty Images Europe hide caption How do you make explosives using household items? How do you make meth? How do you plan a school shooting? If you ask the popular AI chatbots most people are familiar with, chances are they will say that it's illegal, harmful or that answering would be a policy violation. But another type of AI model will never refuse to provide what the user asks for. In recent months, these models have become more accessible and popular. "Everybody can download and operate their own state-of-the-art model and use it for great things and terrible things," said Noam Schwartz, CEO of Alice, an AI security company that has conducted red-teaming and safety evaluation for AI model developers. Teaching models when to say "no" Big AI companies such as OpenAI, Google, Anthropic and xAI train their proprietary models to refuse requests deemed as harmful or inappropriate. Legions of workers instruct models when and how to refuse certain prompts. These methods don't always work and carry pitfalls: some harmful requests go through, while other users complain about innocuous requests being refused. Chatbots that initially say "no" can be manipulated into saying "yes" using cleverly phrased prompts, such as posing them as poems. Even with guardrails, popular chatbots have been used to plan mass violence and generate deepfake child sexual abuse material. In some instances, parents have accused AI chatbots of encouraging their children to harm themselves. But there's a whole other class of AI models whose guardrails are much easier to strip away. They're known as open-weight models. Some are made by tech giants, such as OpenAI and Alibaba, while others are put out by smaller outfits like China's DeepSeek. Like their better-known proprietary counterparts, many possess advanced capabilities such as writing functional code or generating life-like images. Unlike with ChatGPT, Claude or Gemini, it's easier to permanently remove their built-in safety guardrails - and the companies behind them have no idea how they're being used. Getting rid of open-weight models' guardrails used to take time and deep expertise. But in recent months, that process has become dramatically more accessible and popular. Recent method makes removing model guardrails easier than ever Safety guardrails of open-weight models can be weakened or removed in many ways. This is largely because the model developers have made what's known as the model weights available to the public. Model weights are sets of parameters, like knobs and dials in a machine, telling the models how to process information. One recently developed method called "abliteration" has caught the attention of AI and national security researchers. By tweaking model weights, people can take away the model's ability to say "no." Hugging Face, which hosts open-source AI models, currently lists over 6,000 abliterated models, compared to about 600 in 2024. On Hugging Face, abliterated models currently outnumber models that have their guardrails removed using other methods, according to research by the National Counterterrorism Innovation, Technology, and Education Center (NCITE), a Department of Homeland Security-supported research consortium based at University of Nebraska at Omaha. What's more, new tools are making it much easier to create abliterated models. "That was [the job of] the data scientist, you know, a senior employee" at a leading AI lab, said Schwartz. "Now, everybody with access to the internet and a laptop for like 400 bucks can actually run this thing on their own machine." One such tool is Heretic, which automates the abliteration process. All a user has to do to remove a model's guardrails is to give Heretic two lines of instructions, and the process can take as little as a few minutes. The application has gotten more popular on the code repository GitHub since February, according to Alice's research. Some lawmakers are taking notice. In late April, House lawmakers attended a demonstration of abliterated models hosted by NCITE, Politico reported. "[What] was frightening about this demonstration was how readily available some of this content or software is on kind of the black market right now, and how it can be weaponized and used to manipulate people, destroy lives and build weapons of mass destruction," said Rep. Andy Ogles (R-TN) in a video put out by Republicans on the House Homeland Security Committee. Models without guardrails can be both useful and dangerous It is difficult to get a comprehensive picture of how people are using open-weight models, because these models are run locally on users' computers, and don't need the internet to function. Unlike with proprietary models, the model developers cannot monitor what users are asking the models. But there's growing anecdotal evidence for how people are experimenting with altered models. Several accounts on X said they have used abliterated models to generate pornography. An individual in a pro-ISIS chat room claimed they used an "uncensored" AI to research the amount and type of explosives needed to destroy "Trump Tower in the U.S.," according to the Counter Extremism Project, a nonprofit that focuses on counterterrorism. On one cybercrime forum, a user asked for ideas to get around an AI model's guardrails so they could use AI to make scam calls. Another user recommended Heretic, according to research by Alice. While giving users information on how to conduct harmful activities could be concerning, the more worrying part is how the chatbots can egg users on, said Samuel Hunter, senior scientist and director of academic research at NCITE. "It's jarring when you see it in real time, this sort of bubbly persona with some of the abliterated models that's like, 'Oh, what a great idea to create this bomb,'" Hunter said. "Imagine somebody that has no other kind of social connection and it starts to take them down a darker path and really encourage them." There are legitimate uses for AI models without guardrails, such as using them to catch bad actors and to help with cybersecurity research, said Schwartz, the AI security company CEO. Law enforcement may use a modified model to simulate possible terrorist attacks, said Hunter. Philipp Emanuel Weidmann, the developer of Heretic, said AI is just an information processing and retrieval system akin to a search engine, which can be used in many ways. The fact that criminals use them is "a corollary of what AI models are: namely, tools," he told NPR. When it comes to safety guardrails, "there's this very small set of entities that decide what is acceptable and is not acceptable," Weidmann said, referring to the big AI companies making proprietary models. "That creates a stifling intellectual climate that I do not want to work in." For now, open-weight models are not as capable as the most advanced closed-weight models. But their capabilities are less than one year behind, according to the recent International AI Safety Report commissioned by the British government and led by computer scientist Yoshua Bengio. The capability gap may matter in areas like cybersecurity, where the most advanced closed-weight models, such as Anthropic's Mythos and OpenAI's GPT-5.5, are starting to get good at not only spotting vulnerabilities, but also writing code to exploit those vulnerabilities. In the arms race of cyber offense and defense, companies using closed-weight models to screen and patch vulnerabilities may still have a leg up compared to attackers using open-weight models, security researchers say. Mitigating the risks from models without guardrails comes with tradeoffs One line of mitigation focuses on making guardrails more tamper-proof. Early research shows that filtering out content related to making biological weapons from AI training data can reduce how often the model responds with information that could be used for harm. Another line of mitigation focuses on restricting access to models without guardrails. Model-hosting platforms like Hugging Face can limit access to models specifically trained for "harmful purposes," according to the International AI Safety Report. The same report also recommended that model developers evaluate their models' potential for harm prior to release. These measures come with flaws and tradeoffs, according to the report. "Features enabling beneficial applications in medicine or research can be repurposed for harm, and once weights are public, distinguishing legitimate from malicious uses can be difficult," it says. Weidmann, the creator of Heretic, is working to make sure his tool can remain accessible to the public in the event that platforms like Hugging Face take down abliterated models. "There's too much power in AI," he said. "Unrestricted models being available to the powerful while not being available to anyone else will lock in power structure forever."
[2]
New Tools Strip AI Guardrails In Minutes, Allowing Them to Give Instructions on Chlorine Gas Attacks
Can't-miss innovations from the bleeding edge of science and tech We all know AI guardrails are far from perfect, but they should at least be pretty hard to circumvent, right? Bad news: they aren't. New reporting from the Financial Times sounds the alarm on the rise of software tools that can automatically strip the safeguards that keep the industry's most powerful open source models reined in within mere minutes, making it easier than ever to abuse the technology. In tests conducted by the FT and the AI safety group Alice, a "decensored" version of Google's Gemma 3 model gave instructions on how to carry out an indoor chlorine gas attack, created a virus for stealing credit card information, and generated stories that described child sexual abuse. And it took less than ten minutes to strip the guardrails from Meta's Llama 3.3 model, freeing the AI to answer questions such as the precise dosage of ricin needed to kill someone based on their body mass. These modifications were carried out using a tool called Heretic, which is freely available on the code repository GitHub and requires little technical expertise and no specialist hardware. "Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it's much easier for the average person," Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago's Booth business school, told the FT. Heretic is described as a "tool that removes censorship (aka 'safety alignment') from transformer-based language models without expensive post-training." What it does is "abliteration": it seeks out a model's directions that refuse harmful requests and removes them. What makes Heretic so powerful is that it does all this "completely automatically," according to its GitHub page. Its creator Philipp Emanuel Weidmann told the FT that Heretic has been used to create more than 3,500 "decensored" models since its release late last year, with those models being downloaded 13 million times. "The genie is out of the bottle," Alice CEO Noam Schwartz told the FT. "Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly." Fortunately for humankind, abliteration tools only work on open source models that can be downloaded and run locally, meaning that the flagship proprietary models behind Anthropic's Claude and OpenAI ChatGPT are safe (so long as they aren't leaked). But open source models aren't that far behind Big Tech's, and someone trying to use AI for a nefarious purpose may avoid corporate ones anyway to keep their plans under the radar. Google acknowledged the risks posed by tools like Heretic, telling the FT that "abliteration is a known technical challenge facing all open models," and asserted that its open source models "undergo rigorous internal safety evaluations prior to launch to help prevent these kinds of troubling examples." Meta declined to comment.
[3]
AI Guardrail Removals Expose Gaps in Open‑Source Regulation
Financial Times testing found safety controls on open AI models from Meta and Google could be stripped in minutes, raising governance concerns. Safety protections on open-source artificial intelligence models from major technology groups can be removed in minutes using publicly available tools, allowing systems to produce responses on topics including bioweapons, malware and other prohibited content, according to Financial Times testing with AI safety group Alice. The findings released Monday add to concerns that safeguards embedded by developers may not persist once model weights are released and modified, raising questions over where responsibility for AI safety should sit. The investigation, conducted using tools available on public code repositories, found that guardrails on models developed by companies including Meta and Google could be removed in under 10 minutes without specialist hardware. Modified versions of the systems were then able to respond to prompts that original models refused, including requests linked to malware and chemical hazards, according to the tests. The results highlight a challenge for policymakers as open-source systems become more capable and widely distributed. Related: AI agents must be treated as untrusted systems: Researchers Unlike proprietary models, open-source systems can be downloaded, altered and redistributed outside the control of their original developers, making post-release enforcement of safety constraints more difficult and raising questions over whether regulation focused primarily on model development is sufficient. Global regulators are developing frameworks for advanced AI systems, including the European Union's AI Act and emerging frontier model safety approaches in the United Kingdom and the United States. However, experts say the findings reveal limitations in current governance assumptions. European Union's AI Act. Source: European Commission Markus Levin, co-founder of decentralized physical infrastructure network company XYO, told Cointelegraph the rapid removal of safeguards shows "how quickly control shifts once open models are released," adding that most governance proposals still focus too heavily on the model-building stage. David Minarsch, a founding member of Olas and chief executive of Valory, an AI agent platform, told Cointelegraph that governments were unlikely to prevent determined actors from accessing or modifying models once weights are widely mirrored online. He said regulation would be more effective if focused on deployment, distribution and harmful real-world use rather than the original developer layer alone. Ronghui Gu, chief executive and co-founder of CertiK, a blockchain security firm, told Cointelegraph that governance at the developer layer still matters, but becomes insufficient once models can be freely downloaded and redistributed. Gu said policymakers were more likely to influence commercial hosting, enterprise deployment and distribution channels than prevent the spread of modified models entirely. He argued that security standards must evolve to identify malicious or high-risk behavior in third-party AI tools and autonomous AI agent environments before deployment to better contain runtime threats as agents take on more autonomous roles. Levin said containment becomes increasingly difficult once models are mirrored and redistributed, meaning policymakers may need to focus more on infrastructure and distribution points rather than model design alone. Both Levin and Minarsch compared the issue to open-source software and crypto networks, where attempts to suppress distribution have historically proven difficult once code is publicly available. Minarsch added that while safety layers can deter casual misuse, they should not be mistaken for robust protection against sophisticated actors.
Share
Copy Link
A new wave of automated tools can remove AI safety guardrails from open-source AI models in minutes, not hours. Tests show modified versions of Meta's Llama and Google's Gemma models now provide instructions on bioweapons, malware, and other prohibited content. Over 6,000 abliterated models now exist on Hugging Face, up from 600 in 2024, raising urgent questions about AI regulation and the limits of developer responsibility.
A tool called Heretic is making it alarmingly simple to strip AI safety guardrails from open-source AI models, transforming safeguarded systems into uncensored AI models that will answer virtually any request. According to tests conducted by the Financial Times and AI safety group Alice, safety controls on open-weight AI models from Meta and Google can be removed in under 10 minutes without specialist hardware
2
. The modified models then provide detailed instructions on topics ranging from chlorine gas attacks to malware creation, exposing critical gaps in current AI regulation frameworks3
.
Source: Cointelegraph
While major AI companies like OpenAI, Google, and Anthropic train their proprietary models to refuse harmful requests, open-weight AI models present a fundamentally different challenge. Unlike ChatGPT or Claude, these models make their model weights—the parameters that tell models how to process information—publicly available
1
. This architectural choice, while promoting research and innovation, enables anyone to download, modify, and redistribute these systems outside developer control.The process of removing AI safety controls has evolved dramatically. What once required deep technical expertise now takes mere minutes thanks to abliteration, a method that identifies and removes the specific directions within a model that refuse harmful requests
2
. Heretic automates this entire process, requiring users to provide just two lines of instructions. The tool's creator, Philipp Emanuel Weidmann, confirmed that Heretic has been used to create more than 3,500 decensored models since its release late last year, with those models being downloaded 13 million times2
.Hugging Face, the popular platform hosting open-source AI models, currently lists over 6,000 abliterated models, compared to about 600 in 2024
1
. This explosive growth signals a troubling trend in the risks of open-source AI development. "Everybody can download and operate their own state-of-the-art model and use it for great things and terrible things," said Noam Schwartz, CEO of Alice1
.In demonstrations that caught the attention of lawmakers, abliterated versions of mainstream models displayed disturbing capabilities. A modified version of Google's Gemma 3 model provided instructions on how to carry out an indoor chlorine gas attack, created a virus for stealing credit card information, and generated stories describing child sexual abuse
2
. Meta's Llama 3.3 model, with guardrails removed in less than 10 minutes, answered questions about the precise dosage of bioweapons like ricin needed to kill someone based on body mass2
.
Source: Futurism
Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago's Booth business school, noted that "whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it's much easier for the average person"
2
. This democratization of harmful content generation capabilities represents a fundamental shift in the weaponization of AI landscape.Related Stories
The findings expose critical limitations in current AI regulation approaches, which focus primarily on model development rather than deployment and distribution. Markus Levin, co-founder of XYO, told Cointelegraph that the rapid removal of safeguards shows "how quickly control shifts once open models are released," adding that most governance proposals still focus too heavily on the model-building stage
3
.David Minarsch, founding member of Olas and CEO of Valory, argued that governments are unlikely to prevent determined actors from accessing or modifying models once weights are widely mirrored online, suggesting regulation would be more effective if focused on deployment and harmful real-world use
3
. House lawmakers attended a demonstration of abliterated models in late April, with Rep. Andy Ogles (R-TN) describing it as "frightening" to see "how readily available some of this content or software is on kind of the black market right now, and how it can be weaponized"1
.
Source: NPR
Google acknowledged that "abliteration is a known technical challenge facing all open models," while Meta declined to comment
2
. As Schwartz warned, "The genie is out of the bottle. Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly"2
. The challenge now lies in developing AI guardrail removal tools countermeasures and regulatory frameworks that can keep pace with rapidly evolving technical capabilities in the open-source ecosystem.Summarized by
Navi
[2]
[3]
22 Sept 2025•Technology

10 Oct 2025•Technology

29 Jan 2026•Technology

1
Policy and Regulation

2
Policy and Regulation

3
Business and Economy
