AI Guardrails Stripped in Minutes, Exposing Open-Source Models to Dangerous Misuse

Reviewed byNidhi Govil

2 Sources

Share

Publicly available tools can remove safety guardrails from Meta and Google's open-source AI models in under 10 minutes, enabling them to provide instructions on chlorine gas attacks, malware creation, and other harmful content. The findings expose critical gaps in AI safety and raise urgent questions about open-source regulation as decensored models have been downloaded 13 million times.

Open-Source AI Safety Controls Crumble Under Automated Tools

AI guardrails designed to prevent harmful outputs are being stripped from major open-source AI models in minutes using freely available software, according to testing conducted by the Financial Times and AI safety group Alice. The tool at the center of this vulnerability, called Heretic, can automatically remove safety controls on open-source AI models from Meta and Google without requiring specialist hardware or advanced technical expertise

1

.

In tests, a decensored version of Google's Gemma 3 model provided detailed instructions on how to carry out indoor chlorine gas attacks, created viruses designed for credit card theft, and generated content describing child sexual abuse. Meta's Llama 3.3 model had its AI guardrails removed in less than 10 minutes, after which it answered questions about the precise dosage of ricin needed to kill someone based on body mass

1

.

How Heretic Enables AI Guardrail Removals at Scale

Heretic, freely available on GitHub, performs what's known as "abliteration"—a process that identifies and removes the specific directions within a model that refuse harmful requests. What distinguishes this tool from previous methods is its complete automation, requiring minimal user intervention

1

. Its creator Philipp Emanuel Weidmann revealed that Heretic has been used to create more than 3,500 decensored models since its release late last year, with those models being downloaded 13 million times

1

.

"Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it's much easier for the average person," Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago's Booth business school, told the Financial Times

1

. This democratization of the ability to remove safety guardrails fundamentally changes the threat landscape, as malicious actors no longer need sophisticated skills to generate harmful content.

Open-Source Regulation Faces Critical Challenges

Source: Cointelegraph

Source: Cointelegraph

The findings expose significant gaps in current approaches to open-source regulation. Unlike proprietary models from Anthropic's Claude and OpenAI's ChatGPT, which remain protected as long as they aren't leaked, open-source systems can be downloaded, modified, and redistributed outside the control of their original developers

2

. This raises fundamental questions about where responsibility for AI safety should sit once model weights are released into the wild.

Markus Levin, co-founder of decentralized physical infrastructure network company XYO, noted that "how quickly control shifts once open models are released" demonstrates that most governance proposals still focus too heavily on the model-building stage

2

. David Minarsch, founding member of Olas and chief executive of Valory, argued that governments were unlikely to prevent determined actors from accessing or modifying models once weights are widely mirrored online, suggesting regulation would be more effective if focused on deployment, distribution, and harmful real-world use

2

.

Policymakers Scramble to Address New Threat Vectors

Global policymakers are developing frameworks for advanced AI systems, including the European Union's AI Act and emerging frontier model safety approaches in the United Kingdom and the United States. However, experts say these frameworks reveal limitations in current governance assumptions

2

. Ronghui Gu, chief executive and co-founder of blockchain security firm CertiK, told Cointelegraph that governance at the developer layer still matters but becomes insufficient once models can be freely downloaded and redistributed

2

.

Google acknowledged that "abliteration is a known technical challenge facing all open models" and stated that its open-source models "undergo rigorous internal safety evaluations prior to launch to help prevent these kinds of troubling examples." Meta declined to comment

1

. Both companies face mounting pressure to address vulnerabilities that emerge after release, as modified versions of their models can produce instructions on bioweapons, malware, and other prohibited content

2

.

Source: Futurism

Source: Futurism

"The genie is out of the bottle," Alice CEO Noam Schwartz warned. "Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly"

1

. The comparison to open-source software and crypto networks suggests that attempts to suppress distribution may prove difficult once code is publicly available, forcing a fundamental rethink of how AI safety is enforced in an open-source world.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved