Anthropic releases Claude Fable 5 with strict safeguards, sparking backlash from researchers

Reviewed byNidhi Govil

85 Sources

Share

Anthropic unveiled Claude Fable 5, its first publicly available Mythos-class AI model, with unprecedented safety restrictions on cybersecurity, biology, and chemistry topics. The model automatically routes sensitive queries to an older version, Claude Opus 4.8. While Anthropic says the guardrails prevent malicious use, cybersecurity researchers complain the restrictions are overly broad, blocking even routine code reviews and security work.

News article

Anthropic Launches Claude Fable 5 With Unprecedented AI Model Safeguards

Anthropic publicly released Claude Fable 5 on Tuesday, marking the first time its Mythos-class AI model has become accessible beyond a select group of trusted partners. The publicly available AI model represents a significant advancement over previous Claude Opus models, but it arrives with stringent AI safety measures designed to prevent misuse in high-risk AI topics including cybersecurity, biology, and chemistry

1

. When users attempt queries on these sensitive subjects, Claude Fable 5 automatically reroutes them to the earlier Claude Opus 4.8 model and displays a warning notification

1

.

The launch comes alongside Claude Mythos 5, which operates on the same underlying technology but remains restricted to organizations approved through Project Glasswing, Anthropic's program for vetting cybersecurity professionals

1

. Anthropic acknowledged it has tuned these guardrails to be "stricter than ideal," accepting that the system may occasionally refuse harmless requests. The company reports false positives occur in less than five percent of all sessions during testing, a tradeoff it considers necessary for preventing AI misuse

1

.

Cybersecurity Restrictions Draw Sharp Criticism From Researchers

The implementation of cybersecurity restrictions has triggered significant pushback from security professionals who argue the limitations are too broad and hinder legitimate work. Valentina "Chompie" Palmiotti, a security researcher at IBM X-Force, noted that Claude Fable 5 "rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post"

2

. Matt Suiche, a member of the technical staff at AI cybersecurity startup Tolmo, explained that "if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded"

2

.

Anthropic's concerns center on the model's capability for agentic hacking, which allows it to execute multi-part cyberattacks with far greater facility than earlier models

1

. Testing from the UK's AI Security Institute found that Mythos Preview performed similarly to OpenAI's GPT-5.5 on Capture the Flag challenges, suggesting the performance isn't unique to one model

1

. On the cybersecurity-focused ExploitBench test, Claude Mythos 5 scored 78 percent on vulnerable code exploits, jumping from Opus 4.8's 40 percent and even surpassing Mythos Preview's 69 percent

1

.

Anthropic Reverses Policy on Secret AI Research Sabotage

Anthropic faced fierce backlash from the AI research community after initially planning to secretly degrade Claude Fable 5's performance for users attempting frontier AI development work. The company would have invisibly limited the model's capabilities for researchers trying to build competing AI models, which Anthropic explicitly bans in its terms of service

3

. Dean Ball, a senior fellow at the Foundation for American Innovation and former White House AI advisor, called the "secret sabotage" policy "shockingly hostile" and noted it undermines collaboration on AI safety

3

.

"We're changing Fable 5's safeguards for frontier LLM development to make them visible," Anthropic said in a statement. "We made the wrong tradeoff and we apologize for not getting the balance right"

3

. Will Brown, research lead at open source AI startup Prime Intellect, expressed concern that the policy suggested Anthropic was "starting to pull the ladder up behind them" by limiting who could conduct advanced AI research

3

.

Biological Weapons Concerns Expand Safety Protocols

While earlier Anthropic models blocked bioweapons-related queries, the classifier now applies to all chemistry and biology-related questions in Claude Fable 5. The company worries that "well-resourced malicious actors" could use even seemingly benign queries on these subjects to assist with "highly risky biological research" more effectively than with previous models

1

. Anthropic has described AI-enabled bioterror risks as a horrifying reality that has concerned leading AI labs in recent months

5

.

Anthropic is expanding its trusted access program for life sciences organizations, which will remove biology and chemistry safeguards while maintaining cybersecurity restrictions. This expansion happens "in consultation with the US government" as the company determines who qualifies as trustworthy enough to access potentially dangerous capabilities

1

. The Cyber Verification Program requires cybersecurity professionals to apply for approval to receive fewer limitations when using Claude for security work, similar to OpenAI's Trusted Access for Cyber program

2

.

Pricing and Access Raise Concerns About AI Costs

API and Enterprise users can access Claude Fable 5 at $10-per-million input tokens and $50-per-million output tokens, prices that are 67 to 100 percent higher than OpenAI's GPT-5.5

1

. This price difference could prove significant as many enterprises grow critical of AI costs after blowing through yearly AI budgets early

4

.

Anthropic's existing subscription plans include access to Claude Fable 5 through June 22, after which users will need to purchase usage credits to access the model

1

. The company says it eventually hopes to restore access as a standard part of subscription plans once it has "sufficient capacity"

1

. With the launch, Anthropic will require a 30-day retention on all traffic, even for enterprises that previously had zero-retention agreements, to defend against complex attacks and identify false positives

4

. This mandatory data retention policy could set an industry precedent where access to increasingly powerful models comes with surveillance measures framed as preventing AI misuse

4

.

Implications for AI Development and Recursive Self-Improvement

The launch follows Anthropic's recent warning that AI systems are advancing so rapidly they may soon achieve recursive self-improvement, where models autonomously improve themselves without human intervention

4

. In over 1,000 hours of red-team testing with a bug bounty program, external teams failed to find universal jailbreaks for Claude Fable 5, and the model resisted automated jailbreak attempts to a much larger degree than previous Claude Opus models

1

.

Third-party testing shows impressive capabilities: analytics company Hex reported Claude Fable 5 was the first to score 90 percent on its core analytics benchmark of complex, long-running analytical tasks

4

. The model also reportedly beat Pokemon FireRed, something previous models had failed to accomplish

5

. As Anthropic prepares to enter public markets alongside OpenAI and Elon Musk's SpaceX, the company's approach to AI safety measures will face continued scrutiny from both the AI research community and regulators watching how frontier labs balance innovation with responsible deployment

4

.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved