AI models are doubling cybersecurity capabilities every 4 months, outpacing all predictions

Reviewed byNidhi Govil

5 Sources

Share

The UK AI Security Institute reports that frontier AI models like Anthropic Mythos and OpenAI GPT-5.5 are advancing autonomous cyber capabilities faster than anticipated, with doubling times shrinking from 8 months to just 4 months. Palo Alto Networks discovered 75 vulnerabilities in one month using these models—8 times their usual rate—while warning organizations have only 3-5 months before AI-driven cyberattacks become the new norm.

AI Models Accelerate Beyond Expectations

The autonomous cyber capabilities of frontier AI models are advancing at a pace that continues to surprise researchers and security professionals. The UK AI Security Institute (AISI) has documented a dramatic acceleration in how quickly AI models in cybersecurity can complete tasks that previously required human experts

1

. In November 2025, AISI estimated that the cyber capabilities of frontier models would double every 8 months. By February 2026, that projection had shrunk to 4.7 months. The release of Anthropic Mythos Preview and OpenAI GPT-5.5 has compressed this timeline even further, with current estimates suggesting a doubling period closer to 4 months

5

.

Source: Axios

Source: Axios

AISI's time window benchmark measures how much work an AI can accomplish compared to a human cybersecurity professional. For instance, Claude Sonnet 4.5 can complete what a human expert would finish in 16 minutes about 80 percent of the time, given a budget of 2.5 million tokens

1

. The latest Mythos Preview checkpoint solved a 32-step simulated corporate network attack called "The Last Ones" in six of 10 attempts and completed a previously unsolved seven-step industrial control system attack called "Cooling Tower" in three of 10 attempts. These achievements represent significant leaps over earlier models like Opus 4.6, which could only complete a maximum of 22 of 32 steps on The Last Ones in February 2026

1

.

Palo Alto Networks Uncovers 8x More Vulnerabilities

Palo Alto Networks has provided concrete evidence of these advancing capabilities through real-world testing. Over the past month, the company scanned more than 130 products for software flaws using both Mythos and OpenAI's cyber-focused models, uncovering 75 legitimate vulnerabilities that have since been patched

4

. This represents an eightfold increase over their usual monthly discovery rate of 5-10 vulnerabilities. Chief Product Officer Lee Klarich emphasized that many vulnerabilities stood out because AI models in cybersecurity were able to identify ways to chain multiple flaws together into working exploit paths—something earlier AI systems struggled to accomplish

4

.

The models demonstrated particular proficiency in understanding the logic of how applications work and then identifying how attackers might exploit combinations of weaknesses. During internal testing, Palo Alto Networks found the models generated working exploits more than 70 percent of the time

4

. In several cases, individual software flaws might not have warranted disclosure on their own but became high-severity vulnerabilities when combined together. Klarich warned that organizations now have just a three-to-five-month window before AI-driven exploits start to become the new norm

2

. The concerns about AI-driven cyberattacks have escalated to White House meetings with bank leaders and technology giants

2

.

Human Expertise Remains Critical Despite AI Advances

While AI in cybersecurity is advancing rapidly, early adopters consistently found that the models perform best when paired with experienced security researchers who can validate findings and distinguish exploitable vulnerabilities from noise. XBOW, an AI-powered penetration testing startup, noted that Mythos is "extremely powerful for source code audits" but "good, but less powerful, at validating exploits," and that the model could be "too literal and conservative," sometimes overstating the practical significance of its findings

3

.

Palo Alto Networks experienced an average false positive rate of roughly 30 percent across its products, though that rate dropped as the company trained the model on the environment it was searching

3

. The company spent significant time building what Klarich described as an "AI-scanning harness" to feed the models threat intelligence, context, and operational guardrails

4

. Daniel Stenberg, the lead developer for open-source project Curl, reported that Mythos found one low-severity bug in its code alongside several false positives and another issue Curl ultimately considered insignificant

3

.

Cisco released "Foundry Security Spec," an open-source blueprint for how organizations should think about using advanced AI models. Within the spec documents, Cisco warned that "a frontier model produces fluent, confident, plausible vulnerability claims that are wrong at a rate that makes unreviewed output worthless"

3

. Instead of simply instructing models to be more careful, Cisco researchers found better results when they instructed systems to make claims "checkable" and then explicitly verify their own findings.

Implications for Defenders and Attackers

Microsoft reported that its new agentic security system, which runs on several frontier and distilled models, found 16 new vulnerabilities in the Windows networking and authentication stack. The company warned that AI tools are likely to increase the overall volume of discovered software vulnerabilities over time, creating additional pressure on defenders to triage and patch flaws more quickly

3

. Third-party testing suggests that OpenAI GPT-5.5-Cyber is just as powerful as Mythos at finding bugs and writing exploits

3

.

Klarich noted that adversarial hackers won't have the same learning curve when using these tools. "Understanding how attacks work and how you would exploit software and other things like that is the expertise of attackers," he told Axios

3

. Palo Alto Networks is urging organizations to take a four-pronged approach: build the ability to find and patch vulnerabilities before attackers can exploit them, reduce internet-facing exposure, deploy automated detection and prevention tools, and integrate AI and automation into security operations centers so defenders can respond at machine speed

4

.

AISI acknowledges that testing was conducted under conservative circumstances with models limited to 2.5 million tokens per task. Without these limitations, success rates rise so high that time frames become difficult to determine, suggesting published statistics may significantly underestimate the power of these models

5

. The AI Security Institute plans to introduce more advanced tests, including new cyber ranges and active cyber defenses. The White House is actively debating proposals for testing and restricting advanced AI models with powerful cybersecurity capabilities before wider deployment

4

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved