MDASH AI Cybersecurity System Finds 16 Flaws

Microsoft's MDASH Redefines AI Cybersecurity at Production Scale

Microsoft has introduced MDASH, a multi-model agentic system that marks a significant shift in how AI cybersecurity tools discover and validate software vulnerabilities. Standing for multi-model agentic scanning harness, MDASH orchestrates more than 100 specialized AI agents across multiple frontier and distilled models to autonomously identify, debate, and prove exploitable defects in complex codebases like Windows 2

. The system has already demonstrated production-grade capabilities by uncovering 16 Windows vulnerabilities fixed in this month's Patch Tuesday release, including four critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service 2

Source: Hacker News

The strategic implication is clear: AI vulnerability discovery has crossed from research curiosity into enterprise-scale defense, with Microsoft telling customers to expect larger Patch Tuesday updates going forward as AI accelerates the pace of finding flaws 4

. Unlike single-model approaches from competitors, Microsoft's MDASH achieves what the company calls a "durable advantage" by efficiently leveraging multiple models rather than relying on any single one 2

MDASH Outperforms Anthropic and OpenAI on CyberGym Benchmark

Source: GeekWire

Microsoft's MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects 2

. This performance placed it ahead of Anthropic's cybersecurity-focused Claude Mythos Preview, which scored 83.1%, and OpenAI's GPT-5.5, which achieved 81.8% 1

. The British government's AISI, which evaluates AI models, confirmed that both Mythos and GPT-5.5 showed progress well above previous trends on cybersecurity testing, while separately, XBOW released data suggesting frontier models have taken a major step forward in vulnerability discovery 1

The benchmark results highlight a key architectural difference: while Mythos is a single AI model running inside an agent framework, MDASH operates as a structured pipeline that ingests codebases and produces validated findings through specialized agent roles at each stage 3

. Microsoft emphasizes that no single model excels at every stage, which is why the AI-powered security system runs a configurable panel of models 2

How the Multi-Model Agentic System Discovers and Validates Flaws

Source: SiliconANGLE

MDASH operates through a structured pipeline of prepare, scan, validate, dedup, and prove stages, with specialized AI agents constructed based on past CVEs and their patches 3

. The system starts by analyzing source code to build a threat model and attack surface, then runs specialized "auditor" agents over candidate code paths to flag potential issues. A second set of "debater" agents validates the findings, with disagreement between models serving as a signal that increases a finding's credibility 2

State-of-the-art models handle heavy reasoning, distilled models act as cost-effective validators for high-volume passes, and a separate frontier model provides independent counterpoint 3

. Domain plugins inject context that foundation models cannot infer on their own, including kernel calling conventions, lock invariants, and interprocess communication trust boundaries 5

Critical Windows Vulnerabilities Discovered Include Remote Code Execution Flaws

Among the 16 Windows vulnerabilities discovered by MDASH, four were rated critical, including CVE-2026-33827, a remote unauthenticated use-after-free in tcpip.sys triggered by crafted IPv4 packets, and CVE-2026-33824, a double-free in the IKEv2 service reachable over UDP port 500 that yields code execution as LocalSystem 5

. The flaws spanned the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client, with ten kernel-mode and six usermode vulnerabilities, most reachable from a network position without credentials 5

These were not simple bugs that a single-pass scanner would typically surface. The tcpip.sys flaw involved a reference-counted Path object whose ownership was dropped before later reuse, with three independent concurrent free paths in play, while the IKEEXT double-free spanned six source files 5

. On internal testing, MDASH identified all 21 planted vulnerabilities in a private test driver called StorageDrive with zero false positives and recorded 96% recall on clfs.sys and 100% on tcpip.sys against five years of confirmed Microsoft Security Response Center cases 5

The Arms Race Between AI Defense and Offensive Hacking Tools

The introduction of MDASH highlights growing concerns about AI's dual-use nature in cybersecurity. The same capabilities that allow AI to find software vulnerabilities in friendly hands can be used to discover them for exploitation by attackers engaged in offensive hacking 4

. Microsoft acknowledges that MDASH "can approximate professional offensive researchers," which is why the company is limiting access through a private preview with select enterprise customers who apply 2

Microsoft's security engineering teams have been using MDASH internally alongside a small set of customers as part of a limited private preview 2

. The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can't match 4

. Taesoo Kim, vice president of agentic security at Microsoft, stated that "the durable advantage lies in the agentic system around the model rather than any single model itself" 3

. Several members of the Autonomous Code Security team came from Team Atlanta, which won first place in the $20 million DARPA AI Cyber Challenge by building an autonomous cyber-reasoning system 5

Microsoft's MDASH AI system discovers 16 Windows vulnerabilities using 100+ specialized agents

Microsoft's MDASH Redefines AI Cybersecurity at Production Scale

MDASH Outperforms Anthropic and OpenAI on CyberGym Benchmark

How the Multi-Model Agentic System Discovers and Validates Flaws

Critical Windows Vulnerabilities Discovered Include Remote Code Execution Flaws

The Arms Race Between AI Defense and Offensive Hacking Tools

References

AI cybersecurity updates for MDASH, Mythos, and GPT-5.5.

Microsoft's MDASH AI Security System Finds 16 Windows Vulnerabilities

Microsoft's MDASH AI System Finds 16 Windows Flaws Fixed in Patch Tuesday

Microsoft's multi-agent AI system tops Anthropic's Mythos on cybersecurity benchmark

Microsoft's agentic security system MDASH uncovers four critical Windows RCE flaws - SiliconANGLE

Related Stories

Anthropic Mythos evolves faster than expected, now creates working exploits from vulnerabilities

Security researchers use Anthropic Mythos to expose macOS vulnerabilities on Apple M5 chips

Microsoft embeds Anthropic Mythos into security framework to detect vulnerabilities faster

Recent Highlights

Meta AI chatbot exploited by hackers to hijack high-profile Instagram accounts worth millions

Florida sues OpenAI and Sam Altman over ChatGPT safety, alleging AI harms linked to violence

Nvidia RTX Spark chips power new AI laptops with up to 128GB memory and local agent capabilities

Recent Highlights

Today's Top Stories

Anthropic calls for global AI development slowdown as models approach recursive self-improvement

Bot Traffic Surpasses Human Activity as AI Agents Reshape the Internet Faster Than Expected

ChatGPT's Dreaming V3 memory upgrade lets it remember you better across conversations

Cambridge researchers trial first AI-designed vaccine to protect against future pandemics