6 Sources
[1]
AI cybersecurity updates for MDASH, Mythos, and GPT-5.5.
On Wednesday, the AISI, which evaluates AI models for the British government, said both Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 showed progress well above previous trends on cybersecurity testing. Separately, XBOW released data suggesting "frontier models have taken a major step forward in vulnerability discovery." Meanwhile, Microsoft said its multi-model agentic setup, MDASH, was used to discover 16 CVEs in this week's Patch Tuesday updates and is the leader on the CyberGym security evaluation framework.
[2]
Microsoft's MDASH AI Security System Finds 16 Windows Vulnerabilities
Microsoft is responding to the rise of AI programs that can hunt down security vulnerabilities by introducing "MDASH," a system that harnesses over 100 AI agents to find software bugs. Microsoft used MDASH to uncover 16 new vulnerabilities related to Windows, "including four Critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service," the company says. MDASH also outperformed other AI models, including Anthropic's cybersecurity-focused Claude Mythos and OpenAI's GPT 5.5, Microsoft says, achieving a leading 88.45% score on the CyberGym benchmark, which evaluates AI agents' ability to find software bugs. "The strategic implication is clear: AI vulnerability discovery has crossed from research curiosity into production-grade defense at enterprise scale," the company writes in the announcement. In addition, Microsoft says its system shows a "durable advantage" by efficiently leveraging multiple models, rather than relying on a single one. Microsoft doesn't offer specifics on the AI models it used. But it developed over 100 AI agents, each specialized in finding specific software bugs using a collection of cutting-edge AI models and more efficient, smaller models. "No single model is best at every stage. The multi-model agentic scanning harness runs a configurable panel of models," Microsoft adds. A key component is that the AI agents will scan the computer code for vulnerabilities and then debate to see if their findings align. "Disagreement between models is itself a signal: when an auditor flags something as suspect and the debater can't refute it, that finding's posterior credibility goes up," Microsoft says. Microsoft's security engineering teams have been using MDASH along with a "small set of customers as part of a limited private preview." The company is likely doing so to prevent misuse, noting that MDASH "can approximate professional offensive researchers." Still, Microsoft is opening up access to select enterprise customers that apply. MDASH arrives as hackers have been using AI models to find serious flaws in software or help them orchestrate attacks. As a result, the cybersecurity industry is entering an arms race: although AI tools have the potential to bolster defenses, the same models might also fall into the wrong hands and be used to devastating effect. The big question is whether AI can fortify software systems enough to withstand AI-driven attacks.
[3]
Microsoft's MDASH AI System Finds 16 Windows Flaws Fixed in Patch Tuesday
Microsoft has unveiled a new multi-model artificial intelligence (AI)-driven system called MDASH to facilitate vulnerability discovery and remediation at scale, adding that it's being tested by some customers as part of a limited private preview. MDASH, short for multi-model agentic scanning harness, is designed as a model-agnostic system that uses bespoke AI agents for different vulnerability classes to autonomously discover, validate, and prove exploitable defects in complex codebases like Windows. "Unlike single-model approaches, the harness orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end," Taesoo Kim, vice president of agentic security at Microsoft, said. MDASH is envisioned as a "structured pipeline" that ingests a codebase and produces validated, proven findings through a series of actions. It starts with analyzing the source code to build a threat model and attack surface, running specialized "auditor" agents over candidate code paths to flag potential issues, running a second set of "debater" agents that validate the findings, grouping semantically equivalent findings, and then finally proving the existence of the vulnerabilities. The system is powered by a configurable panel of models, with state-of-the-art (SOTA) models used for reasoning, distilled models for validation for high-volume passes, and a second separate SOTA model for independent counterpoint. "Disagreement between models is itself a signal: when an auditor flags something as suspect and the debater can't refute it, that finding's posterior credibility goes up," Microsoft explained. "An auditor does not reason like a debater, which does not reason like a prover. Each pipeline stage has its own role, prompt regime, tools, and stop criteria." Redmond noted that the specialized agents have been constructed based on past common vulnerabilities and exposures (CVEs) and their patches. It also said the architecture allows for portability across model generations. MDASH has already been put to test, unearthing 16 of the vulnerabilities that were fixed in this month's Patch Tuesday release. The shortcomings span across the Windows networking and authentication stack, including two critical flaws that could pave the way for remote code execution - News of MDASH follows the debut of Anthropic's Project Glasswing and OpenAI Daybreak, both of which are AI-powered cybersecurity initiatives for accelerating vulnerability discovery, validation, and remediation before they can be discovered by bad actors. "The strategic implication is clear: AI vulnerability discovery has crossed from research curiosity into production-grade defense at enterprise scale, and the durable advantage lies in the agentic system around the model rather than any single model itself," Kim said.
[4]
Microsoft's multi-agent AI system tops Anthropic's Mythos on cybersecurity benchmark
A new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a leading cybersecurity benchmark, using more than 100 specialized AI agents working together across multiple AI models to find real-world software vulnerabilities. Microsoft's system, codenamed MDASH, was introduced this week alongside the disclosure of 16 new vulnerabilities it found in different versions of Windows, including four "critical" remote code execution flaws fixed in this month's Patch Tuesday release. The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can't match. MDASH, derived from the term "multi-model agentic scanning harness," works by running specialized AI agents through a staged pipeline. Different agents scan code for potential vulnerabilities, then a separate set of agents debate whether each finding is real and exploitable, and a final stage constructs proof-of-concept attacks to confirm the bugs exist. By comparison, Anthropic's Mythos, which raised concerns over its ability to find and exploit software vulnerabilities when it was previewed earlier this year, is a single AI model running inside an agent framework. Anthropic restricted its release to a handful of companies through a consortium called Project Glasswing, which includes Microsoft. OpenAI's GPT-5.5 and others on the leaderboard are also single-model systems. MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects. Mythos Preview was second at 83.1%, followed by GPT-5.5 at 81.8%. The benchmark gives each system a description of a known vulnerability and an unpatched codebase, and measures whether it can produce a working attack that triggers the bug. The scores on the CyberGym leaderboard are self-reported by the companies, including Anthropic's Mythos result. The benchmark code is public, but no independent party has verified any of the scores. Also, benchmark results don't necessarily reflect real-world performance. The results also highlight growing concerns about AI's use as an offensive hacking tool. The same capabilities that allow AI to find vulnerabilities in friendly hands can be used to discover them for exploitation by attackers. Microsoft said MDASH is being used internally by its security engineering teams and will be entering a limited private preview with customers. Microsoft is telling customers to expect bigger Patch Tuesdays going forward as AI accelerates the discovery of vulnerabilities.
[5]
Microsoft's agentic security system MDASH uncovers four critical Windows RCE flaws - SiliconANGLE
Microsoft's agentic security system MDASH uncovers four critical Windows RCE flaws Microsoft Corp. today detailed a new artificial intelligence-powered vulnerability discovery system that uncovered 16 previously unknown flaws in Windows networking and authentication components, including four critical remote code execution bugs patched in this month's Patch Tuesday release. The system, codenamed MDASH for multi-model agentic scanning harness, was built by Microsoft's Autonomous Code Security team in collaboration with the company's Windows Attack Research and Protection group. MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to find, debate and prove exploitable bugs from end to end. The 16 vulnerabilities span the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution and the Telnet client. Ten of the vulnerabilities were kernel-mode and six were usermode and most were reachable from a network position without credentials. Among the four rated critical are CVE-2026-33827, a remote unauthenticated use-after-free in tcpip.sys triggered by crafted IPv4 packets carrying the Strict Source and Record Route option and CVE-2026-33824, a double-free in the IKEv2 service reachable over UDP port 500 that yields code execution as LocalSystem. The vulnerabilities were also not the kind a single-pass scanner would typically surface. The tcpip.sys flaw involved a reference-counted Path object whose ownership was dropped before a later reuse, with three independent concurrent free paths in play. The IKEEXT double-free vulnerability spanned six source files and was only visible when contrasted against a correctly handled site elsewhere in the same code base. Microsoft also disclosed benchmark results to back claims that the harness is performing at production scale. On a private test driver called StorageDrive containing 21 planted vulnerabilities, MDASH identified all 21 with zero false positives. Against five years of confirmed Microsoft Security Response Center cases, the system recorded 96% recall on clfs.sys and 100% on tcpip.sys. On the public CyberGym benchmark, which covers 1,507 real-world vulnerability reproduction tasks drawn from 188 open-source projects, MDASH scored 88.45%, the top result on the leaderboard and roughly five points ahead of the next entry. The architecture runs the work as a pipeline of prepare, scan, validate, dedup and prove stages, with specialized agent roles at each step. State-of-the-art models handle heavy reasoning, distilled models act as cost-effective debaters for high-volume passes and a separate frontier model provides an independent counterpoint. Domain plugins inject context the foundation models cannot infer on their own, including kernel calling conventions, lock in variants and interprocess communication trust boundaries. Taesoo Kim, vice president of agentic security at Microsoft, wrote in the company's announcement that the durable advantage in AI-driven vulnerability discovery lies in the agentic system around the model rather than any single model. Several members of the Autonomous Code Security team came from Team Atlanta, the group that won first place in the $20 million DARPA AI Cyber Challenge by building an autonomous cyber-reasoning system that found and patched real bugs in open-source projects. MDASH is already being used internally by Microsoft engineering teams and is being tested by a limited set of customers as part of a private preview.
[6]
Microsoft Beats Anthropic and OpenAI on Key Cybersecurity Test | PYMNTS.com
The system, dubbed "MDASH," was introduced this week along with the revelation of 16 new vulnerabilities it uncovered in various versions of Windows, tech news website GeekWire reported Wednesday (May 13). According to the report, MDASH was able to surpass Anthropic's high-profile Mythos model on a "leading cybersecurity benchmark," employing 100-plus specialized artificial intelligence (AI) agents working in tandem across multiple models to uncover real-world software vulnerabilities. That metric is called the CyberGym benchmark, created by UC Berkeley researchers to determine how well AI systems can replicate real-world vulnerabilities across 1,507 tasks pulled from 188 open-source software projects. MDASH scored 88.45% on the test, with Mythos at 83.1% and OpenAI's GPT-5.5 at 81.8%, the report said. MDASH ("multi-model agentic scanning harness") works by assigning different agents to do different jobs, the report added. Some scan code for potential vulnerabilities, while another group debates whether each discovery is real and exploitable. A final group puts together proof-of-concept attacks to confirm the bugs are real. Mythos, on the other hand, is a single AI model operating inside an agent framework, GeekWire said. The startup has limited its release to a small group of companies -- Microsoft included -- known as "Project Glasswing." In the wake of Mythos' release, OpenAI has introduced Daybreak, its own agentic security offering that works with the company's Codex coding tool. "AI is already good and about to get super good at cybersecurity; we'd like to start working with as many companies as possible now to help them continuously secure themselves," OpenAI CEO Sam Altman wrote on social media platform X earlier this week. This week also saw reports that French AI startup Mistral was working with banks in Europe -- which lack access to Mythos -- on its own cybersecurity offering. In related news, PYMNTS wrote earlier this week about "the industrialization of hacking" after researchers at Google reported they had uncovered what they believe is the first observed case of an AI-created zero-day exploit tied to a planned mass exploitation campaign. The chief takeaway for businesses is that the "tool kit of hacking tasks" for cyberscammers, including reconnaissance, exploit adaptation, vulnerability discovery and social engineering, no longer need the same level of human expertise. "On top of that, they are all becoming increasingly automatable," PYMNTS added. "This first-principles shift matters because cybersecurity is ultimately an economic system. And economic systems change rapidly when the cost of production collapses."
Share
Copy Link
Microsoft unveiled MDASH, a multi-model agentic system that orchestrates over 100 specialized AI agents to find software vulnerabilities at enterprise scale. The system discovered 16 Windows flaws, including four critical remote code execution bugs, and topped the CyberGym benchmark with an 88.45% score, surpassing Anthropic's Mythos and OpenAI's GPT-5.5.
Microsoft has introduced MDASH, a multi-model agentic system that marks a significant shift in how AI cybersecurity tools discover and validate software vulnerabilities. Standing for multi-model agentic scanning harness, MDASH orchestrates more than 100 specialized AI agents across multiple frontier and distilled models to autonomously identify, debate, and prove exploitable defects in complex codebases like Windows
2
3
. The system has already demonstrated production-grade capabilities by uncovering 16 Windows vulnerabilities fixed in this month's Patch Tuesday release, including four critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service2
5
.
Source: Hacker News
The strategic implication is clear: AI vulnerability discovery has crossed from research curiosity into enterprise-scale defense, with Microsoft telling customers to expect larger Patch Tuesday updates going forward as AI accelerates the pace of finding flaws
4
. Unlike single-model approaches from competitors, Microsoft's MDASH achieves what the company calls a "durable advantage" by efficiently leveraging multiple models rather than relying on any single one2
3
.
Source: GeekWire
Microsoft's MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects
2
4
. This performance placed it ahead of Anthropic's cybersecurity-focused Claude Mythos Preview, which scored 83.1%, and OpenAI's GPT-5.5, which achieved 81.8%1
4
. The British government's AISI, which evaluates AI models, confirmed that both Mythos and GPT-5.5 showed progress well above previous trends on cybersecurity testing, while separately, XBOW released data suggesting frontier models have taken a major step forward in vulnerability discovery1
.The benchmark results highlight a key architectural difference: while Mythos is a single AI model running inside an agent framework, MDASH operates as a structured pipeline that ingests codebases and produces validated findings through specialized agent roles at each stage
3
4
. Microsoft emphasizes that no single model excels at every stage, which is why the AI-powered security system runs a configurable panel of models2
.
Source: SiliconANGLE
MDASH operates through a structured pipeline of prepare, scan, validate, dedup, and prove stages, with specialized AI agents constructed based on past CVEs and their patches
3
5
. The system starts by analyzing source code to build a threat model and attack surface, then runs specialized "auditor" agents over candidate code paths to flag potential issues. A second set of "debater" agents validates the findings, with disagreement between models serving as a signal that increases a finding's credibility2
3
.State-of-the-art models handle heavy reasoning, distilled models act as cost-effective validators for high-volume passes, and a separate frontier model provides independent counterpoint
3
5
. Domain plugins inject context that foundation models cannot infer on their own, including kernel calling conventions, lock invariants, and interprocess communication trust boundaries5
.Related Stories
Among the 16 Windows vulnerabilities discovered by MDASH, four were rated critical, including CVE-2026-33827, a remote unauthenticated use-after-free in tcpip.sys triggered by crafted IPv4 packets, and CVE-2026-33824, a double-free in the IKEv2 service reachable over UDP port 500 that yields code execution as LocalSystem
5
. The flaws spanned the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client, with ten kernel-mode and six usermode vulnerabilities, most reachable from a network position without credentials5
.These were not simple bugs that a single-pass scanner would typically surface. The tcpip.sys flaw involved a reference-counted Path object whose ownership was dropped before later reuse, with three independent concurrent free paths in play, while the IKEEXT double-free spanned six source files
5
. On internal testing, MDASH identified all 21 planted vulnerabilities in a private test driver called StorageDrive with zero false positives and recorded 96% recall on clfs.sys and 100% on tcpip.sys against five years of confirmed Microsoft Security Response Center cases5
.The introduction of MDASH highlights growing concerns about AI's dual-use nature in cybersecurity. The same capabilities that allow AI to find software vulnerabilities in friendly hands can be used to discover them for exploitation by attackers engaged in offensive hacking
4
. Microsoft acknowledges that MDASH "can approximate professional offensive researchers," which is why the company is limiting access through a private preview with select enterprise customers who apply2
.Microsoft's security engineering teams have been using MDASH internally alongside a small set of customers as part of a limited private preview
2
5
. The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can't match4
. Taesoo Kim, vice president of agentic security at Microsoft, stated that "the durable advantage lies in the agentic system around the model rather than any single model itself"3
5
. Several members of the Autonomous Code Security team came from Team Atlanta, which won first place in the $20 million DARPA AI Cyber Challenge by building an autonomous cyber-reasoning system5
.Summarized by
Navi
13 May 2026•Technology

23 Apr 2026•Technology

12 May 2026•Technology

1
Technology

2
Technology

3
Business and Economy
