7 Sources
[1]
Mozilla says 271 vulnerabilities found by Mythos have "almost no false positives"
The disbelief was palpable when Mozilla's CTO last month declared that AI-assisted vulnerability detection meant "zero-days are numbered" and "defenders finally have a chance to win, decisively." After all, it looked like part of an all-too familiar pattern: Cherry pick a handful of impressive AI-achieved results, leave out any of the fine print that might paint a more nuanced picture, and let the hype train roll on. Mindful of the skepticism, Mozilla on Thursday provided a behind-the-scenes look into its use of Anthropic Mythos -- an AI model for identifying software vulnerabilities -- to ferret out 271 Firefox security flaws over two months. In a post, Mozilla engineers said the finally ready-for-prime-time breakthrough they achieved was primarily the result of two things: (1) improvement in the models themselves and (2) Mozilla's development of a custom "harness" that supported Mythos as it analyzed Firefox source code. "Almost no false positives" The engineers said their earlier brushes with AI-assisted vulnerability detection were fraught with "unwanted slop." Typically, someone would prompt a model to analyze a block of code. The model would then produce plausible-reading bug reports, and often at unprecedented scales. Invariably, however, when human developers further investigated, they'd find a large percentage of the details had been hallucinated. The humans would then need to invest significant work handling the vulnerability reports the old-fashioned way. Mozilla's work with Mythos was different, Mozilla Distinguished Engineer Brian Grinstead said in an interview. The biggest differentiating factor was use of an agent harness, a piece of code that wraps around an LLM to guide it through a series of specific tasks. For such a harness to be useful, it requires significant resources to customize it to the project-specific semantics, tooling, and processes it will be used for. Grinstead described the harness his team built as "the code that drives the LLM in order to accomplish a goal. It gives the model instructions (e.g., 'find a bug in this file'), provides it tools (e.g., allowing it to read/write files and evaluate test cases), then runs it in a loop until completion." The harness gave Mythos access to the same tools and pipeline human Mozilla developers use, including the special Firefox build they use for testing. He elaborated: With these harnesses, so long as you can define a deterministic and clear success signal or task verification signal, you can just keep telling it to keep working. In our case when we're looking for memory safety issues we have our sanitizer build of Firefox and if you make it crash you win. We point that agent off to a source file and say: "we know there's an issue in this file, please go find it." It will craft test cases. We have our existing fuzzing systems and tools to be able to run those tests. It will say: "I think there's an issue here if I craft the HTML exactly so." It sends it off to a tool, the tool says yes or no. If the tool says yes then there's some additional verification. The additional verification comes in the form of a second LLM that grades the output from the first LLM. A high score gives developers the same confidence they have when viewing reports generated through more traditional discovery methods. "In terms of the bugs coming out on the other side, there are almost no false positives," he said. Thursday's behind-the-scenes view includes the unhiding of full Bugzilla reports for 12 of the 271 vulnerabilities Mozilla discovered using Mythos and to a lesser extent Claude Opus 4.6. The test cases -- meaning the HTML or other code that triggers an unsafe memory condition -- are provided in each one and meet the same criteria Mozilla requires for all bugs to be considered security vulnerabilities in Firefox. At least one researcher said Thursday that a cursory look at the reports showed they were "pretty impressive." Unlike previous vulnerability disclosure slop, Grinstead said, the details provided by its harness-guided Mythos analysis, and confirmed by the second LLM, and ultimately included in the reports, provide a level of confidence his team didn't have before. "That's the key thing that has unlocked our ability to operate at the scale we've been operating at now," he said. "It gives the engineer a crank they can pull that says: 'yep this has the problem,' and then you can iterate on the code and know clearly when you've fixed it and eventually land the test case in the tree such that you don't regress it." As noted earlier, Mozilla's characterization of AI-assisted vulnerability discovery as a game changer has been greeted with massive and vocal amounts of skepticism in many quarters. Critics initially scoffed when Mozilla didn't obtain CVE designations for any of the 271 vulnerabilities. Like many developers, however, Mozilla doesn't obtain CVE listings for internally discovered security bugs. Instead they are bundled into a single patch. Normally Bugzilla reports detailing these "rollups" are hidden for several months after being fixed to protect those who are slow to patch. Now that Mozilla has revealed a dozen of them, the same critics will surely claim they too were cherry picked and conceal less accurate results. The critics are right to keep pushing back. Hype is a key method for inflating the already high puffed-up valuations of AI companies. Given the extensive praise Mozilla has given to Mythos, it's easy for even more trusting people to wonder: What's it getting in return? Far from settling the debate, Thursday's elaborations are likely to only further stoke the controversy. To hear Grinstead tell it, however, the motivation is simple. "People are a bit burned from the last year of these slop commits so we felt it was important to show some of our work, open up some of the bugs, and talk about it in a little more detail as a way to hopefully spur some action or continue the conversation," he said. "There's no sort of marketing angle here. Our team has completely bought in on this approach. We are trying to get a message out about this technique in general and not any specific model provider, company, or anything like that."
[2]
How Anthropic's Mythos has rewritten Firefox's approach to cybersecurity | TechCrunch
When Anthropic unveiled its new Mythos model in April, it also delivered a stern warning to anyone developing software. The model was so powerful at sniffing out software vulnerabilities, the lab claimed, that it had discovered thousands of high-severity bugs that would need to be fixed before it could be made public. Now, security researchers for Mozilla's Firefox browser are providing a closer look at what that process has looked like in practice, and what Mythos' powers mean for software security at large. In a post published on Thursday, Mozilla said Mythos has unearthed a wealth of high-severity bugs, including some that had lain dormant in the code for more than a decade. That's a significant improvement from what AI security tools were capable of even six months ago. Until now, AI bug-finding tools have come with severe drawbacks, often inundating security teams with low quality reports and false positives. But Mozilla's researchers say the latest generation of tools have turned a corner, particularly now that agentic systems can assess their own work and filter out bad results. "It is difficult to overstate how much this dynamic changed for us over a few short months," the researchers wrote. "First, the models got a lot more capable. Second, we dramatically improved our techniques for harnessing these models." The results are striking: In April 2026, Firefox shipped 423 bug fixes, compared to just 31 exactly a year earlier. The researchers have also published details on 12 of the bugs, which range from a pair of unusual sandbox vulnerabilities, to a 15-year-old error in how the browser parses an HTML element. "These things are actually just suddenly very good," Brian Grinstead, a distinguished engineer at Mozilla, told TechCrunch. "We see that on our own internal scanning, we see that on external bug reports, and we see that in all sorts of signals across the industry." The fact that the system helped reveal vulnerabilities in Firefox's "sandbox" system is particularly impressive, given how intricate an attack that exploits it needs to be. To find sandbox vulnerabilities, the model must write a compromised patch for the browser, then attack the most secure part of the software with the new code implemented. Finding and demonstrating the bug is a delicate, multi-step process, requiring both creativity and close attention. To put this into context, Mozilla's bug bounty program pays researchers who can find a bug in Firefox's sandbox up to $20,000 -- the highest reward available. Despite the top-dollar bounty, however, Grinstead says Mythos is finding more sandbox issues than human researchers ever did. "We do get them," he told TechCrunch, "but not at the volume that we are able to find with this technique." Notably, the Firefox team still isn't using AI to fix the bugs, despite well-documented progress in AI coding tools. The team does ask AI to code up patches for each bug, but the resulting code usually can't be deployed directly, and instead serves as a model for a human engineer. "For the bugs we're talking about in this post, every single one is one engineer writing a patch and one engineer reviewing it," Grinstead says. "We have not found it to be automatable." It's still not clear how AI's emerging capabilities will change the broader balance of power in cybersecurity. One month since Mythos was previewed, most of the bugs discovered likely haven't been patched, which makes it hard to capture the full scope of their impact. Anthropic has been scrupulous about following responsible disclosure norms, but it's likely bad actors are using similar techniques behind the scenes, even if the models they're using aren't quite as good. Speaking at a recent event, Anthropic CEO Dario Amodei was optimistic that the new tools would ultimately favor defenders. "If we handle this right, we could be in a better position than we started, because we fixed all these bugs. There are only so many bugs to find," Amodei said. "So I think there's a better world on the other side of this." Having dealt with the gritty details, Grinstead has a more measured view: "It's useful for both attackers and defenders, but having the tool available shifts the advantage a little bit to defense. Realistically, nobody knows the answer to this yet."
[3]
Mythos found 271 Firefox flaws - none a human couldn't spot
Mozilla CTO says AI means developers finally have a chance to get on top of security The Mozilla has revealed it tested Anthropic's bug-finding "Mythos" AI model and feels the results it experienced represent a watershed moment for software defenders. The FOSS outfit on Tuesday reminded readers that it used Anthropic's Opus 4.6 model to look for bugs in Firefox 148 and found 22 bugs. Mythos found 271 vulnerabilities in Firefox 150. Mozilla CTO Bobby Holley expressed mixed feelings about that result, which he described as giving the Firefox team "vertigo" as they confronted the need to fix so many flaws. "For a hardened target, just one such bug would have been red-alert in 2025, and so many at once makes you stop to wonder whether it's even possible to keep up," he wrote. He also thinks the huge haul of bugs Mythos identified represent "light at the end of the tunnel" for security teams. "Our work isn't finished, but we've turned the corner and can glimpse a future much better than just keeping up," he wrote, then turned on Bold text and declared "Defenders finally have a chance to win, decisively. " He offered that prediction because he feels "Until now, the industry has largely fought security to a draw" while acknowledging it's all-but impossible to eliminate all exploits. "Instead, we aimed to make them so expensive that only actors with functionally unlimited budgets can afford them, and that the cost of burning such an expensive asset disincentivizes those actors against casual use," he wrote. Mythos changes the game, he feels, by improving on the fuzzing tools Mozilla uses to find bugs without human intervention. "Elite security researchers find bugs that fuzzers can't largely by reasoning through the source code," he wrote. "This is effective, but time-consuming and bottlenecked on scarce human expertise. "Computers were completely incapable of doing this a few months ago, and now they excel at it. We have many years of experience picking apart the work of the world's best security researchers, and Mythos Preview is every bit as capable. So far we've found no category or complexity of vulnerability that humans can find that this model can't." The CTO thinks Mythos' abilities "can feel terrifying in the immediate term, but it's ultimately great news for defenders." "A gap between machine-discoverable and human-discoverable bugs favors the attacker, who can concentrate many months of costly human effort to find a single bug. Closing this gap erodes the attacker's long-term advantage by making all discoveries cheap." He then hit CTRL-B again, and busted out CTRL-I too, to note "Encouragingly, we also haven't seen any bugs that couldn't have been found by an elite human researcher. " The CTO also poured cold water on those who assert "future AI models will unearth entirely new forms of vulnerabilities that defy our current comprehension." He doesn't think that will happen, because "Software like Firefox is designed in a modular way for humans to be able to reason about its correctness. It is complex, but not arbitrarily complex." "The defects are finite, and we are entering a world where we can finally find them all." ®
[4]
Anthropic's bug-hunting Mythos was greatest marketing stunt ever, says cURL creator
cURL developer Daniel Stenberg has seen Anthropic's Mythos, a model the AI biz has suggested is too capable at finding security holes to release publicly, scan his popular open source project. But after the system turned up just a single vulnerability, he concluded the hype around Mythos was "primarily marketing" rather than a major AI security breakthrough. Stenberg explained in a Monday blog post that he was promised access to Anthropic's Mythos model - sort of - through the AI biz's Project Glasswing program. Part of Glasswing involves giving high-profile open source projects access via the Linux Foundation, but while Stenberg signed up to try Mythos, he said he never actually received direct access to the model. Instead, someone else with access ran Mythos against curl's codebase and later sent him a report. "It's not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway," Stenberg explained. "Getting the tool to generate a first proper scan and analysis would be great, whoever did it." That scan, which analyzed curl's git repository at a recent master-branch commit, was sent back to him earlier this month, and it found just five things that it claimed were "confirmed security vulnerabilities" in cURL. Saying he had expected an extensive list of vulnerabilities, Stenberg wrote that the report "felt like nothing," and that feeling was further validated by a review of Mythos' findings. "Once my curl security team fellows and I had poked on this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability," Stenberg said, bringing us back to the aforementioned number. As for the other four, three turned out to be false positives that pointed out cURL shortcomings already noted in API documentation, while the team deemed the fourth to be just a simple bug. "The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June," the cURL meister noted. "The flaw is not going to make anyone grasp for breath." That said, Mythos did find several other non-security bugs that Stenberg said the team is working on fixing, and he notes that their description and explanation were well done. Mythos can do good work, in other words, but it's not a ground-breaking, game-changing AI model like Anthropic has claimed. "My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing," Stenberg said in the blog post. "I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos." To say cURL has become widely used in its nearly three decades of existence would be an understatement. Its wide reach has meant that its team has been running it through all sorts of static code analyzers and fuzz testing it since well before the dawn of the AI age. With AI's rise, the cURL team has adapted, meaning Mythos is hardly the first AI to get its fingers on cURL's codebase. "These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so," Stenberg said of tools like AISLE, Zeropath, and OpenAI Codex Security that've tested cURL code. "A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more." Stenberg's experience with AI testing cURL, in other words, makes it a great candidate to see how effective Mythos can really be at finding more than the average AI. As Stenberg noted elsewhere in his blog post, Mythos isn't doing anything particularly novel when it comes to security discoveries: It might be a bit better at finding things than previous models, but "it is not better to a degree that seems to make a significant dent in code analyzing," the cURL author noted. Stenberg isn't an AI doomer when it comes to its ability to improve software design, though. Yes, he may have closed the cURL bug bounty earlier this year due to an influx of sloppy, useless bug reports, but he also noted a few months prior to the bounty closure that some security researchers assisted by AI have made valuable reports. "AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past," Stenberg said, adding an important qualifier for the Mythos moment: "All modern AI models are good at this now." Both older AI models and security-focused tools like Mythos have a common limitation, as far as Stenberg is concerned: They're only as good at finding security vulnerabilities as the humans who programmed them. "AI tools find the usual and established kind of errors we already know about. It just finds new instances of them," Stenberg said. "We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new." As for Mythos, Stenberg remains unimpressed, calling it "an amazingly successful marketing stunt for sure" in his blog post. In an email to The Register, Stenberg admitted that it'd be possible for AI models to actually discover new, novel types of vulnerabilities, but he's still not convinced that they can go beyond what humans are capable of finding, given that they're limited by our understanding of how software vulnerabilities work. At the end of the day, Stenberg explained, when we talk about security, we're only talking about code. "Source code is text and it feels like maybe we already know about most ways we can do security problems in it," he pondered in his email. In other words, like the valuable AI-assisted reports made to the cURL bug bounty program before its closure due to a flood of AI garbage, making valuable use of systems like Mythos is going to require humans to get creative. Sorry, no foisting your critical thinking onto a bot. "Human researchers have always used tools when they look for security problems," Stenberg told us. "Adding AIs to the mix gives the humans even more powerful tools to use, more ways to find problems. I expect that many security bugs going forward will be found by humans coming up with new ways and angles of prompting the AIs." Stenberg said that he hopes he'll actually get his hands on Mythos so he can experiment with its capabilities, but he doesn't seem to be holding out hope the promised access will materialize. "I have been promised access and for all I know I will eventually get it," Stenberg told us. "I just don't know when." ®
[5]
Mozilla boasts Mythos boosted Firefox bug cull
Yet it remains unclear if Anthropic's uber model was effective, or if better model middleware is what makes the difference Mozilla fixed 423 Firefox security bugs in April, a repair rate more than five times higher than the 76 fixes issued in March and almost 20 times higher than its 21.5 monthly average last year. The browser maker previously said Anthropic's ballyhooed Mythos Preview model found 271 of these in Firefox 150. Now, a trio of technical types has come forward to provide a bit more detail about what Mythos (and its less storied sibling Opus 4.6) actually found. But they also highlight something that may matter more than the model: the agentic harness - the middleware mediating between AI and the end user. Brian Grinstead, Firefox distinguished engineer, Christian Holler, Firefox tech lead, and Frederik Braun, head of the Firefox security team, observe that over the past few months, AI-generated security reports have gone from slop to rather more tasty. They attribute the transformation to better models and development of better ways of harnessing those models - steering them in a way that increases the ratio of signal to noise. But they also appear to be aware that there's some skepticism in the security community about Mythos. So they've decided to publicize selected wins in an effort to encourage others to jump aboard the AI bug remediation train. "Ordinarily we keep detailed bug reports private for several months after shipping fixes and issuing security advisories, largely as a precaution to protect any users who, for whatever reason, were slow to update to the latest version of Firefox," they said. "Given the extraordinary level of interest in this topic and the urgency of action needed throughout the software ecosystem, we've made the calculated decision to unhide a small sample of the reports behind the fixes we recently shipped." The post links to a dozen Firefox bugs with varying degrees of severity. The list includes, for example, a 20-year-old heap use-after-free bug (high severity) that a web page could trigger using the XSLTProcessor DOM API without any user interaction. Many of these bugs are sandbox escapes, they note, which are difficult to find using techniques like fuzzing. AI analysis, they say, helps provide broader security coverage. And they add that it has helped validate prior browser hardening work designed to prevent prototype pollution attacks - audit logs showed AI models making unsuccessful exploitation attempts using this technique. Following Anthropic's announcement of Project Glasswing - a program for companies to gain early access to Mythos because it's touted as too dangerous for public release - security experts expressed skepticism. For example, Davi Ottenheimer, president of security consultancy flyingpenguin, wrote in an April 13 blog post, "The supposedly huge Anthropic 'step change' appears to be little more than a rounding error. The threat narrative so far appears to be ALL marketing and no real results. The Glasswing consortium is regulatory capture dressed up poorly as restraint." He subsequently ran a test in which he strapped Anthropic's lesser models Sonnet 4.6 and Haiku 4.5 into a harness called Wirken with an auditing skill called Lyrik. The result was eight findings in two minutes at a cost of about $0.75, Ottenheimer claims, noting that two of the eight matched bugs Mythos had identified. Other security folk have also reported that bug hunting and exploit development can be quite productive with off-the-shelf models like Opus 4.6, which among other virtues costs about 5x less than Mythos. In an email to The Register, Ottenheimer said, "There's a fundamental philosophical failure in the Mozilla post. A reading and a measurement are not the same thing. I don't see a measurement, but they seem to want us to believe we're looking at one. "When they give us the 'behind the scenes math' it's circular, a trick. 'Mythos found 271 bugs' is what Mythos found, not what other tools could not find against the same code. Why leave it as an assumption if it can be proven?" Ottenheimer said Mozilla advocates that every project adopt a similar approach without proving the merits of that approach. "It's like saying if you don't drink Coca-Cola, you can't run a mile under six minutes, because that's what a guy sponsored by Coca-Cola just did," he said. "The bar moves on rhetoric, marketing, not proper evidence. That is the capture crew again." He notes that the merits of Mythos might be more convincing if Mozilla had reported they couldn't do this work without Mythos. And since they're not saying that, he suggests, it's worth asking why there's no transparent comparison of Mythos to other models. He points to Mozilla's admission that Opus 4.6 was already identifying "an impressive amount of previously unknown vulnerabilities." "Mozilla never quantifies what Opus 4.6 [did] before saying what Mythos added," he said. "So 271 attributed to Mythos doesn't fit the analysis. And there's a deeper reveal when they say 'we dramatically improved our techniques for harnessing these models.' The improvement may be entirely in the harness, not as much in the model. This maps to my own experience. A nail gun has advantages over the hammer, yet without being in the right hands the outputs are as bad or worse." ®
[6]
Claude Mythos Just Flagged 271 Hidden Vulnerabilities in Firefox
Mozilla's Claude Mythos AI experiment has unveiled a striking new chapter in software vulnerability detection. By employing advanced AI to analyze the Firefox 150 codebase, the project identified 271 vulnerabilities in a single release cycle, an extraordinary leap from the 22 issues found in a prior evaluation. This effort, as highlighted by Nate Jones, underscores the limitations of traditional manual code reviews, which often fail to catch critical flaws due to human oversight. The findings not only emphasize the precision of AI-driven analysis but also challenge long-standing assumptions about the reliability of human-written code in making sure software security. Dive into this analysis to understand how AI is reshaping vulnerability detection and its broader implications for software development. You'll gain insight into the role of agentic pipelines in streamlining workflows, the importance of maintaining clean codebases for effective AI integration and the evolving responsibilities of engineers in an AI-driven landscape. These shifts signal a pivotal moment for the industry, offering a glimpse into how organizations can adapt to enhance security, efficiency and collaboration in their development practices. AI tools like Anthropic's Mythos are reshaping how vulnerabilities are detected, offering capabilities that far surpass traditional methods. Unlike manual code reviews, which depend on human scrutiny and are prone to oversight, AI systematically analyzes vast amounts of code with unmatched precision. For example, in Firefox 150, Mythos identified critical flaws across multiple modules, revealing weaknesses in human oversight and the limitations of conventional security practices. This development underscores AI's potential to become an indispensable asset in software development, allowing organizations to address vulnerabilities more efficiently and at scale. Beyond efficiency, AI's ability to detect vulnerabilities early in the development cycle reduces the risks associated with deploying flawed software. By identifying issues before they escalate, AI-driven tools can help organizations save time, reduce costs and maintain user trust. As these tools evolve, their integration into development workflows will likely become a standard practice, making sure that software systems remain secure in an increasingly complex digital landscape. For decades, human-written code has been regarded as the cornerstone of software development, trusted for its reliability and intent. However, the Mythos experiment has revealed the inherent vulnerabilities in this approach. Even the most experienced developers can overlook critical flaws, as demonstrated by AI's exhaustive analysis. This finding challenges the long-held assumption that human-written code is inherently secure, reframing it as a potential risk factor. This shift in perspective has significant implications for the software industry. Organizations must now reevaluate their reliance on human-written code and consider how AI can complement traditional development practices. By doing so, they can address the limitations of manual coding and enhance the overall security and quality of their software systems. The integration of AI into development workflows represents not just a technological advancement but also a cultural shift in how software is created and maintained. Deep dive into the latest in Anthropic's Mythos AI by exploring our other resources and articles. The rise of AI in software development is driving a fundamental transformation in engineering roles and practices. As AI takes on tasks such as vulnerability detection and code review, developers are being freed from repetitive, low-level responsibilities. This shift allows them to focus on higher-level tasks, including defining system intent, designing architectures and setting boundaries for AI-driven processes. To adapt to this new reality, engineers must develop new skills and embrace a mindset of continuous learning. They will need to understand how to collaborate with AI tools effectively, making sure that these systems align with organizational goals and deliver accurate results. This transformation is not just about adopting new technologies but also about redefining the role of engineers in an AI-driven world. The future of software development lies in agentic pipelines -- automated workflows that seamlessly integrate AI tools like Mythos. These pipelines emphasize modularity, allowing organizations to incorporate advanced AI systems into their existing development processes without disrupting operations. By automating repetitive and error-prone tasks, agentic pipelines can enhance efficiency, reduce human error and improve overall software quality. Key benefits of agentic pipelines include: As organizations adopt agentic pipelines, they will be better positioned to innovate while maintaining high standards of security and reliability. This approach represents a significant step forward in the evolution of software engineering, allowing teams to focus on creativity and strategic decision-making. Maintaining clean, readable code is essential for maximizing the effectiveness of AI tools. Poor code hygiene not only hampers AI's ability to detect vulnerabilities but also increases technical debt, which can lead to significant security risks over time. Organizations must prioritize practices such as modular design, consistent formatting and thorough documentation to ensure that their codebases are optimized for AI analysis. By fostering a culture of code hygiene, organizations can unlock the full potential of AI-driven tools like Mythos. Clean codebases enable these tools to operate more efficiently, delivering accurate results and reducing the likelihood of false positives. This focus on code quality is not just a best practice but a necessity in an era where AI plays an increasingly central role in software development. The integration of AI into software development is reshaping engineering culture, requiring teams to adapt to new roles and responsibilities. Senior engineers, in particular, will need to focus on high-level abstractions, system design and intent, while delegating low-level tasks to AI tools. Clear specifications and standards will become critical to ensure that AI systems align with organizational objectives and operate effectively. This cultural shift will also demand a commitment to continuous learning and adaptability. Engineers must stay informed about the latest advancements in AI and understand how to use these technologies to achieve their goals. By fostering a collaborative environment where humans and AI work together, organizations can create a culture of innovation and resilience. To thrive in an AI-driven future, organizations must take proactive steps to integrate AI into their development workflows. Key actions include: By embracing these changes, companies can position themselves as leaders in the evolving landscape of software development. The integration of AI is not just a technological shift but a strategic imperative for organizations seeking to remain competitive in an increasingly automated world. Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
[7]
Claude Mythos found decade old Firefox bugs that years of fuzzing missed
There's a 15-year-old bug hiding in Firefox's <legend> element - one of the most boring tags in HTML. It survived over a decade of fuzzing, manual audits, and security research. Claude Mythos found it in days. Also read: Genesis AI's human-sized robotic hands can cook, play piano, and solve a Rubik's cube That's the lead story from the thorough post-mortem by Mozilla detailing how it leveraged Claude Mythos Preview from Anthropic to discover and patch 271 security flaws in Firefox, leading to the most heavily patched version in the browser's history. Firefox 150 was released with all 271 security flaws patched; some extra patches were later rolled into Firefox 149.0.2 and 150.0.1. In April alone, Mozilla fixed 423 security flaws in total, a feat that would have been inconceivable just six months ago. A good example of how AI is solving problems that fuzzing can't is the <legend> bug, where a collision of three entirely independent Firefox behaviors causes a use-after-free vulnerability where the browser frees up memory still in use. Each behavior, on its own, appears innocent enough, but when all three come together, something nasty happens. As such, an automated fuzzer wouldn't have picked up the bug, but Claude Mythos found it because it was capable of making sense of all three behaviors coming together. Also read: Why OpenAI, AMD, NVIDIA, Intel, Broadcom, and Microsoft all agreed on one networking protocol That combinatorial reasoning is the core difference. Earlier attempts at AI-assisted security audits using models like GPT-4 and Claude Sonnet 3.5 produced too many false positives to be useful at scale; it's cheap to prompt a model to find a "problem" in code, but slow and expensive for engineers to chase down dead ends. What changed is the introduction of agentic harnesses: systems that don't just flag suspicious code but actually build and run reproducible test cases to confirm whether a bug is real. If it can't be reproduced, it gets dismissed automatically. That filter is what makes the pipeline scalable. Mozilla took a novel approach in building their own harness on top of existing fuzzing framework, distributed across numerous ephemeral VMs targeting specific codebase files. The bugs they discovered are anything but easy. In particular, they found several sandbox escape exploits which assume the compromise of a content process and an attack on the higher-level parent process. Bugs like these have proven notoriously difficult to detect with conventional techniques. A 20-year old vulnerability with a buggy implementation of XSLT, an NaN value capable of functioning as a pointer for a JavaScript object in cross-process communication, and a race condition allowing exploitation through the WebTransport protocol via massive flooding of certificate hashes - these are just some of the vulnerabilities that require multi-step reasoning to detect. Out of the 271 bugs found by Claude Mythos in Firefox version 150, a staggering 180 have been flagged as sec-high, and 80 have received sec-moderate labels. Mozilla also plans on integrating such automated scanning into their CI pipeline to immediately analyze any changes for potential vulnerabilities. The takeaway for the entire software industry? The bugs have already been found; the time to act is now.
Share
Copy Link
Anthropic Mythos identified 271 security vulnerabilities in Firefox with almost no false positives, helping Mozilla ship 423 bug fixes in April 2026. But cURL creator Daniel Stenberg questions the hype after the AI model found just one confirmed vulnerability in his widely-tested codebase, calling it primarily a marketing stunt.
Mozilla has revealed detailed findings from its use of Anthropic Mythos, an AI model designed for finding security flaws, which identified 271 Firefox security vulnerabilities over two months
1
. The discovery helped Mozilla ship 423 Firefox bug fixes in April 2026, compared to just 31 exactly a year earlier2
. This represents more than five times the 76 fixes issued in March and almost 20 times higher than the 21.5 monthly average from last year5
. Mozilla Distinguished Engineer Brian Grinstead emphasized that "in terms of the bugs coming out on the other side, there are almost no false positives"1
.
Source: Geeky Gadgets
The team published details on 12 of the bugs, ranging from unusual sandbox issues to a 15-year-old error in HTML element parsing
2
. Mozilla CTO Bobby Holley declared that "defenders finally have a chance to win, decisively" and suggested zero-days are numbered3
. The bug-hunting AI proved particularly effective at identifying sandbox vulnerabilities, which Mozilla's bug bounty program pays up to $20,000 for researchers to find2
.Mozilla engineers attribute their success to two factors: improvements in the AI model itself and their development of a custom agent harness that guided Mythos through Firefox source code analysis
1
. The harness wraps around the large language model to guide it through specific tasks, providing instructions and tools that mirror what human Mozilla developers use, including specialized Firefox builds for testing1
.Grinstead explained that earlier attempts at AI-assisted vulnerability detection produced "unwanted slop" with plausible-sounding bug reports that often contained hallucinated details requiring significant human verification
1
. The new approach uses a second LLM to grade output from the first, providing developers with the same confidence level as traditional discovery methods1
. For memory safety issues, the system leverages Mozilla's sanitizer build of Firefox, where successfully crashing the browser confirms a vulnerability1
.
Source: The Register
Not everyone shares Mozilla's enthusiasm about Anthropic Mythos. Daniel Stenberg, creator of the widely-used cURL project, concluded that the hype around the AI model was "primarily marketing" after it found just one confirmed vulnerability in his codebase
4
. Mythos initially flagged five potential security vulnerabilities in cURL, but after hours of investigation by Stenberg's security team, only one was confirmed as a low-severity issue planned for CVE publication4
.Stenberg noted that cURL has undergone extensive testing with AI-powered code analyzers over recent months, with tools like AISLE, Zeropath, and OpenAI Codex Security triggering between 200 and 300 bugfixes in the past 8-10 months
4
. He acknowledged that AI tools have improved at finding security flaws compared to traditional analyzers, but emphasized that "all modern AI models are good at this now" and that Mythos doesn't represent a significant advancement4
.Related Stories
Security consultant Davi Ottenheimer raised concerns about Mozilla's methodology, noting that the organization never quantified what Opus 4.6 accomplished before attributing results to Mythos
5
. He demonstrated that Anthropic's lesser models Sonnet 4.6 and Haiku 4.5, when equipped with a harness called Wirken, produced eight findings in two minutes at approximately $0.75, with two matching bugs Mythos had identified5
.
Source: The Register
Ottenheimer criticized Mozilla for not providing transparent comparisons between Mythos and other models, stating there's "a fundamental philosophical failure" in treating readings as measurements without proper evidence
5
. Mozilla acknowledged that Opus 4.6 was already identifying "an impressive amount of previously unknown vulnerabilities" before Mythos deployment5
.The debate centers on whether Anthropic Mythos represents a genuine breakthrough in software security or if effective middleware makes any capable AI model sufficient for finding security flaws. Mozilla's Brian Grinstead acknowledged uncertainty about the broader implications, stating "it's useful for both attackers and defenders, but having the tool available shifts the advantage a little bit to defense"
2
. Anthropic CEO Dario Amodei suggested that fixing discovered bugs could leave defenders in a better position since "there are only so many bugs to find"2
.Holley noted that Mythos hasn't discovered any bugs that elite human researchers couldn't find, countering speculation about AI uncovering entirely new vulnerability categories
3
. Mozilla still relies on human engineers to write and review patches for every bug, as AI-generated fixes cannot be deployed directly2
. The question remains whether bad actors using similar techniques with less capable models pose an immediate threat, and whether the industry's focus should shift toward developing better harnesses rather than pursuing more advanced AI models for high-severity software vulnerabilities detection.Summarized by
Navi
[3]
[5]
14 May 2026•Technology

13 May 2026•Technology

06 Mar 2026•Technology

1
Policy and Regulation

2
Science and Research

3
Technology
