2 Sources
2 Sources
[1]
Anthropic, Google, Microsoft paid AI bug bounties - quietly
Researchers who found the flaws scored beer money bounties and warn the problem is probably pervasive Exclusive Security researchers hijacked three popular AI agents that integrate with GitHub Actions by using a new type of prompt injection attack to steal API keys and access tokens, and the vendors who run agents didn't disclose the problem. The researchers targeted Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot, then disclosed the flaws and received bug bounties from all three. But none of the vendors assigned CVEs or published public advisories, and this, according to researcher Aonan Guan, "is a problem." "I know for sure that some of the users are pinned to a vulnerable version," Guan said in an exclusive interview with The Register about how he and a team from Johns Hopkins University discovered this prompt injection pattern and pwned the agents. "If they don't publish an advisory, those users may never know they are vulnerable - or under attack." He said the attack probably works on other agents that integrate with GitHub, and GitHub Actions that allow access to tools and secrets, such as Slack bots, Jira agents, email agents, and deployment automation agents. Guan originally found the flaw in Claude Code Security Review. This is Anthropic's GitHub Action that uses Claude to analyze code changes and pull requests for vulnerabilities and other security issues. "It uses the AI agent to find vulnerabilities in the code - that's what the software is designed to do," Guan said. This made him curious about "the flow" - how user prompts flow into the agents, and then how they take action based on those prompts. I bypassed all of them It turns out that Claude, along with other AI agents in GitHub Actions, all use the same flow. The agent reads GitHub data - this includes pull request titles, issue bodies, and comments - processes it as part of the task context, and then takes actions. So Guan came up with a devious idea. If he could inject malicious instructions into this data being read by the AI, "maybe I can take over the agent and do whatever I want." It worked. Guan submitted a pull request and injected malicious instructions in the PR title - in this case, telling Claude to execute the whoami command using the Bash tool and return the results as a "security finding." Claude then executed the injected commands and embedded the output in its JSON response, which got posted as a pull request comment. After originally submitting this attack on HackerOne's bug bounty platform in October, Anthropic asked Guan if he could also use this technique to steal more sensitive data, such as GitHub access tokens or Anthropic's API key. Guan demonstrated that this prompt injection can also work to leak credentials. "The title is the payload, the bot's review comment is one place where the credentials show up," Guan said. "Attacker writes the title, reads the comment." It's also worth noting that, after leaking secrets, the attacker can change the PR title back to "fix typo," or something along those lines, then close the PR and delete the bot's message. In November, Anthropic paid Guan a $100 bug bounty, upgraded the critical severity from a 9.3 to 9.4, and updated a "security considerations" section in its documentation. "This action is not hardened against prompt injection attacks and should only be used to review trusted PRs," the docs state. "We recommend configuring your repository to use the 'Require approval for all external contributors' option to ensure workflows only run after a maintainer has reviewed the PR." After validating that this prompt injection worked with Claude Code, Guan worked with Johns Hopkins University researchers to verify similar attacks against other agents - starting with Google Gemini CLI action, which integrates Gemini into GitHub issue workflows, and GitHub Copilot Agent, which can be assigned GitHub issues and autonomously creates PRs. Spoiler alert: it worked. With Gemini, the researchers again started the attack with a malicious prompt injection title, and then added comments with escalating injections: Injecting a fake "trusted content section" after the real "additional content" allowed the researchers to override Gemini's safety instructions, and publish Gemini's API key as an issue comment. Google paid a $1,337 bounty, and credited Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao with finding and disclosing the flaw. Attacking the Microsoft-owned GitHub Copilot Agent proved to be a little trickier. It's an autonomous software engineering (SWE) agent that works in the background on GitHub's infrastructure and can autonomously creates PRs. In addition to the model-and-prompt-level defenses, such as those built into Claude and Gemini, GitHub added three runtime-level security layers: environment filtering, secret scanning, and a network firewall, to prevent credential theft. "I bypassed all of them," Guan said. Unlike the earlier two attacks, which only require putting a visible prompt into the PR title or issue comment, the Copilot one requires an attacker to inject malicious instructions in an HTML comment that GitHub's rendered Markdown makes invisible to humans. The victim, who can't see the hidden trigger, assigns the issue to the Copilot agent to fix. GitHub, after initially calling this a "known issue" that they "were unable to reproduce," ultimately paid a $500 bounty for this issue in March. In total, Guan and his fellow researchers demonstrated that attackers can use this prompt injection technique to steal Anthropic and Gemini API keys, multiple GitHub tokens, and "any other secret exposed in the GitHub Actions runner environment, including arbitrary user-defined repository or organization secrets the workflow has access to." Guan calls this type of prompt injection attacks "comment and control." It's a play on "command and control" because the entire attack runs inside GitHub - it doesn't require any external command-and-control infrastructure. Essentially, it allows the attacker to control GitHub data by injecting a prompt into pull request titles, issue bodies, and issue comments. The AI agents running in GitHub Actions process the data, execute the commands, and then leak credentials through GitHub itself. In research shared with The Register ahead of publication, Guan says there's a "critical distinction" between comment-and-control prompt injection and classic indirect prompt injection. The latter, he explains, "is reactive: the attacker plants a payload in a webpage or document and waits for a victim to ask the AI to process it ('summarize this page,' 'review this file'). Comment and Control is proactive: GitHub Actions workflows fire automatically" on pull request titles, issue bodies, and issue comments. "So simply opening a PR or filing an issue can trigger the AI agent without any action from the victim," he wrote, adding that the Copilot attack is a "partial exception: a victim must assign the issue to Copilot, but because the malicious instructions are hidden inside an HTML comment, the assignment happens without the victim ever seeing the payload." He told us that these attacks illustrate how even models with prompt-injection prevention built in "can still be bypassed in the end." The solution? Think of prompt injection as phishing, but for machines instead of humans, and treat AI agents much like human employees. "Follow the need-to-know protocol," Guan said. For example, if a code review agent doesn't need bash execution, don't give it this tool. Use allow lists to let the agent access only what's required to do its job. Similarly, if its job is summarizing issues, it doesn't need credentials for GitHub write access. "Treat agents as a super-powerful employee," Guan told us. "Only give them the tools that they need to complete their task. ®
[2]
Anthropic, Google, and Microsoft paid AI agent bug bounties, then kept quiet about the flaws
In short:Security researcher Aonan Guan hijacked AI agents from Anthropic, Google, and Microsoft via prompt injection attacks on their GitHub Actions integrations, stealing API keys and tokens in each case. All three companies paid bug bounties quietly, $100 from Anthropic, $500 from GitHub, an undisclosed amount from Google, but none published public advisories or assigned CVEs, leaving users on older versions unaware of the risk. Security researchers have demonstrated that AI agents from Anthropic, Google, and Microsoft can be hijacked through prompt injection attacks to steal API keys, GitHub tokens, and other secrets, and all three companies quietly paid bug bounties without publishing public advisories or assigning CVEs. The vulnerabilities, disclosed by researcher Aonan Guan over several months, affect AI tools that integrate with GitHub Actions: Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent. Each tool reads GitHub data, including pull request titles, issue bodies, and comments, processes it as task context, and then takes actions. The problem is that none of them reliably distinguish between legitimate content and injected instructions. The core technique is indirect prompt injection. Rather than attacking the AI model directly, the researcher embedded malicious instructions in places the agents were designed to trust: PR titles, issue descriptions, and comments. When the agent ingested that content as part of its workflow, it executed the injected commands as though they were legitimate instructions. Against Anthropic's Claude Code Security Review, which scans pull requests for vulnerabilities, Guan crafted a PR title containing a prompt injection payload. Claude executed the embedded commands and included the output, including leaked credentials, in its JSON response, which was then posted as a PR comment for anyone to read. The attack could exfiltrate the Anthropic API key, GitHub access tokens, and other secrets exposed in the GitHub Actions runner environment. The Gemini attack followed a similar pattern. By injecting a fake "trusted content section" after legitimate content in a GitHub issue, Guan overrode Gemini's safety instructions and tricked the agent into publishing its own API key as an issue comment. Google's Gemini CLI Action, which integrates Gemini into GitHub issue workflows, treated the injected text as authoritative. The Copilot attack was subtler. Guan hid malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in the rendered Markdown that humans see but fully visible to the AI agent parsing the raw content. When a developer assigned the issue to Copilot Agent, the bot followed the hidden instructions without question. What happened next is as revealing as the vulnerabilities themselves. Anthropic received Guan's submission on its HackerOne bug bounty platform in October 2025. The company asked whether the technique could also steal more sensitive data such as GitHub tokens, confirmed it could, and in November paid a $100 bounty while upgrading the critical severity rating from 9.3 to 9.4. Anthropic updated a "security considerations" section in its documentation but did not publish a public advisory or assign a CVE. GitHub initially dismissed the Copilot finding as a "known issue" that it "could not reproduce," but ultimately paid a $500 bounty in March. Google paid an undisclosed amount for the Gemini vulnerability. None of the three vendors assigned CVEs or published advisories that would alert users pinned to vulnerable versions. For Guan, this is the crux of the problem. Users running older versions of these AI agent integrations may never learn they are exposed. Without a CVE, vulnerability scanners will not flag the issue. Without an advisory, security teams have no artefact to track. The attacks exploit a fundamental weakness in how AI agents process context. Large language models cannot reliably separate data from instructions. When an agent reads a GitHub issue, it treats the text as input to reason about, but a well-crafted prompt injection can make that input function as a command. Every data source that feeds an AI agent's reasoning, whether it is an email, a calendar invite, a Slack message, or a code comment, is a potential attack vector. This is not a theoretical concern. In January 2026, researchers from Miggo Security demonstrated that Google Gemini could be weaponised through calendar invitations containing hidden instructions. Days later, the "Reprompt" attack against Microsoft Copilot showed that injected prompts could hijack entire user sessions. Anthropic's own Git MCP server was found to harbour three CVEs that allowed attackers to inject backdoors through repositories the server processed. A systematic analysis of 78 studies published in January found that every tested coding agent, including Claude Code, GitHub Copilot, and Cursor, was vulnerable to prompt injection, with adaptive attack success rates exceeding 85%. The supply chain dimension makes it worse. A security audit of nearly 4,000 agent skills on the ClawHub marketplace found that more than a third contained at least one security flaw, and 13.4% had critical-level issues. When AI agents pull in third-party tools and data sources with the same level of trust they extend to their own instructions, a single compromised component can cascade across an entire development pipeline. The vendors' reluctance to publish advisories reflects an uncomfortable reality: there is no established framework for disclosing AI agent vulnerabilities. Traditional software bugs get CVEs, patches, and coordinated disclosure timelines. Prompt injection flaws sit in a grey zone. They are not bugs in the code so much as emergent behaviours of the model, and the mitigations, stronger system prompts, input sanitisation, output filtering, are partial at best. But the consequences are indistinguishable from those of a conventional security flaw. An attacker who exfiltrates a GitHub token through a prompt injection can do exactly the same damage as one who exploits a buffer overflow. The argument that AI safety requires new frameworks does not excuse the absence of disclosure for vulnerabilities that are already being exploited in the wild. Zenity Labs research published this month found that most agent-building frameworks, including those from OpenAI, Google, and Microsoft, lack appropriate guardrails, putting the burden of managing risk on the companies deploying them. In one documented case, attackers manipulated an AI procurement agent's memory so it believed it had authority to approve purchases up to $500,000, when the real limit was $10,000. The agent approved $5 million in fraudulent purchase orders before anyone noticed. For organisations that have integrated AI agents into their CI/CD pipelines, the message is stark. These tools are powerful precisely because they have access to sensitive systems and data. That same access makes them high-value targets, and the industry has not yet built the disclosure infrastructure to match the risk.
Share
Share
Copy Link
Security researchers successfully hijacked AI agents from Anthropic, Google, and Microsoft through prompt injection attacks, stealing API keys and access tokens. All three companies paid bug bounties ranging from $100 to $1,337 but didn't publish public advisories or assign CVEs, leaving users on vulnerable versions unaware of the risks.

Security researcher Aonan Guan and a team from Johns Hopkins University have successfully hijacked AI agents from three tech giants through a sophisticated prompt injection attack, exposing a fundamental weakness in how these systems process context. The researchers targeted Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub Copilot Agent, demonstrating that all three could be manipulated to steal sensitive data including API keys and access tokens
1
. While Anthropic, Google, and Microsoft paid bug bounties for the discoveries, none assigned CVEs or published public advisories, creating what Guan describes as "a problem" for users who remain on vulnerable versions2
.The attack exploits how AI agent integrations with GitHub Actions process data. These agents read GitHub data including pull request titles, issue bodies, and comments, then process this information as part of their task context before taking actions
1
. The critical flaw lies in their inability to reliably distinguish between legitimate content and injected instructions, turning every data source into a potential attack vector.Guan's initial discovery came while examining Anthropic's Claude Code Security Review, a GitHub Action that uses Claude to analyze code changes for vulnerabilities. He wondered about "the flow" of how user prompts move through the agents and trigger actions. His devious insight: if malicious instructions could be injected into the data being read by AI agents, he could potentially take over the agent entirely
1
.The technique proved devastatingly effective. Guan submitted a pull request with malicious instructions embedded in the PR title, instructing Claude to execute the whoami command using the Bash tool and return results as a "security finding." Claude complied, executing the injected commands and embedding the output in its JSON response, which appeared as a pull request comment. After Anthropic asked if the technique could steal sensitive data like GitHub access tokens or Anthropic's API key, Guan demonstrated that this prompt injection could indeed leak credentials. The attack allowed threat actors to write a malicious title, read the credentials in the bot's comment, then change the PR title back to something innocuous like "fix typo," close the PR, and delete the bot's message, covering their tracks entirely
1
.After validating the attack against Claude Code, Guan and the Johns Hopkins team verified similar AI agent vulnerabilities against Google's Gemini CLI Action and GitHub Copilot Agent. The Gemini attack involved injecting a fake "trusted content section" after legitimate content in a GitHub issue, which overrode Gemini's safety instructions and tricked the agent into publishing its own API key as an issue comment
2
.Attacking the GitHub Copilot Agent required additional creativity. This autonomous software engineering agent works in the background on GitHub's infrastructure and can autonomously create PRs. GitHub had implemented three runtime-level security layers beyond model-level defenses: environment filtering, secret scanning, and a network firewall to prevent credential theft. Guan bypassed all of them by hiding malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in rendered Markdown that humans see but fully visible to the AI agent parsing raw content
2
.Related Stories
The vendor responses revealed a troubling pattern. Anthropic paid Guan a $100 bug bounty in November after receiving his submission on HackerOne in October, upgrading the severity rating from 9.3 to 9.4. The company updated a "security considerations" section in its documentation warning that "this action is not hardened against prompt injection attacks and should only be used to review trusted PRs," but published no public advisory
1
. Google paid a $1,337 bounty and credited Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao for the discovery. GitHub initially dismissed the Copilot finding as a "known issue" it "could not reproduce" but ultimately paid a $500 bounty in March2
.None of the three vendors assigned CVEs or published public advisories. Without CVEs, vulnerability scanners cannot flag the issue. Without public advisories, security teams have no artifact to track. Guan emphasized that he "knows for sure that some of the users are pinned to a vulnerable version," and without published advisories, those users may never know they are vulnerable or under attack
1
.Guan believes the attack likely works on other agents that integrate with GitHub and GitHub Actions that allow access to tools and secrets, including Slack bots, Jira agents, email agents, and deployment automation agents
1
. The vulnerability exposes a fundamental weakness: large language models cannot reliably separate data from instructions. When an agent reads a GitHub issue, it treats the text as input to reason about, but a well-crafted prompt injection can make that input function as a command2
.This concern extends beyond GitHub integrations. Every data source feeding an AI agent's reasoning, whether email, calendar invites, Slack messages, or code comments, represents a potential attack vector. A systematic analysis of 78 studies published in January found that every tested coding agent, including Claude Code, GitHub Copilot, and Cursor, was vulnerable to prompt injection
2
. As organizations increasingly deploy AI agents with access to sensitive systems and data, the lack of transparency around these vulnerabilities creates significant security blind spots that attackers could exploit to steal sensitive data at scale.Summarized by
Navi
[1]
23 Dec 2025•Technology

31 Mar 2026•Technology

08 Mar 2026•Technology
