AI Agents Hijacked via Prompt Injection: Bug Bounties Paid, Security Advisories Withheld

Reviewed byNidhi Govil

4 Sources

Share

Security researchers exploited prompt injection vulnerabilities in AI agents from Anthropic, Google, and Microsoft, stealing API keys through GitHub Actions integrations. All three companies paid bug bounties ranging from $100 to $1,337 but issued no CVEs or public advisories, leaving users on older versions exposed to potential attacks.

Security Researchers Exploit AI Agents Through GitHub Actions

Security researcher Aonan Guan, working with colleagues at Johns Hopkins University, successfully hijacked three popular AI agents by exploiting a prompt injection vulnerability that allowed them to steal API keys and access tokens. The affected tools—Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot Agent—all integrate with GitHub Actions and process user-submitted content as part of their workflow

1

.

The attack, dubbed Comment and Control, works by embedding malicious instructions in places the AI agents are designed to trust: pull request titles, issue bodies, and comments

3

. When these AI agents ingest this content as task context, they execute the injected commands as though they were legitimate instructions, demonstrating a fundamental weakness in how AI agents process context and distinguish data from commands.

Source: VentureBeat

Source: VentureBeat

How the Attack Works Across Multiple Platforms

Guan originally discovered the flaw in Claude Code Security Review, Anthropic's GitHub Action that analyzes code changes for vulnerabilities. He submitted a pull request with malicious instructions embedded in the PR title, instructing Claude to execute the whoami command using the Bash tool and return results as a "security finding." Claude then executed the injected commands and posted the output, including leaked credentials, in its JSON response as a pull request comment

1

.

After validating this prompt injection worked with Claude Code, Guan and his team verified similar attacks against other agents. With Gemini CLI Action, researchers injected a fake "trusted content section" after legitimate content, overriding Gemini's safety instructions and forcing it to publish its own API key as an issue comment

2

. The GitHub Copilot Agent attack proved subtler—Guan hid malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in rendered Markdown but fully visible to the AI agent parsing raw content

3

.

Source: Hacker News

Source: Hacker News

Bug Bounties Paid, But No Public Disclosure

All three vendors paid bug bounties after researchers disclosed the flaws through proper channels. Anthropic paid $100 in November after upgrading the critical severity rating from 9.3 to 9.4 on the CVSS scale. Google paid a $1,337 bounty, crediting Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao. GitHub initially dismissed the Copilot finding as a "known issue" but ultimately paid a $500 bounty in March

4

.

However, none of the three vendors assigned CVEs or published security advisories that would alert users to the vulnerabilities. According to Guan, "If they don't publish an advisory, those users may never know they are vulnerable—or under attack"

1

. This lack of transparency leaves users running older versions of these AI agent integrations exposed, as vulnerability scanners won't flag the issue without a CVE, and security teams have no artifact to track without an advisory

3

.

Source: The Register

Source: The Register

Broader Implications for AI Agent Security

The theft of API keys through these prompt injection attacks highlights a pervasive problem in AI agent architecture. Guan warns that the attack pattern likely applies to any AI agent that ingests untrusted GitHub data and has access to execution tools in the same runtime as production secrets—and beyond GitHub Actions, to any agent that processes untrusted input with access to tools and secrets, including Slack bots, Jira agents, email agents, and deployment automation

2

.

Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, emphasized where protection needs to sit: "At the action boundary, not the model boundary. The runtime is the blast radius"

4

. The vulnerabilities exploit how GitHub Actions workflows using pull_request_target, which most AI agent integrations require for secret access, inject secrets into the runner environment.

What Organizations Should Watch For

Anthropic updated its documentation after the disclosure to clarify that Claude Code Security Review "is not hardened against prompt injection attacks and should only be used to review trusted PRs," recommending users configure repositories to "Require approval for all external contributors"

1

. However, this guidance addresses only one specific implementation and doesn't solve the underlying issue of insufficient input sanitization that allows AI agents to treat user-controlled data as executable instructions.

The lack of published runtime metrics and security safeguards in system cards represents a transparency gap that prevents procurement teams from verifying what they cannot measure

4

. Organizations deploying AI agents need to demand clarity on whether safeguards extend into tool execution and arbitrary code execution prevention, not just prompt filtering at the model level. Without CVEs to track these prompt injection vulnerabilities, security teams must actively monitor vendor documentation and assume that any AI agent processing external content could leak sensitive information through similar attack vectors.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved