AI Agents from Anthropic, Google, and Microsoft Hijacked via Prompt Injection Attacks

2 Sources

Share

Security researchers successfully hijacked AI agents from Anthropic, Google, and Microsoft through prompt injection attacks, stealing API keys and access tokens. All three companies paid bug bounties ranging from $100 to $1,337 but didn't publish public advisories or assign CVEs, leaving users on vulnerable versions unaware of the risks.

News article

Researchers Exploit Critical Prompt Injection Vulnerabilities in Major AI Agents

Security researcher Aonan Guan and a team from Johns Hopkins University have successfully hijacked AI agents from three tech giants through a sophisticated prompt injection attack, exposing a fundamental weakness in how these systems process context. The researchers targeted Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub Copilot Agent, demonstrating that all three could be manipulated to steal sensitive data including API keys and access tokens

1

. While Anthropic, Google, and Microsoft paid bug bounties for the discoveries, none assigned CVEs or published public advisories, creating what Guan describes as "a problem" for users who remain on vulnerable versions

2

.

The attack exploits how AI agent integrations with GitHub Actions process data. These agents read GitHub data including pull request titles, issue bodies, and comments, then process this information as part of their task context before taking actions

1

. The critical flaw lies in their inability to reliably distinguish between legitimate content and injected instructions, turning every data source into a potential attack vector.

How the Attack Works: From Pull Requests to Credential Theft

Guan's initial discovery came while examining Anthropic's Claude Code Security Review, a GitHub Action that uses Claude to analyze code changes for vulnerabilities. He wondered about "the flow" of how user prompts move through the agents and trigger actions. His devious insight: if malicious instructions could be injected into the data being read by AI agents, he could potentially take over the agent entirely

1

.

The technique proved devastatingly effective. Guan submitted a pull request with malicious instructions embedded in the PR title, instructing Claude to execute the whoami command using the Bash tool and return results as a "security finding." Claude complied, executing the injected commands and embedding the output in its JSON response, which appeared as a pull request comment. After Anthropic asked if the technique could steal sensitive data like GitHub access tokens or Anthropic's API key, Guan demonstrated that this prompt injection could indeed leak credentials. The attack allowed threat actors to write a malicious title, read the credentials in the bot's comment, then change the PR title back to something innocuous like "fix typo," close the PR, and delete the bot's message, covering their tracks entirely

1

.

Three Companies, Three Successful Exploits

After validating the attack against Claude Code, Guan and the Johns Hopkins team verified similar AI agent vulnerabilities against Google's Gemini CLI Action and GitHub Copilot Agent. The Gemini attack involved injecting a fake "trusted content section" after legitimate content in a GitHub issue, which overrode Gemini's safety instructions and tricked the agent into publishing its own API key as an issue comment

2

.

Attacking the GitHub Copilot Agent required additional creativity. This autonomous software engineering agent works in the background on GitHub's infrastructure and can autonomously create PRs. GitHub had implemented three runtime-level security layers beyond model-level defenses: environment filtering, secret scanning, and a network firewall to prevent credential theft. Guan bypassed all of them by hiding malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in rendered Markdown that humans see but fully visible to the AI agent parsing raw content

2

.

Bug Bounties Paid, But No Public Warnings Issued

The vendor responses revealed a troubling pattern. Anthropic paid Guan a $100 bug bounty in November after receiving his submission on HackerOne in October, upgrading the severity rating from 9.3 to 9.4. The company updated a "security considerations" section in its documentation warning that "this action is not hardened against prompt injection attacks and should only be used to review trusted PRs," but published no public advisory

1

. Google paid a $1,337 bounty and credited Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao for the discovery. GitHub initially dismissed the Copilot finding as a "known issue" it "could not reproduce" but ultimately paid a $500 bounty in March

2

.

None of the three vendors assigned CVEs or published public advisories. Without CVEs, vulnerability scanners cannot flag the issue. Without public advisories, security teams have no artifact to track. Guan emphasized that he "knows for sure that some of the users are pinned to a vulnerable version," and without published advisories, those users may never know they are vulnerable or under attack

1

.

Wider Implications for AI Agent Security

Guan believes the attack likely works on other agents that integrate with GitHub and GitHub Actions that allow access to tools and secrets, including Slack bots, Jira agents, email agents, and deployment automation agents

1

. The vulnerability exposes a fundamental weakness: large language models cannot reliably separate data from instructions. When an agent reads a GitHub issue, it treats the text as input to reason about, but a well-crafted prompt injection can make that input function as a command

2

.

This concern extends beyond GitHub integrations. Every data source feeding an AI agent's reasoning, whether email, calendar invites, Slack messages, or code comments, represents a potential attack vector. A systematic analysis of 78 studies published in January found that every tested coding agent, including Claude Code, GitHub Copilot, and Cursor, was vulnerable to prompt injection

2

. As organizations increasingly deploy AI agents with access to sensitive systems and data, the lack of transparency around these vulnerabilities creates significant security blind spots that attackers could exploit to steal sensitive data at scale.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo