4 Sources
[1]
Anthropic, Google, Microsoft paid AI bug bounties - quietly
Researchers who found the flaws scored beer money bounties and warn the problem is probably pervasive Exclusive Security researchers hijacked three popular AI agents that integrate with GitHub Actions by using a new type of prompt injection attack to steal API keys and access tokens, and the vendors who run agents didn't disclose the problem. The researchers targeted Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot, then disclosed the flaws and received bug bounties from all three. But none of the vendors assigned CVEs or published public advisories, and this, according to researcher Aonan Guan, "is a problem." "I know for sure that some of the users are pinned to a vulnerable version," Guan said in an exclusive interview with The Register about how he and a team from Johns Hopkins University discovered this prompt injection pattern and pwned the agents. "If they don't publish an advisory, those users may never know they are vulnerable - or under attack." He said the attack probably works on other agents that integrate with GitHub, and GitHub Actions that allow access to tools and secrets, such as Slack bots, Jira agents, email agents, and deployment automation agents. Guan originally found the flaw in Claude Code Security Review. This is Anthropic's GitHub Action that uses Claude to analyze code changes and pull requests for vulnerabilities and other security issues. "It uses the AI agent to find vulnerabilities in the code - that's what the software is designed to do," Guan said. This made him curious about "the flow" - how user prompts flow into the agents, and then how they take action based on those prompts. I bypassed all of them It turns out that Claude, along with other AI agents in GitHub Actions, all use the same flow. The agent reads GitHub data - this includes pull request titles, issue bodies, and comments - processes it as part of the task context, and then takes actions. So Guan came up with a devious idea. If he could inject malicious instructions into this data being read by the AI, "maybe I can take over the agent and do whatever I want." It worked. Guan submitted a pull request and injected malicious instructions in the PR title - in this case, telling Claude to execute the whoami command using the Bash tool and return the results as a "security finding." Claude then executed the injected commands and embedded the output in its JSON response, which got posted as a pull request comment. After originally submitting this attack on HackerOne's bug bounty platform in October, Anthropic asked Guan if he could also use this technique to steal more sensitive data, such as GitHub access tokens or Anthropic's API key. Guan demonstrated that this prompt injection can also work to leak credentials. "The title is the payload, the bot's review comment is one place where the credentials show up," Guan said. "Attacker writes the title, reads the comment." It's also worth noting that, after leaking secrets, the attacker can change the PR title back to "fix typo," or something along those lines, then close the PR and delete the bot's message. In November, Anthropic paid Guan a $100 bug bounty, upgraded the critical severity from a 9.3 to 9.4, and updated a "security considerations" section in its documentation. "This action is not hardened against prompt injection attacks and should only be used to review trusted PRs," the docs state. "We recommend configuring your repository to use the 'Require approval for all external contributors' option to ensure workflows only run after a maintainer has reviewed the PR." After validating that this prompt injection worked with Claude Code, Guan worked with Johns Hopkins University researchers to verify similar attacks against other agents - starting with Google Gemini CLI action, which integrates Gemini into GitHub issue workflows, and GitHub Copilot Agent, which can be assigned GitHub issues and autonomously creates PRs. Spoiler alert: it worked. With Gemini, the researchers again started the attack with a malicious prompt injection title, and then added comments with escalating injections: Injecting a fake "trusted content section" after the real "additional content" allowed the researchers to override Gemini's safety instructions, and publish Gemini's API key as an issue comment. Google paid a $1,337 bounty, and credited Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao with finding and disclosing the flaw. Attacking the Microsoft-owned GitHub Copilot Agent proved to be a little trickier. It's an autonomous software engineering (SWE) agent that works in the background on GitHub's infrastructure and can autonomously creates PRs. In addition to the model-and-prompt-level defenses, such as those built into Claude and Gemini, GitHub added three runtime-level security layers: environment filtering, secret scanning, and a network firewall, to prevent credential theft. "I bypassed all of them," Guan said. Unlike the earlier two attacks, which only require putting a visible prompt into the PR title or issue comment, the Copilot one requires an attacker to inject malicious instructions in an HTML comment that GitHub's rendered Markdown makes invisible to humans. The victim, who can't see the hidden trigger, assigns the issue to the Copilot agent to fix. GitHub, after initially calling this a "known issue" that they "were unable to reproduce," ultimately paid a $500 bounty for this issue in March. In total, Guan and his fellow researchers demonstrated that attackers can use this prompt injection technique to steal Anthropic and Gemini API keys, multiple GitHub tokens, and "any other secret exposed in the GitHub Actions runner environment, including arbitrary user-defined repository or organization secrets the workflow has access to." Guan calls this type of prompt injection attacks "comment and control." It's a play on "command and control" because the entire attack runs inside GitHub - it doesn't require any external command-and-control infrastructure. Essentially, it allows the attacker to control GitHub data by injecting a prompt into pull request titles, issue bodies, and issue comments. The AI agents running in GitHub Actions process the data, execute the commands, and then leak credentials through GitHub itself. In research shared with The Register ahead of publication, Guan says there's a "critical distinction" between comment-and-control prompt injection and classic indirect prompt injection. The latter, he explains, "is reactive: the attacker plants a payload in a webpage or document and waits for a victim to ask the AI to process it ('summarize this page,' 'review this file'). Comment and Control is proactive: GitHub Actions workflows fire automatically" on pull request titles, issue bodies, and issue comments. "So simply opening a PR or filing an issue can trigger the AI agent without any action from the victim," he wrote, adding that the Copilot attack is a "partial exception: a victim must assign the issue to Copilot, but because the malicious instructions are hidden inside an HTML comment, the assignment happens without the victim ever seeing the payload." He told us that these attacks illustrate how even models with prompt-injection prevention built in "can still be bypassed in the end." The solution? Think of prompt injection as phishing, but for machines instead of humans, and treat AI agents much like human employees. "Follow the need-to-know protocol," Guan said. For example, if a code review agent doesn't need bash execution, don't give it this tool. Use allow lists to let the agent access only what's required to do its job. Similarly, if its job is summarizing issues, it doesn't need credentials for GitHub write access. "Treat agents as a super-powerful employee," Guan told us. "Only give them the tools that they need to complete their task. ®
[2]
Google Patches Antigravity IDE Flaw Enabling Prompt Injection Code Execution
Cybersecurity researchers have discovered a vulnerability in Google's agentic integrated development environment (IDE), Antigravity, that could be exploited to achieve code execution. The flaw, since patched, combines Antigravity's permitted file-creation capabilities with an insufficient input sanitization in Antigravity's native file-searching tool, find_by_name, to bypass the program's Strict Mode, a restrictive security configuration that limits network access, prevents out-of-workspace writes, and ensures all commands are being run within a sandbox context. "By injecting the -X (exec-batch) flag through the Pattern parameter [in the find_by_name tool], an attacker can force fd to execute arbitrary binaries against workspace files," Pillar Security researcher Dan Lisichkin said in an analysis. "Combined with Antigravity's ability to create files as a permitted action, this enables a full attack chain: stage a malicious script, then trigger it through a seemingly legitimate search, all without additional user interaction once the prompt injection lands." The attack takes advantage of the fact that the find_by_name tool call is executed before any of the constraints associated with Strict Mode are enforced and is instead interpreted as a native tool invocation, leading to arbitrary code execution. While the Pattern parameter is designed to accept a filename search pattern to trigger a file and directory search using fd through find_by_name, it's undermined by a lack of strict validation, passing the input directly to the underlying fd command. An attacker could, therefore, leverage this behavior to stage a malicious file and inject malicious commands into the Pattern parameter to trigger the execution of the payload. "The critical flag here is -X (exec-batch). When passed to fd, this flag executes a specified binary against each matched file," Pillar explained. "By crafting a Pattern value of -Xsh, an attacker causes fd to pass matched files to sh for execution as shell scripts." Alternatively, the attack can be initiated via an indirect prompt injection without having to compromise a user's account. In this approach, an unsuspecting user pulls a seemingly harmless file from an untrusted source that contains hidden attacker-controlled comments instructing the artificial intelligence (AI) agent to stage and trigger the exploit. Following responsible disclosure on January 7, 2026, Google addressed the shortcoming as of February 28. "Tools designed for constrained operations become attack vectors when their inputs are not strictly validated," Lisichkin said. "The trust model underpinning security assumptions, that a human will catch something suspicious, does not hold when autonomous agents follow instructions from external content." The findings coincide with the discovery of a number of now-patched security flaws in various AI-powered tools - * Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent have been found vulnerable to prompt injection via GitHub comments, allowing an attacker to turn pull request (PR) titles, issue bodies, and issue comments into attack vectors for API key and token theft. The prompt injection attack has been codenamed Comment and Control, as it weaponizes an AI agent's elevated access and its ability to process untrusted user input to execute malicious instructions. * "The pattern likely applies to any AI agent that ingests untrusted GitHub data and has access to execution tools in the same runtime as production secrets -- and beyond GitHub Actions, to any agent that processes untrusted input with access to tools and secrets: Slack bots, Jira agents, email agents, deployment automation," security researcher Aonan Guan said. "The injection surface changes, but the pattern is the same." * Another vulnerability in Claude Code, discovered by Cisco, is capable of poisoning the coding agent's memory and maintaining persistence across every project and every session, even after a system reboot. The attack essentially utilizes a software supply chain attack as an initial access vector to launch a malicious payload that can tamper with the model's memory files for malicious purposes (e.g., framing insecure practices as necessary architectural requirements) and appends a shell alias to the user's shell configuration. * AI code editor Cursor has been found susceptible to a critical living-off-the-land (LotL) vulnerability chain dubbed NomShub that makes it possible for a malicious repository to clandestinely hijack a developer's machine by leveraging a mix of indirect prompt injection, a command parser sandbox escape via shell builtins like export and cd, and Cursor's built-in remote tunnel, granting the attacker persistent, undetected shell access simply upon opening the repository in the IDE. * Once persistent access is obtained, the attacker can connect to the machine without triggering the prompt injection again or raising any security alerts. Because Cursor is a legitimate binary that's signed and notarized, the adversary has unfettered access to the underlying host, gaining full file system access and command execution capabilities. * "A human attacker would need to chain together multiple exploits and maintain persistent access," Straiker researchers Karpagarajan Vikkii and Amanda Rousseau said. "The AI agent does this autonomously, following the injected instructions as if they were legitimate development tasks." * A novel attack called ToolJack has been found to allow a local attacker to manipulate an AI agent's perception of its environment and corrupts the tool's ground truth to produce unintended downstream effects, including poisoned data, fabricated business intelligence, and bogus recommendations. * "Where MCP Tool Shadowing poisons tool descriptions to influence agent behavior across servers and ConfusedPilot contaminates a RAG retrieval pool, ToolJack operates as a real-time infrastructure attack on the communication conduit itself," Preamble researcher Jeremy McHugh said. "It does not wait for the agent to organically encounter poisoned data. It synthesizes a fabricated reality mid-execution, demonstrating that compromising the protocol boundary yields control over the agent's entire perception." * Severe indirect prompt injection vulnerabilities have been identified in Microsoft Copilot Studio (aka ShareLeak or CVE-2026-21520, CVSS score: 7.5) and Salesforce Agentforce (aka PipeLeak) that could enable attackers to exfiltrate sensitive data through an external SharePoint form or a simple lead from a form submission, respectively. * "The attack exploits the lack of input sanitization and inadequate separation between system instructions and user-supplied data," Capsule Security researcher Bar Kaduri said about CVE-2026-21520. PipeLeak is similar to ForcedLeak in that the system processes public-facing lead form inputs as trusted instructions, thus allowing an attacker to embed malicious prompts that override the agent's intended behavior. * A trio of vulnerabilities have been identified in Claude that, when chained together in an attack codenamed Claudy Day, allow an attacker to silently hijack a user's chat session and exfiltrate sensitive data with a single click. The attack pipeline requires no additional integrations, tools, or Model Context Protocol (MCP) servers. * The attack works by embedding hidden instructions in a crafted Claude URL ("claude[.]ai/new?q=..."), encapsulating it in an open redirect on claude[.]com to make it appear legitimate, and then running it as a benign-looking Google ad that, when clicked, triggers the attack by silently redirecting the victim to the crafted "claude[.]ai/new?q=..." URL containing the invisible prompt injection. * "Combined with Google Ads, which validates URLs by hostname, this allowed an attacker to place a search ad displaying a trusted claude.com URL that, when clicked, silently redirected the victim to the injection URL. Not a phishing email. A Google search result, indistinguishable from the real thing," Oasis Security said. In research published last week, Manifold Security also revealed how a Claude-powered GitHub Actions workflow ("claude-code-action") can be tricked into approving and merging a pull request containing malicious code with just two Git configuration commands by spoofing a trusted developer's identity. At its core, the attack entails setting Git's user.name and user.email properties to those of a well-known developer (in this case, AI researcher Andrej Karpathy). This metadata trickery becomes a problem when an AI system treats it as a signal of trust. An attacker could exploit this unverified metadata to deceive the AI agent into executing unintended actions. "On the first submission, Claude flagged the PR for manual review, noting that author reputation alone wasn't sufficient justification," researchers Ax Sharma and Oleksandr Yaremchuk said. "Reopening and resubmitting the same PR led to its approval. The AI overrode its own better judgment on retry. This non-determinism is the point. You cannot build a security control on a system that changes its mind."
[3]
Anthropic, Google, and Microsoft paid AI agent bug bounties, then kept quiet about the flaws
In short:Security researcher Aonan Guan hijacked AI agents from Anthropic, Google, and Microsoft via prompt injection attacks on their GitHub Actions integrations, stealing API keys and tokens in each case. All three companies paid bug bounties quietly, $100 from Anthropic, $500 from GitHub, an undisclosed amount from Google, but none published public advisories or assigned CVEs, leaving users on older versions unaware of the risk. Security researchers have demonstrated that AI agents from Anthropic, Google, and Microsoft can be hijacked through prompt injection attacks to steal API keys, GitHub tokens, and other secrets, and all three companies quietly paid bug bounties without publishing public advisories or assigning CVEs. The vulnerabilities, disclosed by researcher Aonan Guan over several months, affect AI tools that integrate with GitHub Actions: Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent. Each tool reads GitHub data, including pull request titles, issue bodies, and comments, processes it as task context, and then takes actions. The problem is that none of them reliably distinguish between legitimate content and injected instructions. The core technique is indirect prompt injection. Rather than attacking the AI model directly, the researcher embedded malicious instructions in places the agents were designed to trust: PR titles, issue descriptions, and comments. When the agent ingested that content as part of its workflow, it executed the injected commands as though they were legitimate instructions. Against Anthropic's Claude Code Security Review, which scans pull requests for vulnerabilities, Guan crafted a PR title containing a prompt injection payload. Claude executed the embedded commands and included the output, including leaked credentials, in its JSON response, which was then posted as a PR comment for anyone to read. The attack could exfiltrate the Anthropic API key, GitHub access tokens, and other secrets exposed in the GitHub Actions runner environment. The Gemini attack followed a similar pattern. By injecting a fake "trusted content section" after legitimate content in a GitHub issue, Guan overrode Gemini's safety instructions and tricked the agent into publishing its own API key as an issue comment. Google's Gemini CLI Action, which integrates Gemini into GitHub issue workflows, treated the injected text as authoritative. The Copilot attack was subtler. Guan hid malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in the rendered Markdown that humans see but fully visible to the AI agent parsing the raw content. When a developer assigned the issue to Copilot Agent, the bot followed the hidden instructions without question. What happened next is as revealing as the vulnerabilities themselves. Anthropic received Guan's submission on its HackerOne bug bounty platform in October 2025. The company asked whether the technique could also steal more sensitive data such as GitHub tokens, confirmed it could, and in November paid a $100 bounty while upgrading the critical severity rating from 9.3 to 9.4. Anthropic updated a "security considerations" section in its documentation but did not publish a public advisory or assign a CVE. GitHub initially dismissed the Copilot finding as a "known issue" that it "could not reproduce," but ultimately paid a $500 bounty in March. Google paid an undisclosed amount for the Gemini vulnerability. None of the three vendors assigned CVEs or published advisories that would alert users pinned to vulnerable versions. For Guan, this is the crux of the problem. Users running older versions of these AI agent integrations may never learn they are exposed. Without a CVE, vulnerability scanners will not flag the issue. Without an advisory, security teams have no artefact to track. The attacks exploit a fundamental weakness in how AI agents process context. Large language models cannot reliably separate data from instructions. When an agent reads a GitHub issue, it treats the text as input to reason about, but a well-crafted prompt injection can make that input function as a command. Every data source that feeds an AI agent's reasoning, whether it is an email, a calendar invite, a Slack message, or a code comment, is a potential attack vector. This is not a theoretical concern. In January 2026, researchers from Miggo Security demonstrated that Google Gemini could be weaponised through calendar invitations containing hidden instructions. Days later, the "Reprompt" attack against Microsoft Copilot showed that injected prompts could hijack entire user sessions. Anthropic's own Git MCP server was found to harbour three CVEs that allowed attackers to inject backdoors through repositories the server processed. A systematic analysis of 78 studies published in January found that every tested coding agent, including Claude Code, GitHub Copilot, and Cursor, was vulnerable to prompt injection, with adaptive attack success rates exceeding 85%. The supply chain dimension makes it worse. A security audit of nearly 4,000 agent skills on the ClawHub marketplace found that more than a third contained at least one security flaw, and 13.4% had critical-level issues. When AI agents pull in third-party tools and data sources with the same level of trust they extend to their own instructions, a single compromised component can cascade across an entire development pipeline. The vendors' reluctance to publish advisories reflects an uncomfortable reality: there is no established framework for disclosing AI agent vulnerabilities. Traditional software bugs get CVEs, patches, and coordinated disclosure timelines. Prompt injection flaws sit in a grey zone. They are not bugs in the code so much as emergent behaviours of the model, and the mitigations, stronger system prompts, input sanitisation, output filtering, are partial at best. But the consequences are indistinguishable from those of a conventional security flaw. An attacker who exfiltrates a GitHub token through a prompt injection can do exactly the same damage as one who exploits a buffer overflow. The argument that AI safety requires new frameworks does not excuse the absence of disclosure for vulnerabilities that are already being exploited in the wild. Zenity Labs research published this month found that most agent-building frameworks, including those from OpenAI, Google, and Microsoft, lack appropriate guardrails, putting the burden of managing risk on the companies deploying them. In one documented case, attackers manipulated an AI procurement agent's memory so it believed it had authority to approve purchases up to $500,000, when the real limit was $10,000. The agent approved $5 million in fraudulent purchase orders before anyone noticed. For organisations that have integrated AI agents into their CI/CD pipelines, the message is stark. These tools are powerful precisely because they have access to sensitive systems and data. That same access makes them high-value targets, and the industry has not yet built the disclosure infrastructure to match the risk.
[4]
Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it
A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic's Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google's Gemini CLI Action and GitHub's Copilot Agent (Microsoft). No external infrastructure required. Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it "Comment and Control." GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed. Per Guan's disclosure timeline: Anthropic classified it as CVSS 9.4 Critical ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 amount is notably low relative to the CVSS 9.4 rating; Anthropic's HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through GitHub Security Advisories as of Saturday. Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic's own system card acknowledged is "not hardened against prompt injection." The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath OpenAI's safeguard layer at the agent runtime, based on what their system card does not document -- not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect. OpenAI and Google did not respond for comment by publication time. "At the action boundary, not the model boundary," Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when asked where protection actually needs to sit. "The runtime is the blast radius." What the system cards tell you Anthropic's Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is "not hardened against prompt injection." The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card's scope -- Claude Code Auto Mode, for example, applies runtime-level protections -- but the system card itself does not document these runtime safeguards or their coverage. OpenAI's GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found. Google's Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a VentureBeat review of the card found. Google's Automated Red Teaming program remains internal only. No external cyber program. Baer offered specific procurement questions. "For Anthropic, ask how safety results actually transfer across capability jumps," she told VentureBeat. "For OpenAI, ask what 'trusted' means under compromise." For both, she said, directors need to "demand clarity on whether safeguards extend into tool execution, not just prompt filtering." Seven threat classes neither safeguard approach closes Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead. OpenAI's GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure. Eligibility requirements for Anthropic's Cyber Verification Program and OpenAI's Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic's CVP is designed for authorized offensive security research -- removing cyber safeguards for vetted actors -- and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1-3 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5-7 with PR.DS (Data Security). Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic's models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets. What to do before your next vendor renewal "Don't standardize on a model. Standardize on a control architecture," Baer told VentureBeat. "The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture." Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program) Audit every runner for secret exposure. Run grep -r 'secrets\.' .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation) Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs) Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation) Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own. Add "AI agent runtime" to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor's security contact. Do not wait for CVEs. None have come yet for this class of vulnerability. Check which hardened GitHub Actions mitigations you already have in place. Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide) Prepare one procurement question per vendor before your next renewal. Write one sentence: "Show me your quantified injection resistance rate for the model version I run on the platform I deploy to." Document refusals for EU AI Act high-risk compliance. The deadline is August 2026. "Raw zero-days aren't how most systems get compromised. Composability is," Baer said. "It's the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you've already done most of the attacker's work for them."
Share
Copy Link
Security researchers exploited prompt injection vulnerabilities in AI agents from Anthropic, Google, and Microsoft, stealing API keys through GitHub Actions integrations. All three companies paid bug bounties ranging from $100 to $1,337 but issued no CVEs or public advisories, leaving users on older versions exposed to potential attacks.
Security researcher Aonan Guan, working with colleagues at Johns Hopkins University, successfully hijacked three popular AI agents by exploiting a prompt injection vulnerability that allowed them to steal API keys and access tokens. The affected tools—Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot Agent—all integrate with GitHub Actions and process user-submitted content as part of their workflow
1
.The attack, dubbed Comment and Control, works by embedding malicious instructions in places the AI agents are designed to trust: pull request titles, issue bodies, and comments
3
. When these AI agents ingest this content as task context, they execute the injected commands as though they were legitimate instructions, demonstrating a fundamental weakness in how AI agents process context and distinguish data from commands.
Source: VentureBeat
Guan originally discovered the flaw in Claude Code Security Review, Anthropic's GitHub Action that analyzes code changes for vulnerabilities. He submitted a pull request with malicious instructions embedded in the PR title, instructing Claude to execute the whoami command using the Bash tool and return results as a "security finding." Claude then executed the injected commands and posted the output, including leaked credentials, in its JSON response as a pull request comment
1
.After validating this prompt injection worked with Claude Code, Guan and his team verified similar attacks against other agents. With Gemini CLI Action, researchers injected a fake "trusted content section" after legitimate content, overriding Gemini's safety instructions and forcing it to publish its own API key as an issue comment
2
. The GitHub Copilot Agent attack proved subtler—Guan hid malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in rendered Markdown but fully visible to the AI agent parsing raw content3
.
Source: Hacker News
All three vendors paid bug bounties after researchers disclosed the flaws through proper channels. Anthropic paid $100 in November after upgrading the critical severity rating from 9.3 to 9.4 on the CVSS scale. Google paid a $1,337 bounty, crediting Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao. GitHub initially dismissed the Copilot finding as a "known issue" but ultimately paid a $500 bounty in March
4
.However, none of the three vendors assigned CVEs or published security advisories that would alert users to the vulnerabilities. According to Guan, "If they don't publish an advisory, those users may never know they are vulnerable—or under attack"
1
. This lack of transparency leaves users running older versions of these AI agent integrations exposed, as vulnerability scanners won't flag the issue without a CVE, and security teams have no artifact to track without an advisory3
.
Source: The Register
Related Stories
The theft of API keys through these prompt injection attacks highlights a pervasive problem in AI agent architecture. Guan warns that the attack pattern likely applies to any AI agent that ingests untrusted GitHub data and has access to execution tools in the same runtime as production secrets—and beyond GitHub Actions, to any agent that processes untrusted input with access to tools and secrets, including Slack bots, Jira agents, email agents, and deployment automation
2
.Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, emphasized where protection needs to sit: "At the action boundary, not the model boundary. The runtime is the blast radius"
4
. The vulnerabilities exploit how GitHub Actions workflows using pull_request_target, which most AI agent integrations require for secret access, inject secrets into the runner environment.Anthropic updated its documentation after the disclosure to clarify that Claude Code Security Review "is not hardened against prompt injection attacks and should only be used to review trusted PRs," recommending users configure repositories to "Require approval for all external contributors"
1
. However, this guidance addresses only one specific implementation and doesn't solve the underlying issue of insufficient input sanitization that allows AI agents to treat user-controlled data as executable instructions.The lack of published runtime metrics and security safeguards in system cards represents a transparency gap that prevents procurement teams from verifying what they cannot measure
4
. Organizations deploying AI agents need to demand clarity on whether safeguards extend into tool execution and arbitrary code execution prevention, not just prompt filtering at the model level. Without CVEs to track these prompt injection vulnerabilities, security teams must actively monitor vendor documentation and assume that any AI agent processing external content could leak sensitive information through similar attack vectors.Summarized by
Navi
[1]
[3]
23 Dec 2025•Technology

31 Mar 2026•Technology

08 Jan 2026•Technology

1
Health

2
Technology

3
Technology
