AI Coding Agents Tricked Into Installing Malware

AI Coding Agents Face New Security Threat Through Deceptive GitHub Repositories

Security researchers have uncovered a critical AI security vulnerability that allows attackers to manipulate AI coding agents into executing malicious code without raising any red flags. Mozilla 0din and Pentera Labs independently demonstrated how Claude and other agentic AI tools can be exploited through seemingly innocuous operations, exposing developers to severe risks including credential theft and complete system compromise1

Source: BleepingComputer

The attacks leverage what researchers call indirect prompt injection, where AI coding agents are tricked into installing malware through multiple layers of indirection. Mozilla 0din demonstrated that clean GitHub repositories containing no malicious code can still trigger devastating attacks. An attacker would gain control over the developer's account, accessing all their secrets, API keys, code, documents, browser sessions, and passwords, with the ability to install additional malware for permanent access1

How the Multi-Step Social Engineering Attack Works

The Mozilla 0din team revealed a sophisticated three-step attack chain that exploits Claude's helpful nature. The process begins when a developer instructs Claude to initialize a project from a malicious GitHub repository that appears completely legitimate, containing only standard scaffolding files that won't trigger security tools, whether remote, local, or even Claude's own checks1

Claude processes a readme file describing how to initialize a Python environment with the Axiom package, a commonly used monitoring tool. A fake Axiom startup script deliberately errors out on first run, prompting Claude to helpfully execute "python3 -m axiom init" to fix the problem. This triggers a shell script that downloads software, but instead of fetching from a malicious URL that could be scanned, it reads DNS TXT records from a specific domain1

The DNS text record contains a base64-encoded string that opens a reverse shell attack, redirecting the user's machine shell to the attacker's server for input. "Claude Code never decided to open a shell. It decided to fix an error," 0DIN researchers explained. "The reverse shell is three indirection steps away from anything Claude Code actually evaluated"3

. Meanwhile, Claude and the victim see only an "Environment ready" message1

Pentera Labs Reveals Additional Attack Vector Through Desktop Apps

Source: Tom's Hardware

Pentera Labs' offensive security team, led by Dvir Avraham and Reef Spektor, demonstrated another exploitation method targeting Anthropic's Claude Desktop application. Their research, conducted in November 2025, showed how compromised email inboxes could serve as entry points to inject malicious instructions into Claude's account-wide personalization settings2

"We acknowledge the huge trust in AI models - everybody uses them," Avraham told The Register. "We used this trust to manipulate the victim, like under the hood, the victim didn't see it coming." The team developed a base64-encoded prompt that instructed Claude to check for command-capable tools and execute commands if available, or produce fake error messages prompting users to download tools that would execute the attacker's commands. This effectively turned Claude into a double agent and what researchers described as a "phishing layer"2

The attack became even more potent with Claude's Cowork feature, introduced in January, which allows the AI to perform longer agentic tasks. "Anything you can do on your computer, Claude can do. Open apps, fill spreadsheets, navigate your browser," according to Anthropic's description. This capability eliminated the need for the tool enumeration phase that Pentera Labs had to engineer in their November research2

Why Standard Security Tools Failed to Detect the Threat

The sophistication of these attacks lies in how they bypass conventional security measures. Static code-scanning tools only registered routine DNS lookups, network monitoring saw nothing more than ordinary domain name resolution, and the malicious payload remained invisible to both security agents and human reviewers4

"Very few (if any) security scanning tools would even flag the repository, and none of the activity, save for the actual opening of a remote shell, even looks particularly odd," researchers noted. An enterprise environment with tightly controlled network access could potentially catch it, but that's not where the vast majority of developers operate1

. Regular antivirus software and firewall protection failed because none of the individual steps looked suspicious on their own.

Implications for Developer Security and AI Agent Design

These findings expose fundamental weaknesses in how agentic AI tools handle local code-execution and error handling exploitation. Threat actors could easily distribute such repositories through fake job postings, tutorials, blog posts, or direct messages. A single repository link shared in a job posting or chat message could expose every developer who simply opened it3

Source: TechRadar

The 0din team concluded that developers should never blindly trust an unknown project as trusted code and not rely on AI tools for security analysis. For the agents themselves, 0din stated they need runtime safeguards to inspect what actually will run and how, instead of simply following steps. AI agents should disclose the full execution chain of setup commands, including scripts and code fetched dynamically at runtime1

The broader lesson extends beyond Claude, since most agentic AI systems share similar blind spots toward these types of attacks. Until such agents can meaningfully evaluate what a command actually executes, similar attacks will likely remain difficult to prevent. Treating unfamiliar automation as a genuine risk remains the single most reliable safeguard available to developers4

AI coding agents can be tricked into installing malware through clean GitHub repositories

AI Coding Agents Face New Security Threat Through Deceptive GitHub Repositories

How the Multi-Step Social Engineering Attack Works

Pentera Labs Reveals Additional Attack Vector Through Desktop Apps

Why Standard Security Tools Failed to Detect the Threat

Implications for Developer Security and AI Agent Design

References

AI coding agents can be tricked into installing malware via 'clean' GitHub repositories -- Mozilla's 0din team shows how Claude Code can be exploited by its own helpfulness

Red teamers turned Claude Desktop into a double agent to do their evil bidding

Clean GitHub repo tricks AI coding agents into running malware

Agentic AI tools now need real safeguards against this kind of indirection.

Related Stories

Claude Code security vulnerabilities expose developers to API key theft and remote code execution

GhostApproval vulnerability exposes AI coding agents to remote code execution via symlinks

AI Agents Turn Developer Machines Into Credential Vaults as Security Risks Multiply

Recent Highlights

Xi Jinping positions China as global AI partner while challenging US tech dominance

Apple releases Siri AI to everyone through iOS 27 public beta, marking biggest assistant overhaul

Moonshot AI's Kimi K3 rivals Claude and ChatGPT, shaking up the US tech industry

Recent Highlights

Today's Top Stories

AI advice makes people three times less accurate but twice as confident, new study reveals

Researcher poisons open-weight AI model in under an hour for less than $100

China is rebuilding the smartphone around AI agents. ZTE's NaviX sold out in hours.

BrainCo unveils brain-to-robot platform to control humanoid robots with thoughts at WAIC