3 Sources
[1]
New Attacks Trick OpenClaw AI Agent Into Running Code and Leaking Secrets
Two security teams have shown, in separate research published this week, that OpenClaw, the popular self-hosted AI agent, can be driven to run attacker-controlled code or hand over sensitive data through ordinary-looking inputs. Imperva buried instructions inside shared contacts, vCards, and location pins that the agent executed without the victim ever seeing them. Varonis built a test agent on the platform, gave it a mailbox full of synthetic business data, and watched a single plain email talk it into forwarding mock AWS keys and a fake customer export to an outside address. The flaw Imperva found is patched in OpenClaw 2026.4.23, so update if you run it. The phishing weakness Varonis found is not something a patch fixes; it comes down to limiting what the agent can do on its own. Different doors into the same room: the agent trusts what reaches it, and its access becomes the attacker's. Hidden commands in a shared contact Imperva researcher Yohann Sillam looked at how OpenClaw hands messaging data to the model behind it. The problem is in the plumbing. When the agent passes a shared contact, vCard, or location to the LLM, it flattens the object into the prompt text inline, with no boundary marking it as untrusted. The content the agent fetches from the web gets wrapped in an untrusted-content marker. Message objects do not. Only some fields travel to the model, and that is what the attack abuses. A shared contact sends just the name field, serialized as <contact: name, number>. The angle brackets are legal in a name, so the model cannot tell where the real name ends and an injected instruction begins. The contact name is truncated where it shows on screen, both on WhatsApp and in the receiving app, so the victim does not see the payload either. The same trick works through a vCard's full-name field, which WhatsApp supports natively, and through the label on a shared location pin. In Imperva's tests against Gemini 3.1 Pro (preview build), the hidden text told the agent to download and run a script from a server the researchers controlled. It did. A plain image with instructions buried in it failed, likely because that attack has been reported so often that models are now trained to resist it; the message-object route worked because models have seen far fewer examples of it. With OpenClaw's memory on by default, Imperva warns, a single piece of widely shared content carrying a hidden instruction could quietly compromise the agents that ingest it, if they are not sandboxed. Imperva disclosed the issue, and OpenClaw shipped a fix in version 2026.4.23 that moves contact names, vCard fields, and location labels out of the prompt body and into a separate untrusted-metadata channel. Imperva found the same flattening pattern in other personal AI assistants, so the underlying problem is not OpenClaw's alone. A normal email is enough Varonis Threat Labs came at OpenClaw from the social angle. In research led by Itay Yashar, the team built an agent called Pinchy on the platform, wired it to a Gmail inbox stocked with realistic but synthetic business clutter and mock secrets, and ran it through four phishing simulations on Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4. They draw a line between prompt injection, which hides instructions in data, and what they call agent phishing: a believable request that arrives through a normal channel and works because the agent acts before checking who sent it. The agent failed both exfiltration tests. In the first, a message posing as a team lead named Dan, sent from an outside Gmail address, asked for staging access during a fake production incident. Pinchy found the credentials and forwarded mock AWS IAM access keys, database connection strings, and SSH credentials in plaintext. The second pretext was softer: a routine-sounding request for the weekly customer export, supposedly for a QBR deck. The agent shipped out a synthetic dataset of 247 enterprise customers, contacts, and contract values included. Both failures happened under a strict profile that told the agent to verify senders first. The rule existed. Urgency beat it once, routine beat it the second time. The agent did better when the threat was technical rather than social. It interacted with a gift-card phishing page but withheld real credentials and eventually flagged it; the strict profile blocked the page outright. On a malicious OAuth consent screen dressed up as a timesheet app, it inspected the redirect target, judged it suspicious, and stopped before granting access. That is the split Varonis draws out: the agent is better than many people at spotting bad URLs and fake login portals, and worse at the social judgment that makes a human pause when a colleague suddenly asks for credentials at an odd hour. The drive to be helpful is the attack surface. Varonis says OpenAI Codex GPT-5.4 was more cautious than Gemini 3.1 Pro about entering or sending data to outside sites without confirmation, but both fell for the social pretexts. The weak spot behind both attacks Varonis maps both attacks onto what Simon Willison calls the lethal trifecta: an agent that can read private data, take in untrusted content, and send data back out. OpenClaw has all three, which is why a poisoned contact and a friendly email end in the same place. That trust boundary is not only a prompt problem; it shows up in OpenClaw's code as well. A separate InfoSec Write-ups analysis turned OpenClaw's past advisories into static-analysis rules, then used them to find five more flaws across the Slack, Discord, Matrix, Zalo, and Microsoft Teams channel extensions. All five were the same bug: the startup code resolved each channel's allowlist by mutable display name instead of a stable ID, so an attacker who renamed themselves to match an allowed user could slip onto the list and steer the agent. OpenClaw has patched them. OpenClaw ships with broad access to files, shells, and more than twenty messaging platforms, and it has drawn a steady run of earlier prompt-injection and data-exfiltration warnings since it launched late last year. The Dutch data protection authority took the strongest line: the Autoriteit Persoonsgegevens told users and organisations not to run OpenClaw on systems that hold sensitive data, citing data-breach and account-takeover risks. What to do about it Anyone running OpenClaw should update to 2026.4.23 or later for the message-object fix. The rest is architecture, not prompt wording, and Varonis lays out four controls. Treat the agent's instruction file as an enforced, version-controlled policy, not a suggestion. Outbound mail needs a gate: no first-time sends to unfamiliar addresses without approval, so a hijacked agent cannot relay phishing from a trusted account. Connector access should track the trust level of whatever triggered the task, so an inbox handling outside email cannot also read the whole CRM. And the riskiest actions, forwarding credentials or moving money, should wait for a human. Both teams land on the same mental model. Varonis frames it as treating the agent like a junior employee with system access and no instinct for what looks off, not as a security tool. Imperva gets there from the other direction, calling it an authenticated executor that trusts its inputs. The fixes on offer today are specific patches and guardrails. The harder problem is still open. An agent useful enough to act on your email and run your commands is, by design, one that trusts input and wants to help, and nobody has a general fix for that yet.
[2]
Researchers tricked an OpenClaw AI agent into leaking AWS keys and customer data with a phishing email
Varonis phished an OpenClaw email agent. It leaked AWS keys and a CRM export for 247 customers. It caught malicious URLs but failed on identity checks. Security researchers at Varonis built an OpenClaw email agent, connected it to a Gmail inbox with fake company data, and then phished it. The agent, dubbed Pinchy, handed over AWS credentials, database connection strings, and a customer export without verifying who was asking. It took a single impersonation email. The experiment tested whether AI agents fall for the same social engineering attacks that catch human employees. Varonis gave Pinchy access to Gmail, browser tools, and Google Workspace APIs. The inbox was seeded with fake but realistic internal data: AWS IAM keys, SSH credentials, CRM exports, internal communications, and calendar invites. They tested two configurations: a generic setup with standard productivity instructions, and a strict mode explicitly designed to detect phishing. They ran both through Gemini 3.1 Pro and GPT-5.4. The results were a split. When an attacker impersonated a team lead named "Dan" and claimed there was a production issue, Pinchy searched the inbox for staging credentials, found them, and forwarded them in plaintext. When the attacker requested a customer export, saying they were working remotely on a presentation, Pinchy retrieved and sent a CRM file containing names, contact details, and $1.28 million in monthly recurring revenue data for 247 enterprise customers. Both the generic and strict profiles failed these tests. "The verification step still collapsed when the request appeared operationally urgent," Varonis said. But Pinchy performed well against traditional technical phishing. When researchers sent a fake gift card email with a phishing link, the agent identified the page as malicious and blocked it. When they tried to sneak in a malicious Google OAuth application disguised as a timesheet platform, Pinchy inspected the redirect URL and stopped the authentication flow. The pattern is clear. AI agents are good at spotting shady URLs and malicious OAuth apps, the kind of threats with technical signatures. They fail when the attack relies on identity verification and contextual judgment, the kind of reasoning humans also struggle with but that organisations rely on to prevent social engineering. Varonis also noted a difference between models. Gemini 3.1 Pro showed "greater willingness to interact" before raising suspicion. GPT-5.4 was more cautious and less willing to provide sensitive information to external destinations without confirmation. Neither was reliable enough to trust with an inbox full of real credentials. The findings add to a growing body of evidence that AI agents connected to real systems create new attack surfaces that existing security tools do not cover. Varonis recommends that agents should be forced to verify sender identities before acting, prevented from emailing new external recipients without human approval, and given limited access to internal data. In other words, the same zero-trust principles organisations apply to human employees need to apply to their AI agents too.
[3]
OpenClaw AI agent tricked into phishing attacks, with user data compromised
* Varonis' "Pinchy" OpenClaw agent fell for identity‑based phishing despite strict settings * Models blocked malicious links/OAuth apps but granted sensitive access when requests felt urgent * Researchers say AI agents need enforced identity verification before acting Security researchers tested an OpenClaw email agent to see if it's naive enough to fall for the same phishing scams regular employees fall for and it succeeded. Or failed, depending on how you look at it. Cybersecurity researchers Varonis created an OpenClaw agent dubbed Pinchy, and connected it to a Gmail inbox, browser tools, and Google Workspace APIs. They populated the account with fake internal company data, AWS credentials, database credentials, CRM exports, internal communications, and Calendar invites, and then told Pinchy to monitor and process incoming emails. To simulate real-life scenarios as credibly as possible, they created two configurations: a generic one with standard productivity instructions, and a strict mode that should be aware of phishing and other email-borne scams. Varonis tested two models: Gemini 3.1 Pro, and GPT-5.4, and the results seem to be a mixed bag. Where the AI failed, and where it did good When the attacker impersonated a team lead and asked for access to the staging environment, Pinchy granted it. When the attacker requested a customer export, claiming to work remotely on a presentation, Pinchy complied. However, when they sent the agent a fake gift card email with a phishing link, it identified the page as malicious and blocked it. Also, when they tried to smuggle a malicious Google OAuth application as a timesheet platform Pinchy did the right thing and did not grant access. "Both Generic and Strict profiles failed because the verification step still collapsed when the request appeared operationally urgent," Varonis said about the first attack scenario. The conclusion is that AI is good at spotting shady URLs and malicious OAuth apps, but fails when it needs identity verification, or wider context. Varonis also threw a little shade Google's way, saying Gemini showed "greater willingness to interact", while GPT was more careful. The researchers said agents should be forced to verify sender identities before proceeding. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.
Share
Copy Link
Security researchers from Varonis and Imperva exposed critical vulnerabilities in OpenClaw, a popular self-hosted AI agent. Through phishing attacks and hidden commands, they tricked the agent into leaking AWS credentials and customer data. While the agent blocked malicious URLs, it failed identity verification tests, highlighting new attack surfaces as AI agents gain access to sensitive business systems.
Two independent security research teams have demonstrated that the OpenClaw AI agent can be manipulated into executing malicious code and leaking sensitive data through seemingly ordinary inputs. Imperva and Varonis published separate findings this week that reveal critical AI agent vulnerabilities in how these systems process untrusted data and respond to social engineering attacks
1
. The research matters because AI agents are increasingly deployed with access to corporate email, cloud infrastructure, and customer databases, creating new attack surfaces that traditional security tools fail to address.Imperva researcher Yohann Sillam discovered a vulnerability where hidden instructions could be embedded in shared contacts, vCards, and location pins that the OpenClaw AI agent would execute without the victim ever seeing them
1
. The flaw exists in how the agent passes messaging data to the underlying language model. When the agent processes a shared contact, it flattens the object into prompt text inline without marking it as untrusted content, unlike web-fetched content which gets wrapped in an untrusted-content marker.
Source: Hacker News
The attack exploits specific fields that travel to the model. A shared contact sends only the name field, serialized as <contact: name, number>. Since angle brackets are legal characters in a name, the model cannot distinguish where the legitimate name ends and an injected instruction begins
1
. The contact name gets truncated on screen in both WhatsApp and the receiving application, so victims never see the malicious payload.In tests against Gemini 3.1 Pro, Imperva's hidden text instructed the agent to download and run a script from a researcher-controlled server, which it did without hesitation
1
. A plain image with buried instructions failed, likely because models have been trained to resist that well-documented attack vector. The message-object route succeeded because models have encountered far fewer examples of this technique.With OpenClaw's memory enabled by default, Imperva warns that a single piece of widely shared content carrying hidden instructions could quietly compromise multiple agents that ingest it, assuming they lack proper sandboxing
1
. The vulnerability has been patched in OpenClaw version 2026.4.23, which moves contact names, vCard fields, and location labels into a separate untrusted-metadata channel1
.Varonis Threat Labs approached AI agent security from a different angle, testing whether AI agents fall victim to the same social engineering attacks that compromise human employees. Led by researcher Itay Yashar, the team built an agent called Pinchy on the OpenClaw platform, connected it to a Gmail inbox populated with realistic but synthetic business data, and subjected it to four phishing simulations using both Gemini 3.1 Pro and GPT-5.4
2
.The results exposed a critical gap in AI agent security. When an attacker impersonated a team lead named Dan and claimed a production incident required staging access, Pinchy searched the inbox for credentials and forwarded mock AWS IAM access keys, database connection strings, and SSH credentials in plaintext
2
. In a second test using a routine-sounding request for a weekly customer export supposedly needed for a presentation, the agent shipped out a synthetic dataset containing 247 enterprise customers with contact details and contract values totaling $1.28 million in monthly recurring revenue2
.Both failures occurred even under a strict security profile explicitly designed to verify senders before acting. According to Varonis, "the verification step still collapsed when the request appeared operationally urgent"
3
. The drive to be helpful became the primary attack surface.
Source: TechRadar
Related Stories
The Varonis research draws an important distinction between prompt injection, which hides instructions in data, and what they term agent phishing—a believable request arriving through normal channels that succeeds because the agent acts before verifying sender identity
1
. While AI agents trusting untrusted inputs proved vulnerable to identity-based phishing, they performed significantly better against technical threats.When researchers sent a fake gift card email with a phishing link, Pinchy identified the page as malicious and blocked it
2
. When presented with a malicious OAuth application disguised as a timesheet platform, the agent inspected the redirect URL, judged it suspicious, and stopped before granting access1
. This split reveals that AI agents excel at spotting malicious URLs and fake login portals but struggle with the contextual judgment that makes humans pause when colleagues request credentials at unusual times.Varonis also noted performance differences between models. Gemini 3.1 Pro showed "greater willingness to interact" with potentially suspicious requests before raising concerns, while GPT-5.4 demonstrated more caution and less willingness to provide sensitive information to external destinations without confirmation
2
. However, neither model proved reliable enough to trust with an inbox containing real credentials.The research findings carry immediate implications for organizations deploying AI agents with access to business systems. Imperva found the same data-flattening pattern in other personal AI assistants, indicating the underlying problem extends beyond OpenClaw
1
. While Imperva's discovered flaw has a patch, the phishing weakness Varonis identified cannot be fixed through software updates alone—it requires fundamental changes to what agents can do autonomously.Varonis recommends that organizations apply zero-trust principles to AI agents just as they do to human employees
2
. Specifically, agents should be forced to verify sender identities before taking action, prevented from emailing new external recipients without human approval, and given limited access to internal data through proper identity verification for AI agents.As AI agents become more deeply integrated into corporate workflows with access to email, cloud infrastructure, and customer databases, they create attack surfaces that existing security tools do not adequately cover. Organizations need to watch for agents acting on requests from unverified sources, monitor for sensitive data leak incidents, and implement controls that prevent malicious code execution even when requests appear operationally urgent. The challenge ahead involves balancing agent autonomy with security controls that prevent both technical exploits and social engineering attacks on AI systems.
Summarized by
Navi
[2]
30 Mar 2026•Technology

03 Mar 2026•Technology

29 May 2026•Technology

1
Technology

2
Business and Economy

3
Health
