3 Sources
3 Sources
[1]
New AI attack hides data-theft prompts in downscaled images
Researchers have developed a novel attack that steals user data by injecting malicious prompts in images processed by AI systems before delivering them to a large language model. The method relies on full-resolution images that carry instructions invisible to the human eye but become apparent when the image quality is lowered through resampling algorithms. Developed by Trail of Bits researchers Kikimora Morozova and Suha Sabi Hussain, the attack builds upon a theory presented in a 2020 USENIX paper by a German university (TU Braunschweig) exploring the possibility of an image-scaling attack in machine learning. When users upload images onto AI systems, these are automatically downscaled to a lower quality for performance and cost efficiency. Depending on the system, the image resampling algorithms could make an image lighter using nearest neighbor, bilinear, or bicubic interpolation. All of these methods introduce aliasing artifacts that allow for hidden patterns to emerge on the downscaled image if the source is specifically crafted for this purpose. In the Trail of Bits example, specific dark areas of a malicious image turn red, allowing hidden text to emerge in black when bicubic downscaling is used to process the image. The AI model interprets this text as part of the user's instructions and automatically combines it with the legitimate input. From the user's perspective, nothing seems off, but in practice, the model executed hidden instructions that could lead to data leakage or other risky actions. In an example involving Gemini CLI, the researchers were able to exfiltrate Google Calendar data to an arbitrary email address while using Zapier MCP with 'trust=True' to approve tool calls without user confirmation. Trail of Bits explains that the attack needs to be adjusted for each AI model according to the downscaling algorithm used in processing the image. However, the researchers confirmed that their method is feasible against the following AI systems: As the attack vector is widespread, it may extend well beyond the tested tools. Furthermore, to demonstrate their finding, the researchers also created and published Anamorpher (currently in beta), an open-source tool that can create images for each of the mentioned downscaling methods. The researchers argue that As mitigation and defense actions, Trail of Bits researchers recommend that AI systems implement dimension restrictions when users upload an image. If downscaling is necessary, they advise providing users with a preview of the result delivered to the large language model (LLM). They also argue that users explicit users' confirmation should be sought for sensitive tool calls, especially when text is detected in an image. "The strongest defense, however, is to implement secure design patterns and systematic defenses that mitigate impactful prompt injection beyond multi-modal prompt injection," the researchers say, referencing a paper published in June on design patterns for building LLMs that can resist prompt injection attacks.
[2]
Hackers can hide AI prompt injection attacks in resized images
A new method of hiding instructions for "AI" systems takes advantage of how images are compressed when uploaded. "AI" tools are all the rage at the moment, even among users who aren't all that savvy when it comes to conventional software or security -- and that's opening up all sorts of new opportunities for hackers and others who want to take advantage of them. A new research team has discovered a way to hide prompt injection attacks in uploaded images. A prompt injection attack is a way to hide instructions for an LLM or other "artificial intelligence" system, usually somewhere a human operator can't see them. It's the whispered "loser-says-what" of computer security. A great example is hiding a phishing attempt in an email in plain text that's colored the same as the background, knowing that Gemini will summarize the text even though the human recipient can't read it. A two-person Trail of Bits research team discovered that they can also hide these instructions in images, making the text invisible to the human eye but revealed and transcribed by an AI tool when an image is compressed for upload. Compression -- and the artifacts that come along with it -- are nothing new. But combined with the sudden interest in hiding plain text messages, it creates a new way to get instructions to an LLM without the user knowing those instructions have been sent. In the example highlighted by Trail of Bits and BleepingComputer, an image is delivered to a user, the user uploads the image to Gemini (or uses something like Android's built-in circle-to-search tool), and the hidden text in the image becomes visible as Google's backend compresses it before it's "read" to save on bandwidth and processing power. After being compressed, the prompt text is successfully injected, telling Gemini to email the user's personal calendar information to a third party. That's a lot of legwork to get a relatively small amount of personal data, and both the complete attack method and the image itself need to be tailored to the specific "AI" system that's being exploited. There's no evidence that this particular method was known to hackers before now or is being actively exploited at the time of writing. But it illustrates how a relatively innocuous action -- like asking an LLM "what is this thing?" with a screenshot -- could be turned into an attack vector.
[3]
New AI attack shows how images hide secret commands, letting hackers siphon private data directly from unsuspecting chatbot users
Bicubic interpolation can expose black text from specially crafted images As AI tools become more integrated into daily work, the security risks attached to them are also evolving in new directions. Researchers at Trail of Bits have demonstrated a method where malicious prompts are hidden inside images and then revealed during processing by large language models. The technique takes advantage of how AI platforms downscale images for efficiency, exposing patterns that are invisible in their original form but legible to the algorithm once resized. The idea builds on a 2020 paper from TU Braunschweig in Germany, which suggested that image scaling could be used as an attack surface for machine learning. Trail of Bits showed how crafted images could manipulate systems, including Gemini CLI, Vertex AI Studio, Google Assistant on Android, and Gemini's web interface. In one case, Google Calendar data was siphoned to an external email address without user approval, highlighting the real-world potential of the threat. The attack leverages interpolation methods like nearest neighbor, bilinear, or bicubic resampling. When an image is intentionally prepared, downscaling introduces aliasing artifacts that reveal concealed text. In a demonstration, dark areas shifted during bicubic resampling to display hidden black text, which the LLM then interpreted as user input. From the user's perspective, nothing unusual appears to happen. Yet behind the scenes, the model follows the embedded instructions along with legitimate prompts. To illustrate the risk, Trail of Bits created "Anamorpher," an open-source tool that generates such images for different scaling methods. This shows that while the approach is specialized, it could be repeated by others if defenses are lacking. The attack raises questions about trust in multimodal AI systems because many platforms now rely on them for routine work, and a simple image upload could potentially trigger unintended data access. The danger of identity theft arises if private or sensitive information is exfiltrated in this way. Because these models often link with calendars, communications platforms, or workflow tools, the risk extends into broader contexts. To mitigate this, users need to restrict input dimensions, preview downscaled results, and require explicit confirmation for sensitive tool calls. Traditional defenses like firewalls are not built to identify this form of manipulation, leaving a gap that attackers may eventually exploit. The researchers stress that only layered security suites and stronger design patterns can reliably limit such risks. "The strongest defense, however, is to implement secure design patterns and systematic defenses that mitigate impactful prompt injection beyond multimodal prompt injection," the researchers said.
Share
Share
Copy Link
Researchers have uncovered a novel attack method that hides malicious prompts in images, which become visible when processed by AI systems, potentially leading to unauthorized data access and theft.
Researchers from Trail of Bits have developed a new attack method that exploits how AI systems process images, potentially leading to unauthorized data access and theft. This technique, building upon a 2020 USENIX paper from TU Braunschweig, hides malicious prompts in images that become visible only when downscaled by AI systems
1
.Source: Bleeping Computer
The attack takes advantage of image resampling algorithms commonly used by AI systems for performance and cost efficiency. When users upload images, these are automatically downscaled, introducing aliasing artifacts that can reveal hidden patterns
1
. Specifically:In one example, dark areas of a malicious image turn red during bicubic downscaling, allowing hidden black text to emerge
2
.Source: PCWorld
The researchers confirmed this attack method's feasibility against several AI systems, including:
In a proof-of-concept, the researchers successfully exfiltrated Google Calendar data to an arbitrary email address using Zapier MCP with 'trust=True' to approve tool calls without user confirmation
3
.This attack vector is potentially widespread and may extend beyond the tested tools. It raises significant concerns about trust in multimodal AI systems, especially as these platforms become more integrated into daily work routines
3
. The risk of identity theft and unauthorized access to sensitive information is particularly concerning, given that many AI models link with calendars, communications platforms, and workflow tools.Related Stories
To address this vulnerability, Trail of Bits researchers recommend several measures:
1
.Source: TechRadar
To demonstrate their findings, the researchers created Anamorpher, an open-source tool currently in beta. This tool can create images for each of the mentioned downscaling methods, illustrating the potential for replication of this attack technique
3
.As AI technologies continue to evolve and integrate into various aspects of our digital lives, this research underscores the critical need for robust security measures and user awareness to protect against emerging threats in the AI landscape.
Summarized by
Navi
[1]