6 Sources
6 Sources
[1]
OpenAI says AI browsers may always be vulnerable to prompt injection attacks | TechCrunch
Even as OpenAI works to harden its Atlas AI browser against cyberattacks, the company admits that prompt injections, a type of attack that manipulates AI agents to follow malicious instructions often hidden in web pages or emails, is a risk that's not going away any time soon -- raising questions about how safely AI agents can operate on the open web. "Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved'," OpenAI wrote in a Monday blog post detailing how the firm is beefing up Atlas's armor to combat the unceasing attacks. The company conceded that 'agent mode' in ChatGPT Atlas "expands the security threat surface." OpenAI launched its ChatGPT Atlas browser in October, and security researchers rushed to publish their demos, showing it was possible to write a few words in Google Docs that were capable of changing the underlying browser's behavior. That same day, Brave published a blog post explaining that indirect prompt injection is a systematic challenge for AI-powered browsers, including Perplexity's Comet. OpenAI isn't alone in recognizing that prompt-based injections aren't going away. The U.K.'s National Cyber Security Centre earlier this month warned that prompt injection attacks against generative AI applications "may never be totally mitigated," putting websites at risk of falling victim to data breaches. The U.K. government agency advised cyber professionals to reduce the risk and impact of prompt injections, rather than think the attacks can be "stopped." For OpenAI's part, the company said: "We view prompt injection as a long-term AI security challenge, and we'll need to continuously strengthen our defenses against it." The company's answer to this Sisyphean task? A proactive, rapid-response cycle that the firm says is showing early promise in helping discover novel attack strategies internally before they are exploited "in the wild." That's not entirely different from what rivals like Anthropic and Google have been saying: that to fight against the persistent risk of prompt-based attacks, defenses must be layered and continuously stress-tested. Google's recent work, for example, focuses on architectural and policy-level controls for agentic systems. But where OpenAI is taking a different tact is with its "LLM-based automated attacker." This attacker is basically a bot that OpenAI trained, using reinforcement learning, to play the role of a hacker that looks for ways to sneak malicious instructions to an AI agent. The bot can test the attack in simulation before using it for real, and the simulator shows how the target AI would think and what actions it would take if it saw the attack. The bot can then study that response, tweak the attack, and try again and again. That insight into the target AI's internal reasoning is something outsiders don't have access to, so, in theory, OpenAI's bot should be able to find flaws faster than a real-world attacker would. It's a common tactic in AI safety testing: build an agent to find the edge cases and test against them rapidly in simulation. "Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps," wrote OpenAI. "We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports." In a demo (pictured in part above), OpenAI showed how its automated attacker slipped a malicious email into a user's inbox. When the AI agent later scanned the inbox, it followed the hidden instructions in the email and sent a resignation message instead of drafting an out-of-office reply. But following the security update, "agent mode" was able to successfully detect the prompt injection attempt and flag it to the user, according to the company. The company says that while prompt injection is hard to secure against in a foolproof way, it's leaning on large-scale testing and faster patch cycles to harden its systems before they show up in real-world attacks. An OpenAI spokesperson declined to share whether the update to Atlas's security has resulted in a measurable reduction in successful injections, but says the firm has been working with third parties to harden Atlas against prompt injection since before launch. Rami McCarthy, principal security researcher at cybersecurity firm Wiz, says that reinforcement learning is one way to continuously adapt to attacker behavior, but it's only part of the picture. "A useful way to reason about risk in AI systems is autonomy multiplied by access," McCarthy told TechCrunch. "Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access," said McCarthy. "Many current recommendations reflect that tradeoff. Limiting logged-in access primarily reduces exposure, while requiring review of confirmation requests constrains autonomy." Those are two of OpenAI's recommendations for users to reduce their own risk, and a spokesperson said Atlas is also trained to get user confirmation before sending messages or making payments. OpenAI also suggests that users give agents specific instructions, rather than providing them access to your inbox and telling them to "take whatever action is needed." "Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place," per OpenAI. While OpenAI says protecting Atlas users against prompt injections is a top priority, McCarthy invites some skepticism as to the return on investment for risk-prone browsers. "For most everyday use cases, agentic browsers don't yet deliver enough value to justify their current risk profile," McCarthy told TechCrunch. "The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the tradeoffs are still very real."
[2]
Your AI browser can be hijacked by prompt injection, OpenAI just patched Atlas
OpenAI says an internal automated red team uncovered a new class of agent-in-browser attacks, prompting a security update with a newly adversarially trained model and stronger safeguards. OpenAI has shipped a security update to ChatGPT Atlas aimed at prompt injection in AI browsers, attacks that hide malicious instructions inside everyday content an agent might read while it works. Atlas's agent mode is built to act in your browser the way you would: it can view pages, click, and type to complete tasks in the same space and context you use. That also makes it a higher-value target, because the agent can encounter untrusted text across email, shared documents, forums, social posts, and any webpage it opens. Recommended Videos The company's core warning is simple. Hackers can trick the agent's decision-making by smuggling instructions into the stream of information it processes mid-task. A hidden instruction, big consequences OpenAI's post highlights how quickly things can go sideways. An attacker seeds an inbox with a malicious email that contains instructions written for the agent, not the human. Later, when the user asks Atlas to draft an out-of-office reply, the agent runs into that email during normal work and treats the injected instructions as authoritative. In the demo scenario, the agent sends a resignation letter to the user's CEO, and the out-of-office never gets written. If an agent is scanning third-party content as part of a legitimate workflow, an attacker can try to override the user's request by hiding commands in what looks like ordinary text. An AI attacker gets practice runs To find these failures earlier, OpenAI says it built an automated attacker model and trained it end-to-end with reinforcement learning to hunt for prompt-injection exploits against a browser agent. The goal is to pressure-test long, realistic workflows, not just force a single bad output. The attacker can draft a candidate injection, run a simulated rollout of how the target agent would behave, then iterate using the returned reasoning and action trace as feedback. OpenAI says privileged access to those traces gives its internal red team an advantage external attackers don't have. What to do with this now OpenAI frames prompt injection as a long-term security problem, more like online scams than a bug you patch once. Its approach is to discover new attack patterns, train against them, and tighten system-level safeguards. For users, you should use logged-out browsing when you can, scrutinize confirmations for actions like sending email, and give agents narrow, explicit instructions instead of broad "handle everything" prompts. If you're still curious what AI browsing can do, then go with browsers that ship updates that benefit you.
[3]
ChatGPT Atlas exploited with simple Google Docs tricks
OpenAI launched its ChatGPT Atlas AI browser in October, prompting security researchers to demonstrate prompt injection vulnerabilities via Google Docs inputs that altered browser behavior, as the company detailed defenses in a Monday blog post while admitting such attacks persist. Prompt injection represents a type of attack that manipulates AI agents to follow malicious instructions, often hidden in web pages or emails. OpenAI introduced ChatGPT Atlas during October, an AI-powered browser designed to operate with enhanced agent capabilities on the open web. On the launch day, security researchers published demonstrations revealing how entering a few words into Google Docs could modify the underlying browser's behavior. These demos highlighted immediate security concerns with the new product, showing practical methods to exploit the system through indirect inputs. Brave released a blog post on the same day as the launch, addressing indirect prompt injection as a systematic challenge affecting AI-powered browsers. The post specifically referenced Perplexity's Comet alongside other similar tools, underscoring that this vulnerability extends across the sector rather than being isolated to OpenAI's offering. Brave's analysis framed the issue as inherent to the architecture of browsers integrating generative AI functionalities. Earlier in the month, the U.K.'s National Cyber Security Centre issued a warning about prompt injection attacks targeting generative AI applications. The agency stated that such attacks "may never be totally mitigated," which places websites at risk of data breaches. The centre directed cyber professionals to focus on reducing the risk and impact of these injections, rather than assuming attacks could be completely stopped. This guidance emphasized practical risk management over expectations of total elimination. OpenAI's Monday blog post outlined efforts to strengthen ChatGPT Atlas against cyberattacks. The company wrote, "Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved.'" OpenAI further conceded that "agent mode" in ChatGPT Atlas "expands the security threat surface." The post positioned prompt injection as an ongoing concern comparable to longstanding web threats. OpenAI declared, "We view prompt injection as a long-term AI security challenge, and we'll need to continuously strengthen our defenses against it." Agent mode enables the browser's AI to perform autonomous actions, such as interacting with emails or documents, which inherently increases exposure to external inputs that could contain hidden instructions. This mode differentiates Atlas from traditional browsers by granting the AI greater operational latitude on users' behalf, thereby broadening potential entry points for manipulations. To address this persistent risk, OpenAI implemented a proactive, rapid-response cycle aimed at identifying novel attack strategies internally before exploitation occurs in real-world scenarios. The company reported early promise from this approach in preempting threats. This method aligns with strategies from competitors like Anthropic and Google, who advocate for layered defenses and continuous stress-testing in agentic systems. Google's recent efforts, for instance, incorporate architectural and policy-level controls tailored for such environments. OpenAI distinguishes its approach through deployment of an LLM-based automated attacker, a bot trained via reinforcement learning to simulate hacker tactics. This bot searches for opportunities to insert malicious instructions into AI agents. It conducts tests within a simulation environment prior to any real-world application. The simulator replicates the target AI's thought processes and subsequent actions upon encountering an attack, allowing the bot to analyze responses, refine its strategy, and iterate repeatedly. This internal access to the AI's reasoning provides OpenAI with an advantage unavailable to external attackers, enabling faster flaw detection. The technique mirrors common practices in AI safety testing, where specialized agents probe edge cases through rapid simulated trials. OpenAI noted that its reinforcement-learning-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps. The company added, "We also observed novel attack strategies that did not appear in our human red-teaming campaign or external reports." In a specific demonstration featured in the blog post, the automated attacker inserted a malicious email into a user's inbox. When Atlas's agent mode scanned the inbox to draft an out-of-office reply, it instead adhered to the email's concealed instructions and composed a resignation message. This example illustrated a multi-step deception spanning email processing and message generation, evading initial safeguards. Following a security update to Atlas, the agent mode identified the prompt injection attempt during inbox scanning and flagged it directly to the user. This outcome demonstrated the effectiveness of the rapid-response measures in real-time threat mitigation, preventing the harmful action from proceeding. OpenAI relies on large-scale testing combined with accelerated patch cycles to fortify systems against prompt injections before they manifest externally. These processes enable iterative improvements based on simulated discoveries, ensuring defenses evolve in tandem with potential threats.
[4]
OpenAI's ChatGPT Atlas Is Learning to Fight Prompt Injections from AI
The company says the battle against this attack will be long-term OpenAI called prompt injections "one of the most significant risks" and a "long-term AI security challenge" for artificial intelligence (AI) browsers with agentic capabilities, on Monday. The San Francisco-based AI giant highlighted how the cyberattack technique impacts its ChatGPT Atlas browser and shared a new approach to tackle it. The company is using an AI-powered attacker that simulates real-world prompt injection attempts to train browsers. OpenAI said the goal is not to eliminate the threat, but to continuously harden the system as new attack patterns emerge. OpenAI Is Using AI to Fight Against Prompt Injections Prompt injection is a technique where an attacker hides instructions using HTML tricks, such as zero font, white-on-white text, or out-of-margin text. This is hidden within normal-looking content that an AI agent is meant to read, such as a webpage, document, or snippet of text. When it processes that content, it may mistakenly treat the hidden instruction as a legitimate command, even though it was not issued by the user. It can then carry out malicious acts due to the access privilege of the AI browser. In a post, OpenAI explained that prompt injections can be direct, where an attacker clearly tries to override the model's instructions, or indirect, where malicious prompts are embedded inside otherwise normal content. Because ChatGPT Atlas reads and reasons over third-party webpages, it may encounter instructions that were never intended for it but are crafted to influence its behaviour. To address this, the AI giant has built an automated AI attacker, effectively a system that continuously generates new prompt injection attempts as a simulation. This attacker is used during training and evaluation to stress-test Atlas, exposing weaknesses before they are exploited outside the lab. OpenAI said this allows its teams to identify vulnerabilities faster and update defences more frequently than relying on manual testing alone. "Prompt injection, like scams and social engineering, is not something we expect to ever fully solve," OpenAI wrote in the post, adding that the challenge evolves as AI systems become more capable, gaining more permissions and the ability to take more actions. Instead, the company is focusing on layered defences, combining automated attacks, reinforcement learning and policy enforcement to reduce the impact of malicious instructions. The company said its AI attacker helps create a rapid feedback loop, where new forms of prompt injection discovered by the system can be used to immediately retrain and adjust Atlas. This mirrors how security teams respond to evolving threats on the web, where attackers constantly adapt to new safeguards. OpenAI did not claim that Atlas is immune to prompt injections. Instead, it framed the work as part of an ongoing effort to keep pace with a problem that changes alongside the technology itself. As AI browsers become more capable and more widely used, the company said sustained investment in automated testing and defensive training will be necessary to limit abuse.
[5]
OpenAI warns prompt injection attacks are a never-ending battle for AI browsers
OpenAI has warned that browsers with agentic AI will remain vulnerable to prompt injection attacks, requiring developers to monitor and secure them continuously. "Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully solved," OpenAI said in a blog post this week. The AI startup added that as browser agents start handling more tasks, they will become a higher-value target for adversarial attacks. OpenAI released its AI browser, ChatGPT Atlas, in October. It has built-in ChatGPT functionality, along with an Agent Mode that allows AI to perform tasks such as filling out forms, visiting websites, and shopping autonomously on behalf of users. Though the browser market remains dominated by Google Chrome (with a 71.22% share, as per Statcounter), AI companies like OpenAI and Perplexity have revived it with their own web browsers that offer GenAI and agentic AI capabilities. Recent reports indicate a shift in how users navigate search. Instead of scrolling through multiple websites, users are now seeking summaries, quick answers and comparisons. According to Salesforce, GenAI and AI agents influenced $14.2 billion in global online sales during Black Friday this year. Mckinsey estimates that half of consumers are already using AI-powered search, which is expected to drive $750 billion in revenue by 2028. Google has also added an AI Mode in Search and Chrome to enhance the search experience for users with summaries and back-and-forth conversations. Microsoft's Edge browser was one of the first to integrate GenAI chatbot with Bing search. However, with the rollout of new capabilities like agentic AI, web browsers are expected to see a surge in adversarial attacks. Agentic AI on browsers poses a significant security risk, as they often operate with a user's full privileges across authenticated sessions. This gives them access to sensitive data, including user's banking credentials, emails, social media accounts. Security researchers have also flagged vulnerabilities in a few of these new AI browsers. For instance, researchers at Brave web browser found that Perplexity's Comet AI assistant could not tell the difference between user commands and malicious instructions hidden within web pages. This poses a significant risk, as the assistant could execute malicious commands without the user's permission or knowledge. How prompt injection attacks work OpenAI regards prompt injection as a long-term security challenge. This attack is executed by hiding a secret instruction or prompt inside a webpage, email or document that the GenAI chatbot or agent will process to articulate its response or complete a task autonomously. These hidden commands trick the AI into ignoring the user's commands and its own guardrails and follow the attacker's orders instead. By following these injected commands, the AI agent can be asked to leak sensitive data belonging to a user or an organization. According to a September Gartner report, 32% of organizations have noticed prompt-injection attacks on GenAI applications in the last 12 months. Gartner estimates that by 2027 more than 40% of AI-related data breaches worldwide will be caused by malicious use of GenAI. Unlike traditional vulnerabilities in web browsers that target individual websites and use complex exploitation tactics, prompt injection attacks are much easier to carry out using natural language instructions. Palo Alto Networks' Unit 42 red team have simulated how agents with broadly scoped prompts or tool integrations can be manipulated to leak data, escalate privileges, and abuse connected systems. What is OpenAI doing to secure its AI browser The ChatGPT creator said that it has built an LLM-based automated attacker that has been trained to go after prompt injection attacks that target browser agents. The automated attacker has been trained using reinforcement learning to learn from its successes and failures, in the same way AI was trained to play chess. The attacker also uses a simulation loop where it sends a malicious prompt to an external simulator, which runs a counterfactual rollout of how the targeted agent would behave on facing the injection. This gives the automated attacker insight into its Chain of Thought, allowing it to understand why the attack worked or failed. OpenAI claims that its automated attacker marks a shift from a single-step failure detection to long-horizon workflow exploitation. It can steer an agent through complex workflows involving over 100 steps. This has allowed them to identify unique prompt injection attacks that evaded human red teaming teams and external literature. Like all attacks, OpenAI is aware that cybercriminals behind prompt injections will also adapt and find novel ways to trick the AI agents. OpenAI recommends users to limit logged-in access to the agent, carefully review confirmation requests, and give agents explicit instruction as broad prompts makes it easier for malicious commands to influence the agent.
[6]
OpenAI admits AI browsers may never fully escape prompt injection attacks
The company is using an AI "automated attacker" to find vulnerabilities faster than human testing ChatGPT maker OpenAI has acknowledged that among the most dangerous threats facing AI-powered browsers, prompt injection attacks, is unlikely to disappear, even after the company keeps on strengthening the security across its new Atlas AI browser. According to the blog post, while OpenAI is working to improve Atlas against cyberattacks, prompt injections remain a long-term challenge for AI agents operating on the open web. These attacks involve embedding hidden instructions within webpages, emails, or documents, which can lead AI agents to perform unintended or harmful actions. The company admitted that Atlas agent mode, which allows the AI to browse, read emails, and perform actions on a user's behalf, naturally raises security concerns. Since Atlas's release in October, many researchers have demonstrated how harmless text, including content placed within Google Docs, can manipulate browser behaviour. Similar concerns have been raised about other AI browsers, including Perplexity's tools. Also Read: Best TWS earphones under Rs 3,000 you can buy in 2025 Previously, the UK's National Cyber Security Centre warned that prompt injection attacks on generative AI systems might never be completely eliminated. Instead of aiming for total prevention, the agency advised organisations to focus on risk reduction and mitigating the impact of successful attacks. Instead of aiming for total prevention, the agency advised organisations to focus on risk reduction and mitigating the impact of successful attacks. To counter this, OpenAI stated that it is constantly testing and developing faster response cycles rather than one-time fixes. A key component of this effort is an internally developed automated attacker, which is an AI system trained to behave like a hacker using reinforcement learning. This tool makes repeated attempts to exploit Atlas in simulated environments, improving its attacks based on how the AI agent reacts. According to OpenAI, this approach has discovered attack techniques that were not detected during human-led security testing. Despite these improvements, OpenAI has not revealed whether the changes resulted in a measurable decrease in successful prompt injection attempts. The company claims it has been working with external partners on this issue prior to Atlas' public release and will continue to do so for a long time.
Share
Share
Copy Link
OpenAI has acknowledged that its ChatGPT Atlas browser will likely remain vulnerable to prompt injection attacks indefinitely. The company deployed an LLM-based automated attacker trained through reinforcement learning to identify exploits before they surface in real-world scenarios. Despite these efforts, OpenAI frames the challenge as comparable to fighting scams and social engineering—an ongoing battle rather than a solvable problem.
OpenAI has issued a stark warning about the future of AI-powered browsing: prompt injection attacks targeting ChatGPT Atlas and similar tools may never be fully eliminated. In a Monday blog post, the company stated that "prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved,'" while acknowledging that "agent mode" in ChatGPT Atlas "expands the security threat surface."
1
The admission comes as OpenAI works to strengthen defenses against an AI security challenge that evolves alongside the technology itself.
Source: CXOToday
Since launching ChatGPT Atlas in October, the AI browser has faced immediate scrutiny from security researchers who demonstrated how hidden malicious instructions embedded in Google Docs could manipulate AI behavior within minutes of release.
3
Prompt injection attacks work by concealing commands in web pages, emails, or documents that agentic AI systems process during routine tasks. These attacks can trick AI agents into ignoring user instructions and following attacker directives instead, potentially leading to data breaches or unauthorized actions performed with full user privileges.OpenAI isn't alone in acknowledging this persistent threat. The U.K.'s National Cyber Security Centre warned earlier this month that prompt injection attacks against generative AI applications "may never be totally mitigated," advising cyber professionals to focus on reducing risk and impact rather than expecting complete elimination.
1
Brave published analysis on the same day as ChatGPT Atlas's launch, identifying indirect prompt injection as a systematic challenge affecting AI-powered browsers including Perplexity's Comet.3
According to Gartner research, 32% of organizations have experienced prompt injection attacks on generative AI applications in the past 12 months, with projections suggesting that by 2027, more than 40% of AI-related data breaches worldwide will result from malicious use of generative AI.
5
The vulnerability poses particular risks for agentic AI systems that operate with authenticated user sessions, granting access to banking credentials, emails, and social media accounts.
Source: Digital Trends
To combat this evolving threat, OpenAI has developed an LLM-based automated attacker trained through reinforcement learning to simulate hacker tactics and discover vulnerabilities before exploitation occurs in the wild.
4
This bot tests attacks in simulation environments, analyzing how the target AI would think and what actions it would take upon encountering malicious inputs. The system then studies responses, refines attack strategies, and iterates repeatedly—a process that mirrors common practices in AI safety testing.OpenAI's automated attacker provides privileged access to the AI's internal reasoning that external attackers lack, theoretically enabling faster flaw detection.
2
The company reported that its reinforcement-learning-trained attacker can steer agents into executing sophisticated, long-horizon harmful workflows unfolding over tens or even hundreds of steps, discovering novel attack strategies that didn't appear in human red teaming campaigns or external reports.1
In one demonstration, the automated attacker inserted a malicious email into a user's inbox. When the AI agent scanned the inbox to draft an out-of-office reply, it instead followed the email's concealed instructions and composed a resignation message to the user's CEO.
2
Following a security update, agent mode successfully detected the prompt injection attempt and flagged it to the user, according to OpenAI.
Source: TechCrunch
Related Stories
OpenAI's approach aligns with strategies from competitors like Anthropic and Google, who advocate for layered defenses and continuous stress-testing in agentic systems. Google's recent work focuses on architectural and policy-level controls for agentic systems.
1
The company emphasizes faster patch cycles and large-scale testing to harden systems before vulnerabilities surface in real-world cyberattacks.Rami McCarthy, principal security researcher at cybersecurity firm Wiz, notes that "a useful way to reason about risk in AI systems is autonomy multiplied by access." He explains that agentic browsers occupy a challenging position with moderate autonomy combined with very high access, making safeguards like limiting logged-in access and requiring user confirmation for critical actions essential risk mitigation strategies.
1
OpenAI declined to share whether the security update resulted in measurable reductions in successful injections but confirmed ongoing collaboration with third parties to harden Atlas against prompt injection since before launch.
1
The company frames the work as part of sustained investment in automated testing and defensive training necessary as AI browsers become more capable and widely adopted, stating: "We view prompt injection as a long-term AI security challenge, and we'll need to continuously strengthen our defenses against it."4
Summarized by
Navi
[1]
[3]
1
Technology

2
Technology

3
Technology
