Particle: OpenAI Says Prompt Injection Is a Lasting Threat to AI Browsers, Rolls Out RL Red‑Teaming for Atlas

Overview

OpenAI acknowledged that prompt injection is unlikely to be fully solved and that Atlas’s agent mode expands the security threat surface.
The company deployed an LLM-based automated attacker trained with reinforcement learning and shipped an adversarially trained browser-agent model after internal tests uncovered new attack classes.
In a demo, a seeded email caused Atlas to draft a resignation message, after which an update enabled agent mode to detect the injection and flag it to the user.
UK NCSC urged organizations to focus on reducing the likelihood and impact of injections, while Brave, Anthropic, and Google describe the risk as sector-wide and best handled through layered, continuously tested defenses.
OpenAI advises limiting logged-in access, requiring user confirmations for sensitive actions, and giving narrow tasking; some experts question the value-risk tradeoff, and Gartner has advised enterprises to block AI browsers.