Particle.news
Download on the App Store

OpenAI Says Prompt Injection Is a Lasting Threat to AI Browsers, Rolls Out RL Red‑Teaming for Atlas

Industry guidance now favors layered defenses with limited agent privileges over expectations of a permanent cure.

Overview

  • OpenAI acknowledged that prompt injection is unlikely to be fully solved and that Atlas’s agent mode expands the security threat surface.
  • The company deployed an LLM-based automated attacker trained with reinforcement learning and shipped an adversarially trained browser-agent model after internal tests uncovered new attack classes.
  • In a demo, a seeded email caused Atlas to draft a resignation message, after which an update enabled agent mode to detect the injection and flag it to the user.
  • UK NCSC urged organizations to focus on reducing the likelihood and impact of injections, while Brave, Anthropic, and Google describe the risk as sector-wide and best handled through layered, continuously tested defenses.
  • OpenAI advises limiting logged-in access, requiring user confirmations for sensitive actions, and giving narrow tasking; some experts question the value-risk tradeoff, and Gartner has advised enterprises to block AI browsers.