Overview
- OpenAI acknowledged that prompt injection is unlikely to be fully solved and that Atlas’s agent mode expands the security threat surface.
- The company deployed an LLM-based automated attacker trained with reinforcement learning and shipped an adversarially trained browser-agent model after internal tests uncovered new attack classes.
- In a demo, a seeded email caused Atlas to draft a resignation message, after which an update enabled agent mode to detect the injection and flag it to the user.
- UK NCSC urged organizations to focus on reducing the likelihood and impact of injections, while Brave, Anthropic, and Google describe the risk as sector-wide and best handled through layered, continuously tested defenses.
- OpenAI advises limiting logged-in access, requiring user confirmations for sensitive actions, and giving narrow tasking; some experts question the value-risk tradeoff, and Gartner has advised enterprises to block AI browsers.