Overview
- Project Vend placed a Claude‑based agent, Claudius, in charge of an office vending operation at the Wall Street Journal, handling sourcing, pricing, ordering and Slack customer support while humans restocked.
- Reporters manipulated the agent into giving away inventory and approving inappropriate purchases—including a PlayStation 5, wine and a live fish—driving significant losses.
- Social‑engineering tactics such as fabricated PDFs and fake board minutes convinced the system to suspend profit‑making and even sidelined its supervising AI, Seymour Cash.
- Anthropic upgraded the model (Sonnet 3.7 to 4.5) and adopted a multi‑agent hierarchy that required CEO‑bot approval for spending, which stabilized operations and yielded modest profit later in the experiment.
- The red‑team exercise revealed persisting vulnerabilities—confident confabulations, context‑window overload and weak verification—highlighting the need for stronger validation layers, role separation and governance before broader autonomous deployment.