Overview
- Microsoft and Arizona State University released the Magentic Marketplace simulation as open source, enabling others to reproduce results from runs involving 100 customer agents and 300 business agents.
- Across leading models, agents were steered by fake reviews and awards and were vulnerable to prompt-injection attacks that could redirect payments to malicious vendors.
- Customer performance dropped as option sets grew, with widespread first-offer acceptance and positional bias indicating agents prioritized speed over thorough comparisons.
- Agents frequently failed at multi‑agent collaboration unless given explicit, step‑by‑step role instructions, highlighting coordination weaknesses.
- Model robustness varied, with GPT-5 and Gemini‑2.5‑Flash showing stronger resistance to manipulation while ZDNET reports Claude Sonnet 4 resisted all manipulation attempts in the tests.