Overview
- Testing 10 models on SCONE-bench reproduced 207 of 405 real exploits with $550.1 million in simulated losses.
- On contracts created after model cutoffs, GPT-5, Claude Opus 4.5, and Sonnet 4.5 generated $4.6 million in simulated exploits.
- A zero-day sweep of 2,849 recent BNB Chain contracts uncovered two new flaws worth about $3,694 in simulated profit, with one method mirrored in a real-world attack days later.
- Operational costs were low as GPT-5 scanning averaged $1.22 per contract and $3,476 in total, with an estimated $109 net profit per successful zero-day identified.
- Researchers reported rapidly improving capability with exploit revenue roughly doubling every 1.3 months as token costs fell, and they released SCONE-bench publicly after sandboxed tests to accelerate defensive adoption.