Overview
- Anthropic’s Claude Opus 4.1 outperformed humans on 47.6% of evaluated tasks, while OpenAI’s GPT5-high scored 38.8% under the GDPval blind-testing protocol.
- Retail showed the highest task vulnerability at 56%, followed by wholesale at 53% and selected public-sector functions at 52%, with information and media roles more resistant at up to 39%.
- Roles with the greatest exposure included counter clerks at 81%, sales managers at 79%, shipping and inventory staff at 76%, editors at 75%, and software developers at 70%.
- OpenAI stresses the results compare performance on specific tasks and do not predict immediate job losses, framing AI as a tool that could augment work quality and productivity.
- Complementary data point to actionable responses, with WEF reporting 83% of firms plan some automation as 69% expect new roles, and IBM linking generative AI training to roughly 20% higher productivity.