Overview
- Researchers used GPDval 2025 to pit advanced models against professionals in nine major industries, with judges blinded to whether outputs came from humans or AI.
- AI win rates peaked at 81% for counter and rental clerks, 79% for sales managers, 76% for shipping and receiving clerks, and 75% for editors.
- Performance varied by model, with OpenAI’s GPT5-high averaging 48.8% wins, Anthropic’s Claude Opus 4.1 at 47.6%, and GPT-4o at 12.4%.
- Sector averages showed retail tasks beaten 56% of the time, wholesale 53%, and certain government roles 52%, while information-sector roles saw a 39% ceiling.
- OpenAI said top models are approaching expert-level quality yet emphasized that most jobs involve more than written tasks, as CEO Sam Altman warned of likely losses in customer support.