Particle.news

Download on the App Store

OpenAI Study Finds Frontier AI Beating Humans on Tasks Across 44 Occupations

A blinded task comparison reports high win rates in retail and sales roles, with OpenAI stressing the results do not equate to full job replacement.

Overview

  • Researchers used GPDval 2025 to pit advanced models against professionals in nine major industries, with judges blinded to whether outputs came from humans or AI.
  • AI win rates peaked at 81% for counter and rental clerks, 79% for sales managers, 76% for shipping and receiving clerks, and 75% for editors.
  • Performance varied by model, with OpenAI’s GPT5-high averaging 48.8% wins, Anthropic’s Claude Opus 4.1 at 47.6%, and GPT-4o at 12.4%.
  • Sector averages showed retail tasks beaten 56% of the time, wholesale 53%, and certain government roles 52%, while information-sector roles saw a 39% ceiling.
  • OpenAI said top models are approaching expert-level quality yet emphasized that most jobs involve more than written tasks, as CEO Sam Altman warned of likely losses in customer support.