Particle News: OpenAI’s GDPval Benchmark Finds Claude and GPT-5 Near Expert Quality on Real-World Tasks

Overview

GDPval tests 1,320 realistic, file-based assignments across 44 occupations in nine top GDP industries using tasks authored by professionals averaging 14 years of experience.
Blind human graders evaluated outputs from multiple models and human experts, with an AI autograder released as an experimental research tool to supplement—not replace—human scoring.
TechCrunch reports GPT-5-high matched or beat experts 40.6% of the time and Claude Opus 4.1 49%, with OpenAI noting Claude’s edge in aesthetics and GPT-5’s strength in accuracy.
OpenAI says models completed tasks roughly 100× faster and 100× cheaper than experts, a figure limited to inference time and billing that excludes oversight, iteration, and integration costs.
Performance has advanced sharply since GPT-4o, and OpenAI plans to release subsets of GDPval tasks and expand the benchmark to interactive and context-rich workflows while cautioning against reading the results as evidence of imminent broad job replacement.