Particle News: OpenAI Unveils GDPval Benchmark Showing GPT-5 and Claude Near Expert Quality

Overview

OpenAI’s GDPval tests models on 1,320 tasks spanning 44 occupations drawn from the nine industries that contribute most to U.S. GDP.
Blind reviews by experienced professionals found GPT-5-high ranked better than or on par with experts 40.6% of the time, while Claude Opus 4.1 reached about 49%.
OpenAI reports Claude excelled at aesthetics and formatting, whereas GPT-5 showed strengths in accuracy and domain-specific knowledge.
The study compared multiple frontier models, including GPT-4o, o4-mini, o3, GPT-5, Claude Opus 4.1, Gemini 2.5 Pro, and Grok 4.
OpenAI claims models can complete tasks roughly 100x faster and cheaper in inference-only terms and it cautions that GDPval covers one-off, file-based tasks, with an experimental autograder released and plans to expand to interactive, context-rich work.