Science ❯ Computer Science ❯ AI Research ❯ Cognitive Science
OpenAI urges benchmark reforms that penalize confident mistakes to curb incentive-driven errors.