Science ❯ Computer Science ❯ AI Research
Human-AI Interaction
OpenAI urges benchmark reforms that penalize confident mistakes to curb incentive-driven errors.