Particle.news

Download on the App Store

OpenAI Says Guess-Rewarding Evaluations Drive AI Hallucinations, Proposes Scoring Fix

The company urges revamping accuracy-based benchmarks to stop penalizing abstention.

Overview

  • In a paper released Thursday, OpenAI argues that common grading practices teach models to guess rather than express uncertainty, sustaining hallucinations.
  • The proposed remedy is to update widely used accuracy-based scoreboards so they discourage guessing and no longer dock models for refusing to answer when unsure.
  • OpenAI notes Anthropic's Claude more often withholds uncertain answers, though higher refusal rates can reduce practical utility.
  • Coverage reiterates that large language models optimize next-token prediction for plausibility, which can yield fluent but incorrect outputs.
  • User-level safeguards highlighted in reporting include asking for sources and dates, prompting the model to fact-check, and cross-checking responses with other LLMs.