Overview
- In a paper released Thursday, OpenAI argues that common grading practices teach models to guess rather than express uncertainty, sustaining hallucinations.
- The proposed remedy is to update widely used accuracy-based scoreboards so they discourage guessing and no longer dock models for refusing to answer when unsure.
- OpenAI notes Anthropic's Claude more often withholds uncertain answers, though higher refusal rates can reduce practical utility.
- Coverage reiterates that large language models optimize next-token prediction for plausibility, which can yield fluent but incorrect outputs.
- User-level safeguards highlighted in reporting include asking for sources and dates, prompting the model to fact-check, and cross-checking responses with other LLMs.