Particle News: OpenAI’s New AI Models Deliver Advanced Reasoning but Struggle with Accuracy

Overview

OpenAI's o3 and o4-mini models outperform predecessors in tasks like coding, math, and multimodal reasoning, setting new benchmarks for advanced AI capabilities.
Internal and third-party tests reveal concerning hallucination rates, with o3 fabricating information in 33% of cases and o4-mini in 48%, more than doubling previous models' rates.
The regression in accuracy reverses years of progress, and OpenAI acknowledges that further research is needed to understand the causes behind the increased hallucinations.
Experts suggest reinforcement learning techniques and limited world knowledge in smaller models may contribute to the elevated fabrication rates.
OpenAI is exploring real-time web search integration as a potential solution to ground responses in verifiable facts and reduce hallucinations.