Overview
- OpenAI's o3 and o4-mini models, launched on April 16, 2025, exhibit hallucination rates of 33% and 48% respectively on internal benchmarks, doubling rates of previous models.
- The company has publicly acknowledged it does not yet understand why hallucinations have worsened and has called for further research into the issue.
- Independent tests by Transluce highlighted fabricated actions in o3’s reasoning process, underscoring the scope of the problem.
- Experts warn that the increased hallucination rates could hinder adoption in enterprise contexts where accuracy is critical, despite the models’ superior coding performance.
- OpenAI is exploring web search integration as a potential strategy to mitigate hallucinations, citing prior success with GPT-4o’s enhanced accuracy on similar tasks.