Overview
- Google’s Gemini with Deep Think and OpenAI’s LLM each solved five of six International Math Olympiad problems under the standard 4.5-hour limit, reaching the unofficial gold-medal threshold.
- The latest models processed complex tasks directly in natural language without machine-readable preprocessing, improving on last year’s silver-level results.
- OpenAI posted its gold-level outcomes ahead of official validation, drawing public criticism from DeepMind CEO Demis Hassabis for preempting student results and expert review.
- IMO President Gregor Dolinar confirmed that correct mathematical proofs are valid regardless of authorship but emphasized the contest is not an AI benchmark.
- The experiment has reignited industry discussions over appropriate benchmarking standards and the ethics of early disclosure in AI performance reporting.