Particle.news

Download on the App Store

Google and OpenAI Claim Unofficial Gold at Math Olympiad, Prompting Benchmarking Debate

IMO organizers stress that AI’s gold-level performance is unofficial, with no formal endorsement or benchmark status.

Image
OpenAI und Gemini meistern Mathewettbewerb
Image

Overview

  • Google’s Gemini with Deep Think and OpenAI’s LLM each solved five of six International Math Olympiad problems under the standard 4.5-hour limit, reaching the unofficial gold-medal threshold.
  • The latest models processed complex tasks directly in natural language without machine-readable preprocessing, improving on last year’s silver-level results.
  • OpenAI posted its gold-level outcomes ahead of official validation, drawing public criticism from DeepMind CEO Demis Hassabis for preempting student results and expert review.
  • IMO President Gregor Dolinar confirmed that correct mathematical proofs are valid regardless of authorship but emphasized the contest is not an AI benchmark.
  • The experiment has reignited industry discussions over appropriate benchmarking standards and the ethics of early disclosure in AI performance reporting.