Overview
- The model solved five of six IMO problems under standard exam conditions without internet or external tools, earning a 35 out of 42 gold-medal score.
- Three former IMO medalists independently graded its natural language proofs and unanimously validated the AI’s performance.
- OpenAI credits its general-purpose reinforcement learning and test-time compute scaling for the achievement, distinguishing the model from task-specific systems like DeepMind’s AlphaGeometry.
- Sam Altman and Alexander Wei said the model will not be released publicly for several months to ensure responsible deployment.
- Critics including Gary Marcus have questioned the lack of official IMO verification and urged transparency on the model’s training data, utility, and cost per problem.