OpenAI's latest AI model has achieved gold medal-level performance at the 2025 International Mathematical Olympiad (IMO). Under official exam conditions, the system solved five out of six problems with a total score of 35 out of 42 points.
The IMO is recognized as the most prestigious and challenging mathematics competition for high school students worldwide. This year only about 10% of participants earned gold medals, with many future Fields Medalists having previously excelled in the competition. Contestants receive two 4.5-hour sessions to complete six problems without internet access or any external tools.
AI's Mathematical Progress and Challenges
While AI models are generally not renowned for complex mathematical reasoning due to logical comprehension limitations, recent breakthroughs show significant progress. Gemini 2.5 Pro and OpenAI's o3 achieved 86.7% and 88.9% scores respectively on U.S. invitational math exams, establishing new AI benchmarks. In contrast, OpenAI's o1 scored only 83% on the 2024 IMO qualification exam, while Grok 4 reportedly achieved perfect 100% on AIME challenges.
"IMO problems require sustained creative thinking beyond previous benchmarks," stated OpenAI researcher Alexander Wei on X following the announcement of the unpublicized model. His colleague Noam Brown noted that just one year prior, AI labs were using basic math benchmarks like the GSM8K test for elementary school level problems.
OpenAI CEO Sam Altman emphasized this experimental model represents "a large language model for mathematical reasoning rather than a specialized formal math system" like AlphaGeometry, indicating steady progress toward general intelligence goals.
Manon Bischoff, editor of the German edition of Scientific American, predicted in January 2024 that "years" would be required for AI to compete at the IMO level. At that time, she announced AlphaGeometry - a math-specialized model capable of solving 54% of geometry problems from the past 25 IMO competitions. By February, the second-generation version solved 84% of these problems.
Controversy Over OpenAI's IMO Gold Claims
Not all experts accept OpenAI's reported mathematical advances. Google DeepMind researcher Thang Luong and former OpenAI CTO Mikhail Samin highlighted that the model wasn't evaluated against official IMO guidelines, making its gold medal claims verifiably unconfirmed. Wei responded on X that "three former IMO medalists independently assessed the model's proofs" and reached "consensus agreement" on scoring.
OpenAI's credibility for mathematical evaluations remains questionable. In April, independent research group Epoch AI found the o3 model correctly answered only about 10% of advanced problems, sharply contrasting OpenAI's December 2024 claim of over 25% accuracy.
Before the experimental model's Olympic participation, independent verification at this level was impossible. Unfortunately, Wei confirmed OpenAI has "no plans within months" to release models with such mathematical capabilities, and the experimental system likely won't be included in the upcoming GPT-5 version expected soon.