In a world-first, AI systems developed by Alphabet’s Google DeepMind and OpenAI have achieved gold medals at the prestigious International Mathematical Olympiad (IMO), marking a transformative moment in artificial intelligence’s mathematical reasoning capabilities.
Held on Australia’s Sunshine Coast, the 66th IMO saw participation from 630 top high-school mathematicians globally. This year, however, it wasn’t just human competitors making headlines. For the first time, AI models crossed the gold-medal scoring threshold, solving five out of six complex mathematical problems—tasks traditionally regarded as the pinnacle of high-school mathematical challenge.
Natural Language Reasoning Redefines AI’s Approach to Maths
Unlike earlier AI models that relied on rigid formal languages and heavy computational methods, both Google and OpenAI deployed general-purpose reasoning models that processed mathematical concepts using natural language. This breakthrough approach allowed Google’s Gemini Deep Think and OpenAI’s experimental reasoning models to tackle problems much like human contestants—within the official 4.5-hour competition timeframe.
While Google DeepMind formally collaborated with the IMO for result certification, OpenAI independently verified its scores through external IMO medalists. Both companies’ models reached gold medal scores, though only Google’s results were officially validated by the IMO.
Expensive Compute and New Frontiers
OpenAI’s latest success, driven by its experimental model, involved scaling up “test-time compute”—allowing the AI to reason for extended periods and run multiple reasoning pathways simultaneously. Researcher Noam Brown described the approach as “very expensive” in terms of computing resources but declined to disclose exact figures.
This milestone signifies AI’s rapidly advancing potential in solving hard reasoning problems across fields. Junehyuk Jung, a mathematics professor at Brown University and visiting researcher at Google DeepMind, stated that collaboration between AI and human mathematicians on unsolved frontier problems could become a reality within a year.
From Maths to Physics: AI’s Expanding Research Potential
Google researchers, too, express optimism about applying these models to complex challenges beyond mathematics, such as physics and theoretical research. Google’s Gemini Deep Think, a general-purpose model first unveiled at its developer conference in May, achieved this year’s gold status using purely natural language reasoning.
Interestingly, of the 630 student participants, only 67 contestants (about 11%) achieved gold-medal scores. Google’s AI matched this top-tier human performance, while OpenAI’s experimental model, though not officially entered, met the same threshold.
IMO Officially Acknowledges AI Participation
In a notable change, the IMO Board formally coordinated with AI developers this year. The board certified Google’s results and encouraged public publication of AI performance data after the competition’s conclusion to ensure student achievements remained the central focus.
“We respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified and students had received their rightful recognition,” said Google DeepMind CEO Demis Hassabis.
OpenAI, however, announced its gold-level performance shortly after the competition’s closing ceremony, citing permission from an IMO board member.
With the IMO board president confirming that cooperating companies could publish results from Monday onwards, both Google and OpenAI’s achievements now stand officially recognised, highlighting a significant moment in the evolution of AI reasoning capabilities.
With inputs from Reuters