38 Sources
[1]
DeepMind and OpenAI models solve maths problems at level of top students
Google DeepMind announced on 21 July that its software had cracked a set of maths problems at the level of the world's top high-school students, achieving a gold-medal score on questions from the International Mathematical Olympiad. At first sight, this marked only a marginal improvement over the prevous year's performance. The company's system had performed in the upper range of silver medal standard at the 2024 Olympiad, while this year it was evaluated in the lower range for a human gold medallist. But the grades this year hide a "big paradigm shift," says Thang Luong, a computer scientist at DeepMind in Mountain View, California. The company achieved its previous feats using two artificial intelligence (AI) tools specifically designed to carry out rigorous logical steps in mathematical proofscalculations, called AlphaGeometry and AlphaProof. The process required human experts to first translate the problems' statements into something similar to a programming language, and then to translate the AI's solutions back into English. "This year, everything is natural language, end to end," says Luong. The team employed a large language model (LLM) called DeepThink, which is based on its Gemini system but with some additional developments that made it better and faster at producing mathematical arguments, such as handling multiple chains of thought in parallel. "For a long time, I didn't think we could go that far with LLMs," Luong adds. DeepThink scored 35 out of 42 points on the 6 problems that had been given to participants in this year's Olympiad. Under an agreement with the organizers, the computer's solutions were marked by the same judges who evaluated the human participants. Separately, ChatGPT creator OpenAI, based in San Francisco, California, had its own LLM solve the same Mathematical Olympiad problems at gold medal level, but had its solutions evaluated independently. For years, many AI researchers have fallen in one of two camps. Until 2012, the leading approach for was to code the rules of logical thinking into the machine by hand. Since then, neural networks -- which train automatically by learning from vast troves of data -- have made a series of sensational breakthroughs, and tools such as OpenAI's ChatGPT have now entered mainstream use. Gary Marcus, a neuroscientist at New York University (NYU) in New York City, called the results by DeepMind and OpenAI "Awfully impressive." Marcus is an advocate of the 'coding logic by hand' approach -- also known as neurosymbolic AI -- and a frequent critic of what he sees as hype surrounding LLMs. Still, writing on Substack with NYU computer scientist Ernest Davis, he commented that "to be able to solve math problems at the level of the top 67 high school students in the world is to have really good math problem solving chops". It remains to be seen whether LLM superiority on IMO problems is here to stay, or if neurosymbolic AI will claw its way back to the top. "At this point the two camps still keep developing," says Luong, who works on both approaches. "They could converge together." His team has already experimented with using LLMs to automate the translation of mathematical statements from natural language into the formal system that AlphaGeometry can read. Systems such as AlphaProof also have have the advantage that they can certify the correctness of their proofs, while proofs written by LLMs have to be checked by humans, the way human-written maths papers are. Many mathematicians have been working on translating human-written proofs into a machine-readable language to have computers check their correctness. Mathematician Kevin Buzzard of Imperial College London wrote on the social media platform Zulip that maths Olympiad success does not necessarily mean that a young mathematician is ready to do advanced research. By the same token, he added, it is an "open question" whether these systems' gold medal performances will translate into them being able to tackle complex research questions. Ken Ono, a mathematician at the University of Virginia in Charlottesville, agrees. "I view AI as valuable research partners, providing quick access to scientific literature and data summaries, as well as offering effective strategies for surprisingly difficult problems," he says. But he adds that "these tests and benchmarks aren't aligned with what theoretical mathematicians do". DeepMind says it will later allow some researchers to work with a version of DeepThink. "Very soon we can have AI collaborating with mathematicians," says Luong.
[2]
Gemini Deep Think learns math, wins gold medal at International Math Olympiad
The students participating in the annual International Math Olympiad (IMO) represent some of the most talented young computational minds in the world. This year, they faced down a newly enhanced array of powerful AI models, including Google's Gemini Deep Think. The company says it put its model to the test using the same rules as human participants, and it improved on an already solid showing from last year. Google says its specially tuned math AI got five of the six questions correct, which is good enough for gold medal status. And unlike OpenAI, Google played by the rules set forth by the IMO. The Google DeepMind team participated in last year's IMO competition using an AI composed of the AlphaProof and AlphaGeometry 2 models. This setup was able to get four of the six questions correct, earning silver medal status -- only half of the human participants earn any medal at all. In 2025, Google DeepMind was among a group of companies that worked with the IMO to have their models officially graded and certified by the coordinators. Google came prepared with a new model for the occasion. Gemini Deep Think was announced earlier this year as a more analytical take on simulated reasoning models. Rather than going down one linear line of "thought," Deep Think runs multiple reasoning processes in parallel, integrating and comparing the results before giving a final answer. According to Thang Luong, DeepMind senior scientist and head of the IMO team, this is a paradigm shift from last year's effort. In 2024, an expert had to translate the natural language questions into "domain specific language." At the end of the process, said expert would have to interpret the output. Deep Think, however, is natural language, end to end, and was not specifically designed to do math.
[3]
OpenAI jumps gun on International Math Olympiad gold medal announcement
On Saturday, OpenAI researcher Alexander Wei announced that a new AI language model the company is researching has achieved gold medal-level performance on the International Mathematical Olympiad (IMO), matching a standard that fewer than 9 percent of human contestants reach each year. The announcement came despite an embargo request from IMO organizers asking AI companies to wait until July 28 to share their results. The experimental model reportedly tackled the contest's six proof-based problems under the same constraints as human competitors: 4.5 hours per session, with no Internet access or calculators allowed. However, several sources with inside knowledge of the process say that since OpenAI self-graded its IMO results, the legitimacy of the company's claim may be in question. OpenAI plans to publish the proofs and grading rubrics for public review. According to OpenAI, its achievement marks a departure from previous AI attempts at mathematical Olympiad problems, which relied on specialized theorem-proving systems that often exceeded human time limits. OpenAI says its model processed problems as plain text and generated natural-language proofs, operating like a standard language model rather than a purpose-built mathematical system. The announcement follows Google's July 2024 claim that its AlphaProof and AlphaGeometry 2 models earned a silver medal equivalent at the IMO -- though Google's systems required up to three days per problem rather than the 4.5-hour human time limit and needed human assistance to translate problems into formal mathematical language. "Math is a proving ground for reasoning -- structured, rigorous, and hard to fake," the company wrote in a statement sent to Ars Technica. "This shows that scalable, general-purpose methods can now outperform hand-tuned systems in tasks long seen as out of reach." While the company confirmed that its next major AI model, GPT-5, is "coming soon," it clarified that this current model is experimental. "The techniques will carry forward, but nothing with this level of capability will be released for a while," OpenAI says. It's likely that OpenAI needed to devote a great deal of computational resources (which means high cost) for this particular experiment, and that level of computation won't be typical of consumer-facing AI models in the near future. Surprising results for a general-purpose AI model OpenAI says that the research team behind the experimental AI model, led by Alex Wei with support from Sheryl Hsu and Noam Brown, hadn't initially planned to enter the competition but decided to evaluate their work after observing promising results in testing. "This wasn't a system built for math. It's the same kind of LLM we train for language, coding, and science -- solving full proof-based problems under standard IMO constraints: 4.5 hours, no internet, no calculators," OpenAI said in a statement. OpenAI received problems that were freshly written by the IMO organizer and shared with several AI companies simultaneously. To validate the results, each solution reportedly underwent blind grading by a panel of three former IMO medalists organized by OpenAI, with unanimous consensus required for acceptance. However, in addition to the controversy over self-grading the results, OpenAI also annoyed the IMO community because its Saturday announcement appears to have violated the embargo agreement with the International Mathematical Olympiad. Harmonic, another AI company that participated in the competition, revealed in an X post on July 20 that "the IMO Board has asked us, along with the other leading AI companies that participated, to hold on releasing our results until Jul 28th." The early announcement has prompted Google DeepMind, which had prepared its own IMO results for the agreed-upon date, to move up its own IMO-related announcement to later today. Harmonic plans to share its results as originally scheduled on July 28. In response to the controversy, OpenAI research scientist Noam Brown posted on X, "We weren't in touch with IMO. I spoke with one organizer before the post to let him know. He requested we wait until after the closing ceremony ends to respect the kids, and we did." However, an IMO coordinator told X user Mikhail Samin that OpenAI actually announced before the closing ceremony, contradicting Brown's claim. The coordinator called OpenAI's actions "rude and inappropriate," noting that OpenAI "wasn't one of the AI companies that cooperated with the IMO on testing their models." Hard math since 1959 The International Mathematical Olympiad, running since 1959, represents one of the most challenging tests of mathematical reasoning. More than 100 countries send six participants each, with contestants facing six proof-based problems across two 4.5-hour sessions. The problems typically require deep mathematical insight and creativity rather than raw computational power. You can see the exact problems in the 2025 Olympiad posted online. For example, problem one asks students to imagine a triangular grid of dots (like a triangular pegboard) and figure out how to cover all the dots using exactly n straight lines. The twist is that some lines are called "sunny" -- these are the lines that don't run horizontally, vertically, or diagonally at a 45º angle. The challenge is to prove that no matter how big your triangle is, you can only ever create patterns with exactly 0, 1, or 3 sunny lines -- never 2, never 4, never any other number. The timing of the OpenAI results surprised some prediction markets, which had assigned around an 18 percent probability to any AI system winning IMO gold by 2025. However, depending on what Google says this afternoon (and what others like Harmonic may release on July 28), OpenAI may not be the only AI company to have achieved these unexpected results.
[4]
OpenAI and Google outdo the mathletes, but not each other | TechCrunch
AI models from OpenAI and Google DeepMind achieved gold medal scores in the 2025 International Math Olympiad (IMO), one of the world's oldest and most challenging high school level math competitions, the companies independently announced in recent days. The result underscores just how fast AI systems are advancing, and yet, how evenly matched Google and OpenAI seem to be in the AI race. AI companies are competing fiercely for the public perception of behind ahead in the AI race: an intangible battle of "vibes" that can have big implications for securing top AI talent. A lot of AI researchers come from backgrounds in competitive math, so benchmarks like IMO mean more than others. Last year, Google scored a silver medal at IMO using a "formal" system, meaning it required humans to translate problems into a machine‑readable format. This year, both OpenAI and Google entered "informal" systems into the competition, which were able to ingest questions and generate proof‑based answers in natural language. Both companies claim their AI models scored higher than most high school students and Google's AI model from last year, without requiring any human-machine translation. In interviews with TechCrunch, researchers behind OpenAI and Google's IMO efforts claimed that these gold medal performances represent breakthroughs around AI reasoning models in non-verifiable domains. While AI reasoning models tend to do well on questions with straightforward answers, such as math or coding tasks, these systems struggle on tasks with more ambiguous solutions, such as buying a great chair or helping with complex research. However, Google is raising questions around how OpenAI conducted and announced its gold medal IMO performance. After all, if you're going to enter AI models into a math contest for high schoolers, you might as well argue like teenagers. Shortly after OpenAI announced its feat on Saturday morning, Google DeepMind's CEO and researchers took to social media to slam OpenAI for announcing its gold‑medal prematurely -- shortly after IMO announced which high schoolers had won the competition on Friday night -- and for not having their model's test officially evaluated by IMO. Thang Luong, a Google DeepMind senior researcher and lead for the IMO project, told TechCrunch that Google waited to announce its IMO results to respect the students participating in the competition. Luong said that Google has been working with IMO's organizers since last year in preparation for the test and wanted to have the IMO president's blessing and official grading before announcing its official results, which it did on Monday morning. "The IMO organizers have their grading guideline," Luong said. "So any evaluation that's not based on that guideline could not make any claim about gold-medal level [performance]." Noam Brown, a senior OpenAI researcher who worked on the IMO model, told TechCrunch that IMO reached out to OpenAI a few months ago about participating in a formal math competition, but the ChatGPT-maker declined because it was working on natural language systems that it thought were more worth pursuing. Brown says OpenAI didn't know IMO was conducting an informal test with Google. OpenAI says it hired third-party evaluators -- three former IMO medalists who understood the grading system -- to grade its AI model's performance. After OpenAI learned of its gold medal score, Brown said the company reached out to IMO, which then told the company to wait to announce until after IMO's Friday night award ceremony. IMO did not respond to TechCrunch's request for comment. Google isn't necessarily wrong here -- it did go through a more official, rigorous process to achieve its gold medal score -- but the debate may miss the bigger picture: AI models from several leading AI labs are improving quickly. Countries from around the world sent their brightest students to compete at IMO this year, and just a few percent of them scored as well as OpenAI and Google's AI models did. While OpenAI used to have a significant lead over the industry, it certainly feels as though the race is more closely matched than any company would like to admit. OpenAI is expected to release GPT-5 in the coming months, and the company certainly hopes to give off the impression that it still leads the AI industry.
[5]
DeepMind and OpenAI claim gold in International Mathematical Olympiad
Two AI models have achieved gold medal standard for the first time in a prestigious competition for young mathematicians - and their developers claim these AIs could soon crack tough scientific problems Experimental AI models from Google DeepMind and OpenAI have achieved a gold-level performance in the International Mathematical Olympiad (IMO) for the first time. The companies are hailing the moment as an important milestone for AIs that might one day solve hard scientific or mathematical problems, but mathematicians are more cautious because details of the models' results and how they work haven't been made public. The IMO, one of the world's most prestigious competitions for young mathematicians, has long been seen by AI researchers as a litmus test for mathematical reasoning that AI systems tend to struggle with. After last year's competition held in Bath, UK, Google DeepMindannounced that AI systems it had developed, called AlphaProof and AlphaGeometry, had together achieved a silver medal-level performance, but its entries weren't graded by the competition's official markers. Before this year's contest, which was held in Queensland, Australia, companies including Google, Huawei and TikTok-owner ByteDance, as well as academic researchers, approached the organisers to ask whether they could have their AI models' performance officially graded, says Gregor Dolinar, the IMO's president. The IMO agreed, with the proviso that the companies waited to announce their results until 28 July, when the IMO's full closing ceremonies had been completed. OpenAI also asked if it could participate in the competition, but after it was informed about the official scheme, it didn't respond or register an entry, says Dolinar. On 19 July, OpenAI announced that a new AI it had developed had achieved a gold medal score marked by three former IMO medallists separate from the official competition. The AI answered five out of six questions correctly in the same 4.5-hour time limit as the contestants, OpenAI said. Two days later, Google DeepMind also announced that its AI system, called Gemini Deep Think, had achieved gold with the same score and time limits. Dolinar confirmed that this result was given by the IMO's official markers. Unlike Google's AlphaProof and AlphaGeometry systems, which were crafted especially for the competition and worked with questions and answers written in a computer programming language called Lean, both Google and OpenAI's models this year worked entirely in natural language. Working in Lean meant the AI's output could be instantly checked for correctness, but it is harder for non-experts to read. Thang Luong at Google, who worked on Gemini Deep Think, says the natural language approach could produce more understandable answers, as well as being applicable to generally useful AI systems. Luong says the ability to verify solutions in a large language model has been made possible thanks to progress with reinforcement learning, a training method in which an AI is taught what success looks like and is left to figure out the rules and how to succeed solely through trial and error. This method was key to Google's previous success with its game-playing AIs, such as AlphaZero. Google's model also considers multiple solutions at once, in a mode called parallel thinking, as well as being trained on a dataset of maths problems specifically useful for the IMO, says Luong. OpenAI has released few details on its system, apart from that it also uses reinforcement learning and "experimental research methods". "The progress is promising, but not performed in a controlled scientific fashion, and so I will not be able to assess it at this stage," says Terence Tao at the University of California, Los Angeles. "Perhaps once the companies involved release some papers with more data, and hopefully enough access to the model for others to replicate the results, one can say something more definitive, but, for now, we largely have to trust the companies themselves for the claimed results." Geordie Williamson at the University of Sydney in Australia agrees. "I think it is remarkable that this is where we're at. It is frustrating how little detail outsiders are provided with regarding internals," says Williamson. While systems working in natural language could be useful for non-mathematicians, it could also present a problem if models produce long proofs that are hard to check, says Joseph Myers, one of the organisers of this year's IMO. "If AIs are ever to produce solutions to significant unsolved problems that might plausibly be correct but might also have a few subtle but fatal errors hidden accidentally, or potentially deliberately from a misaligned AI, having those AIs also generate a formal proof is key to having confidence in the correctness of a long AI output before attempting to read it." Both companies say that, in the coming months, they will offer these systems for testing to mathematicians at first, before releasing them to the wider public. The models could soon help with harder scientific research problems, says Junehyuk Jung at Google, who worked on Gemini Deep Think. "There are going to be many, many unsolved problems within reach," he says.
[6]
OpenAI wins gold at prestigious math competition - why that matters more than you think
OpenAI has achieved a new milestone in the race to build AI models that can reason their way through complex math problems. On Saturday, the company announced that one of its models achieved gold medal-level performance on the International Math Olympiad (IMO), widely regarded as the most prestigious and difficult math competition in the world. Critically, the winning model wasn't designed specifically to solve IMO problems, in the way that earlier systems like DeepMind's AlphaGo -- which famously beat the world's leading Go player in 2016 -- were trained on a massive dataset within a very narrow, task-specific domain. Rather, the winner was a general-purpose reasoning model, designed to think through problems methodically using natural language. Also: Is ChatGPT down? You're not alone. Here's what OpenAI is saying "This is an LLM doing math and not a specific formal math system," OpenAI wrote in its X post. "It's part of our main push towards general intelligence." (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems. Ziff Davis also owns DownDetector.) Not much is known at this point about the identity of the model that was used. Alexander Wei, a researcher at OpenAI who led the IMO research, called it "an experimental reasoning LLM" in an X post, which included an illustration of a strawberry wreathed in a gold medal, suggesting it's built atop the company's o1 family of reasoning models, which debuted in September. "To be clear: We're releasing GPT-5 soon, but the model we used at IMO is a separate experimental model," OpenAI added on X. "It uses new research techniques that will show up in future models -- but we don't plan to release a model with this level of capability for many months." The IMO, which began in 1959, attracts around 50 contestants from more than 100 countries each year. Contestants must provide proof-based responses to a total of six questions over the course of two days. Those proofs are assessed by former IMO gold medalists, with unanimous consensus required for each final score. Fewer than 9% of participants achieve gold. According to Wei, OpenAI's experimental model solved five out of the six problems and earned 35 out of 42 possible points (about 83%), earning a gold medal. Each proof comprised hundreds of lines of text, representing the individual steps the model took to work through its reasoning process. In keeping with the competition's prohibition against the use of calculators or other external tools, OpenAI's model had no access to the internet; it was purely reasoning through each of the problems step-by-step. Also: My 8 ChatGPT Agent tests produced only 1 near-perfect result - and a lot of alternative facts The "model thinks for a long time," Noam Brown, another OpenAI researcher involved in the research project, wrote in an X post. "o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it's also more efficient with its thinking." Analysts had previously estimated that there was only an 18% chance that an AI system would win gold in the IMO by 2025, according to OpenAI. For all of its impressive abilities, AI has long struggled with simple arithmetic and basic math word problems -- tasks that one might think should be relatively straightforward for advanced algorithms. But unlike more narrow logical puzzles, math requires a level of abstract reasoning and conceptual juggling that has been beyond the reach of most AI systems. That's been changing, however, at an extraordinarily rapid pace. A little over a year ago, AI models were still being assessed using grade school-level math benchmarks like the GSM8K. Reasoning models like o1 and DeepSeek's R1 quickly excelled, first acing high school-level benchmarks like AIME and then advancing to the university level and beyond. A capacity for high-level mathematics has become the gold standard for reasoning models, since even a small amount of hallucination or corner-cutting can very quickly and clearly ruin a model's output. It's easier to get away with when generating other kinds of responses, for example, providing help with a written essay, since they're very often open to various kinds of interpretation. Also: 5 tips for building foundation models for AI OpenAI's IMO gold medal shows that a scalable, general-purpose reasoning approach can surpass domain-specific models in tasks that have long been believed to be beyond the reach of current AI systems. As it turns out, you don't need to build hyperfocused, AlphaGo-like models trained to do nothing but math; it's enough to train them to parse language and carefully reason through their thought process, and if they're given enough time, they'll be able to build AI systems that are able to compete on par with world-class human mathematicians. According to Brown, the current pace of innovation happening throughout the AI industry suggests that its mathematical and reasoning prowess will only grow from here. "I fully expect the trend to continue," he wrote on X. "Importantly, I think we're close to AI substantially contributing to scientific discovery."
[7]
Human teens beat AI at an international math competition
For the first time ever, AI models achieved prestigious gold-level scores at the International Mathematics Olympiad, one of the world's premiere math competitions. Their success is an undeniable bragging right for the technology's biggest supporters. But as it stands, Google and OpenAI's most cutting-edge, experimental AI programs still can't beat an extremely smart teenager. It may seem ironic, but complex mathematics is still one of AI's biggest hurdles. There are many analyses into why this remains such an issue, but generally speaking, it has to do with how the technology works. After receiving a prompt, AI like ChatGPT and Google Gemini break the words and letters down into "tokens," then parse and predict an appropriate response. To an AI, an answer is just the most likely string of tokens. Humans, however, process them as words, sentences, and complete thoughts. Given these parameters, AI doesn't possess the "logic" capabilities required to handle complex mathematical prompts. This is largely because math prompts usually don't have multiple possible answers -- only a single, correct solution. Today, a pocket calculator will invariably give you the objectively true answer to multiplying 4596 by 4859 (22,331,964). Meanwhile, ChatGPT might still offer you an answer of 22,325,364: Since 1959, the International Mathematical Olympiad (IMO) has served as one of the world's premiere events for young -- human -- math whizzes. Many mathematicians would need longer than their allotted time to answer just one of the IMO's problems -- and most people wouldn't be able to solve any of them. Australia most recently hosted the 66th annual IMO competition in Queensland, where 641 teenagers from 112 countries met on July 15 to tackle six questions in under 4.5 hours. This time, however, they had some additional competition: a pair of experimental AI reasoning models from Google and OpenAI. The bots fared well. Both companies have since announced that their programs scored high enough on this year's IMO test to earn gold medals. Each AI managed to solve 5 of the 6 problems within the time limit, earning 35 out of the maximum 42 possible points. This year, only about 10 percent of human entrants received a gold-level score. It marked a major improvement from Google's last showing at IMO. In 2024, a version of its DeepMind reasoning AI reached a silver-medal score after solving four of six problems, although it required 2-3 days of computation instead of the 4.5-hour time limit. According to IMO president Gregor Dolinar, one of the most striking points of this year's results wasn't just the AI programs' calculations, but the ways in which they explained their "thought" process to arrive at each answer. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow," Dolinar said via Google's announcement. There's at least one last IMO milestone for both companies: a perfect score. This year, five teens pulled off that accomplishment. And even if Google or OpenAI ties humans at the IMO in the coming years, the victory may still require context. As AFP noted, IMO organizers couldn't confirm how much computing power was required by either AI model, or if there was any additional human oversight during the calculations. And while AI's latest technological leap forward is impressive, it still likely required disconcertingly massive amounts of energy and water. Companies like Google, OpenAI, and Microsoft are all investing heavily in data center projects to support their AI projects -- all of which need power sources. In some cases, that may even include expanding the use of fossil fuels. Watchdogs previously estimated that at this rate, the AI industry may consume as much energy as Argentina, if not multiple nations combined. That's a problem that AI -- nor its makers -- have yet to solve.
[8]
Humans beat AI at international math contest despite gold-level AI scores
Humans beat generative AI models made by Google and OpenAI at a top international mathematics competition, despite the programs reaching gold-level scores for the first time. Neither model scored full marks -- unlike five young people at the International Mathematical Olympiad (IMO), a prestigious annual competition where participants must be under 20 years old. Google said Monday that an advanced version of its Gemini chatbot had solved five out of the six math problems set at the IMO, held in Australia's Queensland this month. "We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points -- a gold medal score," the US tech giant cited IMO president Gregor Dolinar as saying. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow." Around 10% of human contestants won gold-level medals, and five received perfect scores of 42 points. US ChatGPT maker OpenAI said that its experimental reasoning model had scored a gold-level 35 points on the test. The result "achieved a longstanding grand challenge in AI" at "the world's most prestigious math competition," OpenAI researcher Alexander Wei wrote on social media. "We evaluated our models on the 2025 IMO problems under the same rules as human contestants," he said. "For each problem, three former IMO medalists independently graded the model's submitted proof." Google achieved a silver-medal score at last year's IMO in the British city of Bath, solving four of the six problems. That took two to three days of computation -- far longer than this year, when its Gemini model solved the problems within the 4.5-hour time limit, it said. The IMO said tech companies had "privately tested closed-source AI models on this year's problems," the same ones faced by 641 competing students from 112 countries. "It is very exciting to see progress in the mathematical capabilities of AI models," said IMO president Dolinar. Contest organizers could not verify how much computing power had been used by the AI models or whether there had been human involvement, he cautioned.
[9]
Google and OpenAI's AI models win milestone gold at global math competition
July 21 (Reuters) - Alphabet's (GOOGL.O), opens new tab Google and OpenAI said their artificial-intelligence models won gold medals at a global mathematics competition, signaling a breakthrough in math capabilities in the race to build powerful systems that can rival human intelligence. The results marked the first time that AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad for high-school students. Both companies' models solved five out of six problems, achieving the result using general-purpose "reasoning" models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms. The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit. "I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians," Jung told Reuters. OpenAI's breakthrough was achieved with a new experimental model centered on massively scaling up "test-time compute." This was done by both allowing the model to "think" for longer periods and deploying parallel computing power to run numerous lines of reasoning simultaneously, according to Noam Brown, researcher at OpenAI. Brown declined to say how much in computing power it cost OpenAI, but called it "very expensive." To OpenAI researchers, it is another clear sign that AI models can command extensive reasoning capabilities that could expand into other areas beyond math. The optimism is shared by Google researchers, who believe AI models' capabilities can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in 2003. Of the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11%, achieved gold-medal scores. Google's DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in May. Unlike previous AI attempts that relied on formal languages and lengthy computation, Google's approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months. This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities. IMO judges certified the results of those companies, including Google, and asked them to publish results on July 28. "We respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts and the students had rightly received the acclamation they deserved," Google DeepMind CEO Demis Hassabis said on X on Monday. OpenAI, which published its results on Saturday and first claimed gold-medal status, said in an interview that it had permission from an IMO board member to do so after the closing ceremony on Saturday. The competition on Monday allowed cooperating companies to publish results, Gregor Dolinar, president of IMO's board, told Reuters. Reporting by Kenrick Cai and Anna Tong in San Francisco and Jaspreet Singh in Bengaluru, and Krystal Hu in New York; Editing by Matthew Lewis Our Standards: The Thomson Reuters Trust Principles., opens new tab * Suggested Topics: * Disrupted Kenrick Cai Thomson Reuters Kenrick Cai is a correspondent for Reuters based in San Francisco. He covers Google, its parent company Alphabet and artificial intelligence. Cai joined Reuters in 2024. He previously worked at Forbes magazine, where he was a staff writer covering venture capital and startups. He received a Best in Business award from the Society for Advancing Business Editing and Writing in 2023. He is a graduate of Duke University.
[10]
Google clinches milestone gold at global math competition, while OpenAI also claims win
July 21 (Reuters) - Alphabet's (GOOGL.O), opens new tab Google and OpenAI said their artificial-intelligence models won gold medals at a global mathematics competition, signaling a breakthrough in math capabilities in the race to build systems that can rival human intelligence. The results marked the first time that AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad (IMO) for high-school students. Both companies' models solved five out of six problems, achieving the result using general-purpose "reasoning" models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms. While Google DeepMind worked with the IMO to have their models graded and certified by the committee, OpenAI did not officially enter the competition. The startup revealed their models have achieved a gold medal-worthy score on this year's questions on Saturday, citing grades by three external IMO medalists. The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit. "I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians," Jung told Reuters. OpenAI's breakthrough was achieved with a new experimental model centered on massively scaling up "test-time compute." This was done by both allowing the model to "think" for longer periods and deploying parallel computing power to run numerous lines of reasoning simultaneously, according to Noam Brown, researcher at OpenAI. Brown declined to say how much in computing power it cost OpenAI, but called it "very expensive." To OpenAI researchers, it is another clear sign that AI models can command extensive reasoning capabilities that could expand into other areas beyond math. The optimism is shared by Google researchers, who believe AI models' capabilities can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in 2003. Of the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11%, achieved gold-medal scores. Google's DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in May. Unlike previous AI attempts that relied on formal languages and lengthy computation, Google's approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months. This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities. IMO judges certified the results of those companies, including Google, and asked them to publish results on July 28. "We respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts and the students had rightly received the acclamation they deserved," Google DeepMind CEO Demis Hassabis said on X on Monday. OpenAI, which published its results on Saturday and first claimed gold-medal status, said in an interview that it had permission from an IMO board member to do so after the closing ceremony on Saturday. The competition on Monday allowed cooperating companies to publish results, Gregor Dolinar, president of IMO's board, told Reuters. Reporting by Kenrick Cai and Anna Tong in San Francisco and Jaspreet Singh in Bengaluru, and Krystal Hu in New York; Editing by Matthew Lewis and Stephen Coates Our Standards: The Thomson Reuters Trust Principles., opens new tab * Suggested Topics: * Disrupted Kenrick Cai Thomson Reuters Kenrick Cai is a correspondent for Reuters based in San Francisco. He covers Google, its parent company Alphabet and artificial intelligence. Cai joined Reuters in 2024. He previously worked at Forbes magazine, where he was a staff writer covering venture capital and startups. He received a Best in Business award from the Society for Advancing Business Editing and Writing in 2023. He is a graduate of Duke University.
[11]
OpenAI's experimental model achieved gold at the International Math Olympiad
OpenAI has achieved "gold medal-level performance" at the International Math Olympiad, notching another important milestone for AI's fast-paced growth. Alexander Wei, a research scientist at OpenAI working on LLMs and reasoning, posted on X that an experimental research model delivered on this "longstanding grand challenge in AI." According to Wei, an unreleased model from OpenAI was able to solve five out of six problems at one of the world's longest-standing and prestigious math competitions, earning 35 out of 42 points total. The International Math Olympiad (IMO) sees countries send up to six students to solve extremely difficult algebra and pre-calculus problems. These exercises are seemingly simple but usually require some creativity to score the highest marks on each problem. For this year's competition, only 67 of the 630 total contestants received gold medals, or roughly 10 percent. AI is often tasked with tackling complex datasets and repetitive actions, but it usually falls short when it comes to solving problems that require more creativity or complex decision-making. However, with the latest IMO competition, OpenAI says its model was able to handle complicated math problems with human-like reasoning. "By doing so, we've obtained a model that can craft intricate, watertight arguments at the level of human mathematicians," Wei wrote on X. Wei and Sam Altman, CEO of OpenAI, both added that the company doesn't expect to release anything with this level of math capability for several months. That means the upcoming GPT-5 will likely be an improvement from its predecessor, but it won't feature that same impressive capability to compete in the IMO.
[12]
Google A.I. System Wins Gold In International Math Olympiad
An artificial intelligence system built by Google DeepMind, the tech giant's primary artificial intelligence lab, has achieved "gold medal" status in the annual International Mathematical Olympiad, a premier math competition for high school students. It was the first time a machine -- which solved five of the six problems at the 2025 competition, held in Australia this month -- reached that level of success, Google said in a blog post on Monday. The news is another sign that leading companies are continuing to improve their A.I. systems in areas like math, science and computer coding. This kind of technology could accelerate the research of mathematicians and scientists and streamline the work of experienced computer programmers. Two days before Google revealed its feat, an OpenAI researcher said in a social media post that the start-up had built technology that achieved a similar score on this year's questions, though it did not officially enter the competition. Both systems were chatbots that received and responded to the questions much like humans. Other A.I. systems have participated in the International Mathematical Olympiad, or I.M.O., but they could answer questions only after human experts translated them into a computer programming language built for solving math problems. "We solved these problems fully in natural language," Thang Luong, a senior staff research scientist at Google DeepMind, said in an interview. "That means there was no human intervention -- at all." After OpenAI started the A.I. boom with the release of ChatGPT in late 2022, the leading chatbots could answer questions, write poetry, summarize news articles, even write a little computer code. But they often struggled with math. Over the past two years, companies like Google and OpenAI have built A.I. systems better suited to mathematics, including complex problems that the average person cannot solve. Last year, Google DeepMind unveiled two systems that were designed for math, AlphaGeometry and AlphaProof. Competing in the I.M.O., these systems achieved "silver medal" performance, solving four of the competition's six problems. It was the first time a machine reached silver medal status. Other companies, including a start-up called Harmonic, have built similar systems. But systems like AlphaProof and Harmonic are not chatbots. They can answer questions only after mathematicians translate the questions into Lean, a computer programming language designed for solving math problems. This year, Google entered the I.M.O. with a chatbot that could read and respond to questions in English. This system is not yet available to the public. Called Gemini Deep Think, the technology is what scientists call a "reasoning" system. This kind of system is designed to reason through tasks involving math, science and computer programming. Unlike previous chatbots, this technology can spend time thinking through complex problems before settling on an answer. Other companies, like OpenAI, Anthropic and China's DeepSeek, offer similar technologies. Like other chatbots, a reasoning system initially learns its skills by analyzing enormous amounts of text culled from across the internet. Then it learns additional behavior through extensive trial and error in a process called reinforcement learning. A reasoning system can be expensive, because it spends additional time thinking about a response. Google said Deep Think had spent the same amount of time with the I.M.O. as human participants did: four and a half hours. But the company declined to say how much money, processing power or electricity had been used to complete the test. In December, an OpenAI system surpassed human performance on a closely watched reasoning test called ARC-AGI. But the company ran afoul of competition rules because it spent nearly $1.5 million in electricity and computing costs to complete the test, according to pricing estimates.
[13]
Google and OpenAI Chatbots Claim Gold at International Math Olympiad
Artificial intelligence models developed by Google's DeepMind team and OpenAI have a new accolade they can add to their list of achievements: they have defeated some high schoolers in math. Both companies have claimed to achieve a gold medal at this year's International Mathematical Olympiad (IMO), one of the toughest competitions for high school students looking to prove their mathematical prowess. The Olympiad invites top students from across the world to participate in an exam that requires them to solve a number of complex, multi-step math problems. The students take two four-and-a-half-hour exams across two days, tasked with solving a total of six questions in total with point values assigned for completing different parts of the problems. Models from DeepMind and OpenAI both solved five out of the six answers perfectly, scoring a total of 35 out of 42 possible points, which was enough for gold. A total of 67 human participants of the 630 taking part also took home the honor of gold. There's one little tidbit that doesn't really have anything to do with the results, just the behavior of the companies. DeepMind was invited to participate in the IMO and announced its gold on Monday in a blog post, following the organization's release of the official results for student participants. According to Implicator.ai, OpenAI didn't actually enter the IMO. Instead, it took the problems, which are made public so others can take a crack at solving them, and tackled them on their own. OpenAI announced it had a gold-level performance, which can't actually be verified by the IMO because it didn't participate. Also, the company announced its score over the weekend instead of waiting for Monday (when the official scores are posted) against the wishes of the IMO, which asked for companies not to steal the spotlight from students. The models used to solve these problems participated in the exam the same way the students did. They were given 4.5 hours for each exam and were not allowed to use any external tools or access the internet. Notably, it seems both companies used general-purpose AI rather than specialized models, which previously fared much better than the do-it-all models. A noteworthy fact about these companies' claims to the top spot: Neither model that achieved gold (or, you know, a self-administered gold) is publicly available. In fact, public models did a pretty terrible job at the task. Researchers ran the questions through Gemini 2.5 Pro, Grok-4, and OpenAI o4, and none of them were able to score higher than 13 points, which is short of the 19 needed to take home a bronze medal. There is still plenty of skepticism about the results, and the fact that publicly available models did so poorly suggests there's a gap in the tools that we have access to and what a more finely-tuned model can do, which rightfully should result in questions as to why those smarter models can't be scaled or made widely available. But there are still two important takeaways here: Lab models are getting better at reasoning problems, and OpenAI is run by a bunch of lames who couldn't wait to steal glory from some teenagers.
[14]
OpenAI, Google models beat high school math elites in global contest
Artificial intelligence models developed by Google's DeepMind team and OpenAI soaked themselves in a new form of glory recently. They have managed to beat some of the brightest high-school minds at mathematics, achieving a gold medal at the International Mathematical Olympiad 2025. The IMO 2025 is regarded as one of the toughest competitions for high schoolers worldwide to prove their mettle with numbers and equations. The gold medal scoring by the AI models marks a significant breakthrough for the AI technology, and a marker of how it's redefining its limits daily.
[15]
Google continues to tease Gemini Deep Think with mathematics win
The Gemini Deep Think mode announced at I/O 2025 is not here yet, but Google today is highlighting how it achieved a gold-medal level performance in a math, or maths, competition. The International Mathematical Olympiad ("IMO") is the world's most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Medals are awarded to the top half of contestants, with approximately 8% receiving a prestigious gold medal. IMO 2025 was held last week, and an "advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points, and achieving gold-medal level performance." Google shared the solutions here (PDF). To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions. Back in May, Google explicitly said Gemini 2.5 Pro was the underlying model. Today's blog post just says "advanced version" or "advanced Gemini." This competition is a good challenge for an AI system's advanced mathematical problem-solving and reasoning capabilities. In 2024, Google DeepMind scored a silver ("solving four out of the six problems and scoring 28 points") using AlphaGeometry and AlphaProof with 2-3 days of computation. However, the problems had to first be translated from natural language into domain-specific languages. This year, Gemini "operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions - all within the 4.5-hour competition time limit." Deep Think is an "enhanced reasoning mode" that uses the "latest research techniques," like parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought. Google says it will make "a version of this Deep Think model available to a set of trusted testers, including mathematicians." It will come to Google AI Ultra after that, but it's unclear when it will actually launch to subscribers of the $250 per month tier.
[16]
OpenAI and Google DeepMind race for math gold
The intrigue: Google DeepMind's model results were officially certified by the IMO, but OpenAI released their own results first, highlighting the speed and urgency in the race to build the best model for math and reasoning. The big picture: OpenAI didn't enter the competition, but evaluated its model on the 2025 IMO problems, after seeing the model's performance on related tasks. * Researcher Alexander Wei shared OpenAI's results on X on July 19. * The model abided by the same rules as human contestants, including two 4.5-hour exam sessions using no internet or other tools. * Google announced today that an advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points. * That's the same score OpenAI announced. Stunning stat: Only 67 of the 630 (roughly 10%) of contestants received gold medals this year. * The IMO is an elite math competition for high school students, drawing participants from over 100 countries. It was held in Australia this year. Between the lines: The results from both companies show how far general-purpose models have progressed in solving advanced math problems and delivering the answers in natural language proofs. * Models that previously beat humans in Go, Poker and other competitions were trained specifically for those games. * The new high-performing models are general purpose models, the same ones they train for language, coding and science. * The results show that these models can perform better than those that have been hand-tuned for specific tasks. Why it matters: AI models are unusually difficult to benchmark because of the speed at which the tech is moving and the lack of a standard benchmarking system. Zoom out: Both models are experimental and won't be released to the public "for a while," OpenAI says. "Many months," according to a post on X from OpenAI CEO Sam Altman. * The model that competed in the IMO is "actually very close to the main Gemini model that we have been offering to people," Google DeepMind senior staff research scientist Thang Luong told Axios. * "So we are very confident that we can bring [the model] into the hands of our trusted testers very soon, especially the mathematicians," Luong says. * "We hope that this will empower mathematicians so they can crack harder and harder problems." What they're saying: "When we first started OpenAI, this was a dream but not one that felt very realistic to us," OpenAI CEO Sam Altman said in a post on X. * "It is a significant marker of how far AI has come over the past decade." * "Our leap from silver to gold medal-standard in just one year shows a remarkable pace of progress in AI," Google wrote on its blog. Yes, but: Both Google and OpenAI praised the high school students participating in the Olympiad and were careful not to frame the competition as a bots vs. humans cage match. * The purpose of the IMO is to promote the "beauty of mathematics" to high school students and to encourage them to go into the field, Junehyuk Jung, associate professor at Brown University and visiting researcher at Google DeepMind, told Axios. * Jung was a participant in the IMO 22 years ago. Google waited for the IMO to officially certify the competition results rather than release its results over the weekend out of respect for the students in the competition, Luong said.
[17]
Google DeepMind makes AI history with gold medal win at world's toughest math competition
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Google DeepMind announced Monday that an advanced version of its Gemini artificial intelligence model has officially achieved gold medal-level performance at the International Mathematical Olympiad, solving five of six exceptionally difficult problems and earning recognition as the first AI system to receive official gold-level grading from competition organizers. The victory advances the field of AI reasoning and puts Google ahead in the intensifying battle between tech giants building next-generation artificial intelligence. More importantly, it demonstrates that AI can now tackle complex mathematical problems using natural language understanding rather than requiring specialized programming languages. "Official results are in -- Gemini achieved gold-medal level in the International Mathematical Olympiad!" Demis Hassabis, CEO of Google DeepMind, wrote on social media platform X Monday morning. "An advanced version was able to solve 5 out of 6 problems. Incredible progress." The International Mathematical Olympiad, held annually since 1959, is widely considered the world's most prestigious mathematics competition for pre-university students. Each participating country sends six elite young mathematicians to compete in solving six exceptionally challenging problems spanning algebra, combinatorics, geometry, and number theory. Only about 8% of human participants typically earn gold medals. How Google DeepMind's Gemini Deep Think cracked math's toughest problems Google's latest success far exceeds its 2024 performance, when the company's combined AlphaProof and AlphaGeometry systems earned silver medal status by solving four of six problems. That earlier system required human experts to first translate natural language problems into domain-specific programming languages and then interpret the AI's mathematical output. This year's breakthrough came through Gemini Deep Think, an enhanced reasoning system that employs what researchers call "parallel thinking." Unlike traditional AI models that follow a single chain of reasoning, Deep Think simultaneously explores multiple possible solutions before arriving at a final answer. "Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions," Hassabis explained in a follow-up post on the social media site X, emphasizing that the system completed its work within the competition's standard 4.5-hour time limit. The model achieved 35 out of a possible 42 points, comfortably exceeding the gold medal threshold. According to IMO President Prof. Dr. Gregor Dolinar, the solutions were "astonishing in many respects" and found to be "clear, precise and most of them easy to follow" by competition graders. OpenAI faces backlash for bypassing official competition rules The announcement comes amid growing tension in the AI industry over competitive practices and transparency. Google DeepMind's measured approach to releasing its results has drawn praise from the AI community, particularly in contrast to rival OpenAI's handling of similar achievements. "We didn't announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved," Hassabis wrote, appearing to reference OpenAI's earlier announcement of its own olympiad performance. Social media users were quick to note the distinction. "You see? OpenAI ignored the IMO request. Shame. No class. Straight up disrespect," wrote one user. "Google DeepMind acted with integrity, aligned with humanity." The criticism stems from OpenAI's decision to announce its own mathematical olympiad results without participating in the official IMO evaluation process. Instead, OpenAI had a panel of former IMO participants grade its AI's performance, a approach that some in the community view as lacking credibility. "OpenAI is quite possibly the worst company on the planet right now," wrote one critic, while others suggested the company needs to "take things seriously" and "be more credible." Inside the training methods that powered Gemini's mathematical mastery Google DeepMind's success appears to stem from novel training techniques that go beyond traditional approaches. The team used advanced reinforcement learning methods designed to leverage multi-step reasoning, problem-solving, and theorem-proving data. The model was also provided access to a curated collection of high-quality mathematical solutions and received specific guidance on approaching IMO-style problems. The technical achievement impressed AI researchers who noted its broader implications. "Not just solving math... but understanding language-described problems and applying abstract logic to novel cases," wrote AI observer Elyss Wren. "This isn't rote memory -- this is emergent cognition in motion." Ethan Mollick, a professor at the Wharton School who studies AI, emphasized the significance of using a general-purpose model rather than specialized tools. "Increasing evidence of the ability of LLMs to generalize to novel problem solving," he wrote, highlighting how this differs from previous approaches that required specialized mathematical software. The model demonstrated particularly impressive reasoning in one problem where many human competitors applied graduate-level mathematical concepts. According to DeepMind researcher Junehyuk Jung, Gemini "made a brilliant observation and used only elementary number theory to create a self-contained proof," finding a more elegant solution than many human participants. What Google DeepMind's victory means for the $200 billion AI race The breakthrough comes at a critical moment in the AI industry, where companies are racing to demonstrate superior reasoning capabilities. The success has immediate practical implications: Google plans to make a version of this Deep Think model available to mathematicians for testing before rolling it out to Google AI Ultra subscribers, who pay $250 monthly for access to the company's most advanced AI models. The timing also highlights the intensifying competition between major AI laboratories. While Google celebrated its methodical, officially-verified approach, the controversy surrounding OpenAI's announcement reflects broader tensions about transparency and credibility in AI development. This competitive dynamic extends beyond just mathematical reasoning. Recent weeks have seen various AI companies announce breakthrough capabilities, though not all have been received positively. Elon Musk's xAI recently launched Grok 4, which the company claimed was the "smartest AI in the world," though leaderboard scores showed it trailing behind models from Google and OpenAI. Additionally, Grok has faced criticism for controversial features including sexualized AI companions and episodes of generating antisemitic content. The dawn of AI that thinks like humans -- with real-world consequences The mathematical olympiad victory goes beyond competitive bragging rights. Gemini's performance demonstrates that AI systems can now match human-level reasoning in complex tasks requiring creativity, abstract thinking, and the ability to synthesize insights across multiple domains. "This is a significant advance over last year's breakthrough result," the DeepMind team noted in their technical announcement. The progression from requiring specialized formal languages to operating entirely in natural language suggests that AI systems are becoming more intuitive and accessible. For businesses, this development signals that AI may soon tackle complex analytical problems across various industries without requiring specialized programming or domain expertise. The ability to reason through intricate challenges using everyday language could democratize sophisticated analytical capabilities across organizations. However, questions persist about whether these reasoning capabilities will translate effectively to messier real-world challenges. The mathematical olympiad provides well-defined problems with clear success criteria -- a far cry from the ambiguous, multifaceted decisions that define most business and scientific endeavors. Google DeepMind plans to return to next year's competition "in search of a perfect score." The company believes AI systems combining natural language fluency with rigorous reasoning "will become invaluable tools for mathematicians, scientists, engineers, and researchers, helping us advance human knowledge on the path to AGI." But perhaps the most telling detail emerged from the competition itself: when faced with the contest's most difficult problem, Gemini started from an incorrect hypothesis and never recovered. Only five human students solved that problem correctly. In the end, it seems, even gold medal-winning AI still has something to learn from teenage mathematicians.
[18]
OpenAI claims gold medal performance at prestigious math competition, drama ensues
OpenAI is at the center of competitive math drama. Credit: Witthaya Prasongsin / Moment / Getty Images OpenAI announced its unreleased reasoning model won the gold at the International Mathematical Olympiad (IMO), igniting fierce drama in the world of competitive math. While most high schoolers blissfully enjoy a break from school and homework, top math students from around the world brought their A-game to the IMO, considered the most prestigious math competition. AI labs also competed with their LLMs, and an unreleased model from OpenAI achieved a high-enough score to earn a gold medal, according to researcher Alexander Wei who shared the news on X. This Tweet is currently unavailable. It might be loading or has been removed. The OpenAI model got five out the six problems correct, earning a gold medal-worthy score of 35 out of 42 points. "For each problem, three former IMO medalists independently graded the model's submitted proof, with scores finalized after unanimous consensus," according to Wei. The problems are algebra and pre-calculus challenges that require creative thinking on the competitor's part. So for LLMs to be able to reason their way through long, complex proofs is an impressive achievement. However, the timing of the announcement is being criticized for overshadowing the human competitors' results. The IMO reportedly asked the AI labs officially working with the organization verifying the results to wait a week before making any announcements, to avoid stealing the kids' thunder. That's according to an X post from Mikhail Samin, who runs the AI Governance and Safety Institute nonprofit. OpenAI said they didn't formally cooperate with the IMO to verify their results and instead worked with individual mathematicians to independently verify its scores, and so it wasn't beholden to any kind of agreement. Mashable sent a direct message to Samin on X for comment. But the gossip is that this rubbed organizers the wrong way, who thought it was "rude" and "inappropriate" for OpenAI to do this. This is all hearsay, based on rumors from Samin, who also posted a screenshot of a similar comment from someone named Joseph Myers, presumably the two-time IMO gold medalist. Mashable contacted Myers for comment, but he has not publicly confirmed the authenticity of the screenshot. This Tweet is currently unavailable. It might be loading or has been removed. In response, OpenAI researcher Noam Brown said they posted the results after the IMO closing ceremony, honoring an IMO organizer's request. Brown also said OpenAI wasn't in touch with IMO, suggesting they didn't make any agreements about announcing the results later. This Tweet is currently unavailable. It might be loading or has been removed. Meanwhile, Google DeepMind reportedly did cooperate with the IMO, and announced this afternoon that an "advanced version of Gemini with Deep Think officially achieve[d] gold-medal standard at the International Mathematical Olympiad." According to the announcement, DeepMind's model was "officially graded and certified by IMO coordinators using the same criteria as for student solutions." Read into that statement as much or as little as you want, but the timing is hardly coincidental. This Tweet is currently unavailable. It might be loading or has been removed. Others may follow the Real Housewives, but the proper decorum of elite math competitions is the high drama we live for.
[19]
OpenAI and Google win at the world's most prestigious math competition
Artificial intelligence (AI) models were put to the test this weekend to find out who was the best so-called mathlete at the world's most prestigious competition in Australia. Google's DeepMind and OpenAI, which makes ChatGPT, say they both achieved a gold medal-level performance at this year's International Mathematical Olympiad (IMO), thoughonly Google had actually entered the competition. The IMO confirmed DeepMind's results, whereas OpenAI evaluated its model on the 2025 IMO problems and self-published its results before official verification. Alex Wei, a research scientist at OpenAI working on large language models (LLMs) and reasoning, announced the results on his X account. An advanced version of DeepMind's Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points and achieving gold-medal level performance. OpenAI's model also solved five out of the six IMO problems and had the same score. Both models show how far AI has come since the technology catapulted with the launch of ChatGPT in November 2022. The math test in itself is very hard and only about 10 per cent of the 630 competitors received a gold medal this year. Participants from more than 100 countries entered the competition, which is aimed at elite high-school students. Those under the age of 20 can apply. "When we first started OpenAI, this was a dream but not one that felt very realistic to us; it is a significant marker of how far AI has come over the past decade," OpenAI CEO Sam Altman wrote on X in reference to the math competition. He added that the company will "soon" release a new version, GPT-5, but that it doesn't plan "to release a model with IMO gold level of capability for many months". Meanwhile, Google wrote in a blog post: "It is a significant marker of how far AI has come over the past decade". The company participated in the competition last year and won a silver medal. "Our leap from silver to gold medal-standard in just one year shows a remarkable pace of progress in AI," Google said. However, both companies celebrated the human participants and avoided framing the competition as a man versus machine challenge. Wei called them "some of the brightest young minds of the future" and said that OpenAI employs some former IMO competitors.
[20]
Humans triumph over AI at annual math Olympiad, but the machines are catching up
Sydney -- Humans beat generative AI models made by Google and OpenAI at a top international mathematics competition, but the programs reached gold-level scores for the first time, and the rate at which they are improving may be cause for some human introspection. Neither of the AI models scored full marks -- unlike five young people at the International Mathematical Olympiad (IMO), a prestigious annual competition where participants must be under 20 years old. Google said Monday that an advanced version of its Gemini chatbot had solved five out of the six math problems set at the IMO, held in Australia's Queensland this month. "We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points - a gold medal score," the U.S. tech giant cited IMO president Gregor Dolinar as saying. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow." Around 10% of human contestants won gold-level medals, and five received perfect scores of 42 points. U.S. ChatGPT maker OpenAI said its experimental reasoning model had also scored a gold-level 35 points on the test. The result "achieved a longstanding grand challenge in AI" at "the world's most prestigious math competition," OpenAI researcher Alexander Wei said in a social media post. "We evaluated our models on the 2025 IMO problems under the same rules as human contestants," he said. "For each problem, three former IMO medalists independently graded the model's submitted proof." Google achieved a silver-medal score at last year's IMO in the city of Bath, in southwest England, solving four of the six problems. That took two to three days of computation -- far longer than this year, when its Gemini model solved the problems within the 4.5-hour time limit, it said. The IMO said tech companies had "privately tested closed-source AI models on this year's problems," the same ones faced by 641 competing students from 112 countries. "It is very exciting to see progress in the mathematical capabilities of AI models," said IMO president Dolinar. Contest organizers could not verify how much computing power had been used by the AI models or whether there had been human involvement, he noted. In an interview with CBS' 60 Minutes earlier this year, one of Google's leading AI researchers predicted that within just five to 10 years, computers would be made that have human-level cognitive abilities -- a landmark known as "artificial general intelligence." Google DeepMind CEO Demis Hassabis predicted that AI technology was on track to understand the world in nuanced ways, and to not only solve important problems, but even to develop a sense of imagination, within a decade, thanks to an increase in investment. "It's moving incredibly fast," Hassabis said. "I think we are on some kind of exponential curve of improvement. Of course, the success of the field in the last few years has attracted even more attention, more resources, more talent. So that's adding to the, to this exponential progress."
[21]
OpenAI's Reasoning Model Wins Gold at 2025 IMO, GPT-5 Coming Soon | AIM
An experimental large language model (LLM) developed by OpenAI has achieved gold medal-level performance at the 2025 International Math Olympiad (IMO), a milestone in AI reasoning capabilities. Announcing the result on X, OpenAI researcher Alexander Wei said, "Our latest experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most prestigious math competition -- the International Math Olympiad." The model was evaluated under the same conditions as human contestants, including two 4.5-hour sessions, no access to tools or internet, and writing detailed proofs based on official IMO problems. The AI successfully solved 5 out of 6 problems, earning 35 out of 42 possible points. Three former IMO medalists graded each solution independently, with final scores based on unanimous agreement. IMO problems are widely regarded as some of the most difficult in competitive mathematics, requiring extended periods of creative reasoning. Wei contextualised the achievement by noting the progression of reasoning benchmarks: "We've now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins)." He added that IMO problems "demand a new level of sustained creative thinking" and that the model's performance demonstrates progress in "general-purpose reinforcement learning and test-time compute scaling." The model is not being released to the public in the near term. "The IMO gold LLM is an experimental research model. We don't plan to release anything with this level of math capability for several months," Wei clarified. While OpenAI plans to release GPT-5 soon, the IMO-capable system is part of a separate research track. "We are releasing GPT-5 soon, and we're excited for you to try it," said Wei. Meanwhile, Yuchen Jin, co-founder of Hyperbolic Labs, also suggested on X that the launch of GPT-5 may be imminent. According to Jin, GPT-5 will not be a single model but a system of multiple specialised models, with a router that dynamically switches between models optimised for reasoning, non-reasoning, and tool use. He added that this architecture is likely why OpenAI CEO Sam Altman previously spoke about "fixing model naming," as users would no longer need to select a specific model, with prompts automatically routed to the most suitable one. Jin also noted that GPT-6 is already in training. "I just hope they're not delaying it for more safety tests," he wrote. Wei also acknowledged the broader implications. "This underscores how fast AI has advanced in recent years. In 2021, my PhD advisor, Jacob Steinhardt, had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark... Instead, we have IMO gold." Wei credited Sheryl Hsu, Noam Brown, and others for their role in the research.Last year, Google DeepMind's AlphaProof and AlphaGeometry 2 solved four out of six problems from this year's International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medalist in the competition.
[22]
New Google AI system wins gold medal in prestigious math competition - SiliconANGLE
New Google AI system wins gold medal in prestigious math competition An artificial intelligence model developed by Alphabet Inc.'s Google DeepMind unit has won a gold medal in the International Mathematical Olympiad. The company announced the achievement today, two days after OpenAI disclosed that it has reached the same milestone. However, the ChatGPT developer reportedly earned its gold medal in a different manner. Whereas Google's AI model underwent the same evaluation as human test takers, OpenAI's submission was graded by a group of former contest participants. The International Mathematical Olympiad, or IMO, is a prestigious math competition for high school students. Participating countries each send 6 contestants who must solve 6 questions in 4.5 hours. This year, 67 of the 630 participants won a gold medal. IMO questions are relatively narrow in scope. They primarily focus on four branches of mathematics taught in high school: algebra, combinatorics, geometry and number theory. Nevertheless, the proofs necessary to solve the problems are highly complicated. Each proof comprises multiple pages of dense mathematical formulae accompanied by natural language explanations. Google used an algorithm based on its Gemini series of large language models to win the gold medal. The LLM is equipped with Deep Think, a feature that the search giant debuted in May. The capability allows an AI to generate multiple potential answers to a prompt instead of the usual one and then combine them. Google honed the AI's math capabilities using reinforcement learning, a common approach to training reasoning models. In a reinforcement learning project, researchers give an LLM sample questions and provide feedback on the quality of each response. The LLM then analyzes the feedback to find ways of improving its capabilities. Google taught Gemini using a "curated corpus of high-quality solutions to mathematics." The company's researchers didn't simply include a set of math problems and their solutions, but also provided information on the intermediate steps necessary to reach each given solution. Additionally, they added in "general hints and tips" on how to tackle IMO questions. Google's model answered five of the six questions in this year's contest correctly. The sixth problem, the most complicated in the set, required calculating the number of tiles needed to cover a two-dimensional space. It was solved correctly by 5 of the 630 students who participated. In last year's IMO contest, two Google-developed AI models jointly earned a silver medal by solving four of the six problems. One of the algorithms was optimized to answer geometry questions while the other focused on generating proofs. According to DeepMind, accuracy is only one of the metrics by which its Gemini model outperforms those algorithms. "AlphaGeometry and AlphaProof required experts to first translate problems from natural language into domain-specific languages, such as Lean, and vice-versa for the proofs. It also took two to three days of computation," DeepMind researchers detailed in a blog post. "Our advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions - all within the 4.5-hour competition time limit." Google plans to roll out the AI model in phases. It will first test the algorithm with a group of mathematicians before making it available in Google AI Ultra, a $250 subscription announced earlier this year. The plan increases the usage limits of the company's Gemini AI assistant and provides access to a number of other AI services.
[23]
Google and OpenAI Battle for Math Olympiad Glory | AIM
Google says that it is the first time an AI system has officially reached the gold-medal threshold in the IMO. An advanced version of Google DeepMind's Gemini model has achieved gold-medal performance at the 2025 International Mathematical Olympiad (IMO), solving five of the six problems and scoring 35 out of a possible 42 points. The full set of Gemini's solutions is available online. Google says that it is the first time an AI system has officially reached the gold-medal threshold in the IMO, the world's leading mathematics competition for pre-university students. The result was confirmed by IMO coordinators, who graded the model's work using the same standards applied to human participants. However, OpenAI also recently announced achieving a similar feat. Announcing the result on X, OpenAI researcher Alexander Wei said, "Our latest experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most prestigious math competition -- the International Math Olympiad." Google DeepMind chief Demis Hassabis said in a post on X on this disparity. "We didn't announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved," he said. He added that they have now been authorised to share results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system. "We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points -- a gold medal score," said Prof Gregor Dolinar, president of the IMO. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise, and most of them easy to follow." Meanwhile, OpenAI's model also successfully solved 5 out of 6 problems, earning 35 out of 42 possible points. Three former IMO medalists graded each solution independently, with final scores based on unanimous agreement. Gemini operated entirely in natural language, generating rigorous proofs directly from the official problem statements within the IMO's 4.5-hour time limit. This represents a significant improvement over last year's effort, when DeepMind's AlphaGeometry and AlphaProof systems required translating problems into formal languages and involved multiple days of computation. The latest result by Google was made possible by Gemini's Deep Think mode, which uses research techniques including parallel thinking. This allows the model to consider and combine multiple solution paths simultaneously before arriving at a final answer. DeepMind also applied reinforcement learning strategies to improve multi-step reasoning and trained Gemini using a curated dataset of mathematical solutions. General strategies for approaching IMO problems were also included in its instruction set. A version of the Deep Think model will be released to a limited group of testers, including professional mathematicians, before a broader rollout to Google AI Ultra subscribers. DeepMind noted that while this year's performance used natural language exclusively, work continues on formal reasoning tools like AlphaGeometry and AlphaProof. The long-term goal is to develop AI systems that combine fluency in natural language with the reliability of formal mathematical verification. "We are still only at the start of AI's potential to contribute to mathematics," the company said. "By teaching our systems to reason more flexibly and intuitively, we are getting closer to building AI that can solve more complex and advanced mathematics."
[24]
Google's Gemini Deep Think AI wins gold at the International Mathematical Olympiad
TL;DR: Google's Gemini Deep Think AI achieved a gold-medal performance at the International Mathematical Olympiad by solving five of six advanced problems within the 4.5-hour limit. Its parallel thinking and specialized training mark a significant breakthrough in AI-driven complex mathematical problem-solving. The International Mathematical Olympiad (IMO) has been recognized as the premier mathematics competition for some of the world's brightest young minds since 1959. With participants from a wide range of countries, each competitor is tasked with solving six "exceptionally difficult" problems spanning fields such as algebra, combinatorics, geometry, and number theory. Only 8% of participants earn a gold medal, and now we can add Google's Gemini Deep Think AI to the list. This advanced version of Gemini solved five out of the six problems "perfectly" according to Google, which was enough for it to achieve a gold-medal performance. If you're wondering what the math problems were and the solutions, head here (PDF). But fair warning, it's a lot more advanced than 12 + 57. "We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points - a gold medal score," IMO President, Prof. Dr. Gregor Dolinar said, adding that it was not only the solutions that were impressive, but how well laid out and easy to follow they were. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise, and most of them easy to follow." Making Gemini Deep Think AI's performance even more impressive is that the AI was able to solve the problems within the 4.5-hour competition time limit. To put that into perspective, Google DeepMind's 2024 silver-medal performance using AlphaProof and AlphaGeometry 2 systems took two to three days of computation alongside a team of experts translating the problems from natural language to something the AI could work with. Part of the impressive improvement comes from Gemini Deep Think's "parallel thinking" features, which allow it to explore multiple solutions simultaneously. In addition to this, Google also trained this custom version of Deep Think in learning and problem-solving techniques better suited to complex mathematics. And yes, Google will roll out the Gemini Deep Think model to "trusted" mathematicians and testers ahead of its availability to Google AI Ultra subscribers. "Google DeepMind has ongoing collaborations with the mathematical community, but we are still only at the start of AI's potential to contribute to mathematics," Google writes in its announcement. "By teaching our systems to reason more flexibly and intuitively, we are getting closer to building AI that can solve more complex and advanced mathematics."
[25]
Gemini AI just won a gold medal in a math Olympiad
Google has announced that an advanced version of Gemini Deep Think achieved gold-medal level performance at the International Mathematical Olympiad (IMO) 2025 by solving five out of six problems perfectly. The International Mathematical Olympiad, established in 1959, is an annual competition for pre-university mathematicians. Participating countries send six elite students to solve six complex problems across algebra, combinatorics, geometry, and number theory. Medals are awarded to the top half of contestants, with approximately 8% earning gold medals. During the IMO 2025, which concluded last week, an advanced version of Gemini Deep Think secured 35 points, achieving a gold-medal equivalent performance. This iteration of Gemini Deep Think was specifically trained using novel reinforcement learning techniques. These techniques enable the model to leverage more multi-step reasoning, problem-solving, and theorem-proving data. Additionally, Google provided Gemini with a curated corpus of high-quality solutions to mathematics problems and integrated general hints and tips on approaching IMO problems into its instructions. Nobel-winning AI: DeepMind duo's breakthrough in the field of chemistry In contrast to previous efforts, this advanced Gemini Deep Think operated entirely in natural language. It generated rigorous mathematical proofs directly from the official problem descriptions within the 4.5-hour competition time limit. This marks a progression from Google DeepMind's 2024 performance at the IMO, where AlphaGeometry and AlphaProof achieved a silver medal by solving four out of six problems and scoring 28 points. However, those systems required problems to be translated into domain-specific languages and utilized 2-3 days of computation. Deep Think is characterized as an "enhanced reasoning mode" that incorporates "latest research techniques," including parallel thinking. This setup allows the model to simultaneously explore and combine multiple potential solutions before delivering a final answer, rather than following a single, linear line of reasoning. Google plans to release a version of this Deep Think model to a select group of trusted testers, including mathematicians. Following this testing phase, the model will subsequently become available through Google AI Ultra.
[26]
Gemini and ChatGPT Outscore High Schoolers in International Math Olympiad
Both chatbots achieved the score without any human interaction Gemini and ChatGPT both participated in this year's International Math Olympiad (IMO) and achieved gold medal-level scores in the competition. Google DeepMind highlighted that its artificial intelligence (AI) chatbot officially entered the competition and was able to solve five out of six questions, following the test's rules, without any human interaction. On the other hand, OpenAI's experimental research model was used for the test, and its results were independently evaluated. The San Francisco-based AI firm says scores were finalised after unanimous consensus. In separate posts on X (formerly known as Twitter), Google DeepMind CEO Demis Hassabis and OpenAI's Member of Technical Staff Alexander Wei announced that their chatbots achieved gold medal-level scores in 2025 IMO. Both Gemini and ChatGPT solved five out of six problems and scored 35 out of 42 marks, which is considered enough for a gold medal. While Gemini Deep Think model was used for the competition, OpenAI used an unnamed experimental research model for the Olympiad. IMO is one of the oldest-standing annual mathematics competitions for school students. It was first held in 1959 in Romania, and at present, students from more than 100 countries participate in the competition. The competition focuses on mathematical proofs instead of solution-based problems. This means participants have to use logic, various mathematical theorems, and knowledge of applied mathematics to provide a proof. The quality of proof is then graded by evaluators, and participants are given scores. Hassabis said Gemini was able to operate end-to-end in natural language and produce mathematical proof directly from the problem descriptions within the 4.5-hour time limit. This enhanced Gemini Deep Think model will now be made available to select testers and mathematicians, and later rolled out to the Google AI Ultra subscribers. According to a TechCrunch report, OpenAI also participated in the competition, but not officially. The company is said to have hired three former IMO medalists who understood the grading system as third-party evaluators. The AI firm reportedly reached out to IMO with the scores. Wei, in his post, highlighted that the scores were announced after unanimous consensus. In a separate post, Hassabis indirectly called out OpenAI for not following all of the official rules and lengthy processes that IMO asked other AI labs to follow. He also hinted at OpenAI announcing results prematurely on Friday and said, "we didn't announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved."
[27]
'Astonishing': AI Models From Google, OpenAI Win Gold Medals in an International Math Competition
The AI models were able to solve five of the six problems presented at the competition. AI just scored a major win at an international math competition. For the first time, AI models from Google DeepMind and OpenAI achieved gold medal status at the 2025 International Math Olympiad (IMO), a challenging math contest for high school students that has been held annually since 1959. The competition involves two 4.5-hour exams to solve six total problems, without the help of the Internet or external tools. Related: The CEO of Google's AI Initiative Is Worried About 2 Things, and Neither Is AI Replacing Jobs The New York Times reports that OpenAI and Google's AI models responded to questions using natural language with no human intervention. Both models were able to solve five of the six problems presented at the 2025 competition within the contest's time restraints, marking the first time AI models have achieved such a level of success. The two models tied in score, with each earning 35 points out of a possible 42 points on the IMO, exactly at the cutoff point for a gold medal. OpenAI announced the results on Saturday while Google waited until Monday. Google DeepMind worked with IMO to have its AI system's performance graded and certified by the committee this year, while OpenAI did not formally enter the competition. Instead, OpenAI asked three former IMO medalists to independently grade its AI model's answers to each question, finalizing scores after "unanimous consensus." According to the Google announcement, only 8% of the high school students who compete in IMO typically receive a gold medal. Google's gold-medal performance this year was one step above its results last year, when its AI received a silver medal, solving four out of the six problems presented in the competition. Related: How a Love of Chess Led the CEO of Google's DeepMind to a Career in AI -- and a Nobel Prize IMO's President, Dr. Gregor Dolinar, called Google DeepMind's solutions this year "astonishing in many respects," while IMO graders found [the solutions] to be "clear, precise, and most of them easy to follow," Dolinar stated. OpenAI CEO Sam Altman said in a post on X on Saturday that while OpenAI does not plan to release an AI model with IMO gold capabilities "for many months," the gold medal was "a significant marker of how far AI has come over the past decade." OpenAI used a general-purpose reasoning system to tackle the competition, not a specialized math system, as the company works towards general intelligence. Meanwhile, Google DeepMind CEO Demis Hassabis wrote in a post on X on Monday that Google also used an advanced version of its general-purpose Gemini reasoning model, which will be available "to a set of trusted testers" before rolling it out to Google AI Ultra subscribers, who pay $250 per month for advanced capabilities and 30 TB of storage. This year, 630 high school students participated in IMO in Queensland, Australia, with 67 students achieving gold medals, per Reuters.
[28]
Google and OpenAI's AI models win milestone gold at global math competition - The Economic Times
Google and OpenAI's AI models won gold medals at the International Mathematical Olympiad, marking a breakthrough in AI math reasoning. Using general-purpose models processing natural language, they solved five of six problems. This signals the near use of AI in advanced research, with collaboration between AI and mathematicians on the horizon.Alphabet's Google and OpenAI said their artificial-intelligence models won gold medals at a global mathematics competition, signaling a breakthrough in math capabilities in the race to build powerful systems that can rival human intelligence. The results marked the first time that AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad for high-school students. Both companies' models solved five out of six problems, achieving the result using general-purpose "reasoning" models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms. The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit. "I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians," Jung told Reuters. OpenAI's breakthrough was achieved with a new experimental model centered on massively scaling up "test-time compute." This was done by both allowing the model to "think" for longer periods and deploying parallel computing power to run numerous lines of reasoning simultaneously, according to Noam Brown, researcher at OpenAI. Brown declined to say how much in computing power it cost OpenAI, but called it "very expensive." To OpenAI researchers, it is another clear sign that AI models can command extensive reasoning capabilities that could expand into other areas beyond math. The optimism is shared by Google researchers, who believe AI models' capabilities can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in 2003. Of the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11%, achieved gold-medal scores. Google's DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in May. Unlike previous AI attempts that relied on formal languages and lengthy computation, Google's approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months. This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities. IMO judges certified the results of those companies, including Google, and asked them to publish results on July 28. "We respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts and the students had rightly received the acclamation they deserved," Google DeepMind CEO Demis Hassabis said on X on Monday. OpenAI, which published its results on Saturday and first claimed gold-medal status, said in an interview that it had permission from an IMO board member to do so after the closing ceremony on Saturday. The competition on Monday allowed cooperating companies to publish results, Gregor Dolinar, president of IMO's board, told Reuters.
[29]
Humans beat AI gold-level score at top maths contest - The Economic Times
Humans beat generative AI models made by Google and OpenAI at a top international mathematics competition, despite the programmes reaching gold-level scores for the first time. US ChatGPT maker OpenAI said that its experimental reasoning model had scored a gold-level 35 points on the test.Humans beat generative AI models made by Google and OpenAI at a top international mathematics competition, despite the programmes reaching gold-level scores for the first time. Neither model scored full marks -- unlike five young people at the International Mathematical Olympiad (IMO), a prestigious annual competition where participants must be under 20 years old. Google said Monday that an advanced version of its Gemini chatbot had solved five out of the six maths problems set at the IMO, held in Australia's Queensland this month. "We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points -- a gold medal score," the US tech giant cited IMO president Gregor Dolinar as saying. "Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow." Around 10 percent of human contestants won gold-level medals, and five received perfect scores of 42 points. US ChatGPT maker OpenAI said that its experimental reasoning model had scored a gold-level 35 points on the test. The result "achieved a longstanding grand challenge in AI" at "the world's most prestigious math competition", OpenAI researcher Alexander Wei wrote on social media. "We evaluated our models on the 2025 IMO problems under the same rules as human contestants," he said. "For each problem, three former IMO medalists independently graded the model's submitted proof." Google achieved a silver-medal score at last year's IMO in the British city of Bath, solving four of the six problems. That took two to three days of computation -- far longer than this year, when its Gemini model solved the problems within the 4.5-hour time limit, it said. The IMO said tech companies had "privately tested closed-source AI models on this year's problems", the same ones faced by 641 competing students from 112 countries. "It is very exciting to see progress in the mathematical capabilities of AI models," said IMO president Dolinar. Contest organisers could not verify how much computing power had been used by the AI models or whether there had been human involvement, he cautioned.
[30]
OpenAI's New AI Model Solves 5 Out Of 6 Problems On The World's Toughest Math Olympiad -- But Critics Say It Stole The Spotlight From Student Geniuses - Alphabet (NASDAQ:GOOG), Alphabet (NASDAQ:GOOGL)
OpenAI's latest experimental AI model reportedly gave a near-gold performance on the International Mathematical Olympiad (IMO), but its premature announcement drew criticism for eclipsing the student competitors the contest was meant to honor. What Happened: On Saturday, OpenAI's Alexander Wei revealed on X, formerly Twitter, that the company's new large language model (LLM) solved 5 out of 6 IMO problems under authentic exam conditions -- matching the performance of gold medalists in what's widely considered the most prestigious and difficult high school math competition globally. "This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence," OpenAI CEO Sam Altman posted on X, calling the achievement a decade-long dream come true. See Also: Trump And Wall Street Want Rate Cuts -- The Data Says: Keep Waiting However, the announcement stirred backlash among the math and AI communities. Thang Luong, a lead AI researcher at Alphabet Inc.'s GOOG GOOGL Google DeepMind, replied to a report that the IMO had asked AI companies to delay announcements until after the closing ceremony to allow young human competitors to be recognized without being overshadowed. Luong added that without an official evaluation from IMO's private marking rubric, OpenAI's claim of "gold-level" performance was inaccurate. "With one point deducted, it is a Silver, not Gold," he said. Elon Musk, whose AI venture xAI earlier this month launched Grok 4, also commented, comparing OpenAI's feat to when AI beat humans at chess and Go. He said math competitions will soon be "trivial" for AI. Trending Investment OpportunitiesAdvertisementArrivedBuy shares of homes and vacation rentals for as little as $100. Get StartedWiserAdvisorGet matched with a trusted, local financial advisor for free.Get StartedPoint.comTap into your home's equity to consolidate debt or fund a renovation.Get StartedRobinhoodMove your 401k to Robinhood and get a 3% match on deposits.Get Started Why It's Important: The incident highlights growing tensions around AI's rapid progress and the ethics of showcasing machine achievements in human-centered spaces. It also underscores how quickly general-purpose AI is encroaching on elite intellectual tasks once thought immune to automation. Last week, ARK Invest CEO Cathie Wood raised concerns about rising unemployment among new college graduates, pointing to AI's disruptive impact on entry-level roles. Citing Wall Street Journal data, she noted that recent grad unemployment has risen from 4% to over 6%. Earlier, Craig Shapiro also warned that AI could disrupt 25% of all jobs by 2030 -- a challenge he believes the Federal Reserve has little ability to address. Read Next: Bill Gates Commits Majority Of $200 Billion Foundation Budget To Africa Over Next 20 Years Amid Trump's Massive USAID Cuts Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. Image Via Shutterstock GOOGAlphabet Inc$186.420.93%Stock Score Locked: Edge Members Only Benzinga Rankings give you vital metrics on any stock - anytime. Unlock RankingsEdge RankingsMomentum40.96Growth86.90Quality86.25Value51.76Price TrendShortMediumLongOverviewGOOGLAlphabet Inc$185.541.07%Market News and Data brought to you by Benzinga APIs
[31]
OpenAI's latest programme took home a gold medal at the International Math Olympiad
It managed to crack five of the six questions being posed at the event. Not even the brainiest of mathematicians are safe from the rise of artificial intelligence. Recently, during the International Math Olympiad, OpenAI decided to test its latest experimental reasoning LLM technology, which proved to be so effective it earned a prestigious gold medal. According to researcher Alexander Wei, the technology competed in the event under the same rules as the human counterparts, and managed to solve five of the six problems that it was faced with, with three former medalists grading its work and determining that it earned a score of 35/42, enough for a gold medal. As per Wei, the AI had to operate in the same way as humans, meaning it had 4.5 hours to complete the exam, could not use any tools or internet, and had to read the problems and pose solutions in the form of natural language proofs. It achieved the task at hand and now OpenAI has a "model that can craft intricate, watertight arguments at the level of human mathematicians." While OpenAI does intend to release GPT-5 soon, Wei did mention that this model with this level of advanced mathematics capability, will not be available for several months, meaning any students looking to get a jump on their homework will have to wait for a tad longer...
[32]
How Google and OpenAI's AI Models Won Gold at the 2025 Math Olympiad
What happens when artificial intelligence outshines human brilliance on one of the world's most prestigious stages? At the 2025 International Mathematical Olympiad (IMO), Google's DeepMind Gemini and OpenAI's large language model (LLM) achieved what many thought was still years away: earning gold medals in a competition historically dominated by the sharpest human minds. Yet, this new achievement was not without its share of turbulence. While Google's Gemini basked in well-earned accolades, OpenAI found itself embroiled in controversy over the timing of its announcement, sparking debates about the ethical responsibilities of AI developers. The juxtaposition of triumph and tension paints a vivid picture of the evolving relationship between innovative technology and societal expectations. Wes Roth provides more insights into the remarkable advancements that propelled these AI models to success, from reinforcement learning breakthroughs to their ability to solve complex problems in natural language. But beyond the technical marvels lies a deeper narrative: the ethical dilemmas, the growing gap between AI capabilities and human intuition, and the implications for the future of human-AI collaboration. As we unpack the triumphs and tribulations of Google and OpenAI, you'll discover not just how these models excelled, but also what their success -- and controversy -- means for the broader AI landscape. The question remains: can innovation and integrity evolve hand in hand? For the first time, general-purpose AI models demonstrated their capability to compete alongside the world's brightest human minds in mathematical problem-solving. Google DeepMind's Gemini and OpenAI's LLM earned gold medals by solving problems presented in natural language, showcasing a significant leap in AI capabilities. Despite this success, human participants still outperformed the AI, with top competitors achieving perfect scores of 42 out of 42 points. This underscores the gap that remains between AI and human reasoning in certain domains, even as AI continues to evolve and improve. The participation of AI models in such a prestigious competition also raises questions about the future of human-AI collaboration. While these systems excel at processing vast amounts of data and performing logical reasoning, they still lack the intuitive and creative problem-solving abilities that define human intelligence. This balance between AI's strengths and limitations will likely shape its role in future problem-solving scenarios. The exceptional performance of these AI models can be attributed to advancements in their architecture and training methodologies. Several key innovations played a pivotal role: These advancements reflect a broader trend toward creating more autonomous and versatile AI systems capable of addressing real-world challenges. By integrating natural language understanding with advanced reasoning capabilities, these models are setting new benchmarks for what AI can achieve. Take a look at other insightful guides from our broad collection that might capture your interest in Google DeepMind Gemini. The 2025 achievement represents a significant leap forward compared to previous years. In 2024, Google relied on specialized AI models that required manual problem translation, which limited their effectiveness and resulted in a silver medal. The transition to general-purpose LLMs capable of directly solving problems in natural language marks a major milestone in AI development. This evolution highlights the growing importance of self-reasoning and autonomous problem-solving in modern AI systems. The progress made over the past year also underscores the accelerating pace of AI innovation. As models become more sophisticated, their ability to tackle increasingly complex tasks will continue to expand. This trajectory suggests that AI could soon play a more prominent role in fields such as scientific research, engineering, and education, where advanced problem-solving skills are essential. While the technical achievements of these AI models were widely celebrated, OpenAI faced criticism for allegedly announcing its results prematurely. Reports indicate that the IMO had requested all announcements be delayed until after the competition's closing ceremony to maintain focus on the student participants. OpenAI denied any wrongdoing, stating that it adhered to the IMO's guidelines. Nevertheless, the incident has sparked broader discussions about the ethical responsibilities of AI developers in publicizing their accomplishments. This controversy highlights the need for clear communication and ethical standards in the rapidly evolving field of AI. As AI systems become more integrated into high-profile events and real-world applications, the importance of transparency and responsible behavior will only grow. Making sure that AI achievements are communicated in a way that respects all stakeholders will be critical to maintaining public trust and fostering collaboration within the AI community. Reinforcement learning (RL) was a cornerstone of the success achieved by both Gemini and OpenAI's LLM. Several RL techniques were instrumental in enhancing their performance: This approach represents a shift from traditional pre-training compute to reinforcement learning compute, which emphasizes iterative improvement and adaptability. As RL techniques continue to evolve, they are expected to drive further advancements in AI, allowing systems to handle increasingly sophisticated tasks with minimal human intervention. The ability of AI models to self-train and improve without heavy reliance on human-generated data is a fantastic development. This capability accelerates AI progress while reducing dependence on large-scale pre-training datasets. As reinforcement learning techniques advance, AI systems will likely tackle increasingly complex tasks, unlocking new possibilities across various fields. Potential applications include: However, these advancements also raise important questions about the ethical and societal implications of AI. As AI systems become more capable, making sure that their development aligns with human values and priorities will be essential. Google has announced plans to publish detailed research on the techniques used in its Gemini model, continuing its tradition of transparency in AI development. This openness is expected to benefit the broader AI community, fostering collaboration and innovation. OpenAI is also likely to contribute to the growing body of research, as competition between leading organizations drives further advancements in reinforcement learning and self-learning methodologies. The commitment to transparency by leading AI organizations is a positive step toward building trust and encouraging responsible innovation. By sharing their findings, these companies can help ensure that the benefits of AI are widely distributed and that the technology is developed in a way that serves the broader interests of society. The success of AI at the IMO has surprised experts and observers alike, many of whom underestimated the timeline for such achievements. This milestone serves as a reminder of the rapid pace of AI development and its growing capabilities in areas once thought to be exclusive to human intelligence. As AI continues to evolve, its role in solving real-world problems will expand, presenting both opportunities and challenges for society. Looking ahead, the focus will likely shift toward addressing the ethical, social, and economic implications of AI. Making sure that AI systems are developed and deployed responsibly will be critical to maximizing their benefits while minimizing potential risks. By fostering collaboration, transparency, and ethical practices, the AI community can help shape a future where technology serves as a powerful tool for progress and innovation.
[33]
The Truth Behind AI'd Gold Medal Math Olympiad : What the Media Isn't Saying
What if the next headline you read about AI wasn't just exciting -- but also misleading? Imagine seeing "AI Wins Gold at the International Math Olympiad" and immediately picturing a machine outsmarting the brightest human minds in real-world problem-solving. Sounds new, right? But here's the catch: while OpenAI's model did earn a gold medal, it also stumbled on the most creative problem, exposing the limits of its reasoning. This isn't just a story of triumph -- it's a reminder of how easily we can misinterpret AI's achievements when headlines oversimplify the nuances. In a world captivated by AI breakthroughs, the way we read and interpret these milestones matters more than ever. This perspective AI Explained unpacks the layers behind AI's latest accomplishments, from its gold medal at the IMO to the unveiling of GPT-5, and explores what these advancements truly mean for society. You'll discover why AI's victories often come with caveats, how competition between tech giants shapes the narrative, and why transparency in AI research is more urgent than ever. Along the way, we'll challenge the hype and highlight the critical questions that often go unasked. Understanding AI's strengths and limitations isn't just about staying informed -- it's about shaping how we prepare for the future. After all, the headlines may dazzle, but the real story lies in the details we often overlook. OpenAI's model successfully solved five out of six problems at the IMO, earning a gold medal. This is particularly noteworthy because the model was not specifically trained for mathematics. However, it struggled with the most complex problem, which required creative reasoning -- a skill that remains challenging for AI to replicate. This limitation underscores a crucial distinction: while AI demonstrates exceptional computational efficiency, it often falls short in areas requiring nuanced ingenuity and abstract thinking. Achievements like these, though impressive, are confined to controlled environments and do not necessarily translate to solving real-world challenges. By highlighting both strengths and weaknesses, this milestone serves as a reminder of the boundaries of current AI technology. The announcement also sheds light on the competitive nature of AI research. OpenAI's achievement comes amid reports that Google DeepMind has achieved similar results, though detailed findings have not yet been released. The timing of OpenAI's announcement has sparked speculation about strategic positioning in the race for AI dominance. This rivalry reflects a broader trend in the field, where public perception and technological milestones increasingly shape the narrative. As organizations compete to showcase their breakthroughs, the focus often shifts from collaboration to competition. This competitive environment raises questions about transparency and the potential for shared progress, as companies prioritize proprietary advancements over open collaboration. Browse through more resources below from our in-depth content covering more areas on AI reasoning. AI's growing proficiency in reasoning and professional tasks has profound implications for the workforce. Tools like OpenAI's agent mode demonstrate the potential to enhance productivity, but they also raise concerns about job displacement, particularly in entry-level roles. For example, AI can now draft reports, analyze data, and assist in legal research -- tasks traditionally performed by humans. While these advancements streamline workflows and improve efficiency, they also challenge traditional career pathways. This shift emphasizes the need for workforce training and education to help individuals adapt to an evolving job market. Preparing for these changes will require proactive measures to ensure that workers can thrive alongside AI technologies. Despite its achievements, AI remains far from flawless. One of the most significant challenges is hallucination, where the model generates incorrect or nonsensical information. This poses serious risks in high-stakes fields such as financial analysis, medical research, or legal decision-making. Moreover, AI's performance can vary widely depending on the context, with its weakest moments undermining its reliability. These limitations highlight the importance of cautious deployment and rigorous oversight, especially in industries where errors can have severe consequences. Making sure that AI is used responsibly requires a combination of technical safeguards, ethical guidelines, and regulatory frameworks. A critical issue in AI development is the lack of transparency. OpenAI's announcement, while impressive, provided limited insight into the methodology, computational resources, or costs involved in training the model. This opacity makes it difficult for researchers, policymakers, and the public to assess the broader implications of such achievements. Greater transparency -- through peer-reviewed publications, open data sharing, and detailed disclosures -- could foster a more collaborative and accountable research environment. This would not only benefit the AI community but also help build public trust in these technologies. Transparency is essential for making sure that AI advancements are understood, scrutinized, and responsibly integrated into society. AI's impact extends far beyond academic benchmarks like the IMO. In software development, for instance, AI tools can assist with coding and debugging. However, they may also introduce inefficiencies for experienced developers by generating suboptimal solutions that require additional refinement. On the other hand, AI has delivered tangible benefits in areas such as data center management, where it has optimized energy usage and reduced operational costs. These mixed results underscore the importance of context when evaluating AI's effectiveness. Success in one domain does not guarantee universal applicability, and careful consideration is needed to determine where AI can truly add value. Headlines celebrating AI milestones can sometimes lead to overestimations of its capabilities. For example, solving IMO problems is undoubtedly impressive, but it does not equate to replacing human creativity or expertise in complex, real-world scenarios. Similarly, benchmarks like the IMO, while valuable, do not fully capture AI's practical utility across diverse applications. It is essential to maintain a nuanced understanding of these achievements to avoid misconceptions about AI's true potential. By critically evaluating such milestones, you can better appreciate both the opportunities and limitations of this rapidly evolving technology. The release of new models, such as GPT-5, promises further advancements in AI reasoning and problem-solving. Competitors like Google DeepMind are also expected to unveil their own breakthroughs, intensifying the pace of innovation. However, it is crucial to approach these developments with a balanced perspective. While AI's progress is undeniable, its limitations remain significant. Recognizing both its potential and its shortcomings is essential for navigating this complex field responsibly. As AI continues to evolve, staying informed and thoughtful will help ensure that its benefits are maximized while its risks are carefully managed.
[34]
OpenAI Achieves Breakthrough in AI Model, Wins Gold at Math Olympiad
OpenAI's new AI model has won a gold medal at the 2025 International Math Olympiad (IMO). The unreleased model was permitted to participate under official test conditions without access to the internet or any coding tools. It scored 35 out of 42, earning just enough points to qualify for the gold medal. The IMO is regarded as the most challenging math competition for high school students worldwide. Unlike Google DeepMind's task-specific AlphaGeometry 2, is based on a general-purpose reasoning system. , an OpenAI technical staff member, stated on X (formerly Twitter), "We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling." In other words, the model wasn't trained just for geometry or Olympiad-style problems; it was optimized to reason very broadly.
[35]
Google clinches milestone gold at global math competition, while OpenAI also claims win
The results marked the first time that AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad (IMO) for high-school students. Both companies' models solved five out of six problems, achieving the result using general-purpose "reasoning" models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms. While Google DeepMind worked with the IMO to have their models graded and certified by the committee, OpenAI did not officially enter the competition. The startup revealed their models have achieved a gold medal-worthy score on this year's questions on Saturday, citing grades by three external IMO medalists. The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit. "I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians," Jung told Reuters. OpenAI's breakthrough was achieved with a new experimental model centered on massively scaling up "test-time compute." This was done by both allowing the model to "think" for longer periods and deploying parallel computing power to run numerous lines of reasoning simultaneously, according to Noam Brown, researcher at OpenAI. Brown declined to say how much in computing power it cost OpenAI, but called it "very expensive." To OpenAI researchers, it is another clear sign that AI models can command extensive reasoning capabilities that could expand into other areas beyond math. The optimism is shared by Google researchers, who believe AI models' capabilities can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in 2003. Of the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11 per cent, achieved gold-medal scores. Google's DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in May. Unlike previous AI attempts that relied on formal languages and lengthy computation, Google's approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months. This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities. IMO judges certified the results of those companies, including Google, and asked them to publish results on July 28. "We respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts and the students had rightly received the acclamation they deserved," Google DeepMind CEO Demis Hassabis said on X on Monday. OpenAI, which published its results on Saturday and first claimed gold-medal status, said in an interview that it had permission from an IMO board member to do so after the closing ceremony on Saturday. The competition on Monday allowed cooperating companies to publish results, Gregor Dolinar, president of IMO's board, told Reuters. ---
[36]
Google and OpenAI's AI models win milestone gold at global math competition
(Reuters) -Alphabet's Google and OpenAI said their artificial-intelligence models won gold medals at a global mathematics competition, signaling a breakthrough in math capabilities in the race to build powerful systems that can rival human intelligence. The results marked the first time that AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad for high-school students. Both companies' models solved five out of six problems, achieving the result using general-purpose "reasoning" models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI firms. The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit. "I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians," Jung told Reuters. The same idea can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in 2003. Of the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11%, achieved gold-medal scores. Google's DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in May. Unlike previous AI attempts that relied on formal languages and lengthy computation, Google's approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months. This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities. IMO judges certified the results of those companies, including Google, and asked them to publish results on July 28. "We respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts and the students had rightly received the acclamation they deserved," Google DeepMind CEO Demis Hassabis said on X on Monday. However, OpenAI, which did not work with the IMO, self-published its results on Saturday, allowing it to be first among AI firms to claim gold-medal status. In turn, the competition on Monday allowed cooperating companies to publish results, Gregor Dolinar, president of IMO's board, told Reuters. (Reporting by Kenrick Cai in San Francisco and Jaspreet Singh in Bengaluru; Editing by Matthew Lewis)
[37]
How OpenAI's LLM mastered one of the world's toughest math olympiads
Reinforcement learning powers OpenAI's LLM to gold-level math olympiad success under real exam conditions. In a sun-drenched convention center on Australia's Sunshine Coast, the 66th International Mathematical Olympiad (IMO) unfolded this month. It brought together 635 of the world's brightest young minds from 114 countries. Amid the flurry of pencils and the tension of geopolitical debates, an unexpected contender emerged, not a teenager with a calculator, but an artificial intelligence model developed by OpenAI. On July 19, 2025, OpenAI research scientist Alexander Wei announced that their experimental reasoning large language model (LLM) had achieved a gold medal-level performance, solving five of the six grueling problems in the 2025 IMO. This milestone, validated under the same stringent conditions as human contestants - two 4.5-hour sessions, no tools or internet - marks a seismic shift in the landscape of artificial intelligence. Also read: OpenAI unveils ChatGPT Agent, AI that can work like real assistant The IMO, widely regarded as the pinnacle of pre-college mathematical competition since its inception in 1959, is known for its challenging problems that demand creative, sustained reasoning. This year, the spotlight shifted to a machine. Wei's thread on X detailed the LLM's journey: tackling problems like the 2025 IMO Problem 1, which required intricate proofs about non-negative integers and geometric lines, the model crafted multi-page, watertight arguments that earned it 35 out of 42 points, a score sufficient for gold. What sets this achievement apart is not just the result but the approach. Unlike previous AI successes in narrow domains, OpenAI's LLM leveraged general-purpose reinforcement learning and test-time compute scaling. This method, which recent studies in Nature Machine Intelligence (2024) suggest can boost multi-step reasoning by 40%, allowed the model to think creatively over extended horizons, up to 100 minutes per problem, far surpassing earlier benchmarks like the MATH dataset, where a 30% accuracy was optimistically forecasted in 2021. "We've progressed from quick calculations to sustained, human-like reasoning," Wei wrote, a sentiment echoed by OpenAI CEO Sam Altman, who called it a "significant marker of how far AI has come." Also read: OpenAI announces o3 and o4-mini AI reasoning models: Here's what they can do The LLM's solutions, released online, reveal a distinct style, sometimes sassy, often meticulous. For Problem 1, it navigated a labyrinth of conditions about "sunny" lines (those not parallel to the axes or the line x+y=0) with a proof spanning several pages, culminating in a playful "No citation necessary" after computing a mystery number. This flair hints at the model's experimental nature, a research prototype distinct from the forthcoming GPT-5, which OpenAI plans to release soon but without this level of mathematical prowess for months. The implications are profound. This breakthrough challenges the notion that AI lacks true understanding, suggesting a leap toward general intelligence. Experts point to recent advancements in reinforcement learning, which allow models to adapt and reason without task-specific training, as a key driver. For the human contestants, the AI's presence is both inspiration and a call to elevate their own skills. As the Sunshine Coast event concluded, the LLM's gold medal stood as a testament to AI's evolving capabilities to prove that when it comes to pure reason, the boundary between human and machine is blurring. Whether this heralds a new era of intelligence or a redefinition of competition, the 2025 IMO will be remembered not just for its equations, but for the code that cracked them.
[38]
Google's Gemini, OpenAI's ChatGPT score gold-level marks at Math Olympiad: Here's what we know
Google's Demis Hassabis criticised OpenAI for announcing results before official IMO verification. Google DeepMind's Gemini and OpenAI's AI chatbot ChatGPT won gold medals at the 2025 International Mathematical Olympiad (IMO). According to separate announcements from two companies, both AI models solved five of six problems during the competition, scoring 35 out of 42 points. If you are unfamiliar, the IMO was founded in 1859 and is one of the oldest and most difficult mathematics competitions for school students in the world, with participants from more than 100 countries. It entails complex mathematical proofs rather than simple calculations, requiring deep logical reasoning and advanced knowledge. Speaking about the test, Google DeepMind CEO Demis Hassabis stated that the Gemini Depp Think model officially entered the Olympiad and followed the rules, completing the natural language test in approximately 4.5 hours. He stated that Gemini's mathematical proofs were generated entirely without human intervention. Select testers and mathematicians will now have access to the model before it is made available to all Google AI Ultra subscribers. Also read: Oppo Find X8 Pro price drops by over Rs 18,000 on Amazon: How this deal works On the other hand, OpenAI used an unnamed experimental research model. However, the participation was unofficial, with three former IMO medallists serving as independent evaluators to determine the results. The panel confirmed that the 35-point score was reached through unanimous agreement among the evaluators. The company then reportedly submitted the results to the IMO committee, but did not complete the board's formal entry process. While both passed, it caused friction between two technology giants. Hassabis criticised OpenAI for prematurely announcing the results, claiming that DeepMind waited out of respect for the IMO's guidelines and the student participants. Despite this disagreement, both companies have once again demonstrated their reasoning and complex problem-solving abilities.
Share
Copy Link
Google DeepMind and OpenAI's AI models have achieved gold medal-level performance in the 2025 International Mathematical Olympiad (IMO), marking a significant advancement in AI's mathematical reasoning capabilities.
In a significant advancement for artificial intelligence, both Google DeepMind and OpenAI have announced that their AI models achieved gold medal-level performance in the 2025 International Mathematical Olympiad (IMO). This prestigious competition, running since 1959, is known for its challenging proof-based problems that test mathematical reasoning and creativity 13.
Source: Entrepreneur
Google DeepMind's AI model, named Gemini Deep Think, scored 35 out of 42 points on the six IMO problems, correctly solving five out of six questions 12. This performance marks a significant improvement over their 2024 entry, which achieved a silver medal equivalent using specialized systems AlphaProof and AlphaGeometry 2 2.
Thang Luong, a senior scientist at DeepMind, described this year's achievement as a "big paradigm shift" 1. Unlike previous attempts that required human experts to translate problems into a formal language, Gemini Deep Think processed and solved problems entirely in natural language 12.
Source: Analytics India Magazine
OpenAI also reported that its experimental AI model achieved gold medal-level performance on the IMO problems 3. However, their announcement came earlier than expected, causing some controversy within the AI and mathematical communities 34.
Both companies claim their models operated under the same time constraints as human participants: 4.5 hours per session, without internet access or calculators 13.
This achievement represents a major step forward in AI's ability to handle complex mathematical reasoning. Gary Marcus, a neuroscientist and AI critic, called the results "awfully impressive," noting that solving problems at this level demonstrates "really good math problem solving chops" 1.
The success of these general-purpose language models in tackling IMO problems suggests potential applications beyond mathematics. Researchers from both companies believe these advancements could lead to AI systems capable of addressing challenging scientific and research problems 15.
Source: Analytics India Magazine
The announcements have not been without controversy. OpenAI's early release of their results, before the agreed-upon date of July 28, drew criticism from the IMO community and other AI companies 34. Additionally, questions were raised about the self-grading of OpenAI's results, as opposed to the official IMO grading received by Google DeepMind 34.
This competition between AI companies highlights the ongoing race for supremacy in the field, with implications for public perception, talent acquisition, and future development 4.
While these results are promising, some mathematicians and researchers urge caution. Kevin Buzzard of Imperial College London noted that success in the IMO doesn't necessarily translate to readiness for advanced mathematical research 1. Similarly, Ken Ono from the University of Virginia views AI as valuable research partners but emphasizes that these benchmarks don't fully align with the work of theoretical mathematicians 1.
Both DeepMind and OpenAI plan to make versions of their models available to researchers in the coming months, potentially opening new avenues for collaboration between AI and human mathematicians 15. However, the full impact of these advancements on mathematical research and problem-solving remains to be seen.
Summarized by
Navi
Google has launched its new Pixel 10 series, featuring improved AI capabilities, camera upgrades, and the new Tensor G5 chip. The lineup includes the Pixel 10, Pixel 10 Pro, and Pixel 10 Pro XL, with prices starting at $799.
60 Sources
Technology
15 hrs ago
60 Sources
Technology
15 hrs ago
Google launches its new Pixel 10 smartphone series, showcasing advanced AI capabilities powered by Gemini, aiming to compete with Apple in the premium handset market.
22 Sources
Technology
15 hrs ago
22 Sources
Technology
15 hrs ago
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather with improved accuracy, potentially helping to protect Earth's infrastructure from solar storm damage.
6 Sources
Technology
23 hrs ago
6 Sources
Technology
23 hrs ago
Google's latest smartwatch, the Pixel Watch 4, introduces significant upgrades including a curved display, AI-powered features, and satellite communication capabilities, positioning it as a strong competitor in the smartwatch market.
18 Sources
Technology
15 hrs ago
18 Sources
Technology
15 hrs ago
FieldAI, a robotics startup, has raised $405 million to develop "foundational embodied AI models" for various robot types. The company's innovative approach integrates physics principles into AI, enabling safer and more adaptable robot operations across diverse environments.
7 Sources
Technology
15 hrs ago
7 Sources
Technology
15 hrs ago