Curated by THEOUTPOST
On Sat, 8 Feb, 12:05 AM UTC
5 Sources
[1]
DeepMind AI crushes tough maths problems on par with top human solvers
A year ago AlphaGeometry, an artificial-intelligence (AI) problem solver created by Google DeepMind, surprised the world by performing at the level of silver medallists in the International Mathematical Olympiad (IMO), a competition that sets tough maths problems for gifted high-school students. The DeepMind team now says the performance of its upgraded system, AlphaGeometry2, has surpassed the level of the average gold medallist. The results are described in a preprint on the arXiv. "I imagine it won't be long before computers are getting full marks on the IMO," he says Kevin Buzzard, a mathematician at Imperial College London. Solving problems in Euclidean geometry is one of the four topics covered in IMO problems -- the others cover the branches of number theory, algebra and combinatorics. Geometry demands specific skills of an AI, because competitors must provide a rigorous proof for a statement about geometric objects on the plane. In July, AlphaGeometry2 made its public debut alongside a newly unveiled system, AlphaProof, which DeepMind developed for solving the non-geometry questions in the IMO problem sets. AlphaGeometry is a combination of components that include a specialized language model and a 'neuro-symbolic' -- one that does not train by learning from data like a neural network but has abstract reasoning coded in by humans. The team trained the language model to speak a formal mathematical language, which makes it possible to automatically check its output for logical rigour -- and to weed out the 'hallucinations', the incoherent or false statements that AI chatbots are prone to making. For AlphaGeometry2, the team made several improvements, including the integration of Google's state-of-the-art large language model, Gemini. The team also introduced the ability to reason by moving geometric objects around the plane -- such as moving a point along a line to change the height of a triangle -- and solving linear equations. The system was able to solve 84% of all geometry problems given in IMOs in the past 25 years, compared with 54% for the first AlphaGeometry. (Teams in India and China used different approaches last year to achieve gold-medal-level performance in geometry, but on a smaller subset of IMO geometry problems.) The authors of the DeepMind paper write that future improvements of AlphaGeometry will include dealing with maths problems that involve inequalities and non-linear equations, which will be required to to "fully solve geometry". The first AI system to achieve a gold-medal score for the overall test could win a US$5-million award called the AI Mathematical Olympiad Prize -- although that competition requires systems to be open-source, which is not the case for DeepMind Buzzard says he is not surprised by the rapid progress made both by DeepMind and by the Indian and Chinese teams.But, he adds, although the problems are hard, the subject is still conceptually simple, and there are many more challenges to overcome before AI is able to solve problems at the level of research mathematics. AI researchers will be eagerly awaiting the next iteration of the IMO in Sunshine Coast, Australia, in July. Once its problems are made public for human participants to solve, AI-based systems get to solve them, too. (AI agents are not allowed to take part in the competition, and are therefore not eligible to win medals.) Fresh problems are seen as the most reliable test for machine-learning-based systems, because there is no risk that the problems or their solution existed online and may have 'leaked' into training data sets, skewing the results.
[2]
DeepMind AI achieves gold-medal level performance on challenging Olympiad math questions
A team of researchers at Google's DeepMind project, reports that its AlphaGeometry2 AI performed at a gold-medal level when tasked with solving problems that were given to high school students participating in the International Mathematical Olympiad (IMO) over the past 25 years. In their paper posted on the arXiv preprint server, the team gives an overview of AlphaGeometry2 and its scores when solving IMO problems. Prior research has suggested that AI that can solve geometry problems could lead to more sophisticated apps because they require both a high level of reasoning ability and an ability to choose from possible steps in working toward a solution to a problem. To that end, the team at DeepMind has been working on developing increasingly sophisticated geometry-solving apps. Its first iteration was released last January and was called AlphaGeometry; its second iteration is called AlphaGeometry2. The team at DeepMind has been combining it with another system they developed called Alpha Proof, which conducts mathematical proofs. The team found it was able to solve 4 of 6 problems listed in the IMO this past summer. For this new study, the research team expanded testing of the system's ability by giving it multiple problems used by the IMO over the past 25 years. The research team built AlphaGeometry2 by combining multiple core elements, one of which is Google's Gemini language model. Other elements use mathematic rules to come up with solutions to the original problem or parts of it. The team notes that to solve many IMO problems, certain constructs must be added before proceeding, which means their system must be able to create them. Their system then tries to predict which of those that have been added to a diagram should be used to make the necessary deductions required to solve a problem. AlphaGeometry2 suggests steps that might be used to solve a given problem and then checks the steps for logic before using them. To test their system, the researchers chose 45 problems from the IMO, some of which required translating into a more useable form, resulting in 50 total problems. They report that AlphaGeometry2 was able to solve 42 of them correctly, slightly higher than the average human gold medalist in the competition.
[3]
Google's DeepMind AI Can Solve Math Problems on Par with Top Human Solvers
I agree my information will be processed in accordance with the Scientific American and Springer Nature Limited Privacy Policy. A year ago AlphaGeometry, an artificial-intelligence (AI) problem solver created by Google DeepMind, surprised the world by performing at the level of silver medallists in the International Mathematical Olympiad (IMO), a prestigious competition that sets tough maths problems for gifted high-school students. The DeepMind team now says the performance of its upgraded system, AlphaGeometry2, has surpassed the level of the average gold medallist. The results are described in a preprint on the arXiv. "I imagine it won't be long before computers are getting full marks on the IMO," says Kevin Buzzard, a mathematician at Imperial College London. If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Solving problems in Euclidean geometry is one of the four topics covered in IMO problems -- the others cover the branches of number theory, algebra and combinatorics. Geometry demands specific skills of an AI, because competitors must provide a rigorous proof for a statement about geometric objects on the plane. In July, AlphaGeometry2 made its public debut alongside a newly unveiled system, AlphaProof, which DeepMind developed for solving the non-geometry questions in the IMO problem sets. AlphaGeometry is a combination of components that include a specialized language model and a 'neuro-symbolic' system -- one that does not train by learning from data like a neural network but has abstract reasoning coded in by humans. The team trained the language model to speak a formal mathematical language, which makes it possible to automatically check its output for logical rigour -- and to weed out the 'hallucinations', the incoherent or false statements that AI chatbots are prone to making. For AlphaGeometry2, the team made several improvements, including the integration of Google's state-of-the-art large language model, Gemini. The team also introduced the ability to reason by moving geometric objects around the plane -- such as moving a point along a line to change the height of a triangle -- and solving linear equations. The system was able to solve 84% of all geometry problems given in IMOs in the past 25 years, compared with 54% for the first AlphaGeometry. (Teams in India and China used different approaches last year to achieve gold-medal-level performance in geometry, but on a smaller subset of IMO geometry problems.) The authors of the DeepMind paper write that future improvements of AlphaGeometry will include dealing with maths problems that involve inequalities and non-linear equations, which will be required to to "fully solve geometry." The first AI system to achieve a gold-medal score for the overall test could win a US$5-million award called the AI Mathematical Olympiad Prize -- although that competition requires systems to be open-source, which is not the case for DeepMind. Buzzard says he is not surprised by the rapid progress made both by DeepMind and by the Indian and Chinese teams. But, he adds, although the problems are hard, the subject is still conceptually simple, and there are many more challenges to overcome before AI is able to solve problems at the level of research mathematics. AI researchers will be eagerly awaiting the next iteration of the IMO in Sunshine Coast, Australia, in July. Once its problems are made public for human participants to solve, AI-based systems get to solve them, too. (AI agents are not allowed to take part in the competition, and are therefore not eligible to win medals.) Fresh problems are seen as the most reliable test for machine-learning-based systems, because there is no risk that the problems or their solution existed online and may have 'leaked' into training data sets, skewing the results.
[4]
While We Grapple With Geometry, Google DeepMind's AI Model Beats Math Olympiad Gold Medalists
Google's AI lab, DeepMind, has unveiled a new AI model, AlphaGeometry2, which they claim outperforms some of the top minds who have won a gold medal in the International Mathematical Olympiad. Last year, it hit the silver medal mark, and this year, we have a gold. The research paper claims to have an overall solving rate of 84% for all geometry problems over the last 25 years. DeepMind initially published the first iteration of the AI model back in January 2024 with a 54% solve rate. Looking back, it sounds like good progress with a year of new development. With AlphaGeometry2, the model now tackles locus-type theorems, linear equations, and non-constructive problem statements. The AI model is built as a neuro-symbolic system that combines a language model with a symbolic engine to tackle challenging geometry problems. Under the hood, it leverages the Gemini architecture with an increased model size and a diverse dataset. DeepMind's model was trained on algorithmically generated synthetic data. The method starts by sampling a random diagram and using the symbolic engine to deduce all possible facts from it. It avoids using human-crafted problems and instead starts from random diagrams. As per the research paper, AlphaGeometry2 translates geometry problems in natural language. The paper mentions, "To do this, we utilise Gemini Team Gemini (2024) to translate problems from natural language into the AlphaGeometry language and implement a new automated diagram generation algorithm." The paper mentions the approach of setting the stage for test results, "There are a total of 45 geometry problems in the 2000-2024 International Math Olympiad (IMO), which we translate into 50 AlphaGeometry problems (we call this set IMO-AG-50). Some problems are split into two due to specifics of our formalisation." In addition, it also sheds light on how effective it was, where the model solved 42 out of 50 of all 2000-2024 International Math Olympiad (IMO) geometry problems, thus surpassing an average gold medallist for the first time. The model was also pitched against other models like OpenAI's o1, but as you can see in the table above, AlphaGeometry2 solved most of the questions. To arrive at a conclusion, the paper highlights, "Our geometry experts and International Math Olympiad (IMO) medallists consider many AlphaGeometry solutions to exhibit superhuman creativity." They also add, "Despite good initial results, we think the auto-formalisation can be further improved with more formalisation examples and supervised fine-tuning." With models like AlphaGeometry2, AI is also getting into the high school math competition, which is an intriguing development.
[5]
DeepMind claims its AI performs better than International Mathematical Olympiad gold medalists | TechCrunch
An AI system developed by Google DeepMind, Google's leading AI research lab, appears to have surpassed the average gold medalist in solving geometry problems in an international mathematics competition. The system, called AlphaGeometry2, is an improved version of a system, AlphaGeometry, that DeepMind released last January. In a newly published study, the DeepMind researchers behind AlphaGeometry2 claim their AI can solve 84% of all geometry problems over the last 25 years in the International Mathematical Olympiad (IMO), a math contest for high school students. Why does DeepMind care about a high-school-level math competition? Well, the lab thinks the key to more capable AI might lie in discovering new ways to solve challenging geometry problems -- specifically Euclidean geometry problems. Proving mathematical theorems, or logically explaining why a theorem (e.g. the Pythagorean theorem) is true, requires both reasoning and the ability to choose from a range of possible steps toward a solution. These problem-solving skills could -- if DeepMind's right -- turn out to be a useful component of future general-purpose AI models. Indeed, this past summer, DeepMind demoed a system that combined AlphaGeometry2 with AlphaProof, an AI model for formal math reasoning, to solve four out of six problems from the 2024 IMO. In addition to geometry problems, approaches like these could be extended to other areas of math and science -- for example, to aid with complex engineering calculations. AlphaGeometry2 has several core elements, including a language model from Google's Gemini family of AI models and a "symbolic engine." The Gemini model helps the symbolic engine, which uses mathematical rules to infer solutions to problems, arrive at feasible proofs for a given geometry theorem. Olympiad geometry problems are based on diagrams that need "constructs" to be added before they can be solved, such as points, lines, or circles. AlphaGeometry2's Gemini model predicts which constructs might be useful to add to a diagram, which the engine references to make deductions. Basically, AlphaGeometry2's Gemini model suggests steps and constructions in a formal mathematical language to the engine, which -- following specific rules -- checks these steps for logical consistency. A search algorithm allows AlphaGeometry2 to conduct multiple searches for solutions in parallel and store possibly useful findings in a common knowledge base. AlphaGeometry2 considers a problem to be "solved" when it arrives at a proof that combines the Gemini model's suggestions with the symbolic engine's known principles. Owing to the complexities of translating proofs into a format AI can understand, there's a dearth of usable geometry training data. So DeepMind created its own synthetic data to train AlphaGeometry2's language model, generating over 300 million theorems and proofs of varying complexity. The DeepMind team selected 45 geometry problems from IMO competitions over the past 25 years (from 2000 to 2024), including linear equations and equations that require moving geometric objects around a plane. They then "translated" these into a larger set of 50 problems. (For technical reasons, some problems had to be split into two.) According to the paper, AlphaGeometry2 solved 42 out of the 50 problems, clearing the average gold medalist score of 40.9. Granted, there are limitations. A technical quirk prevents AlphaGeometry2 from solving problems with a variable number of points, nonlinear equations, and inequalities. And AlphaGeometry2 isn't technically the first AI system to reach gold-medal-level performance in geometry, although it's the first to achieve it with a problem set of this size. AlphaGeometry2 also did worse on another set of harder IMO problems. For an added challenge, the DeepMind team selected problems -- 29 in total -- that had been nominated for IMO exams by math experts, but that haven't yet appeared in a competition. AlphaGeometry2 could only solve 20 of these. Still, the study results are likely to fuel the debate over whether AI systems should be built on symbol manipulation -- that is, manipulating symbols that represent knowledge using rules -- or the ostensibly more brain-like neural networks. AlphaGeometry2 adopts a hybrid approach: Its Gemini model has a neural network architecture, while its symbolic engine is rules-based. Proponents of neural network techniques argue that intelligent behavior, from speech recognition to image generation, can emerge from nothing more than massive amounts of data and computing. Opposed to symbolic systems, which solve tasks by defining sets of symbol-manipulating rules dedicated to particular jobs, like editing a line in word processor software, neural networks try to solve tasks through statistical approximation and learning from examples. Neural networks are the cornerstone of powerful AI systems like OpenAI's o1 "reasoning" model. But, claim supporters of symbolic AI, they're not the end-all-be-all; symbolic AI might be better positioned to efficiently encode the world's knowledge, reason their way through complex scenarios, and "explain" how they arrived at an answer, these supporters argue. "It is striking to see the contrast between continuing, spectacular progress on these kinds of benchmarks, and meanwhile, language models, including more recent ones with 'reasoning,' continuing to struggle with some simple commonsense problems," Vince Conitzer, a Carnegie Mellon University computer science professor specializing in AI, told TechCrunch. "I don't think it's all smoke and mirrors, but it illustrates that we still don't really know what behavior to expect from the next system. These systems are likely to be very impactful, so we urgently need to understand them and the risks they pose much better." AlphaGeometry2 perhaps demonstrates that the two approaches -- symbol manipulation and neural networks -- combined are a promising path forward in the search for generalizable AI. Indeed, according to the DeepMind paper, o1, which also has a neural network architecture, couldn't solve any of the IMO problems that AlphaGeometry2 was able to answer. This may not be the case forever. In the paper, the DeepMind team said it found preliminary evidence that AlphaGeometry2's language model was capable of generating partial solutions to problems without the help of the symbolic engine. "[The] results support ideas that large language models can be self-sufficient without depending on external tools [like symbolic engines]," the DeepMind team wrote in the paper, "but until [model] speed is improved and hallucinations are completely resolved, the tools will stay essential for math applications."
Share
Share
Copy Link
Google DeepMind's AI system, AlphaGeometry2, has achieved gold-medal level performance in solving geometry problems from the International Mathematical Olympiad, outperforming human experts and raising questions about the future of AI in mathematics.
Google DeepMind has unveiled AlphaGeometry2, an artificial intelligence system that has surpassed the average gold medalist's performance in solving geometry problems from the International Mathematical Olympiad (IMO). This breakthrough represents a significant advancement in AI's problem-solving capabilities, particularly in the field of mathematics 1.
AlphaGeometry2 is a sophisticated AI model that combines multiple core elements:
The system's architecture allows it to speak a formal mathematical language, enabling automatic checking of its output for logical rigor and eliminating AI-generated hallucinations 2.
AlphaGeometry2 demonstrated remarkable problem-solving abilities:
The DeepMind team employed innovative training methods for AlphaGeometry2:
The success of AlphaGeometry2 has significant implications for AI research and mathematics:
Despite its impressive performance, AlphaGeometry2 faces some limitations:
As AI continues to advance in mathematical problem-solving, researchers eagerly anticipate the next IMO in Sunshine Coast, Australia, in July. This event will provide a fresh set of problems to test the capabilities of AI systems like AlphaGeometry2, offering valuable insights into their real-world performance and potential.
Reference
[2]
[3]
[4]
Analytics India Magazine
|While We Grapple With Geometry, Google DeepMind's AI Model Beats Math Olympiad Gold MedalistsGoogle DeepMind's AI models, AlphaProof and AlphaGeometry2, have demonstrated remarkable mathematical prowess by solving complex problems at a level equivalent to a silver medal in the International Mathematical Olympiad (IMO).
8 Sources
8 Sources
Epoch AI's FrontierMath, a new mathematics benchmark, reveals that leading AI models struggle with complex mathematical problems, solving less than 2% of the challenges.
8 Sources
8 Sources
Researchers are exploring mathematical techniques to address the problem of AI chatbots generating false information. These approaches aim to make language models more reliable and truthful in their responses.
2 Sources
2 Sources
A study by USC researchers reveals that AI models, particularly open-source ones, struggle with abstract visual reasoning tasks similar to human IQ tests. While closed-source models like GPT-4V perform better, they still fall short of human cognitive abilities.
4 Sources
4 Sources
As AI models like OpenAI's o3 series surpass human-level performance on various benchmarks, including complex mathematical problems, the need for more sophisticated evaluation methods becomes apparent.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved