Curated by THEOUTPOST
On Fri, 10 Jan, 8:02 AM UTC
3 Sources
[1]
Microsoft introduces rStar-Math, an SLM for math reasoning and problem solving
A team of math and AI researchers at Microsoft Asia has designed and developed a small language model (SLM) that can be used to solve math problems. The group has posted a paper on the arXiv preprint server outlining the technology and math behind the new tool and how well it has performed on standard benchmarks. Over the past several years, multiple tech giants have been working hard to steadily improve their LLMs, resulting in AI products that have in a very short time become mainstream. Unfortunately, such tools require massive amounts of computer power, which means they consume a lot of electricity, making them expensive to maintain. Because of that, some in the field have been turning to SLMs, which as their name implies, are smaller and thus far less resource intensive. Some are small enough to run on a local device. One of the main ways AI researchers make the best use of SLMs is by narrowing their focus -- instead of trying to answer any question about anything, they are designed to answer questions about something much more specific -- like math. In this new effort, Microsoft has focused its efforts on not just solving math problems, but also in teaching an SLM how to reason its way through a problem. In developing its model, Microsoft made it in a way that allows for its use by other, larger models. An overall strategy that could be the wave of the future. New LLMs could soon be nothing more than an amalgam of many SLMs. Notably, the announcement by Microsoft came not long after the debut of its Phi-4 SLM, which also serves to solve math problems. rStar-Math does its work differently than Phi-4, the researchers note, by making use of Monte Carlo Tree Search -- a reasoning method developed to mimic the way humans attack problems in a step-by-by process. They note that by using such an approach, their new SLM can break down a problem into its smaller parts as a way to figure out how to solve a particular problem. They also note that rStar-Math shows its work by outputting its thought process in both Python code and natural language. The team also noted that rStar-Math has already scored well on several benchmarks. And according to a post on Hugging Face, the team plans to make the code and data publicly available on GitHub.
[2]
Microsoft Launches rStar-Math, Achieves Top-Level Math Reasoning
Smaller models are easier to use, require less powerful hardware, and make advanced AI tools available to more people and organisations Microsoft researchers have developed 'rStar-Math', a method that enables small language models (SLMs) to solve challenging math problems with remarkable accuracy, matching or even surpassing larger models like OpenAI's o1. Instead of relying on knowledge distillation from bigger models, rStar-Math allows smaller models to improve independently through self-evolution. "Our work demonstrates that small language models can achieve frontier-level performance in math reasoning through self-evolution and careful step-by-step verification," the researchers said in the paper. Why does this matter? Smaller models are easier to use, require less powerful hardware, and make advanced AI tools available to more people and organisations. They are especially useful in areas like education, math, coding, and research, where accurate, step-by-step reasoning is crucial. The open-source release of rStar-Math and Microsoft's Phi-4 model on Hugging Face allows others to customise and use these tools for a wide range of applications, making AI more affordable and accessible. The system uses Monte Carlo Tree Search (MCTS), a strategy often used in games like chess, to tackle problems in smaller, manageable steps. Each step is validated with code execution to ensure accuracy, avoiding the common issue of producing correct answers with flawed reasoning. Features of rStar-Math: rStar-Math incorporates three innovations to improve performance. It uses MCTS rollouts to generate step-by-step training data, ensuring accuracy. A process preference model (PPM) evaluates and guides intermediate steps without relying on imprecise scoring. The system then evolves iteratively over four rounds to refine models and data for solving increasingly complex problems. On the MATH benchmark, accuracy increased from 58.8% to 90%, outperforming OpenAI's o1-preview. The system also solved 53.3% of problems in the USA Math Olympiad (AIME), ranking in the top 20% of high school competitors. It performed strongly on other benchmarks, including GSM8K, Olympiad Bench, and college-level challenges. The study highlights the potential of smaller AI models to achieve advanced reasoning capabilities typically associated with larger systems. It also shows how such models can develop intrinsic self-reflection, enabling them to identify and correct errors during problem-solving. The framework, along with its code and data, is open-source and available on GitHub. This makes it accessible to researchers and developers, paving the way for smaller, more efficient AI systems capable of handling complex reasoning tasks.
[3]
Microsoft's new rStar-Math technique upgrades small models to outperform OpenAI's o1-preview at math problems
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft is doubling down on the potential of small language models (SLMs) with the unveiling of rStar-Math, a new reasoning technique that can be applied to small models to boost their performance on math problems with reasoning techniques -- similar to, and in some cases exceeding -- the performance of OpenAI's o1-preview model at solving math problems. While still in a research phase -- as outlined in a paper published on pre-review site arXiv.org and credited to eight authors at Microsoft or Peking University and Tsinghua University in China -- the technique was applied to several different smaller open source models including Microsoft's own Phi-3 mini, Alibaba's Qwen-1.5B (a 1.5-billion parameter model), and Qwen-7B (a 7-billion parameter model), and showed improved performance on all of them, even exceeding OpenAI's previously most advanced model at the MATH (word problem solving) third-party benchmark of 12,500 questions covering various branches such as geometry and algebra, and all levels of difficulty. Ultimately, according to a post on Hugging Face, the researchers plan to make their code and data available on Github at https://github.com/microsoft/rStar, though one of the paper's authors, Li Lyna Zhang, wrote in the comments on the Hugging Face post that the team is "still undergoing the internal review process for open-source release." As such, "the repository remains private for now. Please stay tuned!" Community members expressed enthusiasm, calling the innovations "impressive" and praising the blend of Monte Carlo Tree Search with step-by-step reasoning. One commenter highlighted the simplicity and utility of using Q-values for step scoring, while others speculated on future applications in geometric proofs and symbolic reasoning. This news follows closely on the heels of the open-sourcing of Microsoft's Phi-4 model, a smaller 14-billion-parameter AI system now available on Hugging Face under the permissive MIT license. While the Phi-4 release has expanded access to high-performance small models, rStar-Math showcases a specialized approach: using smaller AI systems to achieve state-of-the-art results in mathematical reasoning. rStar-Math works by using several different models and components to help a target small model 'self-evolve' The key to rStar-Math is that it leverages Monte Carlo Tree Search (MCTS), a method that mimics human "deep thinking" by iteratively refining step-by-step solutions to mathematical problems. The researchers used MCTS because it "breaks down complex math problems into simpler single-step generation tasks, reducing the difficulty" for smaller models. However, they didn't just apply MCTS as other researchers in the past have done. Instead, in a stroke of brilliance, they also ask the model they trained to always output its "chain-of-thought" reasoning steps as both natural language descriptions and Python code. They mandated the model would include the natural language responses as Python code comments, and only those outputs using Python would be used to train the model. The researchers also trained a "policy model" to generate math reasoning steps and a Process Preference Model (PPM) to select the most promising steps to answering the problems, and improved them both over four rounds of "self-evolution," with both models improving each other. For their starting data, the researchers said they used "747,000 math word problems from publicly available sources," along with their solutions, but generated new steps for solving them with the two models described above. Record-Breaking Results After four rounds of self-evolution, rStar-Math achieved significant milestones: * On the MATH benchmark, the accuracy of the Qwen2.5-Math-7B model jumped from 58.8% to 90.0%, outperforming OpenAI o1-preview. * On the American Invitational Mathematics Examination (AIME), it solved 53.3% of problems, placing among the top 20% of high school competitors. These results highlight the power of SLMs in handling complex mathematical reasoning, traditionally dominated by larger systems. Smaller is better? In recent years, AI innovation has largely been driven by scaling up language models, with increasing parameters seen as a way to improve performance. Yet, the high costs associated with these massive models, from computational resources to energy consumption, have raised questions about scalability. Microsoft is offering an alternative path, focusing on efficiency. The release of rStar-Math further underscores this commitment by demonstrating how SLMs can rival -- and in some cases exceed -- the capabilities of their larger counterparts. Microsoft's dual releases of Phi-4 and rStar-Math paper suggest that compact, specialized models can provide powerful alternatives to the industry's largest systems Moreover, by outperforming larger competitors in key benchmarks, these models challenge the notion that bigger is always better. They open doors for mid-sized organizations and academic researchers to access cutting-edge capabilities without the financial or environmental burden of massive models.
Share
Share
Copy Link
Microsoft introduces rStar-Math, a small language model (SLM) that outperforms larger models in solving complex math problems, showcasing the potential of efficient AI in specialized tasks.
Microsoft has introduced rStar-Math, a small language model (SLM) designed to solve complex mathematical problems with remarkable accuracy. This innovation represents a significant shift in AI development, focusing on specialized, efficient models rather than large-scale systems 1.
rStar-Math demonstrates that SLMs can achieve frontier-level performance in math reasoning through self-evolution and careful step-by-step verification 2. This approach offers several advantages:
The model incorporates three key innovations [2]:
rStar-Math outputs its thought process in both Python code and natural language, allowing for transparent reasoning [1].
rStar-Math has achieved remarkable results on several mathematical benchmarks:
Microsoft's focus on SLMs challenges the notion that bigger models are always better. rStar-Math demonstrates that smaller, specialized models can rival or exceed the capabilities of larger systems 3.
This approach offers several benefits:
Microsoft plans to make the rStar-Math framework, along with its code and data, open-source and available on GitHub [2]. This move will enable researchers and developers to build upon and customize the technology for various applications.
The release of rStar-Math follows closely on the heels of Microsoft's Phi-4 model, another SLM focused on math problem-solving [3]. These developments suggest a growing trend towards more efficient and specialized AI models in the industry.
Reference
[2]
Microsoft has released a new series of Phi-3.5 AI models, showcasing impressive performance despite their smaller size. These models are set to compete with offerings from OpenAI and Google, potentially reshaping the AI landscape.
4 Sources
Microsoft unveils Phi-4, a 14-billion-parameter AI model that challenges the "bigger is better" paradigm by outperforming larger models in mathematical reasoning and language processing tasks while using fewer computational resources.
10 Sources
Epoch AI's FrontierMath, a new mathematics benchmark, reveals that leading AI models struggle with complex mathematical problems, solving less than 2% of the challenges.
8 Sources
OpenAI's latest model, O1, represents a significant advancement in AI technology, demonstrating human-like reasoning capabilities. This development could revolutionize various industries and spark new ethical considerations.
3 Sources
A recent study by Apple researchers exposes significant flaws in the mathematical reasoning capabilities of large language models (LLMs), challenging the notion of AI's advanced reasoning skills and raising questions about their real-world applications.
17 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved