Curated by THEOUTPOST
On Thu, 1 May, 4:02 PM UTC
5 Sources
[1]
Microsoft's most capable new Phi 4 AI model rivals the performance of far larger systems | TechCrunch
Microsoft launched several new "open" AI models on Wednesday, the most capable of which is competitive with OpenAI's o3-mini on at least one benchmark. All of the new pemissively licensed models -- Phi 4 mini reasoning, Phi 4 reasoning, and Phi 4 reasoning plus -- are "reasoning" models, meaning they're able to spend more time fact-checking solutions to complex problems. They expand Microsoft's Phi "small model" family, which the company launched a year ago to offer a foundation for AI developers building apps at the edge. Phi 4 mini reasoning was trained on roughly 1 million synthetic math problems generated by Chinese AI startup DeepSeek's R1 reasoning model. Around 3.8 billion parameters in size, Phi 4 mini reasoning is designed for educational applications, Microsoft says, like "embedded tutoring" on lightweight devices. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. Phi 4 reasoning, a 14-billion-parameter model, was trained using "high-quality" web data as well as "curated demonstrations" from OpenAI's aforementioned o3-mini. It's best for math, science, and coding applications, according to Microsoft. As for Phi 4 reasoning plus, it's Microsoft's previously-released Phi-4 model adapted into a reasoning model to achieve better accuracy on particular tasks. Microsoft claims that Phi 4 reasoning plus approaches the performance levels of R1, a model with significantly more parameters (671 billion). The company's internal benchmarking also has Phi 4 reasoning plus matching o3-mini on OmniMath, a math skills test. Phi 4 mini reasoning, Phi 4 reasoning, and Phi 4 reasoning plus are available on the AI dev platform Hugging Face accompanied by detailed technical reports. "Using distillation, reinforcement learning, and high-quality data, these [new] models balance size and performance," wrote Microsoft in a blog post. "They are small enough for low-latency environments yet maintain strong reasoning capabilities that rival much bigger models. This blend allows even resource-limited devices to perform complex reasoning tasks efficiently."
[2]
Microsoft just unveiled new Phi-4 reasoning AI models -- here's why they're a big deal
A new week, a new AI model. Joining the rush is Microsoft, launching 3 new models under the "Phi-4" range. These include Phi-4 reasoning, Phi-4-reasoning plus, and Phi-4-mini reasoning. Apart from showing their commitment to the weird naming schemes found in the AI world, these names also give away the type of AI model they are. Reasoning models have become all the rage recently. These are specifically trained to go beyond the simple answer or image generation, engaging in a more logical approach to prompts. These are often better equipped for fact-checking and complex problems. Phi-4's reasoning range builds on Microsoft's "small model" family - a project started by Microsoft roughly a year ago. The aim here is to build small language models, prioritizing efficiency and low cost. These differ from large language models, where they are fed huge amounts of data, allowing them to accomplish most tasks with immense detail, but doing so at a high cost. This is completely separate to Microsoft Copilot and Copilot 365 -- the brand's better-known AI models that serve both consumer and work-based tasks. Instead, Microsoft's new models focus on more specific tasks, prioritizing high-quality datasets and more specific training schedules to achieve high performance on a smaller scale. Phi-4 reasoning is a 14-billion parameter model (parameters refer to how much a model knows; the higher the number, usually the more it knows). It was trained using high-quality web data and is best used for math, science and coding applications. Phi 4-reasoning-plus is trained on a similar dataset, but it was trained to put in more effort to achieve better accuracy for particular tasks. Microsoft claims that Phi-4-reasoning-plus approaches the performance of Deepseek R1, despite it having a much larger parameter of 671 billion. All of these models, along with their technical reports, are available on the AI development website Hugging Face. However, for most people, Phi-4 won't be a model you'll end up using. These types of models are designed for more advanced purposes, such as research, coding and scientific research. By building a model that is smaller and more affordable to run, Microsoft is joining the likes of Deepseek and Alibaba, focusing on making this kind of technology more affordable compared to options like OpenAI's O3 models. This area of AI is advancing rapidly, seeing models take on more complicated tasks with less energy, shorter training windows and more accessible user interfaces for more people to get involved with.
[3]
Microsoft launches Phi-4-Reasoning-Plus, a small, powerful, open weights reasoning model!
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft Research has announced the release of Phi-4-reasoning-plus, an open-weight language model built for tasks requiring deep, structured reasoning. Building on the architecture of the previously released Phi-4, the new model integrates supervised fine-tuning and reinforcement learning to deliver improved performance on benchmarks in mathematics, science, coding, and logic-based tasks. Phi-4-reasoning-plus is a 14-billion parameter dense decoder-only Transformer model that emphasizes quality over scale. Its training process involved 16 billion tokens -- about 8.3 billion of them unique -- drawn from synthetic and curated web-based datasets. A reinforcement learning (RL) phase, using only about 6,400 math-focused problems, further refined the model's reasoning capabilities. The model has been released under a permissive MIT license -- enabling its use for broad commercial and enterprise applications, and fine-tuning or distillation, without restriction -- and is compatible with widely used inference frameworks including Hugging Face Transformers, vLLM, llama.cpp, and Ollama. Microsoft provides detailed recommendations on inference parameters and system prompt formatting to help developers get the most from the model. Outperforms larger models The model's development reflects Microsoft's growing emphasis on training smaller models capable of rivaling much larger systems in performance. Despite its relatively modest size, Phi-4-reasoning-plus outperforms larger open-weight models such as DeepSeek-R1-Distill-70B on a number of demanding benchmarks. On the AIME 2025 math exam, for instance, it delivers a higher average accuracy at passing all 30 questions on the first try (a feat known as "pass@1") than the 70B parameter distillation model, and approaches the performance of DeepSeek-R1 itself, which is far larger at 671B parameters. Structured thinking via fine-tuning To achieve this, Microsoft employed a data-centric training strategy. During the supervised fine-tuning stage, the model was trained using a curated blend of synthetic chain-of-thought reasoning traces and filtered high-quality prompts. A key innovation in the training approach was the use of structured reasoning outputs marked with special and tokens. These guide the model to separate its intermediate reasoning steps from the final answer, promoting both transparency and coherence in long-form problem solving. Reinforcement learning for accuracy and depth Following fine-tuning, Microsoft used outcome-based reinforcement learning -- specifically, the Group Relative Policy Optimization (GRPO) algorithm -- to improve the model's output accuracy and efficiency. The RL reward function was crafted to balance correctness with conciseness, penalize repetition, and enforce formatting consistency. This led to longer but more thoughtful responses, particularly on questions where the model initially lacked confidence. Optimized for research and engineering constraints Phi-4-reasoning-plus is intended for use in applications that benefit from high-quality reasoning under memory or latency constraints. It supports a context length of 32,000 tokens by default and has demonstrated stable performance in experiments with inputs up to 64,000 tokens. It is best used in a chat-like setting and performs optimally with a system prompt that explicitly instructs it to reason through problems step-by-step before presenting a solution. Extensive safety testing and use guidelines Microsoft positions the model as a research tool and a component for generative AI systems rather than a drop-in solution for all downstream tasks. Developers are advised to carefully evaluate performance, safety, and fairness before deploying the model in high-stakes or regulated environments. Phi-4-reasoning-plus has undergone extensive safety evaluation, including red-teaming by Microsoft's AI Red Team and benchmarking with tools like Toxigen to assess its responses across sensitive content categories. According to Microsoft, this release demonstrates that with carefully curated data and training techniques, small models can deliver strong reasoning performance -- and democratic, open access to boot. Here's a revised version of the enterprise implications section in a more technical, news-style tone, aligning with a business-technology publication: Implications for enterprise technical decision-makers The release of Microsoft's Phi-4-reasoning-plus may present meaningful opportunities for enterprise technical stakeholders managing AI model development, orchestration, or data infrastructure. For AI engineers and model lifecycle managers, the model's 14B parameter size coupled with competitive benchmark performance introduces a viable option for high-performance reasoning without the infrastructure demands of significantly larger models. Its compatibility with frameworks such as Hugging Face Transformers, vLLM, llama.cpp, and Ollama provides deployment flexibility across different enterprise stacks, including containerized and serverless environments. Teams responsible for deploying and scaling machine learning models may find the model's support for 32k-token contexts -- expandable to 64k in testing -- particularly useful in document-heavy use cases such as legal analysis, technical QA, or financial modeling. The built-in structure of separating chain-of-thought reasoning from the final answer could also simplify integration into interfaces where interpretability or auditability is required. For AI orchestration teams, Phi-4-reasoning-plus offers a model architecture that can be more easily slotted into pipelines with resource constraints. This is relevant in scenarios where real-time reasoning must occur under latency or cost limits. Its demonstrated ability to generalize to out-of-domain problems, including NP-hard tasks like 3SAT and TSP, suggests utility in algorithmic planning and decision support use cases beyond those explicitly targeted during training. Data engineering leads may also consider the model's reasoning format -- designed to reflect intermediate problem-solving steps -- as a mechanism for tracking logical consistency across long sequences of structured data. The structured output format could be integrated into validation layers or logging systems to support explainability in data-rich applications. From a governance and safety standpoint, Phi-4-reasoning-plus incorporates multiple layers of post-training safety alignment and has undergone adversarial testing by Microsoft's internal AI Red Team. For organizations subject to compliance or audit requirements, this may reduce the overhead of developing custom alignment workflows from scratch. Overall, Phi-4-reasoning-plus shows how the reasoning craze kicked off by the likes of OpenAI's "o" series of models and DeepSeek R1 is continuing to accelerate and move downstream to smaller, more accessible, affordable, and customizable models. For technical decision-makers tasked with managing performance, scalability, cost, and risk, it offers a modular, interpretable alternative that can be evaluated and integrated on a flexible basis -- whether in isolated inference endpoints, embedded tooling, or full-stack generative AI systems.
[4]
Microsoft releases small but mighty Phi-4 reasoning AI models that outperform larger models - SiliconANGLE
Microsoft releases small but mighty Phi-4 reasoning AI models that outperform larger models Microsoft Corp. announced Wednesday the release of three new advanced small language models artificial intelligence models extending its "Phi" range of AI models that include reasoning capability. The new model releases introduce Phi-4-reasoning, Phi-4-reasoning-plus and Phi-4-mini-reasoning, which add a thinking capability to the models allowing them to break down complex queries and reason through them efficiently. The model family is designed to provide users with a model that can run locally on a PC graphics processing unit or mobile device. This release follows Microsoft's last release of Phi-3, which added multimodality to the efficient and compact model series. Phi-4-reasoning is a 14-billion parameter open-weight model the company says rivals larger models on complex tasks. Phi-4-reasoning-plus is a more advanced version, at the same parameter weight, and was trained with reinforcement learning, to use 1.5 times more tokens to deliver higher accuracy than the base model - this will notably also increase response time and compute. The smallest of the models, Phi-4-mini-reasoning, is designed to be loaded onto mobile and small-footprint devices. It is only a 3.8-billion parameter open-weight model and was optimized for mathematical reasoning with an eye for educational applications. "Phi-reasoning models introduce a new category of small language models. Using distillation, reinforcement learning, and high-quality data, these models balance size and performance," the Microsoft team said in a blog post. "They are small enough for low-latency environments yet maintain strong reasoning capabilities that rival much bigger models." To reach these critical capabilities, Microsoft trained its Phi-4-reasoning model using web data and curated demos from OpenAI's o3-mini model. The Phi-4-mini reasoning model was fine-tuned with synthetic teaching data generated by Deepseek-R1 and was trained on over one million diverse math problems spanning multiple difficulty levels from middle school to Ph.D. Synthetic data is frequently used to train AI models by leveraging a "teacher AI" that curates and augments the training material for student AI. This teacher model can generate thousands, even millions, of practice math and science problems, ranging from simple to complex. In reasoning-based scenarios, it provides step-by-step solutions rather than just final answers, enabling the student AI to learn how to solve problems, not just what the answers are. By tailoring the problems and solutions to a diverse set of mathematics, physics, and science curricula the resulting model can achieve high performance while remaining compact and efficient in terms of size. Microsoft said despite the significantly smaller size, Phi-4-reasoning and Phi-4-reasoning-plus outperformed OpenAI o1-min and DeepSeek1-Distill-Llama-70B on most benchmarks for mathematical and science reasoning at Ph.D. level questions. The company went on to say the models could also exceed the full DeepSeek-R1 model, which weighs in at 671 billion parameters, on the AIME 2025 test, which was used as a 15-question 3-hour qualifier for the USA International Mathematics Olympiad.
[5]
Microsoft Launches Phi-4 Reasoning AI Models to Rival DeepSeek R1
Microsoft says Phi-4 reasoning models can run on Windows Copilot+ PCs, thanks to their small size. Microsoft has launched three new AI reasoning models including Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These are small language models, designed for edge devices like Windows PCs and mobile devices. The Phi-4-reasoning AI model is trained on 14 billion parameters and can perform complex reasoning tasks. The Phi-4-reasoning-plus model uses the same base model, but it uses more inference-time compute, nearly 1.5x more tokens than Phi-4-reasoning to deliver higher accuracy. Despite being much smaller in size, Phi-4-reasoning models rival larger models such as DeepSeek R1 671B and o3-mini. In the GPQA benchmark, Phi-4-reasoning-plus-14B model achieves 69.3% while the o3-mini scores 77.7%. Next, in the AIME 2025 test, Phi-4-reasoning-plus-14B gets 78%, and o3-mini achieves 82.5%. It goes on to show that Microsoft's small model comes very close to flagship reasoning models, which are much larger in size. Microsoft says Phi-4 reasoning models are trained via supervised fine-tuning "on carefully curated reasoning demonstrations from OpenAI o3-mini." Further, Microsoft writes, "The model demonstrates that meticulous data curation and high-quality synthetic datasets allow smaller models to compete with larger counterparts." Apart from that, the smaller Phi-4-mini-reasoning model, trained on just 3.8B parameters, outperforms many 7B and 8B models. In benchmarks like AIME 24, MATH 500, and GPQA Diamond, the Phi-4-mini-reasoning-3.8B model delivers competitive scores, nearly matching o1-mini. The Phi-4-mini model has been "fine-tuned with synthetic data generated by Deepseek-R1 model." Microsoft's Phi models are already being locally used on Windows Copilot+ PCs, and they leverage the built-in NPU. It will be interesting to see how the Phi-4 reasoning models improve the on-device AI performance.
Share
Share
Copy Link
Microsoft launches three new Phi-4 AI models that rival larger systems in reasoning tasks, showcasing advancements in efficient AI for edge devices and complex problem-solving.
Microsoft has unveiled a trio of new AI models under its Phi-4 range, designed to perform complex reasoning tasks while maintaining a relatively small size. The new models - Phi-4 reasoning, Phi-4-reasoning plus, and Phi-4-mini reasoning - expand Microsoft's "small model" family, which aims to offer efficient AI solutions for edge devices and resource-constrained environments 12.
The Phi-4 reasoning model boasts 14 billion parameters and is trained on high-quality web data and curated demonstrations from OpenAI's o3-mini. It excels in math, science, and coding applications 1. Phi-4-reasoning plus, while maintaining the same parameter count, utilizes more compute power at inference time to achieve higher accuracy 5.
The smallest of the trio, Phi-4-mini reasoning, contains 3.8 billion parameters and is specifically designed for educational applications and lightweight devices. It was trained on approximately one million synthetic math problems generated by DeepSeek's R1 reasoning model 14.
Despite their compact size, these models have shown remarkable performance:
Microsoft employed several innovative techniques in developing these models:
All three models are available on the AI development platform Hugging Face, accompanied by detailed technical reports 1. They are released under a permissive MIT license, allowing for broad commercial and enterprise applications without restrictions 3.
The models are compatible with widely used inference frameworks, including Hugging Face Transformers, vLLM, llama.cpp, and Ollama 3. They support a context length of 32,000 tokens by default, with experiments showing stable performance up to 64,000 tokens 3.
The release of these models represents a significant step in making powerful AI more accessible and efficient:
Microsoft has conducted extensive safety evaluations, including red-teaming and benchmarking with tools like Toxigen 3. However, the company advises careful evaluation of performance, safety, and fairness before deploying the models in high-stakes or regulated environments 3.
This development demonstrates that with carefully curated data and advanced training techniques, small models can deliver strong reasoning performance, potentially democratizing access to powerful AI tools across various industries and applications.
Reference
[1]
[2]
[3]
[4]
Microsoft has released a new series of Phi-3.5 AI models, showcasing impressive performance despite their smaller size. These models are set to compete with offerings from OpenAI and Google, potentially reshaping the AI landscape.
4 Sources
4 Sources
Microsoft unveils Phi-4, a 14-billion-parameter AI model that challenges the "bigger is better" paradigm by outperforming larger models in mathematical reasoning and language processing tasks while using fewer computational resources.
10 Sources
10 Sources
Microsoft has released its Phi-4 small language model as open-source, making it freely available on Hugging Face. Despite its compact size, Phi-4 demonstrates impressive performance in various benchmarks, challenging larger models.
5 Sources
5 Sources
Microsoft introduces Phi-4-multimodal and Phi-4-mini, new small language models capable of processing text, speech, and visual data with impressive efficiency and performance.
5 Sources
5 Sources
Microsoft introduces rStar-Math, a small language model (SLM) that outperforms larger models in solving complex math problems, showcasing the potential of efficient AI in specialized tasks.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved