Curated by THEOUTPOST
On Sat, 1 Feb, 12:08 AM UTC
35 Sources
[1]
OpenAI o3-mini vs o1-mini Ai models Compared : Which OpenAI Model is Right for You?
Choosing the right AI model can feel a bit like picking the perfect tool from a crowded toolbox -- each option has its strengths, but finding the one that truly fits your needs can be tricky. Whether you're a developer looking to streamline coding tasks or a creative writer seeking a reliable assistant, the decision often boils down to balancing speed, precision, and cost. OpenAI's o3-mini and o1-mini models have sparked plenty of discussion for exactly this reason, with each offering distinct advantages depending on the task at hand. In this guide, Corbin Brown compares OpenAI o3-mini vs o1-mini, revealing the key differences between the o3-mini and the now-discontinued o1-mini, while also exploring how the o3-mini High variant stacks up for more demanding tasks. Whether you're curious about coding performance, creative writing capabilities, or cost efficiency, we'll help you make sense of the trade-offs so you can choose the model that best aligns with your goals. By the end, you'll have a clearer picture of which AI model is the right fit for your unique needs -- without the guesswork. Selecting the right AI model requires a thorough understanding of its capabilities, limitations, and how well it aligns with your specific needs. OpenAI's o3-mini and o1-mini models are designed to address different priorities, such as speed, cost efficiency, and output quality. The o3-mini model is engineered to prioritize speed and efficiency, making it an excellent choice for tasks that demand quick outputs. However, this emphasis on speed comes with a trade-off in terms of detail and precision. For users who require more comprehensive and nuanced responses, the o3-mini High variant offers a better alternative. While slightly slower, it compensates with enhanced accuracy and depth. The now-discontinued o1-mini previously served as a middle ground, balancing speed and detail. Its phase-out in favor of the o3-mini series reflects OpenAI's commitment to refining its newer models to meet evolving user demands. This shift underscores the importance of adapting AI tools to align with the growing complexity of user needs. Coding tasks reveal some of the most significant differences between these models. The o3-mini High excels in delivering detailed and logical outputs, making it particularly well-suited for complex programming challenges. Its advanced search capabilities allow real-time access to updated API documentation, which reduces errors and saves valuable time for developers. In contrast, the standard o3-mini, while faster, often requires additional user input to refine its outputs. This can be a limitation for developers seeking immediate accuracy but is still a viable option for simpler coding tasks or rapid prototyping. The distinction between these models highlights the importance of selecting the right tool based on the complexity of your coding requirements. Here is a selection of other guides from our extensive library of content you may find of interest on AI model comparisons. When it comes to creative writing, the differences between the o3-mini and older models like the o1-mini are less pronounced. While the o3-mini offers slight enhancements in coherence and tone, these improvements are not as substantial as those observed in coding tasks. This suggests that OpenAI's recent advancements have been more focused on logic-based applications rather than creative writing. Writers may find the performance of both models comparable, with only minor variations in style and depth. For tasks that demand a high level of creativity, the o3-mini High may provide a slight edge, but the standard o3-mini remains a practical option for general writing needs. The o3-mini series strikes a balance between speed and clarity, but not without certain trade-offs. The standard o3-mini delivers faster response times, which is advantageous for users who prioritize efficiency. However, this speed can sometimes result in less precise outputs, requiring you to provide additional clarification or context to achieve the desired results. Higher-tier models like the o3-mini High address this issue by offering more detailed and self-sufficient responses. These models are ideal for tasks that demand minimal back-and-forth interaction, making sure a smoother user experience. For simpler queries or tasks, however, the standard o3-mini remains a cost-effective and practical choice. One of the standout features of the o3-mini model is its cost efficiency. The o3-mini API is 93% cheaper than the o1-mini, making it an attractive option for developers working within budget constraints. Additionally, its lower latency ensures faster response times, which is particularly beneficial for time-sensitive applications. API updates within the o3-mini series are designed to be straightforward, often requiring only minor adjustments such as updating model names. This simplicity allows developers to integrate new functionalities into their software with minimal effort, reducing both development time and costs. For businesses and developers alike, this combination of affordability and ease of use makes the o3-mini series a compelling choice. The evolution of AI models like the o3-mini reflects broader industry trends. Recent advancements have focused on optimizing coding and logic-based tasks, aligning with the growing demand for AI-driven software development tools. These improvements cater to industries that require precision, speed, and cost-effective solutions, making AI more accessible to a wider audience. While creative writing capabilities have seen less dramatic progress, the emphasis on logic-based applications highlights a shift in priorities. As AI technology continues to advance, further refinements in areas such as real-time search, latency reduction, and seamless software integration are expected. These developments will likely shape the future of AI tools, making sure they remain relevant and effective across diverse applications. Choosing the right AI model depends on your specific requirements. Here are some recommendations to help guide your decision: The o3-mini series offers a versatile solution for users with diverse needs. By understanding the strengths and limitations of each model, you can select the one that best aligns with your goals. Whether you prioritize speed, cost efficiency, or detailed outputs, the o3-mini series provides options to ensure optimal performance and efficiency for your tasks. As AI technology continues to evolve, these models represent a step forward in delivering practical, user-focused solutions.
[2]
OpenAI o3 Mini Faster and Smarter : But is it the AI You Need?
The OpenAI o3 Mini is the latest addition to OpenAI's reasoning series, specifically designed to deliver high performance in STEM-focused tasks. Available through ChatGPT and API access, this model is tailored for precision-driven applications such as mathematics, coding, and computer science. With three performance tiers -- low, medium, and high -- it offers flexibility to meet a variety of technical requirements. While the o3 Mini introduces notable improvements in speed and efficiency, it also has certain limitations, particularly in vision processing and autonomous workflows. These strengths and weaknesses make it a specialized tool for specific use cases. Whether you're a developer, data analyst, or someone who simply loves diving into technical challenges, the o3 Mini seems to check many of the right boxes. With its three performance tiers -- low, medium, and high -- it offers flexibility for different needs, making it accessible to a wide range of users. Yet, as impressive as its speed and precision may be, the model isn't without its quirks and limitations. In this overview by Prompt Engineering learn more about what makes the o3 Mini AI model stand out, where it falls short, and whether it's the right fit for your unique workflow. The o3 Mini is designed to cater to a wide range of users, from free-tier subscribers to Pro users, though access levels differ significantly. Pro users benefit from unlimited access to all three performance tiers, while free-tier users face stricter quotas. This accessibility, combined with its advanced features, makes the o3 Mini a versatile option for developers and technical professionals. Key features include: These features are particularly appealing to developers who require efficiency and precision in their work. By focusing on these advanced functionalities, the o3 Mini positions itself as a practical tool for technical professionals seeking to optimize their workflows. For professionals working in STEM disciplines, the o3 Mini offers significant advantages. Its high-performance variant delivers a 25% improvement in response speed compared to earlier models like the O1 Mini, making it highly effective for computationally intensive tasks such as coding and mathematical problem-solving. Additionally, its ability to handle complex function calls and generate structured outputs enhances its utility in technical domains. These capabilities make the o3 Mini a top-tier choice for users who prioritize speed, accuracy, and efficiency in their work. The model's focus on STEM applications ensures that it excels in areas requiring logical reasoning and precision. Whether you're a software developer tackling intricate coding challenges or a data analyst working on complex calculations, the o3 Mini provides the tools needed to streamline your tasks. However, its specialized nature means it may not be the best fit for users with broader or more generalized needs. Advance your skills in OpenAI o3 Mini by reading more of our detailed content. Despite its strengths, the o3 Mini has several limitations that may impact its suitability for certain users. One of its most notable shortcomings is the lack of vision capabilities, which makes it unsuitable for tasks involving image processing or multimodal inputs. This limitation restricts its application in fields that rely on visual data analysis or image-based workflows. Additionally, the o3 Mini's performance in agentic workflows -- where autonomous decision-making and task execution are critical -- is underwhelming. This makes it less effective for users who require AI solutions capable of operating independently. Furthermore, its multilingual processing capabilities are inconsistent, with mixed results in benchmarks such as MLE Bench. These weaknesses suggest that while the o3 Mini is highly effective for STEM-specific tasks, it may not be the ideal choice for users seeking a more versatile or general-purpose AI model. The o3 Mini demonstrates significant improvements in STEM-related benchmarks compared to its predecessors and other models in the GPT-4 series. For example, it excels in coding and mathematical reasoning, outperforming earlier models in these areas. However, its performance in other domains, such as multilingual and agentic tasks, is less consistent. Key observations include: These mixed results highlight the o3 Mini's specialized nature. While it is a powerful tool for users with specific technical needs, it may not meet the expectations of those looking for a more universal AI solution. The o3 Mini introduces several enhancements aimed at improving developer productivity. These features are designed to simplify workflows and enhance efficiency, making the model particularly appealing to technical professionals. Key developer-friendly features include: These functionalities make the o3 Mini a valuable tool for developers working in fields such as software development, data analysis, and computational research. However, its limitations in agentic workflows may pose challenges for those requiring more autonomous AI solutions. Developers should carefully evaluate their specific needs to determine whether the o3 Mini aligns with their requirements. The o3 Mini is best suited for tasks that demand precision, reasoning, and efficiency. Its specialized capabilities make it an excellent choice for professionals in technical domains. Ideal use cases include: While the o3 Mini excels in these areas, it is less effective for general writing tasks or applications requiring advanced agentic capabilities. Users involved in multilingual processing or autonomous workflows may find its performance less satisfactory compared to other models. As such, the o3 Mini is best viewed as a specialized tool for technical professionals rather than a one-size-fits-all solution. The OpenAI o3 Mini represents a focused advancement in AI technology, particularly for STEM applications. Its speed, efficiency, and developer-centric features make it a strong contender for technical tasks. However, its limitations in vision capabilities, agentic workflows, and multilingual processing underscore the importance of aligning the model with your specific needs. For users in technical fields, the o3 Mini offers a powerful, specialized tool that can significantly enhance productivity. For broader applications, its narrow focus may require supplementation with other models or solutions to achieve optimal results.
[3]
ChatGPT o3-mini Review : Enhanced AI Performance at Reduces Costs
Whether you're a developer, data analyst, or someone navigating the ever-evolving world of automation, the need for smarter, faster, and more cost-effective tools has never been greater. To help you on this journey Open AI has recently launched ChatGPT o3-mini, its latest reasoning model, designed to tackle a wide variety of different challenges head-on. It's not just another AI tool -- it's a fantastic option for anyone looking to streamline their workflows and boost productivity without breaking the bank. What sets the o3-mini apart isn't just its impressive performance enhancements or affordability (though those are huge perks). It's the way it seamlessly integrates advanced features like function calling, structured outputs, and even live web search to make your life easier. Whether you're debugging code, organizing raw data, or simply trying to save time on repetitive tasks, this model promises to deliver. But how does it stack up in real-world scenarios? And is it truly the right fit for your needs? AI Foundations explores what makes the o3-mini a standout in the crowded AI landscape. The o3-mini delivers substantial performance enhancements compared to its predecessors, particularly the 01 series. It achieves a 24% faster response time than the 01-mini and reduces significant errors in complex tasks by 39%. These improvements are especially beneficial in areas requiring advanced logical reasoning, such as coding intricate algorithms or solving challenging mathematical problems. Whether you are analyzing scientific datasets or debugging complex code, the o3-mini ensures both accuracy and efficiency. By focusing on measurable improvements, the o3-mini enhances productivity in technical fields. Its ability to handle complex reasoning tasks with greater precision makes it a reliable choice for professionals seeking dependable AI solutions. One of the standout features of the o3-mini is its exceptional affordability. OpenAI has reduced per-token pricing by an impressive 95% since the launch of GPT-4, making this model accessible to a broader audience. Despite its lower cost, the o3-mini maintains a high standard of performance, offering advanced reasoning capabilities without sacrificing quality. This balance between affordability and functionality positions the o3-mini as an attractive option for developers and organizations looking for scalable AI solutions. Whether you are a small business or a large enterprise, the o3-mini provides a cost-effective way to integrate innovative AI into your operations. Find more information on AI reasoning models by browsing our extensive range of articles, guides and tutorials. The o3-mini is specifically designed to cater to the needs of developers, offering a suite of tools that simplify complex tasks and enhance productivity. Its key features include: These features empower developers to build efficient, data-driven applications with minimal effort. By automating repetitive tasks and providing tools for advanced reasoning, the o3-mini enhances both the speed and quality of development processes. Automation is a core strength of the o3-mini, making it an invaluable asset for tasks such as transforming raw data into structured formats like JSON. This capability is particularly useful for applications involving activity tracking, data visualization, and other scenarios requiring clean, organized datasets. For low-code and no-code users, the o3-mini simplifies coding processes by generating functional scripts with minimal input. This feature reduces the need for extensive manual intervention, saving time and effort. By automating repetitive and time-consuming tasks, the o3-mini enables users to focus on higher-value activities, improving overall productivity. The versatility of the o3-mini is evident in its practical applications. For instance, it can automate the structuring of JSON data, providing a foundation for advanced data analysis and visualization. By offering actionable suggestions, it helps users extract meaningful insights from complex datasets, making it an essential tool for data analysts and decision-makers. Whether you are managing large-scale projects or handling routine tasks, the o3-mini adapts to your specific needs with precision and reliability. Its ability to integrate seamlessly into various workflows ensures that it can be used effectively across different industries and use cases. The o3-mini is available in both ChatGPT and API formats, offering flexibility to suit a variety of use cases. Its production-ready design ensures smooth integration into existing systems, enhancing functionality without disrupting established workflows. With robust API support, developers can incorporate the o3-mini into their applications to unlock new capabilities and improve overall efficiency. This flexibility makes it a practical choice for organizations looking to enhance their operations with advanced AI technology. The ChatGPT o3-mini exemplifies OpenAI's dedication to advancing AI technology while prioritizing accessibility and cost-effectiveness. With its enhanced reasoning capabilities, developer-friendly features, and adaptability to real-world applications, the o3-mini is a versatile tool designed to meet the diverse needs of modern users. Whether you are a developer seeking to streamline workflows, a data analyst aiming to extract actionable insights, or a business professional looking to automate routine tasks, the o3-mini equips you with the tools to boost productivity and drive innovation. Its combination of affordability, performance, and practicality ensures that it remains a valuable asset in the evolving landscape of AI-powered solutions.
[4]
OpenAI o3-mini vs DeepSeek R1 : Performance Comparison and First Impressions
If you are interested in learning more about the latest OpenAI o3-mini AI model release this weekend. This performance comparison and first impressions overview by All About AI well hopefully answer your initial questions. Whether you're a developer, a researcher, or just someone curious about the latest in AI, the choice between models can feel overwhelming. That's where OpenAI o3-mini vs DeepSeek R1 come into play, two recent contenders with unique strengths and quirks, each vying for the top spot in coding, reasoning, and orchestration. But how do you decide which one is right for you? The answer lies in understanding how they perform in real-world scenarios. In this overview of the latest OpenAI model, All About AI walk you through a head-to-head comparison of these two models, breaking down their performance across key tasks like coding, problem-solving, and token output. You'll see where each model shines, where they stumble, and how they stack up in terms of speed and cost. By the end, you'll have a clearer picture of which AI might be your best bet -- whether you need precision in reasoning, efficiency in orchestration, or a balance of both. Both models bring unique strengths to the table, from coding and reasoning to token output capacity and AI agent orchestration. This analysis provide more insights into their performance across critical metrics, offering a detailed perspective on their capabilities and limitations. In coding tasks, the performance of these models varied based on the complexity of the assignments: These findings suggest that while DeepSeek R1 demonstrates a slight edge in tackling complex coding problems, both models are competent in handling simpler programming tasks. In the realm of AI agent orchestration, o3-mini emerged as the stronger performer. It efficiently assigned tasks to multiple agents and synthesized their outputs into a coherent summary. DeepSeek R1, while capable of completing the orchestration task, lacked the same level of precision and synthesis. For workflows that require seamless multi-agent coordination, o3-mini stands out as the more reliable choice, offering enhanced efficiency and clarity in task management. Here are more detailed guides and articles that you may find helpful on DeepSeek R1. The reasoning and problem-solving capabilities of the two models were tested through a variety of challenges, yielding distinct results: While both models excel in logical reasoning, DeepSeek R1's ability to interpret subtle, context-heavy challenges gives it an advantage in scenarios requiring deeper contextual understanding. Token output capacity revealed notable differences between the two models: For tasks requiring extensive token generation, such as document analysis or summarization, o3-mini is the better option. However, DeepSeek R1's concise and precise outputs may be more suitable for tasks with tighter constraints or where clarity is paramount. Speed and cost are critical factors when selecting an AI model, and the two systems differ in these areas: For users prioritizing speed and affordability, o3-mini presents a compelling choice. However, DeepSeek R1's current pricing may appeal to those operating within tighter budgets. Choosing between o3-mini vs DeepSeek R1 depends on your specific requirements and priorities: Both models offer distinct advantages, making them valuable tools for different use cases. By understanding their strengths and limitations, you can select the model that aligns best with your needs and objectives. For more details on the performance of the new OpenAI o3-mini AI model jump over to the official Open AI website.
[5]
OpenAI o3-mini Review : AI Coding Performance & Search Capabilities Tested
The o3-mini, developed by Openi, represents a notable step forward in artificial intelligence, particularly in the realms of search functionality and coding capabilities. Positioned as a cost-effective alternative to its predecessors, it combines affordability with enhanced features, making it an attractive option for developers and researchers. However, while it excels in several areas, its reasoning performance remains inconsistent, indicating both potential and areas for improvement. This performance test by Prompt Engineering, using a variety of prompts and testing different areas, provides insight into o3-mini's strengths, weaknesses, and what makes it stand out from its predecessors. From its expanded token context window to its ability to retrieve real-time information, the o3-mini offers a glimpse into the future of accessible AI tools. But before you get too excited, it's worth exploring the areas where it still struggles -- like reasoning through paradoxes or handling ambiguous queries. Whether you're curious about its coding prowess or its potential for open source development, this overview will help you weigh its capabilities against your needs, so you can decide if it's the right fit for your projects. Key Advancements in the o3-mini The o3-mini introduces a range of upgrades that distinguish it from earlier models, such as the o1 and o1-mini. These enhancements include: These advancements position the o3-mini as a compelling option for users seeking advanced AI tools that balance performance and affordability. Performance Benchmarks: Strengths and Weaknesses The o3-mini delivers mixed results across various performance benchmarks, showcasing both its strengths and limitations. These results highlight the model's potential in specific areas while underscoring the need for further refinement in its reasoning algorithms. o3-mini - AI search and coding tested Stay informed about the latest in OpenAI o3-mini AI model by exploring our other resources and articles. Enhanced Search and Coding Capabilities One of the standout features of the o3-mini is its improved search functionality, which enhances its ability to retrieve and process real-time information. This capability is particularly useful for tasks such as: However, the search functionality is not without limitations. The model occasionally struggles with ambiguous or outdated queries, sometimes defaulting to its training data instead of using search results. This reliance can lead to inaccuracies in certain scenarios, particularly when dealing with rapidly changing information. In addition to search, the o3-mini demonstrates robust coding capabilities. It performs well in tasks requiring precision and contextual understanding, including: While the model excels in these areas, its performance in more intricate coding scenarios, such as physics-based problems, is less consistent. Users may need to refine its outputs to achieve optimal results, particularly for complex or highly specialized tasks. Reasoning Tasks: Strengths and Challenges Reasoning remains a mixed area for the o3-mini. The model demonstrates competence in straightforward logical deductions but struggles with more complex scenarios. For example: These challenges highlight the need for continued development to enhance the model's reasoning capabilities, particularly in scenarios that require nuanced understanding or ethical judgment. Open source Potential and Future Directions Openi has hinted at a potential shift toward open source AI development, suggesting that future models may feature open weights. This move could foster greater collaboration and innovation within the AI community, aligning with broader trends toward transparency and accessibility in AI research. By embracing open source principles, Openi could enable developers and researchers to build upon the o3-mini's foundation, accelerating advancements in AI technology. Practical Recommendations for Users The o3-mini is best suited for users who are willing to explore its capabilities and adapt its outputs to their specific needs. To maximize its potential, consider the following recommendations: By taking a strategic approach, users can effectively harness the o3-mini's capabilities while mitigating its limitations. Limitations and Areas for Improvement Despite its advancements, the o3-mini has several limitations that need to be addressed for broader adoption: Addressing these issues will be critical for improving the model's overall performance and reliability, making sure it can meet the diverse needs of its users. Looking Ahead The o3-mini by Openi represents a significant milestone in AI development, particularly in search functionality and coding capabilities. Its affordability and expanded token context window make it an appealing choice for a wide range of users, from developers to researchers. However, its inconsistent reasoning performance highlights areas that require further refinement. As Openi continues to innovate, the o3-mini serves as both a valuable tool and a foundation for future advancements in artificial intelligence.
[6]
5 Things ChatGPT o3-mini Does Better Than Other AI Models
OpenAI finally launched its frontier o3-mini model in response to China's DeepSeek R1 reasoning model this weekend. The o3-series of models were announced in December last year. OpenAI did not waste any time and launched o3-mini and o3-mini-high to keep its lead in the AI race. So, we were curious about all the things ChatGPT o3-mini does better than other AI models, and well, we tested it out. We have tested its coding prowess and discussed various benchmarks rigorously. On that note, let's dive in. OpenAI says o3-mini delivers exceptional performance in coding tasks while keeping the cost low and maintaining great speed. Prior to the o3-mini model, Anthropic's Claude 3.5 Sonnet was the go-to model for programming queries. But that's changing with the o3-mini release, specifically with the o3-mini-high model available to ChatGPT Plus and Pro users. I tested the o3-mini-high model and asked it to create a Python snake game where multiple autonomous snakes compete with each other. The o3-mini-high model thought for 1 minute and 10 seconds and generated the Python code in one shot. I executed the code, and it ran smoothly without any issues. It was fun to watch autonomous snakes make their moves, and it was absolutely precise, just like humans play! After all, the o3-mini-high model has achieved an Elo score of 2,130 on the Codeforces competitive programming platform. This puts the o3-mini-high model among the top 2500 programmers in the world. Apart from that, in the SWE-bench Verified benchmark that evaluates capabilities in solving real-world software issues, o3-mini-high achieved 49.3% accuracy, which is even higher than the larger o1 model (48.9%). So for AI coding assistance, I think the o3-mini-high model will offer you the best performance until the full o3 model comes out, which Sam Altman says is coming in a few weeks. Apart from coding, math is another discipline where the o3-mini model outperforms other AI models. In the prestigious 2024 American Invitational Mathematics Examination (AIME), which has questions from number theory, probability, algebra, geometry, etc., the o3-mini-high achieved an impressive 87.3% again, higher than the full o1 model. In the rigorous FrontierMath benchmark which features expert-level math problems from leading mathematicians, Fields Medalists, and professors from around the world, o3-mini-high achieved 20% after eight attempts. Even in a single attempt, it scored 9.2%, which is still significant. To put this into perspective, renowned mathematician Terence Tao has described the problems in FrontierMath benchmark as "extremely challenging". It can take hours and days to solve them, even for expert mathematicians. Other ChatGPT alternatives have only managed to achieve only 2% in this benchmark. The o3-mini-high model also excels at PhD-level science questions and beats other AI models by a significant margin. GPQA Diamond is an advanced benchmark that evaluates the capabilities of AI models in specialized scientific domains. It consists of advanced questions from the fields of biology, physics, and chemistry. In the GPQA Diamond benchmark, o3-mini-high scored a remarkable 79.7%, outranking the larger o1 model (78.0%). For comparison, Google's latest Gemini 2.0 Flash Thinking (Exp-01-21) reasoning model could manage 73.3%. Even the new Claude 3.5 Sonnet model stands at 65% in the GPQA Diamond benchmark. It goes on to show that OpenAI's smaller o3-mini model when given more time and compute to think, can outperform other AI models at expert-level science questions. Across general knowledge domains, it's expected that o3-mini wouldn't beat larger models as it's smaller and specialized for coding, math, and science. However, despite its smaller size, it comes very close to matching larger models. In the MMLU benchmark that assesses the performance of AI models across a wide variety of subjects, o3-mini-high scores 86.9% whereas OpenAI's own GPT-4o model gets 88.7%. That said, the upcoming larger o3 model would easily beat all AI models out there across general knowledge domains. I say this because the full o1 model already achieved 92.3% on the MMLU benchmark. Now, we need to wait for the full o3 model that might saturate the benchmark entirely. The knowledge cutoff of o3-mini is October 2023 which is quite old at this point. However, OpenAI has added web search support for the o3-mini model, allowing the reasoning model to extract the latest information from the web and perform advanced reasoning. DeepSeek R1 also does this, but no other reasoning model lets you access the web for further reasoning. So these are some of the advanced capabilities of the o3-mini model. While free ChatGPT users can also access o3-mini, the reasoning effort is set to "medium" which uses less compute. I would recommend paying for the ChatGPT Plus subscription, which costs $20/month, to unlock the powerful 'o3-mini-high' model. For professional coders, researchers, and undergraduate STEM students, the o3-mini-high model can be highly beneficial.
[7]
OpenAI Launches ChatGPT o3 Mini : AI Just Got Smarter
OpenAI has introduced the ChatGPT o3 Mini, a new reasoning model now available to all users, including those on the free plan. This release represents a pivotal advancement in artificial intelligence, combining enhanced STEM capabilities, coding expertise, and more with improved speed and efficiency. Designed to meet the growing demand for versatile AI tools, the o3 Mini establishes a new standard for performance and accessibility in the rapidly evolving AI landscape. What makes the ChatGPT o3 Mini so exciting isn't just its impressive performance in logical reasoning tasks. It's the fact that this advanced tool is now accessible to everyone, including free-tier users. With features like web search integration and multiple versions tailored to different needs, this model is opening doors for people who might never have had access to such innovative technology before. But what does that mean for you? With the help of Skill Leap AI explore how this AI is setting a new standard for what's possible -- and how it might just become your go-to problem-solving companion. The ChatGPT o3 Mini is the first reasoning model made accessible to free-tier users, significantly broadening access to advanced AI technology. While free users can use the standard version, paid subscribers -- such as Plus, Team, and Pro users -- gain access to enhanced variants like the o3 Mini High. These premium versions are specifically designed to handle tasks of varying complexity, making sure users across diverse fields can optimize the model for their unique requirements. Key features include: This initiative underscores OpenAI's commitment to making advanced AI tools more accessible while maintaining exceptional performance standards. "OpenAI o3-mini is our first small reasoning model that supports highly requested developer features including function calling(opens in a new window), Structured Outputs(opens in a new window), and developer messages(opens in a new window), making it production-ready out of the gate. Like OpenAI o1-mini and OpenAI o1-preview, o3-mini will support streaming(opens in a new window). Also, developers can choose between three reasoning effort(opens in a new window) options -- low, medium, and high -- to optimize for their specific use cases. This flexibility allows o3-mini to "think harder" when tackling complex challenges or prioritize speed when latency is a concern. o3-mini does not support vision capabilities, so developers should continue using OpenAI o1 for visual reasoning tasks. o3-mini is rolling out in the Chat Completions API, Assistants API, and Batch API starting today to select developers in API usage tiers 3-5(opens in a new window)." - OpenAI. The ChatGPT o3 Mini introduces substantial improvements over its predecessor, excelling in STEM-related tasks such as mathematics, science, and coding. Rigorous testing on benchmarks for logical reasoning and software engineering has demonstrated its ability to consistently outperform the 01 Mini. Notable advancements include: One of the most significant upgrades is its low latency, which ensures faster response times. This feature is particularly valuable for time-sensitive tasks, whether solving challenging STEM problems or working on software development projects. By delivering accurate results with remarkable speed, the o3 Mini enhances productivity across a wide range of applications. Uncover more insights about ChatGPT Mini in previous articles we have written. To cater to a broad spectrum of use cases, the ChatGPT o3 Mini is available in multiple variants, each designed to handle tasks of varying complexity. These include: This flexibility allows users to select the version that best aligns with their needs, whether conducting basic research, solving logical puzzles, or tackling advanced engineering problems. By offering tailored options, the o3 Mini ensures versatility and effectiveness across a wide range of applications, making it a valuable tool for both casual users and professionals. One of the most innovative features of the ChatGPT o3 Mini is its integration with web search capabilities. This functionality enables the model to retrieve external resources and incorporate them into its responses seamlessly. For example: This integration makes the o3 Mini a powerful tool for academic and professional applications alike. By combining advanced reasoning capabilities with the vast resources of the internet, it offers a unique advantage for users seeking comprehensive and accurate results. The release of the ChatGPT o3 Mini comes at a time of increasing competition in the AI space, with models like Deep Seek R1 -- a free reasoning AI developed by a Chinese startup -- entering the market. While OpenAI has primarily focused on benchmarking the o3 Mini against its own models, the competitive landscape highlights the rapid pace of innovation in the field. By offering superior performance and accessibility, the o3 Mini aims to solidify OpenAI's leadership in reasoning AI. Its ability to handle complex tasks with precision and speed positions it as a strong contender in this dynamic and competitive market. OpenAI's focus on accessibility ensures that more users can benefit from advanced AI tools, further strengthening its position as a leader in the industry. Extensive testing of the ChatGPT o3 Mini has demonstrated its versatility and reliability across a wide range of real-world applications. Some notable use cases include: These examples highlight the practical value of the o3 Mini, making it an indispensable resource for students, researchers, and professionals. Its adaptability ensures it remains relevant across diverse contexts, from academic projects to professional problem-solving. The ChatGPT o3 Mini represents a significant leap forward in reasoning AI, offering unmatched performance, accessibility, and versatility. With its advanced STEM capabilities, coding proficiency, and web search integration, it sets a new benchmark for what AI models can achieve. By extending access to free users, OpenAI has taken a bold step toward providing widespread access to advanced AI tools, making sure that more people can benefit from its potential. As competition in the AI space intensifies, the o3 Mini stands out as a powerful and accessible solution for tackling complex reasoning tasks. Whether you are a student, a professional, or an AI enthusiast, this model equips you with the tools needed to excel in your endeavors, paving the way for a new era of reasoning AI.
[8]
OpenAI rolls out o3-mini AI reasoning model for all users
OpenAI has launched o3-mini, the latest model in its reasoning series, now available in ChatGPT and the API. Initially previewed in December 2024, this model enhances performance in science, math, and coding while offering lower costs and faster processing speeds compared to its predecessor, o1-mini. It is optimized for complex STEM tasks, providing efficiency and affordability. The o3-mini model introduces several developer-focused features, including: However, o3-mini does not support vision-related tasks. Developers requiring visual reasoning should continue using o1. OpenAI describes o3-mini as a specialized model for technical fields, prioritizing precision and speed. In ChatGPT, it defaults to medium reasoning effort, balancing response time and accuracy. Compared to o1-mini, it delivers clearer answers with stronger reasoning capabilities, reducing major errors by 39% in complex real-world scenarios. In blind evaluations, testers preferred o3-mini's responses over o1-mini 56% of the time. The o3-mini model has been tested across multiple STEM-focused evaluations, demonstrating superior performance: Additionally, o3-mini generates responses 2500ms faster on average compared to o1-mini, reducing latency and improving efficiency. OpenAI has emphasized safety in o3-mini's development by implementing deliberative alignment, a training method that enables the model to analyze safety guidelines before responding. This approach improves its ability to handle sensitive prompts while maintaining accuracy and reliability. OpenAI reports that o3-mini surpasses GPT-4o in safety and jailbreak evaluations. Prior to release, OpenAI conducted extensive safety testing, including external red-teaming and internal risk assessments. Further details on potential risks and mitigation strategies are outlined in the o3-mini system card. The o3-mini model became available on January 31, 2025, for ChatGPT Plus, Team, and Pro users. Enterprise access will roll out in February 2025. It replaces o1-mini in the model picker, offering higher rate limits and lower latency, making it particularly suited for STEM, coding, and logical reasoning tasks. For developers, o3-mini is available through the Chat Completions API, Assistants API, and Batch API for selected users in tiers 3-5. Additionally, o3-mini now integrates with search, providing real-time answers with linked sources. This feature is in prototype mode as OpenAI continues developing search capabilities for its reasoning models. OpenAI views o3-mini as part of its ongoing effort to make high-quality AI more accessible and cost-effective. The model builds on OpenAI's 95% reduction in per-token costs since GPT-4, while maintaining strong reasoning capabilities. As AI adoption expands, OpenAI emphasized that it remains focused on developing models that balance intelligence, efficiency, and safety, ensuring scalable AI solutions for technical and general use cases.
[9]
OpenAI Releases o3-mini in Response to DeepSeek, Makes it Free for All
Paid users can choose o3-mini-high for better reasoning, while Pro users get unlimited access to both o3-mini and o3-mini-high. OpenAI has announced the release of OpenAI o3-mini, a new AI model built to deliver cost-effective reasoning with improved efficiency and STEM capabilities. The model is available in ChatGPT and the API, replacing OpenAI o1-mini in the model selection. "o3-mini is out! smart, fast model. available in ChatGPT and API. It can search the web, and it shows its thinking. Available to free-tier users! click the "reason" button," said OpenAI chief Sam Altman in a post on X. The launch of o3-mini comes amid the rising popularity of DeepSeek-R1, an open-source model that outperforms OpenAI's o1. Interestingly, during a Reddit AMA, Altman was asked whether OpenAI would consider releasing some model weights and publishing more research. In response, he acknowledged the ongoing discussions within the company, saying, "I personally think we have been on the wrong side of history here and need to figure out a different open-source strategy." However, he also noted that not everyone at OpenAI shares this view and that it is "not our current highest priority." He further praised the DeepSeek model, saying, "It's a very good model! We will produce better models, but our lead will be smaller than in previous years." OpenAI announced that "o3-mini advances the boundaries of what small models can achieve, delivering exceptional STEM capabilities while maintaining low cost and reduced latency." The model supports function calling, structured outputs, and developer messages, making it more adaptable for production use. OpenAI o3-mini is available in the Chat Completions API, Assistants API, and Batch API for select developers in API usage tiers 3-5. ChatGPT Plus, Team, and Pro users can access the model starting today, with Enterprise access set for February. Free-tier users can also try the model by selecting the 'Reason' option in the ChatGPT interface. The model introduces three reasoning effort modes -- low, medium, and high -- allowing users to balance speed and complexity. "This flexibility allows o3-mini to 'think harder' when tackling complex challenges or prioritise speed when latency is a concern," OpenAI stated. Performance evaluations indicate that o3-mini surpasses its predecessor, o1-mini, in key STEM areas. On the AIME 2024 math competition, the high-reasoning variant of o3-mini achieved 83.6% accuracy. In PhD-level science evaluations (GPQA Diamond), o3-mini (high) reached 77.0% accuracy, showing improvements over previous models. In competitive programming tasks on Codeforces, o3-mini (high) achieved an Elo rating of 2073, surpassing o1-mini. Software engineering benchmarks also show gains. In the SWE-bench Verified evaluation, o3-mini (high) achieved a 48.9% accuracy, exceeding o1-mini's performance. In LiveBench coding evaluations, the model outperformed o1-high at medium reasoning effort, reinforcing its efficiency in coding tasks. Beyond STEM, o3-mini demonstrates superior general knowledge performance. Human preference evaluations showed testers preferred o3-mini's responses over o1-mini's 56% of the time, with a 39% reduction in major errors on complex questions. OpenAI also emphasised improvements in speed and latency. "o3-mini delivers responses 24% faster than o1-mini, with an average response time of 7.7 seconds compared to 10.16 seconds," the company stated. The model has an average 2500ms faster time to first token than o1-mini. Safety remains a key focus. OpenAI highlighted the use of deliberative alignment to ensure safer responses. "o3-mini significantly surpasses GPT-4o on challenging safety and jailbreak evaluations," OpenAI noted. The model underwent external red-teaming and rigorous safety testing before deployment. With this release, OpenAI continues its effort to improve AI accessibility while reducing costs. "o3-mini continues our track record of driving down the cost of intelligence -- reducing per-token pricing by 95% since launching GPT-4 -- while maintaining top-tier reasoning capabilities," OpenAI stated. The company is adding search features to its reasoning models. Paid users can choose o3-mini-high for better reasoning, while Pro users get unlimited access to both o3-mini and o3-mini-high.
[10]
I tested ChatGPT's free new o3-mini model with 7 prompts to rate its problem-solving and reasoning capabilities -- here's what happened
OpenAI's o3-mini model is now part of the free tier of ChatGPT, which lets users take full advantage of a significant advancement in AI, particularly for tasks requiring complex reasoning and problem-solving. Building upon the foundation laid by its predecessors, the o3-mini model introduces enhanced capabilities that set it apart. The o3 model excels in tasks that demand step-by-step logical reasoning. Essentially, o3-mini has a "private chain of thought" approach, planning and reasoning through tasks, then performing intermediate steps to assist in problem-solving. This method results in more accurate and reliable outputs, especially in complex scenarios. The o3-mini is a streamlined version of the o3 model, offering higher rate limits and lower latency, making it a compelling choice for coding, STEM and logical problem-solving tasks. It replaces the o1-mini model in the ChatGPT interface, providing users with improved performance for free. This accessibility allows a broader audience to benefit from the model's enhanced performance. In coding tasks, o3 has demonstrated exceptional proficiency. It achieved an Elo score of 2,727 on the Codeforces competitive programming platform, placing it among the top 2,500 programmers globally. Additionally, o3 scored 71.7% on the SWE-bench Verified benchmark, which assesses the ability to solve real-world software issues, outperforming its predecessor, o1, which scored 48.9%. Additionally, o3 excels in scientific and mathematical benchmarks, achieving a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online. Furthermore, on the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, o3 attained three times the accuracy of o1, showcasing its advanced reasoning capabilities. For those looking for ways to see how theo3-mini model truly shines, consider experimenting with the following queries or similar ones that explore coding, math, and STEM tasks. Here's a look at what happened when I put the o3-mini model to the test with seven varying prompts. Prompt: "Write a Python script that simulates a basic banking system with functionalities to deposit, withdraw and check balance." This prompt is excellent for testing o3-mini because it combines multiple aspects of programming -- from OOP and control structures to input validation and error handling -- into one cohesive example. It challenges the model to produce a complete, functional, and well-structured piece of software, which is a solid measure of its code generation capabilities. The prompt is not only a test of code generation but also serves as a learning tool. It provides a concrete example that can help users understand how to design and implement basic banking functionality in Python. This dual purpose of being both a test case and an educational example makes it useful and simple enough for even casual users to understand and implement. Prompt: "Prove the Pythagorean theorem using a geometric approach." This prompt requires a blend of logical sequencing, mathematical rigor, clear communication, and integration of different types of reasoning. It demonstrates the o3-mini model's ability to handle complex, multi-faceted tasks as it successfully generated a clear and correct geometric proof of the Pythagorean theorem. Prompt: "Explain the process of photosynthesis in detail." The o3-mini model's ability to cover a broad range of scientific concepts and recall, organize, and articulate that multi-step process is made evident in this prompt. The logically organized, detailed response was clearly presented and flowed coherently. This prompt showcases the model's ability to relay deep scientific knowledge and the ability to integrate interdisciplinary concepts into a cohesive explanation. Prompt: "Analyze the causes and effects of the French Revolution." This prompt requires the integration of interdisciplinary historical knowledge, structured and coherent writing, and critical analysis of complex cause-and-effect relationships, making it an ideal prompt to test the o3-mini model's ability to successfully generate accurate, detailed, and educationally valuable content on a multifaceted historical topic. This prompt showcases how the o3-mini model can be used for educational or teaching purposes. Prompt: "Provide a critical analysis of Shakespeare's 'Hamlet' focusing on its themes of madness and revenge." The prompt requires a deep and critical analysis of Hamlet, focusing on multifaceted themes like madness and revenge. This tests the model's ability to engage in high-level literary criticism, synthesizing various elements of the text to produce an insightful analysis. This model successfully addressed the complex academic task and expertly produced a nuanced, well-supported argument about intricate themes in literature. Prompt: "Discuss the concept of utilitarianism and its implications in modern ethics." By asking for both a discussion of utilitarianism as a concept and its implications in modern ethics, the prompt challenges the model to bridge historical philosophical theories with contemporary ethical issues. This demonstrates the model's capacity to synthesize information across different time periods and contexts. This, and prompts like it, test the abstract reasoning ability of the o3-mini. This prompt also highlights the model's ability to do critical analysis, understand historical content, and the practical application - all of which are essential for generating an informative and nuanced response on complex ethical topics. Prompt: "Design an integrated strategy to optimize urban transportation in a rapidly growing megacity. Your plan should address the following aspects." This prompt effectively showcases the model's problem-solving and complex reasoning abilities. The query requries an integrated, multifaceted solution that mirrors the challenges encountered in real-world scenarios, in this case, planning within an urban environment. The prompt also dives deep into the o3-mini's ability to understand many "moving parts" including environmental science, technology, and socio-economics. Although I did not show the script of the model "thinking," it did take the time to thoughtfully process a response before offering a detailed, step-by-step plan and the rationale behind the solution. OpenAI's o3-mini model represents a significant advancement in AI, offering enhanced reasoning and problem-solving capabilities across various domains. Its integration into ChatGPT's free tier democratizes access to advanced AI tools, empowering users to tackle complex tasks with greater efficiency. By experimenting with diverse prompts, users can fully appreciate the model's versatility and potential.
[11]
OpenAI Fights Back Against DeepSeek AI With Early o3-Mini Launch -- Here's How It Compares - Decrypt
OpenAI rushed to defend its market position Friday with the release of o3-mini, a direct response to Chinese startup DeepSeek's R1 model that sent shockwaves through the AI industry by matching top-tier performance at a fraction of the computational cost. "We're releasing OpenAI o3-mini, the newest, most cost-efficient model in our reasoning series, available in both ChatGPT and the API today" OpenAI said in an official blog post. "Previewed in December 2024, this powerful and fast model advances the boundaries of what small models can achieve (...) all while maintaining the low cost and reduced latency of OpenAI o1-mini." OpenAI also made reasoning capabilities available for free to users for the first time while tripling daily message limits for paying customers, from 50 to 150, to boost the usage of the new family of reasoning models. Unlike GPT-4o and the GPT family of models, the "o" family of AI models is focused on reasoning tasks. They're less creative, but have embedded chain of thought reasoning that makes them more capable of solving complex problems, backtracking on wrong analyses, and building better structure code. At the highest level, OpenAI has two main families of AI models: Generative Pre-trained Transformers (GPT) and "Omni" (o). The new o3 mini comes in three versions -- low, medium, or high. These subcategories will provide users with better answers in exchange for more "inference" (which is more expensive for developers who need to pay per token). OpenAI o3-mini, aimed at efficiency, is worse than OpenAI o1-mini in general knowledge and multilingual chain of thought, however, it scores better at other tasks like coding or factuality. All the other models (o3-mini medium and o3-mini high) do beat OpenAI o1-mini in every single benchmark. DeepSeek's breakthrough, which delivered better results than OpenAI's flagship model while using just a fraction of the computing power, triggered a massive tech selloff that wiped nearly $1 trillion from U.S. markets. Nvidia alone shed $600 billion in market value as investors questioned the future demand for its expensive AI chips. The efficiency gap stemmed from DeepSeek's novel approach to model architecture. While American companies focused on throwing more computing power at AI development, DeepSeek's team found ways to streamline how models process information, making them more efficient. The competitive pressure intensified when Chinese tech giant Alibaba released Qwen2.5 Max, an even more capable model than the one DeepSeek used as its foundation, opening the path to what could be a new wave of Chinese AI innovation. OpenAI o3-mini attempts to increase that gap once again. The new model runs 24% faster than its predecessor, and matches or beats older models on key benchmarks while costing less to operate. Its pricing is also more competitive. OpenAI o3-mini's rates -- $0.55 per million input tokens and $4.40 per million output tokens -- are a lot higher than DeepSeek's R1 pricing of $0.14 and $2.19 for the same volumes, however, they decrease the gap between OpenAI and DeepSeek, and represent a major cut when compared to the prices charged to run OpenAI o1. And that might be key to its success. OpenAI o3-mini is closed-source, unlike DeepSeek R1 which is available for free -- but for those willing to pay for use on hosted servers, the appeal will increase depending on the intended use. OpenAI o3 mini-medium scores 79.6 on the AIME benchmark of math problems. DeepSeek R1 scores 79.8, a score that is only beaten by the most powerful model in the family, OpenAI mini-o3 high, which scores 87.3 points. The same pattern can be seen in other benchmarks: The GPQA marks, which measure proficiency in different scientific disciplines, are 71.5 for DeepSeek R1, 70.6 for o3-mini low, and 79.7 for o3-mini high. R1 is at the 96.3rd percentile in Codeforces, a benchmark for coding tasks, whereas o3-mini low is at the 93rd percentile and o3-mini high is at the 97th percentile. So the differences exist, but in terms of benchmarks, they may be negligible depending on the model chosen for executing a task. We tried the model with a few tasks to see how it performed against DeepSeek R1. The first task was a spy game to test how good it was at multi-step reasoning. We choose the same sample from the BIG-bench dataset on Github that we used to evaluate DeepSeek R1. (The full story is available here and involves a school trip to a remote, snowy location, where students and teachers face a series of strange disappearances; the model must find out who the stalker was.) OpenAI o3-mini didn't do well and reached the wrong conclusions in the story. According to the answer provided by the test, the stalker's name is Leo. DeepSeek R1 got it right, whereas OpenAI o3-mini got it wrong, saying the stalker's name was Eric. (Fun fact, we cannot share the link to the conversation because it was marked as unsafe by OpenAI). The model is reasonably good at logical language-related tasks that don't involve math. For example, we asked the model to write five sentences that end in a specific word, and it was capable of understanding the task, evaluating results, before providing the final answer. It thought about its reply for four seconds, corrected one wrong answer, and provided a reply that was fully correct. It is also very good at math, proving capable of solving problems that are deemed as extremely difficult in some benchmarks. The same complex problem that took DeepSeek R1 275 seconds to solve was completed by OpenAI o3-mini in just 33 seconds. So a pretty good effort, OpenAI. Your move DeepSeek.
[12]
OpenAI launches o3-mini, its latest 'reasoning' model | TechCrunch
OpenAI on Friday launched a new AI "reasoning" model, o3-mini, the newest in the company's o family of reasoning models. OpenAI first previewed the model in December alongside a more capable system called o3, but the launch comes at a pivotal moment for the company, whose ambitions -- and challenges -- are seemingly growing by the day. OpenAI is battling the perception that it's ceding ground in the AI race to Chinese companies like DeepSeek, which OpenAI alleges might have stolen its IP. Nonetheless, the ChatGPT maker has managed to win over scores of developers, and it's been trying to shore up its relationship with Washington as it simultaneously pursues an ambitious data center project, It's reportedly also laying the groundwork for one of the largest financing rounds by a tech company in history. Which brings us to o3-mini. OpenAI is pitching its new model as both "powerful" and "affordable." "Today's launch marks [...] an important step toward broadening accessibility to advanced AI in service of our mission," an OpenAI spokesperson told TechCrunch. Unlike most large language models, reasoning models like o3-mini thoroughly fact-check themselves before giving out results. This helps them avoid some of the pitfalls that normally trip up models. These reasoning models do take a little longer to arrive at solutions, but the trade-off is that they tend to be more reliable -- though not perfect -- in domains like physics. O3-mini is fine-tuned for STEM problems, specifically for programming, math, and science. OpenAI claims the model is largely on par with the o1 family, o1 and o1-mini in terms of capabilities, but runs faster and costs less. The company claimed that external testers preferred o3-mini's answers over those from o1-mini more than half the time. O3-mini apparently also made 39% fewer "major mistakes" on "tough real-world questions" in A/B tests versus o1-mini, and produced "clearer" responses while delivering answers about 24% faster. O3-mini will be available to all users via ChatGPT starting Friday, but users who pay for the company's ChatGPT Plus and Team plans will get a higher rate limit of 150 queries per day, while ChatGPT Pro subscribers will get unlimited access. OpenAI said o3-mini will come to ChatGPT Enterprise and ChatGPT Edu customers in a week (no word on ChatGPT Gov). Users with premium ChatGPT plans can select o3-mini using the drop-down menu. Free users can click or tap the new "Reason" button in the chat bar, or have ChatGPT "re-generate" an answer. Beginning Friday, o3-mini will also be available via OpenAI's API to select developers, but it initially will not have support for analyzing images. Devs can select the level of "reasoning effort" (low, medium, or high) to get o3-mini to "think harder" based on their use case and latency needs. O3-mini is priced at $1.10 per million cached input tokens and $4.40 per million output tokens, where a million tokens equates to roughly 750,000 words. That's 63% cheaper than o1-mini, and competitive with DeepSeek's R1 reasoning model pricing. DeepSeek charges $0.14 per million cached input tokens and $2.19 per million output tokens for R1 access through its API. In ChatGPT, o3-mini is set to medium reasoning effort, which OpenAI says provides "a balanced trade-off between speed and accuracy." Paid users will have the option of selecting "o3-mini-high" in the model picker, which will deliver what OpenAI calls "higher-intelligence" in exchange for slower responses. Regardless of which version of o3-mini ChatGPT users choose, the model will work with search to find up-to-date answers with links to relevant web sources. OpenAI cautions that the functionality is a "prototype" as it works to integrate search across its reasoning models. "While o1 remains our broader general-knowledge reasoning model, o3-mini provides a specialized alternative for technical domains requiring precision and speed," OpenAI wrote in a blog post on Friday. "The release of o3-mini marks another step in OpenAI's mission to push the boundaries of cost-effective intelligence." O3-mini is not OpenAI's most powerful model to date, nor does it leapfrog DeepSeek's R1 reasoning model in every benchmark. O3-mini beats R1 on AIME 2024, a test that measures how well models understand and respond to complex instructions -- but only with high reasoning effort. It also beats R1 on the programming-focused test SWE-bench Verified (by .1 point), but again, only with high reasoning effort. On low reasoning effort, o3-mini lags R1 on GPQA Diamond, which tests models with PhD-level physics, biology and chemistry questions. To be fair, o3-mini answers many queries at competitively low cost and latency. In the post, OpenAI compares its performance to the o1 family: "With low reasoning effort, o3-mini achieves comparable performance with o1-mini, while with medium effort, o3-mini achieves comparable performance with o1," OpenAI writes. "O3-mini with medium reasoning effort matches o1's performance in math, coding and science while delivering faster responses. Meanwhile, with high reasoning effort, o3-mini outperforms both o1-mini and o1." It's worth noting that o3-mini's performance advantage over o1 is slim in some areas. On AIME 2024, o3-mini beats o1 by just 0.3 percentage points when set to high reasoning effort. And on GPQA Diamond, o3-mini doesn't surpass o1's score even on high reasoning effort. OpenAI asserts that o3-mini is as "safe" or safer than the o1 family, however, thanks to red-teaming efforts and its "deliberative alignment" methodology, which makes models "think" about OpenAI's safety policy while they're responding to queries. According to the company, o3-mini "significantly surpasses" one of OpenAI's flagship models, GPT-4o, on "challenging safety and jailbreak evaluations."
[13]
It's here: OpenAI's o3-mini advanced reasoning model arrives to counter DeepSeek's rise
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has released a new proprietary AI model in time to counter the rapid rise of open source rival DeepSeek R1 -- but will it be enough to blunt the latter's success? Today, after several days of rumors and increasing anticipation among AI users on social media, OpenAl is debuting o3-mini, the second model in its new family of "reasoners," Al models that take slightly more time to "think," analyze their own processes and reflect on their own "chains of thought" before responding to user queries and inputs with new outputs. The result is a model that can perform at the level of a PhD student or even degree holder on answering hard questions in math, science, engineering and many other fields. The o3-mini model is now available on ChatGPT, including the free tier, and OpenAI's application programming interface (API), and it's actually less expensive, faster, and more performant than the previous high-end model, OpenAI's o1 and its faster, lower-parameter count sibling, o1-mini. While inevitably it will be compared to DeepSeek R1, and the release date seen as a reaction, it's important to remember that o3 and o3-mini were announced well prior to the January release of DeepSeek R1, in December 2024 -- and that OpenAI CEO Sam Altman stated previously on X that due to feedback from developers and researchers, it would be coming to ChatGPT and the OpenAI API at the same time. Unlike DeepSeek R1, o3-mini will not be made available as an open source model -- meaning the code cannot be taken and downloaded for offline usage, nor customized to the same extent, which may limit its appeal compared to DeepSeek R1 for some applications. OpenAI did not provide any further details about the (presumed) larger o3 model announced back in December alongside o3-mini. At that time, OpenAI's opt-in dropdown form for testing o3 stated that it would undergo a "delay of multiple weeks" before third-parties could test it. Performance and Features Similar to o1, OpenAI o3-mini is optimized for reasoning in math, coding, and science. Its performance is comparable to OpenAI o1 when using medium reasoning effort, but offers the following advantages: It also boasts impressive benchmarks, even outpacing o1 in some cases, according to the o3-mini System Card OpenAI released online (and which was published earlier than the official model availability announcement). o3-mini's context window -- the number of combined tokens it can input/output in a single interaction -- is 200,000, with a maximum of 100,000 in each output. That's the same as the full o1 model and outperforms DeepSeek R1's context window of around 128,000/130,000 tokens. But it is far below Google Gemini 2.0 Flash Thinking's new context window of up to 1 million tokens. While o3-mini focuses on reasoning capabilities, it doesn't have vision capabilities yet. Developers and users looking to upload images and files should keep using o1 in the meantime. The competition heats up The arrival of o3-mini marks the first time OpenAI is making a reasoning model available to free ChatGPT users. The prior o1 model family was only available to paying subscribers of the ChatGPT Plus, Pro and other plans, as well as via OpenAI's paid application programming interface. As it did with large language model (LLM)-powered chatbots via the launch of ChatGPT in November 2022, OpenAI essentially created the entire category of reasoning models back in September 2024 when it first unveiled o1, a new class of models with a new training regime and architecture. But OpenAI, in keeping with its recent history, did not make o1 open source, contrary to its name and original founding mission. Instead, it kept the model's code proprietary. And over the last two weeks, o1 has been overshadowed by Chinese AI startup DeepSeek, which launched R1, a rival, highly efficient, largely open-source reasoning model freely available to take, retrain, and customize by anyone around the world, as well as use for free on DeepSeek's website and mobile app -- a model reportedly trained at a fraction of the cost of o1 and other LLMs from top labs. DeepSeek R1's permissive MIT Licensing terms, free app/website for consumers, and decision to make R1's codebase freely available to take and modify has led it to a veritable explosion of usage both in the consumer and enterprise markets -- even OpenAI investor Microsoft and Anthropic backer Amazon rushing to add variants of it to their cloud marketplaces. Perplexity, the AI search company, also quickly added a variant of it for users. It also dethroned the ChatGPT iOS app for the number one place in the U.S. Apple App Store, and is notable for outpacing OpenAI by connecting DeepSeek R1 to web search, something that OpenAI has not yet done for o1, leading to further techno anxiety among tech workers and others online that China is catching up or has outpaced the U.S. in AI innovation -- even technology more generally. Many AI researchers and scientists and top VCs such as Marc Andreessen, however, have welcomed the rise of DeepSeek and its open sourcing in particular as a tide that lifts all boats in the AI field, increasing the intelligence available to everyone while reducing costs. Availability in ChatGPT The model is now rolling out globally to Free, Plus, Team, and Pro users, with Enterprise and Education access coming next week. Additionally, o3-mini now supports search integration within ChatGPT, providing responses with relevant web links. This feature is still in its early stages as OpenAI refines search capabilities across its reasoning models. API Integration and Pricing For developers, o3-mini is available via the Chat Completions API, Assistants API, and Batch API. The model supports function calling, Structured Outputs, and developer messages, making it easy to integrate into real-world applications. One of o3-mini's most notable advantages is its cost efficiency: It's 63% cheaper than OpenAI o1-mini and 93% cheaper than the full o1 model, priced at $1.10/$4.40 per million tokens in/out (with a 50% cache discount). Yet it still pales in comparison to the affordability of the official DeepSeek API's offering of R1 at $0.14/$0.55 per million tokens in/out. But given DeepSeek is based in China and comes with attendant geopolitical awareness and security concerns about the user/enterprise's data flowing into and out of the model, it's likely that OpenAI will remain the preferred API for some security-focused customers and enterprises in the U.S. and Europe. Developers can also adjust the reasoning effort level (low, medium, high) based on their application needs, allowing for more control over latency and accuracy trade-offs. On safety, OpenAI says it used something called "deliberative alignment" with o3-mini. This means the model was asked to reason about the human-authored safety guidelines it was given, understand more of their intent and the harms they are designed to prevent, and come up with its own ways of ensuring those harms are prevented. OpenAI says it allows the model to be less censorious when discussing sensitive topics while also preserving safety. OpenAI says the model outperforms GPT-4o in handling safety and jailbreak challenges, and that it conducted extensive external safety testing prior to release today. A recent report covered in Wired (where my wife works) showed that DeepSeek succumbed to every jailbreak prompt and attempt out of 50 tested by security researchers, which may give OpenAI o3-mini the edge over DeepSeek R1 in cases where security and safety are paramount. What's next? The launch of o3-mini represents OpenAI's broader effort to make advanced reasoning AI more accessible and cost-effective in the face of more intense competition than ever before from DeepSeek's R1 and others, such as Google, which recently released a free version of its own rival reasoning model Gemini 2 Flash Thinking with an expanded input context of up to 1 million tokens. With its focus on STEM reasoning and affordability, OpenAI aims to expand the reach of AI-driven problem-solving in both consumer and developer applications. But as the company becomes more ambitious than ever in its aims -- recently announcing a $500 billion data center infrastructure project called Stargate with backing from Softbank -- the question remains whether or not its strategy will pay off well enough to justify the multibillions sunken into it by deep-pocketed investors such as Microsoft and other VCs. As open source models increasingly close the gap with OpenAI in performance and outmatch it in cost, will its reportedly superior safety measures, powerful capabilities, easy-to-use API and user-friendly interfaces be enough to maintain customers -- especially in the enterprise -- who may prioritize cost and efficiency over these attributes? We'll be reporting on the developments as they unfold.
[14]
OpenAI launches o3-mini, still more expensive than DeepSeek R1
OpenAI has launched o3-mini, its most cost-efficient AI model today. This model, designed as a small reasoning tool, is available on ChatGPT and through OpenAI's API and is expected to excel in tasks related to science, mathematics, and coding. The company introduced almost a week later than DeepSeek, the Chinese model everybody's talking about recently as they launched DeepSeek R1, which is almost 100x cheaper via API and free on its chat app. As the first of its kind in the company's "small reasoning model" category, o3-mini uses a train-of-thought process to improve task completion accuracy. Testing showed that o3-mini outperforms its predecessor, o1-mini, on multiple math and coding benchmarks, and sometimes even exceeds the full o1 model's performance. Users will have control over the reasoning effort exerted by o3-mini, which can aid in cost management for developers working on applications that do not necessitate complete reasoning. Subscribers to ChatGPT's $200 monthly Pro tier will have unlimited access to o3-mini, while those at the $20 Plus tier will be able to use the model for up to 150 messages per day. Free users will also be able to experiment with o3-mini, though the duration of this access is not specified. Developers utilizing OpenAI's API for new applications with o3-mini will incur costs of $1.10 per million input tokens and $4.40 per million output tokens. The release of o3-mini comes during a significant week for OpenAI, which includes the announcement of ChatGPT Gov, a model tailored for U.S. government agency use. ChatGPT Gov facilitates agency deployment in their own Microsoft Azure cloud environment, allowing agencies to handle security and compliance requirements more efficiently. OpenAI believes this service will streamline the analysis of non-public sensitive data. Since 2024, over 90,000 users from more than 3,500 U.S. government entities have engaged with ChatGPT, sending more than 18 million messages for everyday task support. Notably, the Air Force Research Laboratory uses the tool for administrative tasks, and a pilot program in Pennsylvania has reportedly led to a deduction of approximately 105 minutes per day in routine work. OpenAI has also announced a partnership with the National Laboratories to facilitate research breakthroughs using o3-mini. The agreement involves deploying current and future flagship AI models on Venado, a supercomputer in Los Alamos, developed in collaboration with Nvidia. This facility is intended to drive advancements in materials science, renewable energy, and astrophysics, providing a shared resource for Los Alamos, Lawrence Livermore, and Sandia National Labs. Researchers will explore the model's capabilities in identifying new disease treatments, enhancing national security threat detection, and optimizing the utilization of natural resources. The model will also support Los Alamos' nuclear security program, with specific use cases considered on a case-by-case basis, in coordination with government officials and OpenAI researchers who hold security clearances. Katrina Mulligan, OpenAI's lead for national security policy and partnerships, noted that she joined the company to influence significant national security decisions. The partnership with the National Labs, as announced on January 30, aligns with her goal of contributing to game-changing decisions in scientific fields. The model launch reflects OpenAI's strategic response to the competitive landscape of AI development, particularly in light of the advancements made by rival company DeepSeek. Although let's not forget, Sam Altman, OpenAI CEO first congratulated DeepSeek, and two days later the company accused them of stealing their model.
[15]
OpenAI Just Released o3-mini, Its Most Cost-Efficient Model Yet
OpenAI just released o3-mini, a miniature version of its upcoming flagship AI model. The new model is the company's first "small reasoning model," capable of using a train-of-thought process to complete tasks more accurately. The model's launch, now available both on ChatGPT and through OpenAI's API, caps off a week that also saw the company strengthen its ties with the United States government in the form of announcements about ChatGPT Gov and a partnership with the U.S. National Laboratories. In a blog post today, OpenAI shared that it anticipates o3-mini will be particularly useful for tasks involving science, math, and coding. The company's testing indicates that o3-mini outperforms its predecessor, o1-mini, across several math and coding benchmarks, and in some aspects even outperforms the full o1 model. Like o1, users will be able to determine how much effort o3-mini puts into its reasoning, which could help developers save money when building applications that don't require full effort. Subscribers at ChatGPT's $200 per month Pro tier will get unlimited access to o3-mini, while those who pay $20 for ChatGPT's Plus tier will be allowed 150 messages to o3-mini per day. Free users will also get a chance to try the model, but it's unclear for how long. Developers who want to use OpenAI's API to create new applications with o3-mini will pay $1.10 per one million input tokens and $4.40 per million outputs tokens. (Tokens are grammar elements that have been converted into data that can be processed by an AI model.) The model's launch comes just as OpenAI is wrapping up a big week. On Tuesday, the AI market leader announced ChatGPT Gov, a version of the LLM that's been tailored "to provide U.S. government agencies with an additional way to access OpenAI's frontier models."
[16]
OpenAI's o3-Mini Is a Leaner AI Model that Keeps Pace with DeepSeek
On the heels of DeepSeek R1, the latest model from OpenAI promises more advanced capabilities at a cheaper price. OpenAI is making a smaller, more efficient version of its cleverest artificial intelligence model available for free as it seeks to answer the hype and enthusiasm swirling around a new open-source offering from Chinese AI startup DeepSeek. WIRED previously reported that OpenAI was prepping the new model, called o3-mini, for release on January 31. The company's researchers have been working overtime to get it ready for prime time, according to sources who spoke on the condition of anonymity. o3-mini, which OpenAI teased in December, is a smaller version of the model that features the most advanced AI reasoning capabilities of any OpenAI offering to date. The model can break difficult problems into constituent parts in order to figure out how best to solve them. "This powerful and fast model advances the boundaries of what small models can achieve," the company said in a blog post announcing o3-mini's availability. OpenAI is making o3-mini available to all Plus, Team, and Pro users of ChatGPT. Users of the free version of ChatGPT will also be able to try o3-mini but won't be able to send as many queries, the company says. OpenAI has evidently been using PhD students to help train a new model for some time. Several weeks ago, the company began recruiting PhD computer science students at $100 per hour for a "research collaboration" that would "involve working on unreleased models", according to an email viewed by WIRED. OpenAI also appears to have been recruiting PhD students with expertise in other areas through a company called Mercor that it regularly uses to find staff for model training. A recent job posting from Mercor on LinkedIn states: "The overall goal of this project that you may become a part of is to create challenging scientific coding questions designed to test the capabilities of large language models in generating code for solving realistic scientific research problems." The job posting goes on to give an example problem that is strikingly similar to a problem in a benchmark called SciCode that is designed to test a large language models' ability to solve complex science problems. The news comes as DeepSeek's R1 continues to roil the US tech industry. The fact that such a powerful model could be released for free puts pressure on Google and Anthropic to lower their prices. OpenAI is particularly eager to demonstrate that it remains at the forefront of developing and commercializing AI, according to sources inside the company. DeepSeek's freely available model incorporates innovations that made it more efficient to both train and serve. The company appears to have developed it using far fewer resources than OpenAI and other US companies currently building frontier AI models, although the precise details of DeepSeek's expenditure remain unknown. OpenAI says it believes R1 may have incorporated the output from its models into its training. OpenAI's newest model may not outshine R1 in terms of price, but it shows that the company will make efficiency part of its focus going forward. OpenAI also says that the model is especially strong in math, science, and coding. The company says that the latest model will also incorporate new features, including the ability to tap into web searches, call functions from a user's code, and toggle between different reasoning levels that trade off speed for problem solving capabilities. DeepSeek's sudden rise has also raised questions about the US government strategy to curb China's rise in AI. The past two US administrations have introduced a number of sanctions to curb China's ability to access the most advanced Nvidia chips typically used to build cutting-edge AI models. DeepSeek described several types of Nvidia chips in its research but it remains unclear what exactly was used.
[17]
OpenAI responds to the DeepSeek buzz by launching its latest o3-mini reasoning model for all users
As promised last week, OpenAI has now launched its latest o3-mini AI model to users on all ChatGPT plans, including the free tier. The new model brings with it improved reasoning capabilities, especially in math, coding, and science. The o3-mini release "advances the boundaries of what small models can achieve", OpenAI says, and it apparently responds 24% faster than the o1-mini model it's replacing. As per external testers, o3-mini answers are preferable to o1-mini answers 56% of the time, and include 39% fewer mistakes. As with o1-mini, this reasoning AI model will show its workings above its responses - so you can check the 'thought' processes involved. You can also combine this reasoning with web searches if needed, though this integration is still in its early stages. Of course, the release comes after a tumultuous week in AI, in which the models offered for free by China's DeepSeek have attracted millions of users with their speed and accuracy - and now OpenAI is trying to grab back some of the limelight (and traffic). Free users can get at o3-mini by clicking the Reason button in the text input box. OpenAI hasn't specified what the limits on its use will be, but it's likely to be in line with current restrictions on GPT-4o use - so a handful of queries per hour. For paying users, o3-mini can be selected from the model picker in the top left corner. If you're on a Plus or Team plan, you get 150 queries of o3-mini daily, and if you're on the Pro plan, access is unlimited - for a mere $200 (about £160 / AU$320) per month. Paying ChatGPT users also get access to an o3-mini-high model that applies the same reasoning skills but takes longer to think and respond. It boosts performance even further, if you don't mind waiting a few extra seconds. OpenAI has also highlighted the safety assessments that o3-mini has gone through before launched - it apparently "significantly surpasses" the GPT-4o model when it comes to assessing unsafe use and jailbreak attempts.
[18]
ChatGPT creator OpenAI launches o3-mini: its new 'reasoning' AI model to compete with DeepSeek
OpenAI has just released its new o3-mini, its newest, most cost-efficient model in its reasoning series, to both ChatGPT and API today... just as DeepSeek has come in changing the AI landscape. In a new post on its website, the ChatGPT creator says that its new o3-mini is its most powerful, and fastest model that "advances the boundaries of what small models can achieve, delivering exceptional STEM capabilities -- with particular strenth in science, math, and coding -- all while maintaining the low cost and reduced latency of OpenAI o1-mini". OpenAI explains on its website: "OpenAI o3-mini is our first small reasoning model that supports highly requested developer features including function calling(opens in a new window), Structured Outputs(opens in a new window), and developer messages(opens in a new window), making it production-ready out of the gate. Like OpenAI o1-mini and OpenAI o1-preview, o3-mini will support streaming(opens in a new window). Also, developers can choose between three reasoning effort(opens in a new window) options-low, medium, and high-to optimize for their specific use cases". "This flexibility allows o3-mini to "think harder" when tackling complex challenges or prioritize speed when latency is a concern. o3-mini does not support vision capabilities, so developers should continue using OpenAI o1 for visual reasoning tasks. o3-mini is rolling out in the Chat Completions API, Assistants API, and Batch API starting today to select developers in API usage tiers 3-5(opens in a new window)". After its new o3-mini model, OpenAI promises what's next: "The release of OpenAI o3-mini marks another step in OpenAI's mission to push the boundaries of cost-effective intelligence. By optimizing reasoning for STEM domains while keeping costs low, we're making high-quality AI even more accessible. This model continues our track record of driving down the cost of intelligence-reducing per-token pricing by 95% since launching GPT-4-while maintaining top-tier reasoning capabilities. As AI adoption expands, we remain committed to leading at the frontier, building models that balance intelligence, efficiency, and safety at scale".
[19]
I pitted ChatGPT's new o3-mini reasoning model against DeepSeek-R1, and I was shocked by the results
It's been a rollercoaster week for artificial intelligence, with DeepSeek completely destabilizing the AI market by releasing its R1 reasoning model and not only giving everybody access to it for free (as a chatbot), but also giving developers incredibly cost-effective access to it as an API. DeepSeek was then hit by cyber attacks that temporarily took it offline, but it appears to be up and running again. To cap the week off, OpenAI responded by releasing its o3-mini and o3-mini-high reasoning models across all its subscription services, including its Plus and Pro subscriptions and its free tier. To use o3-mini on the free tier of ChatGPT, on mobile or the web, is simple. You'll need to update your ChatGPT app on mobile first, then tap the new Reason button next to search in the Message box. It works exactly the same way in the web browser version of ChatGPT. You'll find a new Reason button where you type your text prompt: Reasoning models are particularly good at tasks like writing complex code and solving difficult math problems, however, most of us use chatbots to get quick answers to the kind of questions that appear in everyday life. So, I immediately started wondering how the new o3-mini reasoning model would do compared to DeepSeek-R1 since they're both free to access. I immediately set about asking it some tough questions that would require a little bit of thought to answer. The first thing I asked o3-mini was to give me a bit of life advice and (pretending I was 18 years old again) help me choose between starting my career or going to university. That's the sort of question that has a lot of factors that need consideration, so I thought it would be a good place to start. While both LLMs gave me a decent answer, the difference between how they presented it was quite shocking. ChatGPT o3-mini thought about if for a few seconds, giving me a brief insight into its thinking, telling me, "I'm weighing the decision between starting a career now or pursuing further education. Need to gather more details, like goals and specific circumstances, before giving any advice." and "I'm evaluating fields' requirements, considering interests, preferences, finances, career goals, and job market. Mentorship and research are pivotal. Personal context is crucial for an informed decision" before giving me an actual answer that was pretty balanced. DeepSeek, however, completely lifted the lid on its reasoning process, telling me what it was considering at every point. In fact, there was almost too much information! It feels like you're looking into the anxious mind of an over-thinker. "Wait," DeepSeek wonders, "but how do I know what I want? Maybe I should list out the pros and cons". And later, it ponders, "What about passion? Am I excited about a particular field of study, or am I more eager to get into the workforce? If I'm not sure what to study, maybe working for a while could help me figure that out before committing to a degree." And so it goes on. And that is just a small sample of the behind-the-scenes reasoning DeepSeek-R1 provides. Both models gave me a breakdown of the final answer, with bullet points and categories, before hitting a summary. This is how deep reasoning models tend to provide their answers, in contrast to things like ChatGPT 4o, which will just give you a more concise answer. Obviously, I didn't stop there, but the results are the same for most queries I threw at the models. ChatGPT o3-mini is more concise in showing reasoning, and DeepSeek-R1 is more sprawling and verbose. If you really need to see the way the LLM arrived at the answer, then DeepSeek-R1's approach feels like you're getting the full reasoning service, while ChatGPT 03-mini feels like an overview in comparison. I've read reports on how o3-mini can crush DeepSeek-R1 in terms of physics simulations and complex geometric challenges, but for the simple stuff, I think I prefer DeepSeek-R1.
[20]
I tested ChatGPT o3-mini vs DeepSeek R1 vs Qwen 2.5 with 7 prompts -- here's the winner
DeepSeek's R1 model has won over users with its speed, reasoning capabilities, and free access. The model excels in several key areas such as logic inference and reasoning, making it adept at understanding and processing complex information. DeepSeek has proven to be particularly strong in mathematical reasoning and coding tasks, effectively solving complex problems and generating code snippets. With superior multilingual capabilities and high inference efficiency, the model has shown versatility in a wide range of applications. OpenAI's o3-mini model, now available within the free tier of ChatGPT, is a compact, yet powerful AI model designed to excel in advanced reasoning, coding proficiency, and mathematical problem-solving, scoring 96.7% on the American Invitational Mathematics Examination (AIME), surpassing its predecessor, o1. Yet, since Alibaba's Qwen 2.5 launched, it has been a top competitor of both DeepSeek and ChatGPT. Also free for users and also excelling at coding proficiency, multilingual understanding, mathematical reasoning, and extended content processing with efficiency and speed, this chatbot is proving to hold its own within the competitive AI space. So how do these chatbots compare? I put them through a series of the same prompts to test them on everything from advanced reasoning and coding proficiency to problem-solving capabilities. Here's what happened when these free tier models faced off, including the overall winner. 1. Coding challenge Prompt: "Write a Python script that simulates a basic banking system with functionalities to deposit, withdraw, and check balance." o3-mini Provided a solid implementation using a class-based approach and included meaningful error messages while ensuring proper handling of deposits and withdrawals. It also offers a clear explanation of each method and its functionality. Qwen 2.5 offered a well-structured breakdown of how the script works, covering class definition, deposit/withdraw methods, error handling, and user experience. It includes try-except blocks to handle invalid inputs, making it more robust. The script is clean and well-commented, making it easy for beginners to understand. DeepSeek kept the script structured and efficient and introduces an owner name for the account, adding a personal touch. Yet, it lacks input validation (e.g., no try-except handling for non-numeric inputs) and while the explanation is clear, it is not as detailed as Qwen 2.5. Winner: Qwen 2.5 wins for providing a clean, well-structured script with strong error handling, detailed explanations, and intuitive user experience. With a good implementation but slightly less comprehensive with error handling, o3-mini was a close second. 2. Mathematical proof Prompt: "Prove the Pythagorean theorem using a geometric approach." o3-mini delivered an explanation that follows a well-structured, step-by-step approach, making it easy to understand. The explanation is neither overly verbose nor lacking in necessary details. Qwen 2.5 offered a similar approach to o3-mini, using the large square and rearranging triangles while breaking down the steps clearly and methodically. The explanation contains formatting issues and some parts, like the ASCII diagram, are slightly unclear or misaligned, making it harder to visualize. DeepSeek crafted a correct proof that follows a logical structure. Yet it lacks depth in explaining why the approach works. Winner: o3-mini wins for the best combination of clarity, detail and logical flow. Qwen 2.5 is in second place with a solid response but formatting and visualization issues. 3. Scientific explanation Prompt: "Explain the process of photosynthesis in detail." o3-mini provided detailed descriptions of both light-dependent and light-independent reactions with clear breakdowns of each step. The step-by-step progression from capturing light to converting energy into glucose is easy to follow. It breaks down complex processes into digestible segments. Qwen 2.5 provided all the key concepts in photosynthesis with a good step-by-step breakdown of the light-dependent reactions and the Calvin cycle. However, the chatbot placed less emphasis on real-word significance such as climate change, food security and the response feels overly condensed compared to o3-mini's thorough explanation. DeepSeek covered both stages of photosynthesis well and included factors affecting photosynthesis (e.g., light intensity, CO₂ levels, water availability) but lacked technical depth in comparison to the o3-mini's response. Winner: o3-mini wins for best balance of depth, clarity, organization, and accuracy. DeepSeek was a close second for its solid explanation but lacking some finer details. 4. Historical analysis Prompt: "Analyze the causes and effects of the French Revolution." o3-mini crafted a comprehensive and well-structured analysis clearly dividing the causes and effects into distinct sections and provided in-depth explanations for each factor, rather than just listing them. Qwen 2.5 discussed global impact, including Napoleon and later revolutions within its strong explanation and well-organized response. However, the economic consequences could have been explored in more detail. DeepSeek covered key causes well, including social inequality, economic struggles, and Enlightenment ideas, but lacked analytical depth and references to sources. Winner: o3-mini wins for the best balance of depth, clarity, organization, and historical analysis. DeepSeek comes in second place for a solid response but slightly less detailed. 5. Literary critique Prompt: "Provide a critical analysis of Shakespeare's 'Hamlet' focusing on its themes of madness and revenge." o3-mini explored both themes of madness and revenge and how they intertwine rather than treating them as separate topics. It explored Hamlet's psychological struggle, examining whether his madness is feigned or real, which is a central debate in Shakespearean scholarship. Qwen 2.5 offered a very detailed discussion of feigned vs. real madness. Yet, there was some redundancy in explaining revenge, which felt more descriptive than analytical. DeepSeek provided a solid comparison between Hamlet, Laertes, and Fortinbras in their approach to revenge, but the response felt like a well-structured summary rather than a deep analysis. The list-like structure made it feel less like a flowing critical argument. Winner: o3-mini wins again for the best blend of depth, structure, and thematic connection. DeepSeek is second for a strong response, but it was more summary-like and less interwoven. 6. Philosophical discussion Prompt: "Discuss the concept of utilitarianism and its implications in modern ethics." o3-mini clearly outlined the core principles of utilitarianism (consequentialism, hedonistic calculus, impartiality) and discussed their modern applications (policy-making, healthcare, environmental ethics) in greater detail than the other responses. Qwen 2.5 delivered a solid breakdown of act vs. rule utilitarianism and covered business ethics, technology, AI, and medical ethics well. But there was some redundancy and over-explanation in defining utilitarian concepts. DeepSeek covered the core principles well and includes historical context but it failed at exploring critiques as deeply as the other two agents. Additionally, the response lacked a strong thematic connection between theory and real-world issues. Winner: o3-mini delivered the best in-depth response with clarity and connection to modern ethical issues. Qwen 2.5 is in second place for a good explanation but slightly weaker structure and conclusion. 7. Urban planning Prompt: "Design an integrated strategy to optimize urban transportation in a rapidly growing megacity. Your plan should address the following aspects." o3-mini covered all major aspects required to optimize urban transportation with smart references and strong logical flow with clear implementation steps. Qwen 2.5 delivered a well-structured response and covered most essential components with a good use of data-driven decision-making. However it lacked a strong global case study and did not emphasize implementation phases. DeepSeek included in-depth transport electrification plans and had a solid focus on equity and gender safety in transit. However, the chatbot was way too broad in some areas, lacking a strong focus on governance and long-term futureproofing. It also is missing a well-defined policy execution framework from its response. Winner: o3-mini wins for its execution roadmap, innovation, depth, and realism. Qwen 2.5 came in second for a strong but slightly-less structured response. Overall winner: o3-mini ChatGPT's o3-mini emerged as the most well-rounded and consistently high-performing chatbot in this chatbot face-off. Across a diverse range of challenges -- including coding, mathematics, historical analysis, literary critique, philosophical discussions, and problem solving -- o3-mini repeatedly demonstrated superior depth, clarity, organization, and real-world applicability. 03 mini excelled in balancing detail with readability, offering well-structured and insightful responses that blended theoretical understanding with practical implications. While DeepSeek R1 and Qwen 2.5 had their strengths -- DeepSeek often providing structured yet somewhat surface-level responses and Qwen 2.5 showcasing strong coding skills and robust ethical analysis -- neither could match o3-mini's versatility across all tested domains. Notably, Qwen 2.5 edged out o3-mini in the coding challenge due to its well-commented script and error-handling capabilities, and DeepSeek occasionally placed second when it provided a more comprehensive but less nuanced response. Consistently ranking first in five out of the seven challenges, o3-mini proved to be the most balanced AI model for users seeking thoughtful, well-articulated, and logically sound answers. While all three models provide valuable assistance in various tasks, o3-mini currently offers the most polished and reliable experience among these free-tier chatbot options. More from Tom's Guide
[21]
OpenAI just released 03-mini to fight DeepSeek -- the first 'reasoning model' that's free in ChatGPT
This week has felt like the Super Bowl of AI, if you ask me. The excitement of new chatbots and the drama of potentially leaked data has had all the makings of a dramatic showdown of the bots. So it's no surprise that just before this week ends, OpenAI decided to launch its latest artificial intelligence model, o3-mini and make it freely accessible to all users. DeepSeek, who? The o3-mini model is designed to enhance reasoning capabilities, offering faster and more accurate responses in areas such as mathematics, coding, and science. According to OpenAI, o3-mini responds 24% faster than its predecessor, o1-mini, while providing detailed problem-solving steps. The o3-mini model is part of OpenAI's latest advancements in its generative AI technology. Although smaller in scale compared to the flagship GPT-4-turbo model, o3-mini promises faster response times, reduced computational requirements, and the ability to handle simpler queries with ease. Developers can access o3-mini through OpenAI's Chat Completions API, Assistants API, and Batch API. For those seeking more advanced features, a paid version called o3-mini-high delivers higher intelligence responses beneficial for coding, albeit with slight latency. The o3-mini model can be used on the free tier of ChatGPT by selecting the "Reason" feature in ChatGPT, with rate limits similar to GPT-4o. Additionally, OpenAI has increased message limits for Plus and Teams users, while Pro users, at a subscription rate of $200 per month, enjoy unlimited access. The introduction of o3-mini signifies OpenAI's commitment to advancing AI technology and making it accessible to a broader audience. By offering this model for free, OpenAI enables millions of users to benefit from cutting-edge AI capabilities without financial barriers. Of course, this move also reflects the competitive dynamics in the AI industry, as OpenAI offers a rapid response to innovations from rivals like DeepSeek and Alibaba. The release of o3-mini showcases OpenAI's technological advancements and highlights its strategic efforts to maintain a leading position as AI develops at what seems like lightning speed.
[22]
OpenAI releases its new o3-mini reasoning model for free
Reasoning models use a "chain of thought" technique to generate responses, essentially working through a problem presented to the model step by step. Using this method, the model can find mistakes in its process and correct them before giving an answer. This typically results in more thorough and accurate responses, but it also causes the models to pause before answering, sometimes leading to lengthy wait times. OpenAI claims that o3-mini responds 24% faster than o1-mini. These types of models are most effective at solving complex problems, so if you have any PhD-level math problems you're cracking away at, you can try them out. Alternatively, if you've had issues with getting previous models to respond properly to your most advanced prompts, you may want to try out this new reasoning model on them. To try out o3-mini, simply select "Reason" when you start a new prompt on ChatGPT. Although reasoning models possess new capabilities, they come at a cost. OpenAI's o1-mini is 20 times more expensive to run than its equivalent non-reasoning model, GPT-4o mini. The company says its new model, o3-mini, costs 63% less than o1-mini per input token However, at $1.10 per million input tokens, it is still about seven times more expensive to run than GPT-4o mini. This new model is coming right after the DeepSeek release that shook the AI world less than two weeks ago. DeepSeek's new model performs just as well as top OpenAI models, but the Chinese company claims it cost roughly $6 million to train, as opposed to the estimated cost of over $100 million for training OpenAI's GPT-4. (It's worth noting that a lot of people are interrogating this claim.) Additionally, DeepSeek's reasoning model costs $0.55 per million input tokens, half the price of o3-mini, so OpenAI still has a way to go to bring down its costs. It's estimated that reasoning models also have much higher energy costs than other types, given the larger number of computations they require to produce an answer. This new wave of reasoning models present new safety challenges as well. OpenAI used a technique called deliberative alignment to train its o-series models, basically having them reference OpenAI's internal policies at each step of its reasoning to make sure they weren't ignoring any rules. But the company has found that o3-mini, like the o1 model, is significantly better than non-reasoning models at jailbreaking and "challenging safety evaluations" -- essentially, it's much harder to control a reasoning model given its advanced capabilities. o3-mini is the first model to score as "medium risk" on model autonomy, a rating given because it's better than previous models at specific coding tasks -- indicating "greater potential for self-improvement and AI research acceleration," according to OpenAI. That said, the model is still bad at real-world research. If it were better at that, it would be rated as high risk, and OpenAI would restrict the model's release.
[23]
OpenAI makes its o3-mini reasoning model generally available - SiliconANGLE
Word of the launch leaked a few hours earlier. According to Wired, OpenAI brought o3-mini's release date forward in response to R1, the reasoning-optimized LLM that DeepSeek debuted last Monday. The latter algorithm set off a broad selloff in artificial intelligence stocks and raised questions about the cost-efficiency of OpenAI's models. Previewed in December, o3-mini is positioned as a lower-cost version of o3, OpenAI's flagship reasoning-optimized LLM. It's also faster. OpenAI detailed today that o3-mini has latency on par with o1-mini, a less advanced reasoning LLM it debuted last September. The company has made o3-mini available in the free, Plus, Pro and Team editions of ChatGPT. The LLM will roll out to the Enterprise plan next week. In the Plus and Team versions, the rate limit for o3-mini is 150 messages per day, three times the number supported by o1-mini. OpenAI has also made the new model available via several of its application programming interfaces. Developers can use the APIs to integrate o3-mini into their applications. The API version of the LLM is available in three editions with varying output quality: o3-mini-low, o3-mini-medium and o3-mini-high. OpenAI's reasoning-optimized models implement a processing approach called test-time compute. The method boosts the quality of an LLM's prompt responses by increasing the amount of hardware it uses to generate each answer. The entry-level o3-mini-low version of o3-mini requires the least amount of infrastructure and time to answer prompts, while the top-end o3-mini-high is the most hardware-intensive. OpenAI showed how o3-mini compares against o3, its flagship reasoning LLM, in a December demo. In an evaluation that required the two models to solve a set of coding challenges, o3-mini achieved a score of 2,073 while o3 earned 2,727 points. At one point, the former model wrote a web-based chatbot interface for its own API using Python. OpenAI engineers also ran other tests during the December evaluation. They found that o3-mini-high achieved a score of 83.6 out of 100 on a qualifying exam for the U.S. Math Olympiad, trailing o3 by under 16 points. According to updated benchmark results released by OpenAI today, o3-mini-high has since improved its score to 79.6, which hints the company may have upgraded the model since last month's demo. OpenAI introduced its first-generation reasoning LLM, o1, in September. Wired today cited sources as saying that the launch shed light on issues in the company's internal development processes. According to the report, OpenAI deployed o1 on an AI stack that was not designed for commercial use and traded off some "experimental rigor" for speed. The company has also developed a second, more reliable AI stack. OpenAI at one point launched an effort to merge the two technologies but employees reportedly believe the project wasn't "fully realized." During the December demo of o3-mini, OpenAI Chief Executive Officer Sam Altman detailed that the company was planning to partner with external AI safety researchers to test o3-mini prior to its release. The company had earlier relied solely on internal safety testing. Altman added that the company's flagship o3 reasoning LLM will launch "shortly after" o3-mini.
[24]
OpenAI's Newest Reasoning Model Is Rolling Out
OpenAI is officially rolling out its latest model, o3-mini, starting today, Friday, Jan. 31. The company shared the news in a blog post on its website, just over a month after officially announcing the model during its "12 Days of OpenAI." As with each refreshed generative AI model, o3-mini is an improvement over o1-mini -- but not by as much as you might think. OpenAI says the two models perform the same in math, coding, and science, but o3-mini offers quicker answers to user queries -- 24% faster, in A/B testing. According to the company, testers comparing the models found o3-mini produces "more accurate and clear answers, with stronger reasoning abilities." And, with "medium reasoning effort," o3-mini matches o1 in certain reasoning and intelligence evaluations. Like o1-mini, o3-mini is a reasoning model, a type of AI model that "thinks" through answers before responding to them. o3-mini has three different reasoning "efforts" depending on the use case: low, medium, and high. In mathematics testing, for instance, o3-mini's medium and high effort reasoning out erforms o1-mini, while high effort even outperforms o1 (the more powerful version of o1-mini). All three efforts beat o1-mini in PhD-level science questions, but o1 outperforms them all. o3-mini replaces the o1-mini model for all users. OpenAI doesn't explicitly state why you can't use o1-mini going forward, but touts that o3-mini has higher rate limits and lower latency than the previous model. At launch, only ChatGPT Plus, Team, and Pro users can access o3-mini. OpenAI says Enterprise users can access the model in a week. (In addition, Plus and Team users will see their daily rate limits jump from 50 messages on o1-mini to 150 messages.) That said, free users will be able to try o3-mini in a limited capacity, either by choosing the "Reasoning" option in the message composer, or regenerating a response. OpenAI says it's the first time free users have had access to a reasoning model in ChatGPT, which comes one day after Microsoft offered o1's reasoning to Copilot users for free. You can learn more about o3-mini in our post here. But as the model is only rolling out today, we won't know exactly how it performs until real-world testers start to use it.
[25]
OpenAI begins releasing its next generation of reasoning models with o3-mini
Developers can access o3-mini through an API, and can select between three levels of reasoning intensity. The lowest setting, for example, might be best for less difficult problems where speed of response is a factor. ChatGPT Plus, Team, and Pro users can access OpenAI o3-mini starting today, OpenAI says, while enterprise users will get access in a week. The announcement comes at the end of a week in which the Chinese company DeepSeek dominated headlines after releasing a pair of surprisingly powerful and cost-effective AI models called DeepSeek-V3 and DeepSeek-R1. The latter, a reasoning model, scored close to, and sometimes above, OpenAI's o1 in a set of recognized benchmark tests. "We're shifting the entire cost‑intelligence curve," OpenAI researcher Noam Brown said of o3-mini on X. "Model intelligence will continue to go up, and the cost for the same intelligence will continue to go down." He said o3-mini even outperforms the full-sized o1 model in a number of evaluations.
[26]
OpenAI hits back at DeepSeek with o3-mini reasoning model
Over the last week, OpenAI's place atop the AI model hierarchy has been heavily challenged by Chinese model DeepSeek. Today, OpenAI struck back with the public release of o3-mini, its latest simulated reasoning model and the first of its kind the company will offer for free to all users without a subscription. First teased last month, OpenAI brags in today's announcement that o3-mini "advances the boundaries of what small models can achieve." Like September's o1-mini before it, the model has been optimized for STEM functions and shows "particular strength in science, math, and coding" despite lower operating costs and latency than o1-mini, OpenAI says. Users are able to choose from three different "reasoning effort options" when using o3-mini, allowing them to fine-tune a balance between latency and accuracy depending on the task. The lowest of these reasoning levels generally shows accuracy levels comparable to o1-mini in math and coding benchmarks, according to OpenAI, while the highest matches or surpasses the full-fledged o1 model in the same tests. OpenAI says testers reported a 39 percent reduction in "major errors" when using o3-mini, compared to o1-mini, and preferred the o3-mini responses 56 percent of the time. That's despite the medium version of o3-mini offering a 24 percent faster response time than o1-mini on average -- down from 10.16 seconds to 7.7 seconds.
[27]
After DeepSeek R1's Release, ChatGPT Offers Free Trials for o3-mini
Imad is a senior reporter covering Google and internet culture. Hailing from Texas, Imad started his journalism career in 2013 and has amassed bylines with The New York Times, The Washington Post, ESPN, Tom's Guide and Wired, among others. After the release of DeepSeek R1, a Chinese-made freely available and open source "reasoning" AI model, the creators of ChatGPT have fully released o3-mini to all paid tiers, including a trial to free users, the company said in a press release on Friday. OpenAI's o3-mini model is a lighter version of o3, a "reasoning" model that takes longer to develop answers but can deal with more complex information for greater outputs. OpenAI says o3-mini excels in math, science and coding at faster speeds, especially when compared to its top-tier o1 model. The o3-mini model supports function calling for outside data like stock prices and can produce responses in a specific format like JSON or XML. It also supports developer messages and allows for streaming, which is the incremental delivery of results so you can see them faster. Unlike o1, o3-mini doesn't support vision capabilities, meaning you can't have it analyze pictures. The sudden release of o1-mini is likely in response to DeepSeek R1, a reasoning model out of China that vaulted onto the internet last weekend and sent tech stocks spiraling downward when markets opened on Monday. DeepSeek says it found a way to develop a reasoning model that's wildly cheaper and more efficient while also using older Nvidia hardware. DeepSeek hasn't been entirely transparent on how it landed on its training costs, however. Regardless, DeepSeek R1 is available now, and people are using it. It's competitive with OpenAI's o1 model, which is remarkable considering o1 is part of ChatGPT's $200 pro tier. DeepSeek's release completely upended the current paradigm in Silicon Valley, where conventional wisdom said more money and compute were necessary to create the next wave of AI models. It's why Big Tech has been pouring billions into AI development. For a company like Nvidia, which makes the hardware powering the AI revolution, investors were spooked to learn that powerful models could be made using older hardware. Along with the release of o3-mini, OpenAI will increase the usage rates for its more powerful o1-mini model. The o3-mini model can also connect to the internet to find up-to-date information.
[28]
Following the lead of DeepSeek, OpenAI makes its reasoning model free
Reasoning models use a "chain of thought" technique to generate responses, essentially working through a problem presented to the model step by step. Using this method, the model can find mistakes in its process and correct them before giving an answer. This typically results in more thorough and accurate responses, but it also causes the models to pause before answering, sometimes leading to lengthy wait times. OpenAI claims that o3-mini responds 24% faster than o1-mini. These types of models are most effective at solving complex problems, so if you have any PhD-level math problems you're cracking away at, you can try them out. Alternatively, if you've had issues with getting previous models to respond properly to your most advanced prompts, you may want to try out this new reasoning model on them. To try out o3-mini, simply select "Reason" when you start a new prompt on ChatGPT. Although reasoning models possess new capabilities, they come at a cost. OpenAI's o1-mini is 20 times more expensive to run than its equivalent non-reasoning model, GPT-4o mini. The company says its new model, o3-mini, costs 63% less than o1-mini per input token However, at $1.10 per million input tokens, it is still about seven times more expensive to run than GPT-4o mini. It's no coincidence this new model is coming right after the DeepSeek release that shook the AI world less than two weeks ago. DeepSeek's new model performs just as well as top OpenAI models, but the Chinese company claims it cost roughly $6 million to train, as opposed to the estimated cost of over $100 million for training OpenAI's GPT-4. (It's worth noting that a lot of people are interrogating this claim.) Additionally, DeepSeek's reasoning model costs $0.55 per million input tokens, half the price of o3-mini, so OpenAI still has a way to go to bring down its costs. It's estimated that reasoning models also have much higher energy costs than other types, given the larger number of computations they require to produce an answer. This new wave of reasoning models present new safety challenges as well. OpenAI used a technique called deliberative alignment to train its o-series models, basically having them reference OpenAI's internal policies at each step of its reasoning to make sure they weren't ignoring any rules. But the company has found that o3-mini, like the o1 model, is significantly better than non-reasoning models at jailbreaking and "challenging safety evaluations" -- essentially, it's much harder to control a reasoning model given its advanced capabilities. o3-mini is the first model to score as "medium risk" on model autonomy, a rating given because it's better than previous models at specific coding tasks -- indicating "greater potential for self-improvement and AI research acceleration," according to OpenAI. That said, the model is still bad at real-world research. If it were better at that, it would be rated as high risk, and OpenAI would restrict the model's release.
[29]
OpenAI Makes 'o3-mini' Free for All ChatGPT Users; Plus Users Get 'o3-mini-high'
Both models are rolling out today. Free ChatGPT users can select "Reason" in message composer to chat with the o3-mini model. The Chinese AI lab, DeepSeek, recently released its o1-level reasoning model called R1 and made it freely available to all users globally. This unexpected development shocked the US tech stock market and pushed OpenAI to release its frontier AI models. Finally, today, OpenAI has released the powerful 'o3-mini' model for free to all ChatGPT users. Earlier, OpenAI was serving the much smaller "GPT-4o mini" to free ChatGPT users, with some GPT-4o usage as well. Now, all ChatGPT users can access the 'o3-mini' model which was announced in December, last year. The o3 series models were hailed as a breakthrough because o3 at the highest compute setting, cracked the hallowed ARC-AGI benchmark. Talking specifically about the 'o3-mini' model which will be available to free users, its reasoning effort is set to "medium". In AIME 2024 (a competitive math benchmark), o3-mini nearly matches the performance of the much larger o1 model. In Codeforces (a competitive coding benchmark), o3-mini again performs better than o1. In terms of latency too, o3-mini is almost on par with GPT-4o. OpenAI says the o3-mini model delivers exceptional performance in science, math, coding, and reasoning problems. As for ChatGPT Plus users, OpenAI has released the 'o3-mini-high' model which is even more powerful and the most capable AI model out there for coding. o3-mini-high scores much better than o1 in math and coding benchmarks including AIME 2024, Codeforces, and SWE-bench Verified. Only in GPQA Diamond (PhD-level Science Questions), o1 performs better than o3-mini-high. Both o3-mini and o3-mini-high models are rolling out to free and paid ChatGPT users, starting today. Of course, there is a rate limit for free users, but OpenAI has not disclosed it. However, OpenAI has tripled the rate limit for ChatGPT Plus users. If you are a paid subscriber, the rate limit has been increased to 150 messages per day with o3-mini. Note that ChatGPT Pro users who are subscribed to the $200-per-month plan will have unlimited access to both o3-mini and o3-mini-high models. All in all, for the first time, free ChatGPT users will have access to a far superior model, thanks to DeepSeek's entry into the AI race. To use the o3-mini model, just select the "Reason" button in ChatGPT and start asking your questions.
[30]
OpenAI's o3-mini is here and available to all users
OpenAI's latest machine learning mode has arrived. On Friday, the company released o3-mini and it's available to try now. For the first time, OpenAI is making one of its "reasoning" models available to free users of ChatGPT, though a message limit applies. When OpenAI first previewed o3 and o3-mini at the end of last year, CEO Sam Altman said the latter would arrive "around the end of January." Altman gave a more concrete timeline on January 17 when he wrote on X that OpenAI was "planning to ship in a couple of weeks." Now that it's here, it's safe to say o3-mini arrives with a sense of urgency. On January 20, the same day Altman was attending Donald Trump's inauguration, China's DeepSeek quietly released its R1 chain-of-thought model. By January 27, the company's chatbot surpassed ChatGPT as the most-download free app on the US App Store after going viral. The overnight success of DeepSeek wiped $1 trillion of stock market value, and almost certainly left OpenAI blindsided. In the aftermath of last week, OpenAI said it was working with Microsoft to identify two accounts the company claims may have distilled its models. Distillation is the process of transferring the knowledge of an advanced AI system to a smaller, more efficient one. Distillation is not a controversial practice. DeepSeek has used distillation on its own R1 model to train its smaller algorithms; in fact, OpenAI's terms of service allow for distillation as long users don't train competing models on the outputs of the company's AI. OpenAI did not explicitly name DeepSeek. "We know [China]-based companies -- and others -- are constantly trying to distill the models of leading US AI companies," an OpenAI spokesperson told The Guardian recently. However, David Sacks, President Trump's AI advisor, was more direct, claiming there was "substantial evidence" that DeepSeek had "distilled the knowledge out of OpenAI's models."
[31]
OpenAI's o3-mini Arrives With an Unexpected Feature -- Free Access
Starlink's Direct-to-Phone Satellite Tech Is a Bigger Deal Than You Think Less than one week after DeepSeek's "Sputnik moment," OpenAI is releasing its o3-mini reasoning model to the public. This is the first OpenAI reasoning model that is not tied to a subscription -- you can use it for free. Reasoning models are far more accurate than typical LLMs. They utilize a "chain of thought" system to "think" before answering questions and self-correct mistakes. As a result, they're the ideal option for difficult prompts, especially those that involve difficult math. The new o3-mini reasoning model is specially designed for science, programming, engineering, and other math-heavy fields. And, per OpenAI, it can go toe-to-toe with the premiere o1 and o1-mini models. This is important because, at $1.10 per million input tokens, o3-mini is half the cost of o1-mini. It's a leap in efficiency that would be recognized as a huge achievement, if not for DeepSeek. At $1.10 per input token, o3-mini is two-times more expensive than DeepSeek's reasoning model. And, if we're to believe that DeepSeek's model was trained for just $6 million, then o3-mini proves that last year's AI leaders are far less efficient and far more wasteful than they should be. I should also point out that o3-mini wasn't expected this week. OpenAI appears to have fast-tracked its launch in response to DeepSeek's meteoric rise. The fact that free users can access this model is also interesting, although we don't know whether this particular point was influenced by DeepSeek or not -- the DeepSeek model is notable for its open-source licensing. To clarify, o3-mini is not open source, and free users do not have unlimited access. Pro users can tap into the model as much as they want, while Plus and Team users may simply enjoy the model's reduced token cost. To use o3-mini, press the "reason" button before submitting a question to ChatGPT. This option may not be immediately available in the ChatGPT desktop or mobile apps, but it's live on the ChatGPT website. Source: OpenAI
[32]
OpenAI Launches o3-mini Model, Makes It Free After DeepSeek's Launch - Microsoft (NASDAQ:MSFT)
Microsoft Corp.-backed MSFT OpenAI has introduced the o3-mini, a new model in its reasoning series. It will be available on Friday in both ChatGPT and the API. What Happened: The o3-mini model, previewed in December, promises to offer "exceptional capabilities" in science, math, and coding, while maintaining low costs and reduced latency. It is the first small reasoning model from OpenAI to support developer features such as function calling and structured outputs, making it ready for production use. According to OpenAI, the o3-mini will replace the o1-mini in the model picker, offering higher rate limits and lower latency. This makes it a compelling choice for tasks requiring coding, science, technology, engineering, and mathematics (STEM,) and logical problem-solving. The model is rolling out to select developers in API usage tiers 3-5, with broader access for ChatGPT Plus, Team, and Pro users. OpenAI emphasizes that o3-mini is optimized for STEM reasoning. Its medium reasoning effort matches the performance of the o1 model. It delivers faster responses, and evaluations show a 39% reduction in major errors on difficult questions compared to the o1-mini. Why It Matters: The launch of the o3-mini model comes amid OpenAI's significant investments in infrastructure, such as the $100 billion Stargate data center in Texas. Additionally, OpenAI CEO Sam Altman recently addressed the hype surrounding artificial intelligence, urging the public to temper expectations about the development of Artificial General Intelligence (AGI). Altman's comments suggest that while OpenAI is making strides in AI capabilities, the journey toward AGI is still ongoing. Furthermore, the introduction of the o3-mini follows the release of OpenAI's "Operator" AI agent, which was designed to autonomously perform web tasks. This agent, available to ChatGPT Pro subscribers, integrates advanced reasoning and vision capabilities, showcasing OpenAI's continuous innovation in AI technology. Check out more of Benzinga's Consumer Tech coverage by following this link. Read Next: OpenAI Debuts 'Operator' AI Agent That Performs Web Tasks, But Early Error Has Sam Altman Scrambling For A Quick Fix Disclaimer: This content was partially produced with the help of Benzinga Neuro and was reviewed and published by Benzinga editors. Photo courtesy: Shutterstock MSFTMicrosoft Corp$414.10-0.21%Overview Rating:Speculative50%Technicals Analysis660100Financials Analysis400100WatchlistOverviewMarket News and Data brought to you by Benzinga APIs
[33]
OpenAI o3-mini for free, ChatGPT Gov, and ElevenLabs' Series C: This week's AI launches
OpenAI launched its newest reasoning model, o3-mini, in ChatGPT and through its API this week. The o3-mini launch is the first time the AI startup is offering an AI model with reasoning capabilities to ChatGPT's free users, it said. o3-mini has medium reasoning capabilities that match the o1 reasoning model's performance in math, coding, and science, OpenAI said. However, o3-mini has a "significantly lower cost" and "faster responses." Compared to o1-mini, o3-mini responds 24% faster, the startup said. External testers also preferred 03-mini's responses 56% of the time compared to 01-mini, and found 39% fewer mistakes on real-world questions. Compared to o1, o3-mini outperformed the model in coding and other reasoning tasks with less latency and at a lower cost.
[34]
OpenAI to release new artificial intelligence model for free
Move to issue 03-mini model follows sudden arrival of much cheaper Chinese rival DeepSeek's R1 OpenAI is releasing a new artificial intelligence model for free, after the company said it would speed up product releases in response to the emergence of a Chinese rival. The startup behind ChatGPT is issuing the AI, called o3-mini, after the surprise success of a rival product by China's DeepSeek. It will be available without charge - albeit with usage limits - to people who use the free version of OpenAI's chatbot. DeepSeek rattled tech investors in the US with the release of R1, a so-called reasoning model that underpinned the company's eponymous chatbot. News that it had topped Apple's free app store and claims it had been developed at a fraction of the cost wiped $1tn off the tech-heavy Nasdaq index on Monday. OpenAI's chief executive, Sam Altman, reacted to DeepSeek's challenge by pledging to "deliver much better models" and accelerate product releases. He had first announced plans to release o3-mini - a less powerful version of the full o3 model that has yet to be released publicly - on 23 January, days after DeepSeek unveiled R1. "Today's launch marks the first time we're bringing reasoning capabilities to our free users, an important step towards broadening accessibility to advanced AI in service of our mission," said OpenAI. R1, the underlying technology for DeepSeek's chatbot, not only rivalled its OpenAI equivalent in performance but was also developed with fewer resources, according to DeepSeek. This has made investors ask whether US tech firms will continue their dominance of the AI market and generate a return on the multibillion-dollar sums they have invested in AI infrastructure and products. OpenAI said the o3-mini model matched its predecessor, o1, in maths, coding, and science but at a significantly lower cost and with faster responses. Users on ChatGPT's Pro package, which costs $200 a month, will get unlimited access to o3-mini, while users on the cheaper Plus tariff will have higher usage limits than free users. The power of the full o3 model was flagged in the International AI Safety Report published on Tuesday. The study's lead author, Yoshua Bengio, said its capabilities "could have profound implications for AI risks". He said o3's performance in a key abstract reasoning test represented a breakthrough that had stunned experts, including himself. In some tests, o3 outperformed many human experts, he said.
[35]
OpenAI launches o3-mini, its latest AI reasoning model
Sam Altman, OpenAI's chief executive, announced the launch on X, calling it a "smart, fast model" available on both ChatGPT and the API.Microsoft-backed OpenAI on Friday introduced o3-mini, the latest addition to its 'o' family of AI reasoning models. The company first previewed the model in December alongside a more advanced system, o3. Sam Altman, OpenAI's chief executive, announced the launch on X, calling it a "smart, fast model" available on both ChatGPT and API. He added that o3-mini can search the web, explain its reasoning, and is accessible to free-tier users via a "reason" button. The launch comes as OpenAI seeks to counter concerns that it is losing ground in the AI race, particularly to Chinese firms like DeepSeek, which it has accused of intellectual property theft.
Share
Share
Copy Link
OpenAI introduces the o3-mini AI model, offering improved performance, cost-efficiency, and specialized capabilities for STEM tasks, while also presenting some limitations in areas like vision processing and multilingual support.
OpenAI has introduced its latest AI model, the o3-mini, marking a significant advancement in artificial intelligence technology. This new model offers improved performance, cost-efficiency, and specialized capabilities, particularly for STEM-related tasks [1][2][3].
The o3-mini demonstrates notable improvements over its predecessors:
These enhancements make the o3-mini an attractive option for developers and organizations seeking scalable AI solutions without compromising on quality or breaking the bank.
The o3-mini is designed to excel in STEM-focused applications:
This specialization makes the o3-mini particularly valuable for professionals in technical fields who require precision and speed in their work.
The model introduces several features aimed at improving developer productivity:
These features simplify complex tasks and enhance overall efficiency in development processes.
Despite its advancements, the o3-mini has some limitations:
These shortcomings suggest that while the o3-mini is highly effective for specific technical tasks, it may not be the ideal choice for users seeking a more versatile or general-purpose AI model.
When compared to models like DeepSeek R1, the o3-mini shows strengths in certain areas:
However, DeepSeek R1 demonstrates an edge in handling complex coding problems and interpreting context-heavy challenges.
OpenAI has hinted at a potential shift towards open source AI development, suggesting that future models may feature open weights. This move could foster greater collaboration and innovation within the AI community, aligning with broader trends toward transparency and accessibility in AI research [5].
As AI technology continues to evolve, the o3-mini represents a significant step forward in balancing performance, affordability, and specialized capabilities. Its introduction paves the way for more accessible and efficient AI tools, potentially transforming various industries and applications in the near future.
Reference
[1]
[2]
[3]
An in-depth analysis of DeepSeek R1 and OpenAI o3-mini, comparing their performance, capabilities, and cost-effectiveness across various applications in AI and data science.
7 Sources
OpenAI introduces the O1 model, showcasing remarkable problem-solving abilities in mathematics and coding. This advancement signals a significant step towards more capable and versatile artificial intelligence systems.
11 Sources
OpenAI has announced significant updates to its AI models, introducing ChatGPT-4 Turbo and GPT-4 Turbo with Vision. These new models offer enhanced capabilities, improved performance, and expanded context windows, marking a major advancement in AI technology.
4 Sources
OpenAI unveils o3 and o3 Mini models with impressive capabilities in reasoning, coding, and mathematics, sparking debate on progress towards Artificial General Intelligence (AGI).
35 Sources
OpenAI has introduced its new O1 series of AI models, featuring improved performance, safety measures, and specialized capabilities. These models aim to revolutionize AI applications across various industries.
27 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved