Curated by THEOUTPOST
On Sat, 14 Sept, 4:02 PM UTC
11 Sources
[1]
ChatGPT o1 AI reasoning and thinking explained
OpenAI has introduced two groundbreaking models, ChatGPT o1 Preview and ChatGPT o1 Mini, which represent a significant shift from their previous GPT series. These models are specifically designed to enhance reasoning capabilities through innovative reinforcement learning techniques. In contrast to traditional models that generate a single response, the o1 models perform multiple iterations and produce comprehensive reasoning traces to provide more accurate and reliable answers. However, this approach requires substantial computational resources during both the training and inference stages. The o1 Preview and o1 Mini models are not intended to replace ChatGPT-5. Instead, they are specialized models focused on reasoning and problem-solving tasks. These models heavily rely on reinforcement learning, setting them apart from earlier versions. Their primary strength lies in their ability to break down complex problems into manageable steps, resulting in more precise and logical outcomes. This unique approach enables the o1 models to tackle intricate reasoning tasks with remarkable effectiveness. The training process for the o1 models involves large-scale reinforcement learning algorithms. During both training and inference, the models employ a chain of thought processes, which demands significant computational power. The models generate detailed reasoning traces to support their conclusions, ensuring a high level of accuracy and reliability. This extensive use of computational resources is crucial for the models to handle complex reasoning tasks effectively. One of the key strengths of the o1 models is their ability to break down prompts into detailed steps. They perform multiple passes and engage in backtracking to refine their answers, guaranteeing higher accuracy. This iterative process generates long-form reasoning traces, which provide valuable insights into how the models arrive at their conclusions. By doing so, the models can tackle complex problems with unparalleled precision. Here are a selection of other articles from our extensive library of content you may find of interest on the subject of ChatGPT-o1 : The ChatGPT o1 models excel in tasks that require logical reasoning, such as mathematics and coding. However, they may be less effective in subjective tasks, such as creative writing. To ensure optimal performance, the models are evaluated on maximum test time compute settings. This rigorous evaluation process highlights their strengths in logical reasoning and problem-solving. It is important to note that the ChatGPT o1 models come with a higher computational cost compared to previous models. Users are charged for reasoning tokens, which are not visible in the output but are essential for the models' reasoning processes. However, there is potential for automated routing to optimize cost-efficiency, making these models more accessible for various applications. The o1 models have significant potential for complex problem-solving and planning in AI agents. They can be integrated with future GPT models, enhancing their capabilities and pushing the boundaries of what AI can achieve. The focus is on developing models that can handle intricate reasoning tasks, paving the way for more advanced AI applications in the future. However, there are challenges and considerations that need to be addressed: Addressing these challenges will be crucial for the future success and widespread adoption of advanced reasoning AI models like o1 and o1 mini. The ChatGPT o1 Preview and ChatGPT o1 Mini models represent a significant milestone in the development of AI reasoning capabilities. By using reinforcement learning and extensive reasoning processes, these models offer a new approach to problem-solving. While they come with higher computational costs and some challenges, their potential applications and future integration with other AI models make them a promising development in the field of artificial intelligence. As research and development continue, we can expect to see even more impressive advancements in AI reasoning models, unlocking new possibilities for complex problem-solving and decision-making.
[2]
OpenAI ChatGPT o1 AI model use cases explored
The rapid advancements in Artificial Intelligence (AI) are transforming industries, and OpenAI's latest AI model, ChatGPT 01, is at the forefront of this revolution. With its advanced multi-step reasoning capabilities, ChatGPT 01 is reshaping software development, decision-making, and more. This overview by AI Advantage provides more insights into the practical applications of the latest OpenAI 01 AI models and other recent AI innovations, showcasing their potential to streamline processes and drive innovation. The OpenAI 01 model's multi-step reasoning is a fantastic option in AI capabilities. By breaking down complex problems into manageable steps, this feature enhances the model's utility across various domains: Currently, access to ChatGPT 01 is limited to teams or individuals with a Plus subscription, ensuring that its powerful features are used by those who can fully harness its potential. Both models excel in STEM reasoning but differ in cost, processing speed, and scope of knowledge. Here are a selection of other articles from our extensive library of content you may find of interest on the subject of ChatGPT-o1 : Replit Agent, another notable AI tool, benefits greatly from the OpenAI 01 model's multi-step reasoning. This integration empowers the agent to provide more accurate code suggestions, identify potential issues, and offer solutions, streamlining the development process. Internal tools and applications built using Replit Agent demonstrate its practical utility, such as automating task assignments based on team members' strengths and project requirements, leading to improved efficiency and productivity. Google continues to push the boundaries of AI with innovations like Notebook LM and Illuminate. Notebook LM serves as a research environment where users can manage and interact with multiple sources, while its new feature for generating audio summaries allows researchers to quickly grasp the essence of lengthy documents. Illuminate, on the other hand, focuses on converting academic papers into podcasts, making complex information more accessible to a wider audience. The integration of AI into smartphones has led to enhanced search capabilities in photos and videos, allowing users to easily locate specific content within their media files. However, these advancements also raise privacy concerns. To address this, companies are implementing robust privacy solutions, such as on-device processing and encryption, to safeguard personal information while still benefiting from AI's capabilities. Anthropic Workspaces introduces a new feature for organizing API keys and projects, similar to OpenAI's project feature. This tool simplifies the management of multiple APIs, ensuring that you can easily access and use the necessary resources for your projects. By streamlining this process, Anthropic Workspaces enhances productivity and reduces the risk of errors. AI video generators are making significant strides, with current capabilities allowing for the creation of high-quality video content. These tools can automate various aspects of video production, from scriptwriting to editing, making it easier for creators to produce engaging content. As these technologies continue to evolve, their impact on video production workflows will only grow, with future advancements potentially making them integral to professional-grade video creation. The advancements in AI, particularly with the OpenAI 01 model, are transforming various fields. From enhancing software development and decision-making to improving content creation and privacy solutions, these innovations demonstrate AI's vast potential. As these technologies continue to evolve, their impact on our daily lives and professional environments will only grow, ushering in a new era of efficiency and innovation.
[3]
How to use ChatGPT-o1 Preview for best results
OpenAI's latest AI models that take the form of ChatGPT-o1 Preview and ChatGPT-o1 Mini, are engineered for deep reasoning and complex problem-solving. These innovative AI models stand out from traditional models like GPT-4 due to their unique features and capabilities. By understanding the core functionalities and practical applications of the GPT-o1 series, you can harness their power to tackle intricate challenges and enhance your productivity across various domains. The ChatGPT-o1 series models excel in processing complex problems that require thorough analysis and deep reasoning. While the ChatGPT-o1 Preview model is designed for high-accuracy tasks that demand extensive background knowledge, the ChatGPT-o1 Mini model offers a faster and more cost-effective solution for coding, math, and science tasks where such broad knowledge is less critical. What sets the ChatGPT-o1 series apart from traditional models is their ability to think deeply before responding. By using reasoning tokens, these models can process information more thoroughly compared to models like GPT-4. Although currently in beta and limited to text input, the ChatGPT-o1 series is expected to introduce advanced functionalities like function calling in future updates, further expanding their problem-solving capabilities. To get the best results from the ChatGPT-o1 series models, it's crucial to employ effective prompting techniques: Here are a selection of other articles from our extensive library of content you may find of interest on the subject of ChatGPT-o1 : When choosing between the GPT-o1 Preview and GPT-o1 Mini models, consider the nature of your tasks. The GPT-o1 Preview is ideal for deep reasoning, complex problem-solving, and tasks requiring broad general knowledge and high accuracy, making it suitable for research, detailed analysis, and intricate problem-solving scenarios. On the other hand, the GPT-o1 Mini is best for faster processing of routine tasks, coding, and technical fields that don't require extensive background knowledge, such as basic coding and straightforward mathematical calculations. The GPT-o1 series models have practical applications across various fields, including coding, scientific research, and data analysis. For example, you can use the models for tasks like writing Python functions to sort lists, summarizing key findings from the latest research, or analyzing sales data to identify top-performing products. When structuring your prompts, ensure they are clear and concise to help the model provide accurate and relevant responses. To maximize the capabilities of the GPT-o1 series models, avoid asking for detailed reasoning steps, as this can lead to confusion. Instead, focus on getting direct answers. Additionally, improve the quality of responses by using delimiters and providing only the most relevant context. By following these guidelines and understanding the strengths of OpenAI's ChatGPT-o1 series models, you can effectively use their capabilities for various complex tasks. Whether you are working on coding projects, conducting scientific research, or analyzing data, these models offer powerful tools to enhance your productivity and problem-solving abilities. As you explore the ChatGPT-o1 series models, remember to experiment with different prompting techniques and tailor your approach to the specific requirements of your tasks. By doing so, you can unlock the full potential of these advanced AI models and achieve optimal results in your endeavors. The ChatGPT-o1 series models represent a significant step forward in AI technology, providing users with sophisticated tools for deep reasoning and complex problem-solving. By mastering the tips and tricks outlined in this guide, you can harness the power of these models to tackle challenges, gain valuable insights, and drive innovation in your field.
[4]
How you can try OpenAI's new o1 model for yourself | Digital Trends
Despite months of rumored development, OpenAI's release of its Project Strawberry last week came as something of a surprise, with many analysts believing the model wouldn't be ready for weeks at least, if not later in the fall. Contents What is o1?Is o1-preview available to try?How secure is o1 against bad actors? How do I get access to o1-preview? The new o1-preview model, and its o1-mini counterpart, are already available for use and evaluation, here's how to get access for yourself. Recommended Videos We're releasing a preview of OpenAI o1 -- a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. https://t.co/peKzzKX1bu — OpenAI (@OpenAI) September 12, 2024 What is o1? OpenAI has made no secret of its artificial general intelligence (AGI) aspirations, and Project Strawberry (now known as "o1") is the company's next step toward that goal. It's the first in a new line of "reasoning" models, "designed to spend more time thinking before they respond," per an OpenAI announcement post. That strategy enables the model to, "reason through complex tasks and solve harder problems than previous models in science, coding, and math." The models reportedly reason in a human-like manner, allowing them to "refine their thinking process, try different strategies, and recognize their mistakes," as they gain experience through training. According to OpenAI, o1-preview operates on par with Ph.D. students in physics, chemistry, and biology, and performs similarly on benchmark tests in those subjects. o1 is also adept at coding and math problems, scoring 83% in a International Mathematics Olympiad (IMO) qualifying exam where GPT-4o only scored 13% and reaching the 89th percentile in a Codeforces competition against human opponents. here is o1, a series of our most capable and aligned models yet:https://t.co/yzZGNN8HvD o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. pic.twitter.com/Qs1HoSDOz1 — Sam Altman (@sama) September 12, 2024 o1-mini is a lightweight version of the standard o1-preview model. It reportedly is 80% less expensive to operate than the larger iteration, making it especially capable in coding analysis and generation tasks. Is o1-preview available to try? Yes, the o1-preview models launched on September 12 for ChatGPT Plus and Teams subscribers. Enterprise and Educational users will have access at the start of the following week. How secure is o1 against bad actors? Very, it would seem. OpenAI reportedly developed an entirely new safety training program that leverages the model's increased reasoning capabilities to make it more efficiently adhere to its safety and alignment guidelines. The company notes that in testing, where GPT-4o scored a 22 (out of 100) in resisting jailbreak attempts, the new o1 model scored an 84. How do I get access to o1-preview? As with all new generative AI features, the newly released o1-preview is currently only available to paying subscribers. If you want to try it for yourself, you'll need to pick up a $20/month Plus subscription. Simply click on the Upgrade Plan radio button in the bottom of the left-hand navigation pane and follow the onscreen prompts to enter your payment details. Once your subscription is activated, select either o1-preview or o1-mini from the model picker toggle on the left side of the ChatGPT homepage. Note that access is limited, even for paying users, with a weekly rate limit of 30 messages for o1-preview and 50 messages for o1-mini. OpenAI says it will eventually make o1-mini available for free tier users, though the company has yet to set a date for that roll out.
[5]
OpenAI's new o1 model can solve 83% of International Mathematics Olympiad problems
OpenAI will launch o1, its latest artificial intelligence (AI) model in two weeks. This will mark the debut of a new class of reasoning AI models and comes amid the speculation of the release of "Strawberry" AI. The AI company will also be releasing o1-mini, which is a lighter and more cost-effective version of the new model. This one will be ideal for tasks including coding and problem-solving. Also Read: WhatsApp Business new verified badge and AI tools: All you need to know o1 can solve intricate, multi step problems like math and coding types. It mimics human-like reasoning and explaining its thought process as well along the way. It also promises improved accuracy and a big reduction in hallucinations (Which is when an AI model generates false or misleading information). o1 can be used as a powerful tool for scientific research in physics, chemistry, and engineering, where rigorous reasoning and complex problem-solving are important. o1 uses reinforcement learning, compared to the pattern-mimicking training method used by older AI models. Reinforcement learning is when a system learns through rewards or penalties. Also Read: Income tax notices sent to TCS employees are a 'discrepancy,' ITRs will be reprocessed: Report It also uses a "chain of thought" process, which mimics human cognition by breaking down problems into logical, sequential steps. OpenAI Chief Research Officer Bob McGrew says that o1 outshines its predecessors in math-related tasks. o1 was able to tackle 83% of the problems in the International Mathematics Olympiad while GPT-4o in comparison could only solve 13% of problems correctly. All ChatGPT Plus and Team users already will have access to o1 preview and o1 mini, while Enterprise and Edu users will get access next week. OpenAi plans to make o1-mini available to free users as well, but no date has been announced for this. o1-preview costs $15 per million input tokens and $60 per million output tokens for developers wanting to integrate it into their applications. This is thrice the price of GPT-4o o1 is more expensive and slower to use than its predecessors, while also not being optimised for web browsing and processing files and images. For OpenAI, o1 is a step towards a future where AI can be an autonomous agent capable of making decisions, taking action on the behalf of users, and solving real-world problems, revolutionising various industries from healthcare to engineering.
[6]
ChatGPT Now Shows You Its Thought Process
o1, OpenAI's latest generative AI model, has arrived. The company announced o1-preview and o1-mini on Thursday, marking a departure from the GPT naming scheme. There's good reason for that: OpenAI says that unlike its other models, o1 is designed to spend more time "thinking" through issues before returning results -- and it will also show you how it solved your problem. In OpenAI's announcement, the company says this new "thought process" helps its models try new tactics and think through their mistakes. According the company, o1 performs "similarly to PhD students" in biology, chemistry, and physics. Where GPT-4o solved 13% of the problems on the International Mathematics Olympiad, o1 reportedly solved 83%. The company also emphasized how the models are more effective for coding and programming. That "thinking" means o1 takes longer to respond than previous models. As OpenAI research lead Jerry Tworek tells The Verge, o1 is trained through reinforcement learning. Rather than looking for patterns from a training set, o1 learns through "rewards and penalties." OpenAI is keeping the exact methodology involved vague, but says this new thought model does hallucinate less than previous models -- though it still does hallucinate. There are two versions of o1: o1-preview, which is the fully-powered version of the model, and o1-mini, a lighter version trained on a similar framework. The company is reportedly shipping these models earlier in development, and says that's the reason they don't include standard GPT features like web access and file and image uploading. I admit, I am not a programmer, nor do I have many advanced math problems to solve on a daily basis. That makes it difficult to properly test OpenAI's latest models for their proposed strengths and use cases. What I can appreciate, as a non-technical party, is o1-preview's thought process: When you prompt the new model, it now displays a feedback message as it works through the question. (e.g. "Thinking...") When finished, it displays the results as you'd expect, but with a drop-down menu above. I used OpenAI's suggested prompt of "Is a hot dog a sandwich," its answer was preceded by a message that reads "Thought for 4 seconds." (Its answer, by the way, amounted to three paragraphs of "it depends.") Anyway, when I clicked the "Thought for 4 seconds" drop-down, I got to see the model's reasoning: For this prompt, it broke its process into two parts. The first, "Analyzing the question," reads: "OK, let me see. The question about whether a hot dog is a sandwich involves understanding semantics and considering OpenAI's policies, focusing on accuracy and avoiding personal opinions or disallowed content." The second, "Examining definitions," reads: "I'm thinking through whether a hot dog is a sandwich by looking at definitions and cultural views. This shows the room for debate." I guess that's all the thinking it needed to answer the question. I also asked o1 to weigh in on another controversial matter involving food: Is a taco a sandwich? The model has a lot to say. After thinking for five whole seconds, the AI returned a 364-word response. Its thought process included focusing on definitions, clarifying definitions ("I'm defining a taco by its main ingredients: tortilla, filling, and sauce. This helps in understanding whether it fits the definition of a sandwich."), and examining perspectives ("I'm looking into the classification of tacos and sandwiches, underscoring their culinary distinctions: tacos use tortillas, sandwiches use bread; tacos rest on cultural roots from Mexican cuisine, while sandwiches stem from European influence.") Admitting this is "a topic of debate," it reasoned the answer hinges on definitions from culinary traditions, cultural contexts, and even legal interpretations," weighed "key differences" (specifically, there's no bread in a taco, and while a sandwich involves placing ingredients between pieces of bread, a taco involves placing ingredients onto a tortilla). All things considered, o1 concluded that a taco is not a sandwich, according to "most culinary experts and food enthusiasts" -- even citing a legal case in which a judge ruled that a burrito isn't a sandwich. (Here's the context, if you're interested.) As a followup, I asked o1 if it would classify a taco as a hot dog. After nine seconds, it delivered a definitive answer: "While both tacos and hot dogs involve placing fillings inside a form of bread or bread-like base, they are not the same and belong to different culinary categories." There you have it, internet. You can stop arguing this one. Let's try another. I chose a second OpenAI-suggested prompt: "Generate a 6x6 nonogram puzzle for me to solve, where the solved grid looks like the letter Q." As you might expect from a more demanding request, o1-preview took longer to process this task -- 84 seconds, to be exact. It delivered just such a puzzle, with instructions on how to solve it. Clicking on the drop-down menu, it took 36 individual thought processes as it worked through the prompt. In "Formulating the puzzle," the bot said "I'm thinking through the process of creating a 6x6 nonogram where the solution reveals the letter Q. We need to design the grid, derive clues, and present the puzzle for solving." It then goes on to try to figure out how to incorporate the "tail" of the Q in the image. It decides it must have to adjust the bottom row of its layout in order to add the tail in, before continuing to figure out how to set up the puzzle. It's definitely interesting to scroll through each step o1-preview takes. OpenAI has apparently trained the model to use words and phrases like "OK," "hm," and "I'm curious about" when "thinking," perhaps in an effort to make the model sound more human. (Is that really what we want from AI?) If the request is too simple, however, and takes the model only a couple seconds to solve, it won't show its work. It's very early, so it's tough to know whether o1 represents a significant leap over previous AI models. We'll need to see whether or not this new "thinking" really improves on the usual quirks that clue you into whether or not a piece of text was generated by AI. These new models are available now, but you need to be an eligible user to try them out. That means having a ChatGPT Plus or ChatGPT Team subscription. If you're a ChatGPT Enterprise or ChatGPT Ed user, the models should appear next week. ChatGPT free users will get o1-mini at some point in the future. If you do have one of those subscriptions, you'll be able to select o1-preview and o1-mini from the model drop-down menu when starting a chat. OpenAI says that, at launch, the weekly rate limits are 30 messages for o1-preview and 50 for o1-mini. If you plan to test these models frequently, just keep that in mind before wasting all your messages on day one.
[7]
ChatGPT o1 is the new 'strawberry' model from OpenAI -- 5 prompts to try it out
OpenAI has unveiled its new o1 model, which, while taking a bit longer to respond to queries, is considerably more likely to be accurate and provide significantly more detailed responses than previous models. Formerly known as project Strawberry or Q*, this is a reasoning model that takes a prompt and thoughtfully works through how to solve it step-by-step, rather than generating a response token by token. While not perfect for every task, it excels at math, coding, and problems that demand extended thought and analysis. For instance, it can analyze timesheets and shift data for a large store to devise an optimal working pattern. Currently, the new model is offered in two versions: o1-preview and o1-mini. Somewhat confusingly, it seems that o1-mini is the more powerful model, but with a smaller knowledge base. Reports indicate that o1-preview was trained on an earlier architecture than mini, and the full o1 is deemed too powerful to release without additional security protections and guardrails. This new model will be especially beneficial to researchers and students, as it has demonstrated PhD-level capability in math, mathematics, and other science, technology, and engineering subjects. I've devised a number of prompts to truly test its limits, but with only 30 messages per week, I've had to find ways to maximize each one. That said, OpenAI reset the rate limit to give Plus and Teams users more time to play with the model. It isn't available for free users of ChatGPT. With a new type of model come new approaches to prompting. o1 processes a query by working through the problem and thinking about it until it reaches a solution. Therefore, your best strategy is to be as descriptive as possible, outlining every aspect of what you want to achieve, and then letting the AI handle it. One of my top tips is to use another AI model like GPT-4o or Sonnet 3.5 to refine your basic idea into a workable prompt for o1. This could involve having it outline each step the model needs to take or breaking down the problem into smaller components. In addition to improved performance and accuracy, o1 also boasts a significantly larger output window. This means it's more capable of generating a full report, writing an entire codebase, or providing a detailed response to a complex query compared to other OpenAI models. One of the most impressive things I found when trying o1 was its ability to outline its responses and offer detailed explanations of why it responded the way it did. Here was a prime example of that where it broke down the response section-by-section and gave an explanation. The prompt: "Develop a comprehensive plan to terraform Mars, addressing major challenges such as radiation protection, atmosphere generation, and sustainable resource management. Include estimated timelines and potential technological breakthroughs required." You can view the full Mars Terraform report in a Google Doc. My next experiment was a simple prompt holding a complex problem. I wanted a new form of math that didn't require numbers. But it still had to be functional and the AI had to explain how we could make use of this new math with potential applications. The prompt: "Design an alternative system of mathematics not based on our current numerical system or logic. Explain its fundamental principles, operations, and potential applications." You can read the full detail of "Qualitative Mathematics" in a Google Doc. After two fairly simple prompts, I went more descriptive with the third test. Here I asked it to come up with a new system of government that solves the problems of our current models. The prompt: "Design a new system of government that addresses the major shortcomings of current democratic, autocratic, and other existing systems. Your proposal should consider: Decision-making processes and power structures Representation and participation of citizens Checks and balances to prevent abuse of power Economic model and resource allocation Approach to law-making and enforcement Handling of individual rights and collective responsibilities Methods for adapting to long-term challenges and crises Integration of technology in governance Scalability from local to global levels Evaluate the potential strengths and weaknesses of your proposed system, and discuss how it might be implemented or transitioned to from current forms of government." You can see o1's full explanation of "Dynamic Participatory Governance (DPG)" in a Google Doc. Code is where o1 really shines. Its ability to generate longer outputs, as well as more reasoned and accurate responses allows it to be more thorough in its code generation. What better test than a Mars colony game? Here it has to create resource management functionality, a UI and a fun gameplay element, all from a single prompt. The prompt for this is fairly long and comprehensive, so for brevity I'll include the first line and a summary: "Create a 2D version of Age of Empires set on Mars using Python and Pygame." It goes on to say "The game should include the following elements and specifications," including game window size, color schemes, buildings and gameplay mechanics. Finally, this idea came about after multiple attempts to give it reasoning problems other models couldn't solve -- but the other models kept solving them. I wanted it to come up with a new language, but that seemed a bit generic, so I had it turn emoji into a formal language instead. The prompt: "Assume a scenario where a group of people can only communicate using emoji. It is how they communicate with one another. Using only widely available emoji create an emoji to English dictionary that would allow someone from that group to communicate with someone outside of the group that speaks English as we know it today. It has to be comprehensive enough to be both conversational and technical." You can check out the full Emoji Dictionary and phrase guide in a Google Doc. What I found when first using the two different o1 models is that the biggest issue was coming up with ideas to try. They essentially cause the AI to go away, have a think and come back with a more reasoned response. But they don't have access to any of the features we've come to appreciate from modern AI including web access, memory and data analysis. It is exceptionally good at coding, long-form conceptual work such as the emoji dictionary and problems that require reasoning. One example I saw on X was someone using it to create a work schedule by having it analyze available hours for different employees and required shifts. When OpenAI adds the ability to load data files this will be game-changing in the business space and could be used to organize the family vacation, working out all the different complexities of the trip including timings and schedules. Right now, with only 30 messages per week (I used half in a day), its a fun diversion but for most use cases GPT-4o is more than enough. In fact, GPT-04o mini is more than enough for how the vast majority of people use AI and Apple Intelligence is as good as that model.
[8]
What OpenAI's new o1-preview and o1-mini models mean for developers
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI surprised the world yesterday afternoon by announcing not "Strawberry" as rumored, nor GPT-5, but a new family of "reasoning" large language models (LLMs) called o1 that aims to offer high performance and accuracy on tasks related to science, technology, engineering and math (STEM) fields. OpenAI's two new models are o1-preview and the lower-parameter (less advanced) o1-mini, available now to ChatGPT Plus users as well as developers who use OpenAI's paid application programming interface (API). This way, developers can test them as the backend of existing third-party apps and services, or build new apps and services atop them. The new o1 models use a form of "reasoning," according to OpenAI, and they "try different strategies, recognize mistakes, and are doing the full thinking process," according to Michelle Pokrass, OpenAI's API Tech Lead, who shared some of the thinking behind the development of the models in a video call interview with VentureBeat. "In our tests, these models perform pretty similarly to PhD students on kind of some of the most challenging benchmarks," Pokrass noted. Specifically, the o1 models "perform much better" than the GPT series on "reasoning-related problems," said Nikunj Handa, who works on Product at OpenAI, and also took time to share thoughts about the o1 model family for VentureBeat. Here's what third-party developers should know about the new o1-preview and o1-mini models. Limited to text -- no image or file analysis -- and slower...for now The o1-preview and o1-min models are limited to text inputs and outputs for now, and are therefore unlikely at this time to supplant third-party developers' usage of GPT-4o, OpenAI's last most advanced model, which offers multimodal inputs and outputs including analyzing file attachments and generating imagery. The o1 series models aren't multimodal, according to Pokrass and Handa. The o1 models further aren't yet able to connect to web browsing, meaning no outside knowledge past their training cutoff date (October 2023), although users can of course provide their own knowledge in the form of text inputs for the model to reference and analyze. They're also slower to respond with outputs, taking over a minute -- sometimes even several -- to respond in some cases. However, some developers who received early alpha access over the last weeks and months have reported increased performance on tasks such as coding and drafting legal documents, so using one of them could still be a good option for developers looking to experiment and pay more for increased performance. As OpenAI writes in its API documentation for its new o1-preview and o1-mini reasoning models: "For applications that need image inputs, function calling, or consistently fast response times, the GPT-4o and GPT-4o mini models will continue to be the right choice. However, if you're aiming to develop applications that demand deep reasoning and can accommodate longer response times, the o1 models could be an excellent choice." o1 costs a lot more than other OpenAI models, but o1-mini is a bargain First up, you need to be a heavy user of OpenAI's APIs in order to qualify. The o1-preview and o1-mini models are being made available initially to "Tier 5" users -- that is, those who have spent $1,000 through the API and made payments to the company at least 30 (or more) days ago. OpenAI warns that the new o1 models are previews and limited to 20 requests per minute -- or 20 calls per minute -- compared to other OpenAI models that have higher limits, or are limited by tokens per minute/day. The company also currently doesn't accept "batched" requests as it does for other models at a lower price -- essentially bunching inputs to the API that don't require immediate responses, and are instead analyzed and corresponded responses outputted in 24 hours (or less). The main o1-preview model, which Pokrass says offers much more "world knowledge" of subjects outside of STEM, is the most expensive OpenAI AI model currently offered by a wide margin -- costing $15 per 1 million tokens inputted and $60 per 1 million tokens out ($15/$60) versus $5/$15 for GPT-4o, or a 200%-300% more expensive price for the new full o1-preview model. Yet the o1-mini model is a steal at $3 per 1 million input tokens and $12 per 1 million output tokens, or an 80% cheaper price. "Of course, we will be retreating the pricing over the coming weeks and months to get this to the right spot," said Pokrass. Here's a breakdown of the pricing of OpenAI's various leading models through its API -- data taken from this page. When it comes to the context -- or how many tokens a given LLM can handle in one interaction, input and output -- the o1 series has a limit of 128,000, comparable to GPT-4o and OpenAI's other top models. The o1-preview model can produce a maximum of 32,768 tokens in a single output, or response, while the o1-mini can produce double that number at 65,536. What developers are using OpenAI o1-preview and o1-mini for so far... It's been less than 24 hours since OpenAI released o1-previews and o1-mini, but already some developers are thinking up uses for it and testing it out to see what it does well and doesn't. And, as previously mentioned, OpenAI did "seed" it amongst a select group of early alpha users and testers over the last few weeks and month. Based on that work, here are some of the most interesting uses of the o1-preview and o1-mini models so far: AI influencer and enterprise consultant Allie K. Miller posted a thread on X of various impressive outputs from OpenAI's o1-preview model, including automatically (and much more rapidly than a human) optimizing a human staff's schedules for an organization, asses merger risks, design warehouses for efficiency, even balancing a city's power grid. Creating apps and games quickly OpenAI o1-preview seems to be a direct shot across the bow at Anthropic's Claude family and specifically the Artifacts feature, as it is also a capable and quick way for users to generate their own interactive apps and games, as Ammaar Reshi, Head of Design at AI voice and audio startup ElevenLabs, pointed out on X. Note that he used another software tool, Cursor Composer, to run the model. However, as Anand Sukumaran, CTO of web notification startup Engagespot posted on his X account, GPT-4o still achieves much faster speeds when coding simple programs such as one to display "Hello, World!" Completing requests-for-proposal (RFPs) on its own Contractors, particularly those offering products for government agencies, are all-too familiar with the request-for-proposal (RFP) -- a call out by an agency soliciting contract bids in a standardized format that can be tedious and time consuming to fill out. While specialized and AI-driven software has arisen to help contractors fill out these documents more efficiently, University of Pennsylvania Wharton School of Business Professor Ethan Mollick, a leading AI influencer and early adopter who had access to o1 as part of its alpha testing phase, posted on X that o1 can fill out RFPs on its own -- though of course, it is limited to text and doesn't accept file uploads, so the user would need to copy and paste the text version of the RFP into o1's context window in ChatGPT or through another app. Strategizing engagement and growth hacking Ruben Hassid, founder of EasyGen, a Chrome app for automatically generating LinkedIn posts, posted a demo video on X showing how o1-preview was able to generate a comprehensive and well-reasoned plan for using Reddit to help grow his company. "I can't believe the length of the answers. There is no way an LLM is capable of this much strategizing," he wrote. Where to get access to OpenAI o1-preview and o1-mini? Developers can of course access the new OpenAI o1 models through the company's public API, as well as through Microsoft Azure OpenAI Service, Azure AI Studio, and GitHub Models. While clearly not right for all (or potentially even most) developers, the o1 family's debut makes for an exciting time for those with room to experiment and looking to build new apps and services. OpenAI has also committed to continuing to develop both the capabilities of the o1 family and its GPT series, so there is no shortage of options for those looking to build atop the leading AI company's platforms.
[9]
Here are 9 things you need to know about OpenAI's o1 model
OpenAI has announced a much-anticipated new family of AI models that can solve difficult reasoning and math questions better than previous large language models. On Thursday, it launched a "preview" version of two of these models, called o1-preview and o1-mini, to some of its paying users. AI with improved reasoning and math skills could help chemists, physicists, and engineers work out answers to complex problems, which might help them create new products. It could also help investors calculate options trading strategies or financial planners work through how to construct specific portfolios that better trade off risks and rewards. Better reasoning, planning, and problem solving skills are also essential as tech companies try to build AI agents that can perform sophisticated tasks, such as writing entire computer programs or finding information on the web, importing it into a spreadsheet, and then performing analysis of that data and writing a report summarizing its findings. OpenAI published impressive benchmark results for the o1 models. On questions from the AIME mathematics competition, which is geared towards challenging high school students, o1 got 83.3% of the questions correct compared to just 13.4% for GPT-4o. On a different assessment, o1 answered 78% of PhD-level science questions accurately, compared to 56.1% for GPT-4o and 69.7% for human experts. The o1 model is also significantly less likely to hallucinate -- or to confidently provide plausible but inaccurate answers -- than the company's previous models, according to test results published by OpenAI. It is also harder to "jailbreak," or prompt the model into jumping safety guardrails the company has tried to get the model to adhere to when providing responses. In tests users have conducted in the hours since o1-preview became widely available the model does seem able to correctly answer many questions that befuddled previous models, including OpenAI's most powerful models, such as GPT-4 and GPT-4o. But o1-preview is still tripped up by some riddles and in OpenAI's own assessments, it sometimes failed at seemingly simple tasks, such as tic-tac-toe (although in my own experiments, o1-preview was much improved over GPT-4o in its tic-tac-toe skills.) This may indicate significant limits to the "reasoning" o1 exhibits. And when it came to language tasks, such as writing and editing, human evaluators OpenAI employed tended to find GPT-4o produced preferable responses to the o1 models. The o1 model also takes significantly longer to produce its responses than GPT-4o. In tests OpenAI published, its o1-preview model could take more than 30 seconds to answer a question that its GPT-4o model answered in three. The o1 models are also not yet fully integrated into ChatGPT. A user needs to decide if they want their prompt handled by o1-preview or by GPT-4o, and the model itself cannot decide whether the question requires the slower, step-by-step reasoning process o1 affords or if GPT-4, or even GPT-3, will suffice. In addition, the o1 model only works on text and unlike other AI models cannot handle image, audio, or video inputs and outputs. OpenAI has made its o1-preview and o1-mini models available to all subscribers to its premium ChatGPT Plus and ChatGPT Teams products as well as its top tier of developers who use its enterprise-focused application programming interface (API). Here are 9 things to know about the o1 models: 1. This is not AGI. The stated mission of OpenAI, Google DeepMind, more recently Meta, and a few other AI startups, such as Anthropic, is the achievement of artificial general intelligence. That is usually defined as a single AI system that can perform cognitive tasks as well or better than humans. While o1-preview is much more capable at reasoning tasks, its limitations and failures still show that the system is far from the kind of intelligence humans exhibit. 2. o1 puts pressure on Google, Meta, and others to respond, but is unlikely to significantly alter the competitive landscape. At a time when foundation model capabilities had been looking increasingly commoditized, o1 gives OpenAI a temporary advantage over its rivals. But this is likely to be very short-lived. Google has publicly stated it's working on models that, like o1, offer advanced reasoning and planning capabilities. Its Google DeepMind research unit has some of the world's top experts in reinforcement learning, one of the methods that we know has been used to train o1. It's likely that o1 will compel Google to accelerate its timelines for releasing these models. Meta and Anthropic also have the expertise and resources to quickly create models that match o1's capabilities and they will likely roll these out in the coming months too. 3. We don't know exactly how o1 works. While OpenAI has published a lot of information about o1's performance, it has said relatively little about exactly how o1 works or what it was trained on. We know that the model combines several different AI techniques. We know that it uses a large language model that performs "chain of thought" reasoning, where the model must work out an answer through a series of sequential steps. We also know that the model uses reinforcement learning, where an AI system discovers successful strategies for performing a task through a process of trial and error. Some of the errors both OpenAI and users have documented so far with o1-preview are telling: They would seem to indicate that what the model does is to search through several different "chain of thought" pathways that an LLM generates and then pick the one that seems most likely to be judged correct by the user. The model also seems to perform some steps in which it may check its own answers to reduce hallucinations and to enforce AI safety guardrails. But we don't really know. We also don't know what data OpenAI used to train o1. 4. Using o1-preview won't be cheap. While ChatGPT Plus users are currently getting access to o1-preview at no additional cost beyond their $20 monthly subscription fee, their usage is capped at a certain number of queries per day. Corporate customers typically pay to use OpenAI's models based on the number of tokens -- which are words or parts of words -- that a large language model uses in generating an answer. For o1-preview, OpenAI has said it is charging these customers $15 per 1 million input tokens and $60 per 1 million output tokens. That compares to $5 per 1 million input tokens and $15 per 1 million output tokens for GPT-4o, OpenAI's most powerful general LLM model. What's more, the chain of thought reasoning o1 engages in requires the LLM portion of the model to generate many more tokens than a straightforward LLM answer. That means o1 may be even more expensive to use than those headline comparisons to GPT-4o imply. In reality, companies will likely be reluctant to use o1 except in rare circumstances when the model's additional reasoning abilities are essential and the use case can justify the added expense. 5. Customers may balk at OpenAI's decision to hide o1's "chain of thought" While OpenAI said that o1's chain of thought reasoning allows its own engineers to better assess the quality of the model's answers and potentially debug the model, it had decided not to let users see the chain of thought. It has done so for what it says are both safety and competitive reasons. Revealing the chain of thought might help people figure out ways to better jailbreak the model. But more importantly, letting users see the chain of thought would allow competitors to potentially use that data to train their own AI models to mimic o1's responses. Hiding the chain of thought, however, might present issues for OpenAI's enterprise customers who might be in the position of having to pay for tokens without a way to verify that OpenAI is billing them accurately. Customers might also object the inability to use the chain of thought outputs to refine their prompting strategies to be more efficient, improve results, or to avoid errors. 6. OpenAI says its o1 shows new "scaling laws" that apply to inference not just training. AI researchers have been discussing OpenAI's publication with o1 of a new set of "scaling laws" that seem to show a direct correlation between the amount of time o1 is allowed to spend "thinking" about a question -- searching possible answers and logic strategies -- and its overall accuracy. The longer o1 had to produce an answer, the more accurate its answers became. Before, the paradigm was that model size, in terms of the number of parameters, and the amount data a model was fed during training essentially determined performance. More parameters equaled better performance, or similar performance could be achieved with a smaller model trained for longer on more data. But once trained, the idea is to run inference -- when a trained model produces an answer to a specific input -- as quickly as possible. The new o1 "scaling laws" upend this logic, indicating that with models designed like o1, there is an advantage to applying additional computing resources at inference time too. The more time the model is given to search for the best possible answer, the more likely it will be to come up with more accurate results. This has implications for how much computing power companies will need to secure if they want to take advantage of the reasoning abilities of models like o1 and for how much it will cost, in both energy and money, to run these models. It points to the need to run models for longer, potentially using much more inference compute, than before. 7. o1 could help create powerful AI agents -- but carry some risks. In a video, OpenAI spotlighted its work with AI startup Cognition, which got early access to o1 and used it to help augment the capabilities of its coding assistant Devin. In the example in the video, Cognition CEO Scott Wu asked Devin to create a system to analyze the sentiment of posts on social media using some off-the-shelf machine learning tools. When it couldn't read the post correctly from a web browser, Devin, using o1's reasoning abilities, found a work around by accessing the content directly from the social media company's API. This was a great example of autonomous problem-solving. But it also is a little bit scary. Devin didn't come back and ask the user if it was okay to solve the problem in this way. It just did it. In its safety report on o1, OpenAI itself said it found instances where the model engaged in "reward hacking" -- which is essentially when a model cheats, finding a way to achieve a goal that is not what the user intended. In one cybersecurity exercise, o1 failed in its initial efforts to gain network information from a particular target -- which was the point of the exercise -- but found a way to get the same information from elsewhere on the network. This would seem to indicate that o1 could power a class of very capable AI agents, but that companies will need figure out how to ensure those agents don't take unintended actions in the pursuit of goals that could pose ethical, legal, or financial risks. 8. OpenAI says o1 is safer in many ways, but presents a "medium risk" of assisting a biological attack. OpenAI published the results of numerous tests that indicate that in many ways o1 is a safer model than its earlier GPT models. It's harder to jailbreak and less likely to produce toxic, biased, or discriminatory answers. Interestingly, despite improved coding abilities, OpenAI said that in its evaluations neither o1 nor o1-mini presented a significantly enhanced risk of helping someone carry out a sophisticated cyberattack compared to GPT-4. But AI Safety and national security experts were buzzing last night about several aspects of OpenAI's safety evaluations. The one that created the most alarm was OpenAI's decision to classify its own model as presenting a "medium risk" of aiding a person in taking the steps needed to carry out a biological attack. OpenAI has said it will only release models that it classifies as presenting a "medium risk" or less, so many researchers are scrutinizing the information OpenAI has published about its process for making this determination to see if it seems reasonable or whether OpenAI graded itself too leniently in order to be able to still release the model. 9. AI Safety experts are worried about o1 for other reasons too. OpenAI also graded o1 as presenting a "medium risk" on a category of dangers the company called "persuasion," which judges how easily the model can convince people to change their views or take actions recommended by the model. This persuasive power could be dangerous in the wrong hands. It would also be dangerous if some future powerful AI model developed intentions of its own and then could persuade people to carry out tasks and actions on its behalf. At least that danger doesn't seem too imminent though. In safety evaluations by both OpenAI and external "red teaming" organizations it hired to evaluate o1, the model did not show any indication of consciousness, sentience, or self-volition. (It did, however, find that o1 gave answers that seemed to imply a greater self-awareness and self-knowledge compared to GPT-4.) AI Safety experts pointed at a few other areas of concern too. In red teaming tests carried out by Apollo Research, a firm that specializes in conducting safety evaluations of advanced AI models, found evidence of what is called "deceptive alignment," where an AI model realizes that in order to be deployed and carry out some secret long-term goal, it should lie to the user about its true intentions and capabilities. AI Safety researchers consider this particularly dangerous since it makes it much more difficult to evaluate a model's safety based solely on its responses.
[10]
First impressions of ChatGPT o1: An AI designed to overthink it | TechCrunch
OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to "think" before they answer. There's been a lot of hype building up to these models, codenamed "strawberry" inside OpenAI. But does strawberry live up to the hype? Sort of. Compared to GPT-4o, the o1 models feel like one step forward and two steps back. ChatGPT o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI's latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that "GPT-4o is still the best option for most prompts," on its help page, and notes elsewhere that GPT o1 struggles at simpler tasks. "It's impressive, but I think the improvement is not very significant," said Ravid Shwartz Ziv, an NYU professor who studies AI models. "It's better at certain problems, but you don't have this across-the-board improvement." For all of these reasons, it's important to use GPT o1 only for the questions it's truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today's AI models are not very good at it. However, o1 is a tentative step in that direction. ChatGPT o1 is unique because it "thinks" before answering, breaking down big problems into small steps, and attempting to identify when it gets one of those steps right or wrong. This "multi-step reasoning" isn't entirely new (researchers have proposed it for years, and You.com uses it for complex queries) but it hasn't been practical until recently. "There's a lot of excitement in the AI community," said Workera CEO and Stanford professor Kian Katanforoosh, who teaches classes on machine learning, in an interview. "If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking, and allow the AI model to walk backwards from big ideas you're trying to work through." ChatGPT o1 is also uniquely pricey. In most models, you pay for input tokens and output tokens. However, ChatGPT o1 adds a hidden process (the small steps the model breaks big problems into), which adds a large amount of compute you never fully see. OpenAI is hiding some details of this process to maintain its competitive advantage. That said, you still get charged for these in the form of "reasoning tokens." This further emphasizes why you need to be careful about using ChatGPT o1, so you don't get charged a ton of tokens for asking where the capital of Nevada is. The idea of an AI model that helps you "walk backwards from big ideas" is powerful though. In practice, the model is pretty good at that. In one example, I asked ChatGPT o1 preview to help my family plan Thanksgiving, a task that could benefit from a little unbiased logic and reasoning. Specifically, I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner for 11 people, and wanted to talk through whether we should consider renting an Airbnb to get access to a third oven. After 12 seconds of "thinking," ChatGPT wrote me out a 750+ word response ultimately telling me that two ovens should be sufficient with some careful strategizing, and will allow my family to save on costs and spend more time together. But it broke down its thinking for me at each step of the way, and explained how it considered all of these external factors, including costs, family time and oven management. ChatGPT o1 told me how to prioritize oven space at house that is hosting the event, which was smart. Oddly, it suggested I consider renting a portable oven for the day. That said, the model performed much better than GPT-4o, which required multiple follow up questions about what exact dishes I was bringing, and then gave me bare bones advice I found less useful. Asking about Thanksgiving dinner may seem silly, but you could see how this tool would be helpful for breaking down complicated tasks. I also asked ChatGPT o1 to help me plan out a busy day at work, where I needed to travel between the airport, multiple in-person meetings in various locations, and my office. It gave me a very detailed plan, but maybe was a little bit much. Sometimes, all the added steps can be a little overwhelming. For a simpler question, ChatGPT o1 does way too much - it doesn't know when to stop overthinking. I asked where you can find cedar trees in America, and it delivered an 800+ word response, outlining every variation of cedar tree in the country, including their scientific name. It even had to consult with OpenAI's policies at some point, for some reason. GPT-4o did a much better job answering this question, delivering me about three sentences explaining you can find the trees all over the country. In some ways, Strawberry was never going to live up to the hype. Reports about OpenAI's reasoning models date back to November 2023, right around the time everyone was looking for an answer about why OpenAI's board ousted Sam Altman. That spun up the rumor mill in the AI world, leaving some to speculate that Strawberry was a form of AGI, the enlightened version of AI that OpenAI aspires to ultimately create. Altman confirmed o1 is not AGI to clear up any doubts, not that you'd be confused after using the thing. The CEO also trimmed expectations around this launch, tweeting that "o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it." The rest of the AI world is coming to terms with a less exciting launch than expected. "The hype sort of grew out of OpenAI's control," said Rohan Pandey a research engineer with the AI startup ReWorkd, which builds web scrapers with OpenAI's models. He's hoping that o1's reasoning ability is good enough to solve a niche set of complicated problems where GPT-4 falls short. That's likely how most people in the industry are viewing ChatGPT o1, but not quite as the revolutionary step forward that GPT-4 represented for the industry. "Everybody is waiting for a step function change for capabilities, and it is unclear that this represents that. I think it's that simple," said Brightwave CEO Mike Conover, who previously co-created Databricks' AI model Dolly, in an interview. The underlying principles used to create o1 go back years. Google used similar techniques in 2016 to create AlphaGo, the first AI system to defeat a world champion of the board game, former Googler and CEO of the venture firm S32, Andy Harrison, points out. AlphaGo trained by playing against itself countless times, essentially self-teaching until it reached superhuman capability. He notes that this brings up an age old debate in the AI world. "Camp one thinks that you can automate workflows through this agentic process. Camp two thinks that if, if you had generalized intelligence and reasoning, you wouldn't need the workflow and, like a human, the AI would just make a judgment," said Harrison in an interview. Harrison says he's in camp one, and that camp two requires you to trust AI to make the right decision. He doesn't think we're there yet. However, others think of o1 as less of a decision maker, and more of a tool to question your thinking on big decisions. Katanforoosh, the Workera CEO, described an example where he was going to interview a data scientist to work at his company. He tells ChatGPT o1 that he only has 30 minutes, and wants to asses a certain number of skills. He can work backwards with the AI model to understand if he's thinking about this correctly, and ChatGPT o1 will understand time constraints and whatnot. The question is whether this helpful tool is worth the hefty price tag. As AI models continue to get cheaper, o1 is one of the first AI models in a long time that we've seen get more expensive.
[11]
First impressions of OpenAI o1: An AI designed to overthink it | TechCrunch
OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to "think" before they answer. There's been a lot of hype building up to these models, codenamed "Strawberry" inside OpenAI. But does Strawberry live up to the hype? Sort of. Compared to GPT-4o, the o1 models feel like one step forward and two steps back. OpenAI o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI's latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that "GPT-4o is still the best option for most prompts" on its help page, and notes elsewhere that o1 struggles at simpler tasks. "It's impressive, but I think the improvement is not very significant," said Ravid Shwartz Ziv, an NYU professor who studies AI models. "It's better at certain problems, but you don't have this across-the-board improvement." For all of these reasons, it's important to use o1 only for the questions it's truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today's AI models are not very good at it. However, o1 is a tentative step in that direction. OpenAI o1 is unique because it "thinks" before answering, breaking down big problems into small steps and attempting to identify when it gets one of those steps right or wrong. This "multi-step reasoning" isn't entirely new (researchers have proposed it for years, and You.com uses it for complex queries), but it hasn't been practical until recently. "There's a lot of excitement in the AI community," said Workera CEO and Stanford adjunct lecturer Kian Katanforoosh, who teaches classes on machine learning, in an interview. "If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backwards from big ideas you're trying to work through." OpenAI o1 is also uniquely pricey. In most models, you pay for input tokens and output tokens. However, o1 adds a hidden process (the small steps the model breaks big problems into), which adds a large amount of compute you never fully see. OpenAI is hiding some details of this process to maintain its competitive advantage. That said, you still get charged for these in the form of "reasoning tokens." This further emphasizes why you need to be careful about using OpenAI o1, so you don't get charged a ton of tokens for asking where the capital of Nevada is. The idea of an AI model that helps you "walk backwards from big ideas" is powerful, though. In practice, the model is pretty good at that. In one example, I asked ChatGPT o1 preview to help my family plan Thanksgiving, a task that could benefit from a little unbiased logic and reasoning. Specifically, I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner for 11 people and wanted to talk through whether we should consider renting an Airbnb to get access to a third oven. After 12 seconds of "thinking," ChatGPT wrote me out a 750+ word response ultimately telling me that two ovens should be sufficient with some careful strategizing, and will allow my family to save on costs and spend more time together. But it broke down its thinking for me at each step of the way and explained how it considered all of these external factors, including costs, family time, and oven management. ChatGPT o1 preview told me how to prioritize oven space at the house that is hosting the event, which was smart. Oddly, it suggested I consider renting a portable oven for the day. That said, the model performed much better than GPT-4o, which required multiple follow-up questions about what exact dishes I was bringing, and then gave me bare-bones advice I found less useful. Asking about Thanksgiving dinner may seem silly, but you could see how this tool would be helpful for breaking down complicated tasks. I also asked o1 to help me plan out a busy day at work, where I needed to travel between the airport, multiple in-person meetings in various locations, and my office. It gave me a very detailed plan, but maybe was a little bit much. Sometimes, all the added steps can be a little overwhelming. For a simpler question, o1 does way too much -- it doesn't know when to stop overthinking. I asked where you can find cedar trees in America, and it delivered an 800+ word response, outlining every variation of cedar tree in the country, including their scientific name. It even had to consult with OpenAI's policies at some point, for some reason. GPT-4o did a much better job answering this question, delivering me about three sentences explaining you can find the trees all over the country. In some ways, Strawberry was never going to live up to the hype. Reports about OpenAI's reasoning models date back to November 2023, right around the time everyone was looking for an answer about why OpenAI's board ousted Sam Altman. That spun up the rumor mill in the AI world, leaving some to speculate that Strawberry was a form of AGI, the enlightened version of AI that OpenAI aspires to ultimately create. Altman confirmed o1 is not AGI to clear up any doubts, not that you'd be confused after using the thing. The CEO also trimmed expectations around this launch, tweeting that "o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it." The rest of the AI world is coming to terms with a less exciting launch than expected. "The hype sort of grew out of OpenAI's control," said Rohan Pandey, a research engineer with the AI startup ReWorkd, which builds web scrapers with OpenAI's models. He's hoping that o1's reasoning ability is good enough to solve a niche set of complicated problems where GPT-4 falls short. That's likely how most people in the industry are viewing o1, but not quite as the revolutionary step forward that GPT-4 represented for the industry. "Everybody is waiting for a step function change for capabilities, and it is unclear that this represents that. I think it's that simple," said Brightwave CEO Mike Conover, who previously co-created Databricks' AI model Dolly, in an interview. The underlying principles used to create o1 go back years. Google used similar techniques in 2016 to create AlphaGo, the first AI system to defeat a world champion of the board game Go, former Googler and CEO of the venture firm S32, Andy Harrison, points out. AlphaGo trained by playing against itself countless times, essentially self-teaching until it reached superhuman capability. He notes that this brings up an age-old debate in the AI world. "Camp one thinks that you can automate workflows through this agentic process. Camp two thinks that if you had generalized intelligence and reasoning, you wouldn't need the workflow and, like a human, the AI would just make a judgment," said Harrison in an interview. Harrison says he's in camp one and that camp two requires you to trust AI to make the right decision. He doesn't think we're there yet. However, others think of o1 as less of a decision-maker and more of a tool to question your thinking on big decisions. Katanforoosh, the Workera CEO, described an example where he was going to interview a data scientist to work at his company. He tells OpenAI o1 that he only has 30 minutes and wants to asses a certain number of skills. He can work backward with the AI model to understand if he's thinking about this correctly, and o1 will understand time constraints and whatnot. The question is whether this helpful tool is worth the hefty price tag. As AI models continue to get cheaper, o1 is one of the first AI models in a long time that we've seen get more expensive.
Share
Share
Copy Link
OpenAI introduces the O1 model, showcasing remarkable problem-solving abilities in mathematics and coding. This advancement signals a significant step towards more capable and versatile artificial intelligence systems.
OpenAI has recently introduced its latest artificial intelligence model, dubbed O1, marking a significant advancement in AI capabilities. This new model demonstrates exceptional problem-solving skills across various domains, particularly in mathematics and coding 1.
One of the most striking features of the O1 model is its ability to solve complex mathematical problems. Reports indicate that O1 can successfully tackle 83% of International Mathematics Olympiad (IMO) problems, a feat that showcases its advanced reasoning capabilities 5. This achievement is particularly noteworthy given the high difficulty level of IMO questions, which are designed to challenge the brightest young mathematicians globally.
Beyond mathematics, the O1 model exhibits remarkable proficiency in coding and general problem-solving tasks. It has demonstrated the ability to write complex code and solve intricate programming challenges with a high degree of accuracy 2. This versatility makes O1 a potentially valuable tool for developers and researchers across various fields of computer science and software engineering.
The O1 model represents a significant improvement over its predecessors, including GPT-4. While specific comparisons are not fully detailed, the ability to handle IMO-level mathematics problems suggests a substantial leap in reasoning and problem-solving capabilities 3.
Currently, access to the O1 model is limited, with OpenAI providing early access to select individuals and organizations for testing and feedback 4. This controlled release allows for thorough evaluation and refinement of the model before a potential wider release.
The development of O1 has significant implications for the future of AI. Its advanced problem-solving abilities could potentially revolutionize fields such as scientific research, engineering, and education. However, it also raises questions about the impact of such powerful AI systems on human roles in these domains.
As with any major advancement in AI, the introduction of the O1 model brings forth important ethical considerations. Issues such as AI safety, potential misuse, and the need for responsible development and deployment of such powerful systems are likely to be at the forefront of discussions surrounding this technology.
The O1 model represents a significant milestone in the evolution of artificial intelligence, showcasing capabilities that blur the lines between human and machine problem-solving abilities. As research and development in this area continue, it will be crucial to balance the potential benefits with careful consideration of the broader implications for society and technology.
Reference
[1]
[2]
[3]
[4]
O1, a new AI model developed by O1.AI, is set to challenge OpenAI's ChatGPT with improved capabilities and a focus on enterprise applications. This development marks a significant step in the evolution of AI technology.
3 Sources
OpenAI introduces the O1 series for ChatGPT, offering free access with limitations. CEO Sam Altman hints at potential AI breakthroughs, including disease cures and self-improving AI capabilities.
5 Sources
OpenAI has introduced its new O1 series of AI models, featuring improved performance, safety measures, and specialized capabilities. These models aim to revolutionize AI applications across various industries.
27 Sources
OpenAI introduces O1 AI models for enterprise and education, competing with Anthropic. The models showcase advancements in AI capabilities and potential applications across various sectors.
3 Sources
OpenAI has announced significant updates to its AI models, introducing ChatGPT-4 Turbo and GPT-4 Turbo with Vision. These new models offer enhanced capabilities, improved performance, and expanded context windows, marking a major advancement in AI technology.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved