Curated by THEOUTPOST
On Fri, 13 Sept, 12:05 AM UTC
11 Sources
[1]
OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes
The Sam Altman-led company made some big promises in its announcement, claiming that its "o1-preview" AI model "performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology." With its new "human-like" ability to "reason," the AI model can tackle even more "complex tasks" and "harder problems," according to the company. But as early testers have already discovered firsthand, it's still miles away from replacing a human scientist or coder. In fact, if recent posts making their rounds on social media are anything to go by, the o1-preview is still often struggling with the absolute basics. For instance, INSA Rennes researcher Mathieu Acher found, it's still repeatedly suggesting illegal chess moves in response to certain puzzles. Tasks as basic as counting also remain elusive. In one example flagged by Meta AI scientist Colin Fraser, Strawberry attempts to take on a rudimentary word puzzle about a farmer transporting sheep across a river -- and accidentally abandons the correct answer in favor of illogical garble at the end. Even entering an prompt OpenAI used in its demo -- a logic puzzle fittingly involving a strawberry -- gave users varying answers. "o1-preview gives the wrong answer to this prompt 75 percent of the time," one user found. In fact, some users are claiming, the model is even still sometimes struggling with one of the most confounding word problems for AI language models: how many times the letter "R" appears in the word "strawberry." In all fairness, OpenAI was clear right from the start that its latest AI is still a work in progress. "As an early model, it doesn't yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images," the company wrote in its announcement. "For many common cases GPT-4o will be more capable in the near term." Thanks to a new "chain of thought" process, o1-public differs significantly from its predecessors like GPT-4o, which powers the company's popular ChatGPT chatbot. Instead of spitting out the first answer it can produce, it takes its time to build out iterative answers before arriving at a conclusion. That can extend its response time significantly. As one user found, the new AI model took 92 seconds to come up with an answer to a word riddle -- before bungling the answer. OpenAI research scientist Noam Brown, who worked on the new model, argued that having it take its time could result in some groundbreaking answers. "OpenAI's o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks," he tweeted. "Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis?" Those lofty conclusions didn't sit well with noted AI critic Gary Marcus. "I really like a lot of your work, but the tweet below rubs me the wrong way," he wrote in response, "because it invites the inference that running versions of o1 for weeks or months might create a new cancer drug (in reality, at best you just get new candidates, but still need to do the clinical work), or create breakthrough batteries (again you aren't going to shortcut the lab work) or prove the Riemann Hypothesis." "This is not realistic," he added. "As you acknowledge o1 is still unreliable even at tic-tac-toe, and in some cases no better than earlier models. Longer processing times are unlikely to reach transcendent reasoning." (To be fair, Brown also conceded that the new model is still flubbing certain answers, including ones as fundamental as tic-tac-toe.) Marcus is tapping into a heated debate surrounding the tremendous hype gripping the AI industry. As companies continue to lock down billions of funding -- OpenAI is looking to raise a whopping $6.5 billion from investors, boosting its already sky-high valuation to $150 billion -- skeptics and tech investors alike are growing uneasy about the amount of money being poured into the tech, nevermind its environmental impact. In short, the company's latest AI still falling for the same old traps isn't exactly confidence-inducing. OpenAI promised that it's only the beginning, though, symbolically naming its model to reset the "counter back to 1" -- which, given it's stumbling right out of the gate, might end up being an appropriate name after all.
[2]
OpenAI's 'Strawberry' Model Sparks Fresh Discussions on AI Capabilities | PYMNTS.com
OpenAI's new "Strawberry" AI model is winning praise from industry observers who praise its reasoning capabilities but note limitations. The company unveiled its latest AI model, dubbed "OpenAI o1" and nicknamed "Strawberry," on Thursday (Sept. 12). The o1 model family, available in o1-preview and o1-mini versions, aims to advance artificial intelligence (AI) problem-solving and reasoning. Scott Dylan, founder of NexaTech Ventures, a venture capital firm focused on AI, called the new model "an exciting leap forward in AI development." He told PYMNTS that "the model's ability to handle complex problems in fields like science, coding, and mathematics by spending more time thinking before responding sets it apart." According to OpenAI, o1-preview ranked in the 89th percentile on competitive programming questions from Codeforces. In mathematics, it scored 83% on an International Mathematics Olympiad qualifying exam, compared to GPT-4o's 13%. Some early users reported mixed experiences. They said o1 doesn't consistently outperform GPT-4o across all metrics. Others criticized slower response times, which OpenAI attributes to more complex processing. OpenAI Product Manager Joanne Jang addressed concerns on social media. "There's a lot of o1 hype on my feed, so I'm worried that it might be setting the wrong expectations," she wrote on X. Jang described o1 as "the first reasoning model that shines in really hard tasks" but cautioned it isn't a "miracle model that does everything better than previous models." One area of interest is whether the model is a step toward artificial general intelligence (AGI), which refers to highly autonomous systems that outperform humans at most economically valuable work. Unlike narrow AI systems designed for specific tasks, AGI would possess human-like general intelligence and adaptability across various domains. "While it's not quite AGI, it's a strong step in that direction," Dylan said. Steve Wilson, CPO at the AI security company Exabeam, told PYMNTS he was impressed by o1's ability to explain its reasoning. "The biggest takeaway from OpenAI's o1 is its ability to explain its reasoning. The new o1 model uses step-by-step reasoning rather than relying solely on 'next token' logic," he said. Wilson provided an example: "I posed a riddle to o1, asking it 'What has 18 legs and catches flies?' It responded: a baseball team. A baseball team has nine players on the field (totaling 18 legs), and they catch 'flies' -- which are fly balls hit by the opposing team." He noted a new feature that shows users how o1 arrives at its conclusions. "This feels like a huge step forward! The concept of explainability has always been a huge topic and a major challenge for applications based on machine learning," Wilson added. Dylan sees significant potential in specific sectors: "Industries such as healthcare, legal tech and scientific research will see the greatest benefits." He elaborated, "In healthcare, the model can help interpret complex genomics or protein data with far greater accuracy; in legal tech, its ability to analyze nuanced legal language could lead to more thorough contract reviews." The slower processing may challenge industries like customer service or real-time data analysis, where speed is essential, Dylan noted. "For tasks requiring precision, like medical diagnostics or complex legal cases, this model could be a game-changer," he said. Wilson underscored the significance of o1's explainability feature. Explainability in AI refers to the ability of a system to provide clear, understandable reasons for its outputs or decisions. This feature lets users see how the AI model arrives at its conclusions, making the decision-making process more transparent. "What's exciting about my initial testing isn't so much that it's going to 'score better on benchmarks' but that it offers a level of 'explainability' that has never been present in production AI/LLM models," he said. Looking ahead, Wilson predicted, "When you start to combine these reasoning models with multi-modal vision models and voice interaction, we're in for a radical shift in the next 12 months." OpenAI credits o1's advancements to a novel reinforcement learning approach. This method teaches the model to spend more time analyzing problems before responding, similar to human reasoning processes. Researchers and developers are now testing o1 to determine its capabilities and limitations. The release has reignited discussions about AI reasoning technologies' current state and future. "The o1 model isn't just an upgrade; it's a shift toward more careful, calculated reasoning in AI, which will likely reshape how we solve real-world problems," Dylan said.
[3]
OpenAI to launch models with 'reasoning' abilities that are 'much like a person'
'Strawberry' models can break down complex problems into smaller logical steps, an area where other AIs stumble OpenAI said on Thursday it was launching its "Strawberry" series of AI models designed to spend more time processing answers to queries in order to solve hard problems. The models are capable of reasoning through complex tasks and can solve more challenging problems than previous models in science, coding and math, the AI firm said in a blog post. OpenAI used the code name Strawberry to refer to the project internally, while it dubbed the models announced on Thursday o1 and o1-mini. The o1 will be available in ChatGPT and its API starting Thursday, the company said. ChatGPT has struggled to recognize that the word "strawberry" contains three instances of the letter R. Noam Brown, a researcher at OpenAI focused on improving reasoning in the company's models, confirmed in a post on X that the models were the same as the Strawberry project. "I'm excited to share with you all the fruit of our effort at OpenAI to create AI models capable of truly general reasoning," Brown wrote. In its blog post, OpenAI said the o1 model scored 83% on the qualifying exam for the International Mathematics Olympiad, compared with 13% for its previous model, GPT-4o. The model also improved performance on competitive programming questions and exceeded human PhD-level accuracy on a benchmark of science problems, the company said. Brown said the models were able to accomplish the scores by incorporating a technique known as "chain-of-thought" reasoning, which involves breaking down complex problems into smaller logical steps. Researchers have noted that AI model performance on complex problems tends to improve when the approach has been used as a prompting technique. OpenAI has now automated this capability so the models can break down problems on their own, without user prompting, the company claimed in its blog post. "We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes," OpenAI said.
[4]
OpenAI launches new 'Strawberry' series of AI models
Microsoft-backed OpenAI said on Thursday it was launching its "Strawberry" series of AI models designed to spend more time processing answers to queries in order to solve hard problems. The models, first reported by Reuters, are capable of reasoning through complex tasks and can solve more challenging problems than previous models in science, coding and math, the AI firm said in a blog post. OpenAI used the code name Strawberry to refer to the project internally, while it dubbed the models announced on Thursday o1 and o1-mini. The o1 will be available in ChatGPT and its API starting Thursday, the company said. OpenAI in talks to raise funds at $150 billion valuation: Report Noam Brown, a researcher at OpenAI focused on improving reasoning in the company's models, confirmed in a post on social media platform X that the models were the same as the Strawberry project. "I'm excited to share with you all the fruit of our effort at OpenAI to create AI models capable of truly general reasoning," Brown wrote. In its blog post, OpenAI said the o1 model scored 83% on the qualifying exam for the International Mathematics Olympiad, compared with 13% for its previous model, GPT-4o. The model also improved performance on competitive programming questions and exceeded human PhD-level accuracy on a benchmark of science problems, the company said. Brown said the models were able to accomplish the scores by incorporating a technique known as "chain-of-thought" reasoning, which involves breaking down complex problems into smaller logical steps. Researchers have noted that AI model performance on complex problems tends to improve when the approach has been used as a prompting technique. OpenAI has now automated this capability so the models can break down problems on their own, without user prompting. "We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes," OpenAI said. Reuters was the first to report OpenAI's work on the reasoning project, then called Q*, in November 2023. It reported in July that the project had come to be known as Strawberry. Published - September 13, 2024 08:11 am IST Read Comments
[5]
OpenAI Launching "Strawberry" Model With "Human-Like Reasoning" as Soon as This Week
ChatGPT maker OpenAI is rumored to be imminently releasing a brand-new AI model, internally dubbed "Strawberry," that has a "human-like" ability to reason. As Bloomberg reports, a person familiar with the project says it could be released as soon as this week. We've seen rumors surrounding an OpenAI model capable of reasoning swirl for many months now. In November, Reuters and The Information reported that the company was working on a shadowy project called Q* -- pronounced Q-Star -- which was alleged to represent a breakthrough in OpenAI's efforts to realize artificial general intelligence, the theoretical point at which an AI could outperform a human. In July, Reuters' sources revealed that the latest model dubbed Strawberry is a new name for Q*. But exactly when the company will publicly release the system -- nevermind whether it'll live up to the sky-high hype -- remains to be seen. The pressure is on as OpenAI is looking to raise a whopping $6.5 billion from investors, boosting its already sky-high valuation to $150 billion. A next-generation AI model could address growing concerns that its releases so far don't represent the technological revolution they promised and that the "AI bubble" is starting to burst. Other AI companies are also said to be working on AI models capable of "reasoning." Earlier this year, Google's AI unit DeepMind claimed that its AlphaProof and AlphaGeometry2 could complete high school-level math problems. AI competitor Anthropic is also looking to upgrade its AI Claude's ability to reason. Instead of giving users an answer straight away like with OpenAI's current ChatGPT chatbot, getting an answer out of Strawberry may take a little longer. That's because it uses a new technique called "chain of thought" prompting, which considers a number of different responses before choosing which it deems the best. OpenAI still has a lot to prove. Could Strawberry really represent a major breakthrough in AI tech? Will it power the next generation of chatbots? What will the experience of using it look like, and how much slower will it be than ChatGPT? Even more importantly, will it still be as error-prone as its predecessors, which still have the tendency to "hallucinate"? It's likely that the Sam Altman-led company will play it safe and slowly roll out its new AI model, so these answers will probably come gradually.
[6]
OpenAI might release its hyped-up Strawberry AI model soon
OpenAI is reportedly close to releasing a new artificial intelligence model that can "think," as it seeks billions in funding. The AI startup is preparing to release Strawberry, its reasoning-focused AI model, as part of ChatGPT, The Information reported, citing unnamed people familiar with the matter. While people told The Information the release could be in the next two weeks, Bloomberg reported that the timing is unclear; the latter outlet cited unnamed sources who said a limited number of users could get access as soon as this week. Despite being part of ChatGPT, Strawberry would be a standalone product, The Information reported, adding that details on how it will be made available to customers aren't clear, but one possibility is for the model to be part of the dropdown menu on ChatGPT that allows customers to pick the model they want to use. Unlike other conversational AI models that immediately respond, Strawberry "reasons" before it responds to a query, which usually takes 10 to 20 seconds, people told The Information. The longer "reasoning" stage reportedly helps the model avoid errors and "know" when it should ask follow-up questions. Strawberry's initial version will reportedly not be multimodal, meaning it will only take in and generate text. The model could be priced differently from ChatGPT, which has a free version and different subscription tiers, The Information reported, adding that it could also have limits on speed and how many queries users can make per hour. In July, OpenAI shared a five-level rating system it developed to track its artificial general intelligence progress with employees. The levels go from the currently available conversational AI (i.e. chatbots), to AI that might someday perform the same amount of work as an organization. While OpenAI executives believe the startup's tech is on this self-defined first level, a spokesperson told Bloomberg the company is close to level two; OpenAI described level two as "Reasoners," or AI that can perform basic problem-solving, and is supposedly on the level of a human with a doctorate degree but no access to tools. OpenAI did not immediately respond to a request for comment.
[7]
OpenAI close to releasing 'Strawberry' Model with reasoning capabilities
The timing is still unclear, but a release to a limited number of users could come as soon as this week, said the person, who asked not to be identified discussing private information. Also Read: Starbucks CEO Brian Niccol says 'cliché' career advice made him successful: 'You absolutely have to...' AI with the ability to reason is considered a major step in the development of the technology -- in this case it means that OpenAI's tools should be able to solve multi-step problems, including complicated math and coding questions. The model's release, which has been rumored for months, comes as OpenAI is looking to raise billions in funding and faces heightened competition in the race to develop ever more sophisticated artificial intelligence systems. OpenAI isn't the only company working on such capabilities; competitors Anthropic and Google have also touted "reasoning" skills with their advanced AI models. OpenAI declined to comment. The experience of using OpenAI's updated AI system will differ somewhat from what people have come to expect with ChatGPT, the company's chatbot. Before responding to a user's prompt, the new software will pause for a matter of seconds while, behind the scenes and invisible to the user, it considers a number of related prompts and then summarizes what appears to be the best response, the person said. This technique is sometimes referred to as "chain of thought" prompting. The Information previously reported some details of how Strawberry would process prompts. Also Read: Ex-Deloitte employee says she is happy to be laid off: 'Felt burnt out, lost myself' This approach could enable the technology to respond more accurately to prompts that currently bedevil ChatGPT and other chatbots. For instance, when asked whether the number 9.11 is larger than 9.9 -- a question that may be simple for a human but isn't always answered correctly even by state-of-the-art AI systems -- the updated model was able to correctly determine that 9.9 is bigger, the person said. During an all-hands meeting in July, OpenAI executives showed off a demonstration of the company's most advanced AI system enhanced with new reasoning capabilities, Bloomberg previously reported. The product was able to answer several word problems that have stumped its models in the past and also solve an advanced chemistry problem. OpenAI has been working to get computers to carry out multi-step actions for some time. In May 2023, for instance, the company released a blog post and an accompanying research paper about its efforts to improve AI systems' abilities to solve math problems. According to the paper, the company trained a model by rewarding it for each correct step in the process toward coming up with an answer to a problem, rather than by just rewarding it for generating an accurate answer. The topic is also something the company is increasingly addressing publicly. Noam Brown, a research scientist at OpenAI, is scheduled to speak about generative AI and multi-step "reasoning agents" at a TED AI event in San Francisco next month, according to the event's website.
[8]
OpenAI launches new series of AI models with 'reasoning' abilities
Microsoft-backed OpenAI has launched its 'Strawberry' series of AI models, designed to solve complex problems by spending more time processing answers. The new models, o1 and o1-mini, excel in science, coding, and math tasks. They use a technique called 'chain-of-thought' reasoning to break down problems into smaller steps.Microsoft-backed OpenAI said on Thursday it was launching its "Strawberry" series of AI models designed to spend more time processing answers to queries in order to solve hard problems. The models, first reported by Reuters, are capable of reasoning through complex tasks and can solve more challenging problems than previous models in science, coding and math, the AI firm said in a blog post. OpenAI used the code name Strawberry to refer to the project internally, while it dubbed the models announced on Thursday o1 and o1-mini. The o1 will be available in ChatGPT and its API starting Thursday, the company said. Noam Brown, a researcher at OpenAI focused on improving reasoning in the company's models, confirmed in a post on social media platform X that the models were the same as the Strawberry project. "I'm excited to share with you all the fruit of our effort at OpenAI to create AI models capable of truly general reasoning," Brown wrote. In its blog post, OpenAI said the o1 model scored 83% on the qualifying exam for the International Mathematics Olympiad, compared with 13% for its previous model, GPT-4o. The model also improved performance on competitive programming questions and exceeded human PhD-level accuracy on a benchmark of science problems, the company said. Brown said the models were able to accomplish the scores by incorporating a technique known as "chain-of-thought" reasoning, which involves breaking down complex problems into smaller logical steps. Researchers have noted that AI model performance on complex problems tends to improve when the approach has been used as a prompting technique. OpenAI has now automated this capability so the models can break down problems on their own, without user prompting. "We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes," OpenAI said. Reuters was the first to report OpenAI's work on the reasoning project, then called Q* , in November 2023. It reported in July that the project had come to be known as Strawberry.
[9]
OpenAI launches 'Strawberry' bots with 'reasoning' abilities
STORY: OpenAI has launched a new series of bots built to answer really tough questions. The AI models are designed to spend more time processing problems before spitting out an answer. First reported by Reuters, the new products are capable of reasoning through complex tasks. OpenAI says that makes them better able to solve challenging problems in science, coding and math. Known within the firm as 'Strawberry', they've officially been branded as o1 and o1-mini. The o1 version will be available in ChatGPT from Thursday. In a blog post, the firm said the new model scored 83% on the qualifying exam for the International Mathematics Olympiad. That compares with just 13% for a previous model. The new bot also exceeded human PhD-level accuracy on some scientific tests. An OpenAI researcher said the system was built on so-called "chain of thought" reasoning, which breaks complex problems down into small steps. The models have refined their abilities through training, and learning from their mistakes.
[10]
OpenAI Nears Release of 'Strawberry' Model, With Reasoning Capabilities
OpenAI is getting closer to releasing a new artificial intelligence model known internally as "Strawberry" that can perform some human-like reasoning tasks, according to a person familiar with the matter. The timing is still unclear, but a release to a limited number of users could come as soon as this week, said the person, who asked not to be identified discussing private information.
[11]
OpenAI Releases Strawberry Reasoning Model to Paying Customers
OpenAI on Thursday released a first version of its highly anticipated Strawberry artificial intelligence that aims to reason through complex problems just as it raises up to $7 billion from Thrive Capital and potentially also from a UAE state-backed investment firm , MGX. The model, officially known as o1-preview, is being made available first to paying ChatGPT users and to developers who
Share
Share
Copy Link
OpenAI has launched its new Strawberry series of AI models, sparking discussions about advancements in AI reasoning and capabilities. The model's introduction has led to both excitement and concerns in the tech community.
OpenAI, the artificial intelligence research laboratory, has recently introduced its latest innovation: the Strawberry series of AI models. This new development has quickly become a focal point in the AI community, generating both excitement and debate about the future of artificial intelligence 1.
The Strawberry model represents a significant leap forward in AI reasoning capabilities. Unlike its predecessors, this new model demonstrates an improved ability to understand context, make logical inferences, and solve complex problems. These advancements have the potential to revolutionize various fields, from scientific research to everyday applications 2.
One of the key improvements in the Strawberry model is its ability to overcome certain limitations observed in earlier AI systems. The model shows enhanced performance in tasks that require nuanced understanding and multi-step reasoning, areas where previous models often struggled 3.
Experts suggest that the Strawberry model could have far-reaching implications across various sectors. From healthcare diagnostics to financial analysis, the model's advanced reasoning capabilities could lead to more accurate predictions and decision-making processes. Additionally, its potential in natural language processing could further bridge the gap between human-machine interactions 4.
While the Strawberry model's capabilities are impressive, its introduction has also reignited discussions about AI ethics and potential risks. Some experts express concerns about the model's decision-making processes and the need for transparency in its operations. Questions about data privacy, bias, and the societal impact of such advanced AI systems remain at the forefront of these debates 5.
The tech industry has responded to the Strawberry model with a mix of enthusiasm and caution. While many see it as a significant step forward in AI development, others emphasize the need for responsible innovation and thorough testing. As OpenAI continues to refine and expand the Strawberry series, the AI community eagerly anticipates further developments and potential applications of this groundbreaking technology 2.
Reference
[4]
OpenAI, the artificial intelligence research laboratory, is reportedly working on a new reasoning technology under the codename 'Strawberry'. This development aims to enhance AI's ability to solve complex problems and could potentially revolutionize the field of artificial intelligence.
11 Sources
11 Sources
OpenAI, the creator of ChatGPT, is reportedly working on a new AI technology codenamed "Strawberry" that aims to enhance reasoning capabilities in artificial intelligence models. This development could potentially revolutionize AI's ability to perform complex tasks and conduct deep research.
13 Sources
13 Sources
OpenAI introduces its latest AI model, O1, codenamed 'Strawberry', showcasing advanced reasoning capabilities and a novel approach to AI response time. This development marks a significant step in AI's evolution towards more thoughtful and accurate problem-solving.
12 Sources
12 Sources
OpenAI is reportedly preparing to launch its highly anticipated AI model, codenamed 'Strawberry', within the next two weeks. This release comes earlier than initially planned and is expected to showcase significant advancements in AI capabilities.
3 Sources
3 Sources
OpenAI is set to release 'Strawberry', a new AI model for ChatGPT, within the next two weeks. This update aims to enhance ChatGPT's reasoning capabilities and text handling, potentially revolutionizing AI interactions.
17 Sources
17 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved