The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On September 19, 2024
2 Sources
[1]
OpenAI o1 Likely Uses RL over Chains of Thought to Build System 2 LLMs
OpenAI's o1 could be considered the first successful commercial launch of a System 2 LLM. Recently, OpenAI released two models - OpenAI o1-preview and OpenAI o1-mini - marking a significant leap in the AI world. These models can now reason using chain of thoughts and reasoning tokens. Jim Fan, in a recent post on X, mentioned that o1 models mark a significant shift towards inference-time scaling in AI, emphasising the importance of search and reasoning over mere knowledge accumulation. This approach suggests that effective reasoning can be achieved with smaller models. By implementing techniques like Monte Carlo tree search during inference, the model can explore multiple strategies and scenarios to converge on optimal solutions. The key advantage of using MCTS during inference is that it allows the model to consider many different approaches to a problem, rather than committing to a single strategy early on. Subbarao Kambhampati, professor at Arizona State University, said that OpenAI's o1 model uses reinforcement learning over auto-generated chain of thought -- similar to AlphaGo's self-play approach -- to optimise problem-solving by building a generalised System 2 component atop LLM substrates, albeit without guarantees. "One interesting issue with o1 is that it seems to be significantly less steerable compared to LLMs. For example, it often completely ignores any output formatting instructions making it hard to automatically check its solutions," he added, saying that once you are an approximate reasoner, you might develop the 'don't tell me how to solve the problem; I already have a way I use to solve it' complex. In his 2011 book Thinking, Fast and Slow, Daniel Kahneman coined the term 'System 2 thinking' which refers to complex problem-solving, logical reasoning, and careful decision-making, often involving step-by-step analysis and focused attention. Sounds very similar to what OpenAI has promised with its latest o1 model, which "thinks". Maybe, OpenAI's o1 could be considered the first successful commercial launch of a System 2 LLM and the most important reason for that is the reasoning tokens. These tokens are designed to guide the system to perform step-by-step reasoning. They are created based on the user's prompt and added to the reasoning process. Reasoning tokens in the systems are often notated with single or double-angle brackets for illustrative purposes. OpenAI decided to use English words as reasoning tokens for convenience, such as "Interesting", "First", "Let's test this theory", "Wait", "That seems unlikely", "Alternatively", "So this works", "Perfect", etc. With the use of reasoning tokens, o1 models demonstrated significantly better performance on complex tasks compared to previous models. For example, o1 solved 83% of problems in a qualifying exam for the International Mathematics Olympiad, compared to GPT-4's 13%. When AIM first tapped into ChatGPT's o1, our debut question was, "How many 'R's does 'Strawberry' have?" - and it nailed it. Later, we also asked which was bigger - 9.9 or 9.11 - and it got that right as well. This shows that OpenAI has finally solved Jagged Intelligence. Anatoly Geyfman, the co-founder and CEO of Carevoyance, explained that reasoning tokens are meant to be used for the time a model spends "thinking". It's used to pay for the additional submissions to itself to refine the process of arriving at an answer, or whatever the actual mechanism of action is. "This is important - there is now a way for model builders to monetise the more sophisticated actions of a model beyond "input" and "output" tokens. The reasoning tokens let OpenAI and, I bet, others in the near future release models that aren't so much better trained, but instead, are better at thinking through responses," he added further. A similar approach was mentioned in a paper titled 'Guiding Language Model Reasoning with Planning Tokens', published in July. It proposed adding specialised planning tokens at the beginning of each chain-of-thought step to guide and improve language models' maths reasoning ability. Saurabh Sarkar, the CEO of Phenx Machine Learning Technologies, mentioned that when you try to solve a question like "What is 2 + 2, then multiply the result by 3?" with a traditional approach, it will first calculate 2 + 2, get the result 4, and then multiply 4 by 3 to get 12. Using reasoning tokens, the model anticipates the need to multiply the intermediate result (4) with 3 while still calculating 2 + 2. It "pre-computes" and stores this information, so when it reaches the multiplication step, it already has the necessary data, allowing for faster and more efficient processing. This is how reasoning tokens allow for more thorough and accurate responses to challenging queries. Theo Browne, a popular YouTuber, founder and CEO of Ping Labs, recently posted a video on the reasoning capabilities of o1 models. In response to the popular saying "We have PhD in our pocket," Browne said, "PhD that can't do basic maths" as the o1 models were not able to find all the possible corners of a parallelogram. https://x.com/allgarbled/status/1834344480797057307 (Please embed this in HTML) A Reddit user mentioned that OpenAI is advertising this as some kind of mega-assistant for research scientists and quantum physicists. "I gave it a fairly simple twin paradox time dilation problem, and it failed just as miserably as all the previous versions. It seems like it still has no understanding, just probabilistic word guessing," he added. He suggested that even after using reasoning tokens and taking more time to generate the answer, the model does not give satisfactory results. Another user mentioned that o1 was, in fact, performing worse than ChatGPT 4o. He mentioned that the responses from o1 were wordy, generic and 'safe' and he had to coax it several times to give him the same response that GPT4 provided on the first try. Apart from the reasoning capabilities of reasoning tokens, not showing tokens to API users raised concerns amongst users.
[2]
OpenAI's o1: More Than Meets The Eye
With all the buzz around OpenAI's project "Strawberry," I was eager to try out OpenAI's o1 preview when it launched. At first, it felt like an incremental update. The more I explored, the more I realized this model is a significant step forward and a preview of what is to come. Here's why: I had hoped OpenAI would implement "self-taught reasoning," where models can evaluate and refine their internal processing (something akin to human "thoughts"). While o1 isn't there yet, it combines three key innovations: Deep Reinforcement Learning (Q-learning), "Chain of Thought" (CoT), and a "Tree of Thoughts" approach: OpenAI improved the model's ability to reason through safety protocols, making it much more resistant to jailbreak attempts (efforts to bypass its safeguards). In safety tests, o1 scored 84 out of 100, compared to GPT-4's score of just 22. OpenAI is also working with AI safety institutes in the U.S. and U.K. to further evaluate and refine these capabilities. This improvement will make o1 a strong candidate for future applications where AI agents must operate autonomously while adhering to company policies and regulations. In its current form a lack of tool access that prevent the model preview from taking actions. If you want to do a fun demo, ask Chat GPT 4o how many r's are in the word Strawberry. It may tell you 2. This is because the model represents the word as tokens rather than letter by letter. Ask o1 and you will see it think for a split second and get the answer right. To test both model's capabilities, I asked both ChatGPT 4o and o1 to develop a quantum circuit that solves a Max-Cut optimization problem. o1 clearly outperformed GPT-4, not only delivering a better solution than GPT 4o, but also providing a detailed explanation of its reasoning process. This transparency is crucial for business applications in regulated industries, where explainability is key. The additional accuracy comes at the cost of time - o1 takes longer to generate results. In my case, o1 took 8 seconds more than GPT-4. This makes it unsuitable for real-time applications, but ideal for decision-support systems where detailed reasoning is more important than speed. The model's higher computational demands also translate into a higher price: $15 per 1 million input tokens and $60 per 1 million output tokens, compared to GPT-4o's $5 and $15, respectively. Also, you pay for the tokens it uses in internal "thinking" as well as tokens for input and output. Businesses will need to weigh o1's capabilities against its cost and determine where it fits into their system architecture. At first glance, o1 may seem like a minor update, but it marks a major step forward in AI reasoning. As OpenAI's strategy of steady improvements released incrementally continues, improvements in problem-solving, explainability, and safety lay the groundwork for future breakthroughs. I hope introspection and self-teaching are coming soon. While the higher cost and slower speed are trade-offs, o1 is better for use cases where transparency and accuracy are essential and can justify the extra resources. As you think through what o1 means for your generative and agentic AI aspirations, clients can have a guidance session with me to discuss what this all means in the short and long term and how you plan for the rapid pace of AI progress.
Share
Share
Copy Link
OpenAI's O1 project is making waves in the AI community, potentially using reinforcement learning to develop more advanced language models. This approach could lead to significant improvements in AI capabilities, moving closer to human-like reasoning.
OpenAI, a leading artificial intelligence research laboratory, has recently unveiled its latest project, O1, which is garnering significant attention in the tech world. This innovative endeavor aims to push the boundaries of AI capabilities, potentially bringing us closer to achieving System 2 thinking in language models 1.
According to insights from Analytics India Magazine, O1 is likely leveraging reinforcement learning (RL) techniques rather than relying solely on chains of thought approaches 1. This shift in methodology could prove crucial in developing more advanced and capable AI systems that can mimic human-like reasoning and decision-making processes.
The concept of System 1 and System 2 thinking, popularized by psychologist Daniel Kahneman, distinguishes between quick, intuitive responses (System 1) and more deliberate, logical reasoning (System 2). Current large language models (LLMs) primarily operate within the realm of System 1 thinking. O1's approach aims to bridge this gap, potentially enabling AI to engage in more complex, reasoned thought processes 1.
Forrester's analysis suggests that O1 could represent a significant leap forward in AI capabilities 2. By incorporating reinforcement learning, O1 may be able to develop more sophisticated problem-solving skills, improved decision-making abilities, and enhanced adaptability to new situations. This could potentially lead to AI systems that can handle more complex tasks and provide more nuanced responses.
While the prospects of O1 are exciting, it's important to note that developing System 2-like capabilities in AI is a complex undertaking. Challenges include ensuring the ethical use of more advanced AI systems, addressing potential biases, and maintaining transparency in the decision-making processes of these models 2.
The development of O1 could have far-reaching implications for various industries. From healthcare to finance, more advanced AI systems could revolutionize decision-making processes, data analysis, and problem-solving capabilities. However, it also raises important questions about the future of human-AI interaction and the potential impact on the job market 2.
Reference
[1]
[2]
OpenAI introduces the O1 model, showcasing remarkable problem-solving abilities in mathematics and coding. This advancement signals a significant step towards more capable and versatile artificial intelligence systems.
11 Sources
OpenAI's latest model, O1, represents a significant advancement in AI technology, demonstrating human-like reasoning capabilities. This development could revolutionize various industries and spark new ethical considerations.
3 Sources
OpenAI has released O1, a new AI model that showcases impressive coding abilities and potential for various applications. While it demonstrates significant improvements over previous models, concerns about accessibility and ethical implications have also emerged.
7 Sources
OpenAI has introduced its latest AI model series, O1, featuring enhanced reasoning abilities and specialized variants. While showing promise in various applications, the models also present challenges and limitations.
5 Sources
OpenAI has introduced its new O1 series of AI models, featuring improved performance, safety measures, and specialized capabilities. These models aim to revolutionize AI applications across various industries.
27 Sources