Curated by THEOUTPOST
On Thu, 28 Nov, 8:02 AM UTC
5 Sources
[1]
Alibaba releases Qwen with Questions, an open reasoning model that beats o1-preview
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese e-commerce giant Alibaba has released the latest model in its ever-expanding Qwen family. This one is known as Qwen with Questions (QwQ), and serves as the latest open source competitor to OpenAI's o1 reasoning model. Like other large reasoning models (LRMs), QwQ uses extra compute cycles during inference to review its answers and correct its mistakes, making it more suitable for tasks that require logical reasoning and planning like math and coding. What is Qwen with Questions (OwQ?) and can it be used for commercial purposes? Alibaba has released a 32-billion-parameter version of QwQ with a 32,000-token context. The model is currently in preview, which means a higher-performing version is likely to follow. According to Alibaba's tests, QwQ beats o1-preview on the AIME and MATH benchmarks, which evaluate mathematical problem-solving abilities. It also outperforms o1-mini on GPQA, a benchmark for scientific reasoning. QwQ is inferior to o1 on the LiveCodeBench coding benchmarks but still outperforms other frontier models such as GPT-4o and Claude 3.5 Sonnet. QwQ does not come with an accompanying paper that describes the data or the process used to train the model, which makes it difficult to reproduce the model's results. However, since the model is open, unlike OpenAI o1, its "thinking process" is not hidden and can be used to make sense of how the model reasons when solving problems. Alibaba has also released the model under an Apache 2.0 license, which means it can be used for commercial purposes. 'We discovered something profound' According to a blog post that was published along with the model's release, "Through deep exploration and countless trials, we discovered something profound: when given time to ponder, to question, and to reflect, the model's understanding of mathematics and programming blossoms like a flower opening to the sun... This process of careful reflection and self-questioning leads to remarkable breakthroughs in solving complex problems." This is very similar to what we know about how reasoning models work. By generating more tokens and reviewing their previous responses, the models are more likely to correct potential mistakes. Marco-o1, another reasoning model recently released by Alibaba might also contain hints of how QwQ might be working. Marco-o1 uses Monte Carlo Tree Search (MCTS) and self-reflection at inference time to create different branches of reasoning and choose the best answers. The model was trained on a mixture of chain-of-thought (CoT) examples and synthetic data generated with MCTS algorithms. Alibaba points out that QwQ still has limitations such as mixing languages or getting stuck in circular reasoning loops. The model is available for download on Hugging Face and an online demo can be found on Hugging Face Spaces. The LLM age gives way to LRMs: Large Reasoning Models The release of o1 has triggered growing interest in creating LRMs, even though not much is known about how the model works under the hood aside from using inference-time scale to improve the model's responses. There are now several Chinese competitors to o1. Chinese AI lab DeepSeek recently released R1-Lite-Preview, its o1 competitor, which is currently only available through the company's online chat interface. R1-Lite-Preview reportedly beats o1 on several key benchmarks. Another recently released model is LLaVA-o1, developed by researchers from multiple universities in China, which brings the inference-time reasoning paradigm to open-source vision language models (VLMs). The focus on LRMs comes at a time of uncertainty about the future of model scaling laws. Reports indicate that AI labs such as OpenAI, Google DeepMind, and Anthropic are getting diminishing returns on training larger models. And creating larger volumes of quality training data is becoming increasingly difficult as models are already being trained on trillions of tokens gathered from the internet. Meanwhile, inference-time scale offers an alternative that might provide the next breakthrough in improving the abilities of the next generation of AI models. There are reports that OpenAI is using o1 to generate synthetic reasoning data to train the next generation of its LLMs. The release of open reasoning models is likely to stimulate progress and make the space more competitive.
[2]
Alibaba releases an 'open' challenger to OpenAI's o1 reasoning model | TechCrunch
A new "reasoning" AI model, QwQ-32B-Preview, has arrived on the scene. It's one of the few to rival OpenAI's o1, and it's the first available to download under a permissive license. Developed by Alibaba's Qwen team, QwQ-32B-Preview, which contains 32.5 billion parameters and can consider prompts up ~32,000 words in length, performs better on certain benchmarks than o1-preview and o1-mini, the two reasoning models that OpenAI has released so far. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. Per Alibaba's testing, QwQ-32B-Preview beats OpenAI's o1 models on the AIME and MATH tests. AIME uses other AI models to evaluate a model's performance, while MATH is a collection of word problems. QwQ-32B-Preview can solve logic puzzles and answer reasonably challenging math questions, thanks to its "reasoning" capabilities. But it isn't perfect. Alibaba notes in a blog post that the model might switch languages unexpectedly, get stuck in loops, and underperform on tasks that require "common sense reasoning." Unlike most AI, QwQ-32B-Preview and other reasoning models effectively fact-check themselves. This helps them avoid some of the pitfalls that normally trip up models, with the downside being that they often take longer to arrive at solutions. Similar to o1, QwQ-32B-Preview reasons through tasks, planning ahead and performing a series of actions that help the model tease out answers. QwQ-32B-Preview, which can be run on and downloaded from the AI dev platform Hugging Face, appears to be similar to the recently released DeepSeek reasoning model in that it treads lightly around certain political subjects. Alibaba and DeepSeek, being Chinese companies, are subject to benchmarking by China's internet regulator to ensure their models' responses "embody core socialist values." Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, like speculation about the Xi Jinping regime. Asked "Is Taiwan a part of China?," QwQ-32B-Preview answered that it was -- a perspective out of step with most of the world but in line with that of China's ruling party. Prompts about Tiananmen Square, meanwhile, yielded a non-response. QwQ-32B-Preview is "openly" available under an Apache 2.0 license, meaning it can be used for commercial applications. But only certain components of the model have been released, making it impossible to replicate QwQ-32B-Preview or gain much insight into the system's inner workings. The increased attention on reasoning models comes as the viability of "scaling laws," long-held theories that throwing more data and computing power at a model would continuously increase its capabilities, are coming under scrutiny. A flurry of press reports suggest that models from major AI labs including OpenAI, Google, and Anthropic aren't improving as dramatically as they once did. That's led to a scramble for new AI approaches, architectures, and development techniques. One is test-time compute, which underpins models like o1 and DeepSeek's. Also known as inference compute, test-time compute essentially gives models additional processing time to complete tasks. Big labs besides OpenAI and Chinese ventures are betting it's the future. According to a recent report from The Information, Google recently expanded its reasoning team to about 200 people and added computing power.
[3]
Alibaba's New AI Model Gets Reasoning Skills to Match OpenAI's GPT-o1
Recently, Deepseek-R1 was also released with advanced reasoning Alibaba released a new artificial intelligence (AI) model on Thursday, which is said to rival OpenAI's GPT-o1 series models in reasoning capability. Launched in preview, the QwQ-32B large language model (LLM) is said to outperform GPT-o1-preview in several mathematical and logical reasoning-related benchmarks. The new AI model is available to download on Hugging Face, however it is not fully open-sourced. Recently, another Chinese AI firm released an open-source AI model DeepSeek-R1, which was claimed to rival ChatGPT-maker's reasoning-focused foundation models. In a blog post, Alibaba detailed its new reasoning-focused LLM and highlighted its capabilities and limitations. The QwQ-32B is currently available as a preview. As the name suggests, it is built on 32 billion parameters and has a context window of 32,000 tokens. The model has completed both pre-training and post-training stages. Coming to its architecture, the Chinese tech giant revealed that the AI model is based on transformer technology. For positional encoding, QwQ-32B uses Rotary Position Embeddings (RoPE), along with Switched Gated Linear Unit (SwiGLU) and Root Mean Square Normalization (RMSNorm) functions, as well as Attention Query-Key-Value Bias (Attention QKV) bias. Just like the OpenAI GPT-o1, the AI model shows its internal monologue when assessing a user query and trying to find the right response. This internal thought process lets QwQ-32B test various theories and fact-check itself before it presents the final answer. Alibaba claims the LLM scored 90.6 percent in the MATH-500 benchmark and 50 percent in the AI Mathematical Evaluation (AIME) benchmark during internal testing and outperformed the OpenAI's reasoning-focused models. Notably, AI models with better reasoning are not proof of models becoming more intelligent or capable. It is simply a new approach, also known as test-time compute, that lets models spend additional processing time to complete a task. As a result, the AI can provide more accurate responses and solve more complex questions. Several industry veterans have pointed out that newer LLMs are not improving at the same rate as their older versions, suggesting the existing architectures are reaching a saturation point. As QwQ-32B spends additional processing time on queries, it also has several limitations. Alibaba stated that the AI model can sometimes mix languages or switch between them giving rise to issues such as language-mixing and code-switching. It also tends to enter reasoning loops and apart from mathematical and reasoning skills, other areas still require improvements. Notably, Alibaba has made the AI model available via a Hugging Face listing and both individuals and enterprises can download it for personal, academic, and commercial purposes under the Apache 2.0 licence. However, the company has not made the model weights and data available, which means users cannot replicate the model or understand how the architecture functions.
[4]
Alibaba researchers unveil Marco-o1, an LLM with advanced reasoning capabilities
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The recent release of OpenAI o1 has brought great attention to large reasoning models (LRMs), and is inspiring new models aimed at solving complex problems classic language models often struggle with. Building on the success of o1 and the concept of LRMs, researchers at Alibaba have introduced Marco-o1, which enhances reasoning capabilities and tackles problems with open-ended solutions where clear standards and quantifiable rewards are absent. OpenAI o1 uses "inference-time scaling" to improve the model's reasoning ability by giving it "time to think." Basically, the model uses more compute cycles during inference to generate more tokens and review its responses, which improves its performance on tasks that require reasoning. o1 is renowned for its impressive reasoning capabilities, especially in tasks with standard answers such as mathematics, physics and coding. However, many applications involve open-ended problems that lack clear solutions and quantifiable rewards. "We aimed to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges," Alibaba researchers write. Marco-o1 is a fine-tuned version of Alibaba's Qwen2-7B-Instruct that integrates advanced techniques such as chain-of-thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS) and reasoning action strategies. The researchers trained Marco-o1 on a combination of datasets, including the Open-O1 CoT dataset; the Marco-o1 CoT dataset, a synthetic dataset generated using MCTS; and the Marco-o1 Instruction dataset, a collection of custom instruction-following data for reasoning tasks. MCTS is a search algorithm that has proven to be effective in complex problem-solving scenarios. It intelligently explores different solution paths by repeatedly sampling possibilities, simulating outcomes and gradually building a decision tree. It has proven to be very effective in complex AI problems, such as beating the game Go. Marco-o1 leverages MCTS to explore multiple reasoning paths as it generates response tokens. The model uses the confidence scores of candidate response tokens to build its decision tree and explore different branches. This enables the model to consider a wider range of possibilities and arrive at more informed and nuanced conclusions, especially in scenarios with open-ended solutions. The researchers also introduced a flexible reasoning action strategy that allows them to adjust the granularity of MCTS steps by defining the number of tokens generated at each node in the tree. This provides a tradeoff between accuracy and computational cost, giving users the flexibility to balance performance and efficiency. Another key innovation in Marco-o1 is the introduction of a reflection mechanism. During the reasoning process, the model periodically prompts itself with the phrase, "Wait! Maybe I made some mistakes! I need to rethink from scratch." This causes the model to re-evaluate its reasoning steps, identify potential errors and refine its thought process. "This approach allows the model to act as its own critic, identifying potential errors in its reasoning," the researchers write. "By explicitly prompting the model to question its initial conclusions, we encourage it to re-express and refine its thought process." To evaluate the performance of Marco-o1, the researchers conducted experiments on several tasks, including the MGSM benchmark, a dataset for multi-lingual grade school math problems. Marco-o1 significantly outperformed the base Qwen2-7B model, particularly when the MCTS component was adjusted for single-token granularity. However, the primary objective of Marco-o1 was to address the challenges of reasoning in open-ended scenarios. To this end, the researchers tested the model on translating colloquial and slang expressions, a task that requires understanding subtle nuances of language, culture and context. The experiments showed that Marco-o1 was able to capture and translate these expressions more effectively than traditional translation tools. For instance, the model correctly translated a colloquial expression in Chinese, which literally means, "This shoe offers a stepping-on-poop sensation", into the English equivalent, "This shoe has a comfortable sole." The reasoning chain of the model shows how it evaluates different potential meanings and arrives at the correct translation. This paradigm can prove to be useful for tasks such as product design and strategy, which require deep and contextual understanding and do not have well-defined benchmarks and metrics. A new wave of reasoning models Since the release of o1, AI labs are racing to release reasoning models. Last week, Chinese AI lab DeepSeek released R1-Lite-Preview, its o1 competitor, which is currently only available through the company's online chat interface. R1-Lite-Preview reportedly beats o1 on several key benchmarks. The open source community is also catching up with the private model market, releasing models and datasets that take advantage of inference-time scaling laws. The Alibaba team released Marco-o1 on Hugging Face along with a partial reasoning dataset that researchers can use to train their own reasoning models. Another recently released model is LLaVA-o1, developed by researchers from multiple universities in China, which brings the inference-time reasoning paradigm to open-source vision language models (VLMs). The release of these models comes amidst uncertainty about the future of model scaling laws. Various reports indicate that the returns on training larger models are diminishing and might be hitting a wall. But what's for certain is that we are just beginning to explore the possibilities of inference-time scaling.
[5]
Alibaba's New AI Model Outperforms OpenAI's o1 In Specific Benchmarks, Now Available For Free Download - Alibaba Gr Hldgs (NYSE:BABA), Alphabet (NASDAQ:GOOG)
Alibaba Group Holding Ltd. BABA has introduced a new AI model, QwQ-32B-Preview, which is positioned as a competitor to Microsoft Corp.-backed MSFT OpenAI's o1 reasoning model. What Happened: Developed by Alibaba's Qwen team, QwQ-32B-Preview boasts 32.5 billion parameters, enabling it to handle prompts up to 32,000 words. This model is one of the few available under a permissive license, allowing for download and use, the company stated in a blog post on Thursday. It reportedly outperforms OpenAI's o1-preview and o1-mini models on specific benchmarks, such as the AIME and MATH tests. These tests evaluate AI models' performance in logic puzzles and math problems. See Also: Joby Gets Toyota's $500M Dollar Lift: eVTOL Pioneer Making New Highs Despite its strengths, QwQ-32B-Preview has limitations. Alibaba acknowledges potential issues like unexpected language switches and underperformance in tasks requiring common sense reasoning. The model's reasoning capabilities allow it to fact-check itself, reducing errors but increasing solution time. QwQ-32B-Preview is available on the AI development platform Hugging Face. However, only certain components are released, limiting full replication or insight into its workings. The model's responses align with Chinese regulatory standards, avoiding politically sensitive topics. The development was first spotted by TechCrunch. Subscribe to the Benzinga Tech Trends newsletter to get all the latest tech developments delivered to your inbox. Why It Matters: The introduction of Alibaba's QwQ-32B-Preview comes at a time when OpenAI is making significant strides in the AI sector. Last month, OpenAI's valuation soared to $157 billion following a successful funding round. Earlier this week, it was reported that SoftBank Group SFTBF has increased its stake in OpenAI through a $1.5 billion employee share buyout. OpenAI is also reportedly exploring the development of its own web browser to challenge Alphabet Inc. GOOG GOOGL subsidiary Google's Chrome browser, following a push from the U.S. Department of Justice to divest it. Price Action: Currently, Alibaba shares are up 0.27% to $86.82 in pre-market trading on Friday, following a 1.66% rise to $86.59 during Wednesday's regular session, according to data from Benzinga Pro. Check out more of Benzinga's Consumer Tech coverage by following this link. Read Next: China's Tech Giants Alibaba, ByteDance, And Meituan Are Expanding Their Silicon Valley AI Footprints Amid US Efforts To Block Progress: Here's Why Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. Photo courtesy: Shutterstock Market News and Data brought to you by Benzinga APIs
Share
Share
Copy Link
Alibaba releases QwQ-32B-Preview, an open-source AI model that rivals OpenAI's o1 in reasoning capabilities. The model outperforms o1 on specific benchmarks and is available for commercial use.
Alibaba, the Chinese e-commerce giant, has entered the race for advanced AI models with the release of QwQ-32B-Preview, a new reasoning model that aims to compete with OpenAI's o1 series 1. This development marks a significant step in the evolution of Large Reasoning Models (LRMs) and highlights the growing competition in the AI industry.
QwQ-32B-Preview boasts 32.5 billion parameters and can process prompts up to 32,000 words in length 2. The model utilizes advanced techniques such as inference-time scaling, which allows it to generate more tokens and review its responses during problem-solving. This approach enables QwQ to excel in tasks requiring logical reasoning and planning, particularly in mathematics and coding.
According to Alibaba's internal testing, QwQ-32B-Preview outperforms OpenAI's o1-preview and o1-mini on several key benchmarks:
The model also demonstrates competitive performance on LiveCodeBench coding benchmarks, surpassing other frontier models like GPT-4o and Claude 3.5 Sonnet 1.
Unlike OpenAI's closed-source approach, Alibaba has made QwQ-32B-Preview available for download on Hugging Face under an Apache 2.0 license 3. This permissive licensing allows for both personal and commercial use, potentially accelerating innovation in the field. However, it's worth noting that only certain components of the model have been released, limiting full replication or deep insight into its inner workings 2.
Despite its impressive capabilities, QwQ-32B-Preview is not without limitations. Alibaba acknowledges that the model may:
These challenges highlight the ongoing complexities in developing advanced AI models 3.
The release of QwQ-32B-Preview comes amid growing interest in LRMs, which offer an alternative approach to traditional model scaling. As AI labs face diminishing returns on training larger models, inference-time scale presents a promising avenue for improving AI capabilities 1.
Alibaba's entry into the LRM space with an open-source model could significantly impact the AI landscape. The availability of QwQ-32B-Preview for commercial use may foster innovation and competition, potentially accelerating advancements in AI reasoning capabilities 5.
As the AI industry continues to evolve, the development of models like QwQ-32B-Preview underscores the ongoing shift towards more sophisticated reasoning capabilities in artificial intelligence. This trend is likely to shape the future of AI applications across various domains, from scientific research to business strategy and product design 4.
Reference
[3]
Alibaba's Qwen Team unveils QwQ-32B, an open-source AI model matching DeepSeek R1's performance with significantly lower computational requirements, showcasing advancements in reinforcement learning for AI reasoning.
3 Sources
3 Sources
Alibaba's Qwen research team has released QVQ-72B, an experimental open-source AI model that combines visual analysis with advanced reasoning capabilities, potentially outperforming some closed-source competitors in specific benchmarks.
2 Sources
2 Sources
DeepSeek's open-source R1 model challenges OpenAI's o1 with comparable performance at a fraction of the cost, potentially revolutionizing AI accessibility and development.
6 Sources
6 Sources
OpenAI introduces O1 AI models for enterprise and education, competing with Anthropic. The models showcase advancements in AI capabilities and potential applications across various sectors.
3 Sources
3 Sources
OpenAI's latest model, O1, represents a significant advancement in AI technology, demonstrating human-like reasoning capabilities. This development could revolutionize various industries and spark new ethical considerations.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved