Curated by THEOUTPOST
On Tue, 21 Jan, 12:02 AM UTC
21 Sources
[1]
Deepseek-R1: The Open Source AI That Outperforms OpenAI
Imagine tackling a problem so complex it feels like solving a 1,000-piece puzzle without the picture on the box. Whether you're a developer, a researcher, or a business leader, you've likely faced challenges that demand not just raw computational power but true reasoning -- breaking down tasks, exploring alternatives, and refining solutions over time. Enter Deepseek R1, a new open source AI model that promises to do just that, offering advanced reasoning capabilities at a fraction of the cost of its competitors. If you've ever wished for a tool that could think through problems the way you do -- but faster and more efficiently -- this might just be the breakthrough you've been waiting for. But what makes Deepseek R1 more than just another AI model in an already crowded field? It's not just about its impressive cost savings or its ability to rival top proprietary systems. Deepseek R1 is designed to go beyond surface-level tasks, excelling in areas like agent planning, image reasoning, and task decomposition. Whether you're trying to streamline operations, interpret complex data, or train smaller, specialized models for edge devices, this model has the potential to transform how we approach problem-solving in AI. AI Jason provides more insight into what makes Deepseek R1 a fantastic option and how it could redefine the future of reasoning models. Deepseek R1 distinguishes itself by combining innovative performance with cost efficiency, making it an attractive option for organizations and developers. Its ability to handle complex reasoning tasks, paired with its open source accessibility, positions it as a critical tool in the evolving AI landscape. By offering advanced capabilities at a reduced cost, Deepseek R1 ensures that high-performance AI is no longer limited to organizations with extensive budgets. Deepseek R1 rivals top-tier proprietary models in performance while reducing costs by up to 96%. Its open source design allows developers to integrate reasoning tokens into their workflows without incurring significant expenses. This affordability makes it particularly appealing to organizations seeking high-performance AI solutions without the financial burden of traditional models. For example, businesses in sectors like finance or healthcare can now deploy advanced AI systems without the need for substantial upfront investment. This widespread access of AI technology ensures that smaller organizations can also benefit from innovative tools, leveling the playing field in competitive industries. Browse through more resources below from our in-depth content covering more areas on Reasoning Models. At its core, Deepseek R1 excels in reasoning and problem-solving, employing advanced techniques to tackle intricate challenges. These include: For instance, when analyzing intricate datasets, Deepseek R1 can break down problems into manageable steps, iteratively refining its results. This capability is particularly valuable in fields like scientific research, where precision and adaptability are critical. By allowing more nuanced problem-solving, Deepseek R1 enables users to address challenges that were previously beyond the reach of traditional AI systems. Deepseek R1 supports knowledge distillation, allowing the creation of smaller, domain-specific models. These compact models can be deployed on edge devices, such as smartphones or IoT systems, making AI applications more accessible and efficient. This feature is especially valuable in industries where resource constraints demand lightweight yet powerful solutions. Key applications include: By using reasoning data, these smaller models maintain high performance while operating efficiently on limited hardware. This ensures that even industries with limited computational resources can benefit from advanced AI capabilities. Deepseek R1 enhances user interaction through effective prompting techniques. Unlike traditional models that require detailed instructions, it performs optimally with concise, one-to-two-shot prompts. This simplicity not only improves usability but also enhances the model's ability to generate accurate and contextually relevant outputs. Encouraging extended reasoning within prompts further boosts its performance, particularly in tasks requiring nuanced understanding. For example, in customer service applications, Deepseek R1 can interpret brief user inputs and provide detailed, context-aware responses, improving both efficiency and user satisfaction. Deepseek R1 is designed for advanced applications that demand sophisticated reasoning. Its capabilities make it a versatile tool across various industries. Key use cases include: These features make Deepseek R1 particularly effective in industries like healthcare, engineering, and logistics, where advanced problem-solving is essential. Its ability to adapt and refine its outputs ensures that it remains relevant even as challenges evolve. While Deepseek R1 offers numerous advantages, it is not without trade-offs. Its advanced reasoning capabilities come with higher latency and computational costs compared to standard AI models. This makes it better suited for tasks requiring deep problem-solving rather than routine operations. Developers should carefully assess their application needs to determine whether Deepseek R1 aligns with their goals. For instance, while it excels in tasks requiring precision and adaptability, it may not be the ideal choice for applications where speed and simplicity are paramount. Deepseek R1 represents a significant advancement in AI development. By using inference-stage computation, it overcomes the limitations of pre-training data, paving the way for more adaptable and intelligent systems. Experts predict that reasoning models like Deepseek R1 will drive substantial progress in AI by 2025, unlocking new possibilities across industries. For example, in education, reasoning models could enable personalized learning experiences, while in environmental science, they could assist in modeling complex ecosystems. As these models continue to evolve, their potential to address global challenges will only grow. To fully harness the potential of Deepseek R1, consider the following best practices: By following these strategies, organizations can maximize the benefits of Deepseek R1 while mitigating its limitations. This ensures that the model is deployed effectively, delivering optimal results across a wide range of applications.
[2]
DeepSeek R1 : Open Source AI Competing with Big Tech Giants
DeepSeek has recently unveiled its DeepSeek R1 AI model family, marking a significant advancement in the field of artificial intelligence reasoning for open source AI. This release introduces open weights and a variety of distilled models, emphasizing improvements in reasoning and performance. Among these models is a 1.5 billion parameter version that demonstrates competitive performance against proprietary systems such as OpenAI's GPT-4 and Anthropic's Claude 3.5 in specific benchmarks. By adopting an open source framework under the MIT license, DeepSeek establishes a new standard for accessibility, allowing researchers and developers to experiment and innovate without restrictions. Whether you're working on a complex reasoning task, experimenting with AI on limited hardware, or simply exploring what's possible, DeepSeek R1 offers a glimpse into a future where advanced AI tools aren't locked behind closed doors. The DeepSeek R1 models distinguish themselves through their exceptional capabilities in reasoning and problem-solving tasks. Their performance on benchmarks like GSM 8K, which evaluates math and logic skills, highlights their ability to generate detailed chain-of-thought reasoning. This feature is crucial for addressing complex problems. Even the smaller, distilled versions -- some with as few as 1.5 billion parameters -- achieve results that rival much larger proprietary models. This makes DeepSeek R1 an attractive option for researchers and developers seeking high-performing AI solutions without the constraints of closed ecosystems. The open source nature of DeepSeek R1 further enhances its appeal. By providing unrestricted access to the models, DeepSeek enables a global community of developers to explore, adapt, and apply these tools to a wide range of applications. This approach not only provide widespread access tos access to advanced AI but also fosters collaboration and innovation across diverse fields. The success of DeepSeek R1 is rooted in its unique and carefully designed training pipeline. Unlike traditional methods that rely heavily on supervised fine-tuning, DeepSeek R1 employs reinforcement learning (RL) to enhance its reasoning capabilities. This innovative approach enables the models to generate logical, step-by-step explanations, making them particularly effective for tasks requiring detailed reasoning. The training process follows a multi-stage methodology: This structured and iterative process ensures that the models achieve a balance between efficiency and advanced reasoning capabilities. By focusing on reinforcement learning without supervised fine-tuning, DeepSeek R1 demonstrates a novel approach to AI training that prioritizes logical reasoning and adaptability. Browse through more resources below from our in-depth content covering more areas on AI reasoning. DeepSeek has prioritized accessibility by employing a rigorous model distillation process. This technique involves creating smaller, distilled versions of the flagship R1 model using carefully curated datasets. These distilled models retain the reasoning strength of the original while eliminating the need for direct reinforcement learning application. As a result, they are optimized for deployment on consumer hardware or in environments with limited computational resources. The availability of these lightweight models ensures that innovative AI technology is accessible to a broader audience. Developers and researchers with limited hardware can use these models for a variety of applications, from educational tools to technical problem-solving. The distillation process exemplifies DeepSeek's commitment to making advanced AI tools available to as many users as possible, regardless of their technical or financial constraints. The DeepSeek R1 family offers a range of models, from 1.5 billion to 671 billion parameters. However, even the largest model operates with 37 billion active parameters at any given time, striking a balance between scale and computational efficiency. For developers with limited resources, smaller models are available in quantized versions, allowing local deployment on consumer-grade devices or platforms like Google Colab. This flexibility ensures that experimentation and development are not hindered by hardware limitations. DeepSeek R1 is particularly well-suited for tasks that require reasoning and problem-solving. Its potential applications include: Despite its strengths, DeepSeek R1 is less effective for tasks requiring highly structured outputs, such as JSON generation, or for creative writing. Additionally, the models are not yet optimized for seamless integration into workflows that demand structured outputs or tool-based interactions. However, the open source nature of DeepSeek R1 allows developers to customize and adapt the models to address these limitations, further expanding their utility. By releasing the DeepSeek R1 models under the MIT license, DeepSeek has taken a bold step toward providing widespread access to access to advanced AI technology. This open source framework allows developers to experiment with the models locally using tools like the Transformers library or explore their capabilities through DeepSeek's chat interface. The lightweight nature of the distilled models ensures that even users with limited hardware can engage with these advanced tools. This open approach fosters collaboration and innovation, allowing a diverse range of users to contribute to the development and application of AI technology. By prioritizing accessibility and transparency, DeepSeek has created a platform that encourages experimentation and drives progress in the field of artificial intelligence. The release of DeepSeek R1 underscores the growing competitiveness of open source AI. By achieving performance levels comparable to proprietary systems, DeepSeek demonstrates that community-driven innovation can rival and even surpass closed ecosystems. The accompanying technical paper provides detailed insights into the training methodologies and benchmarks, offering valuable resources for researchers and developers. DeepSeek R1 represents a significant milestone in the evolution of open source AI. Its focus on reasoning, accessibility, and performance challenges the dominance of proprietary models while empowering developers and researchers worldwide. As AI continues to evolve, the release of DeepSeek R1 highlights the fantastic potential of open source innovation in shaping the future of technology.
[3]
DeepSeek-R1 - New Open Source AI Model with Human-Like Reasoning Performance
DeepSeek-R1, the latest open source reasoning AI model, represents a significant advancement in artificial intelligence. Released under the permissive MIT license, it is designed to encourage commercial use, fine-tuning, and community-driven innovation. By integrating reinforcement learning (RL) and adhering to a transparent development philosophy, DeepSeek-R1 provides a compelling alternative to proprietary systems like OpenAI's GPT-4. Its open source nature and technical sophistication make it a standout in the rapidly evolving AI landscape. DeepSeek-R1 has been built on the principles of open source development and powered by reinforcement learning, this AI model is designed to be as versatile as it is powerful. Whether you're a researcher looking to push the boundaries of AI, a developer building the next big thing, or an organization seeking smarter solutions, DeepSeek-R1 promises to deliver. Check out the overview video below created by Prompt Engineering to learn more about this new reasoning AI model. DeepSeek-R1's open source framework is a defining feature that distinguishes it from many other AI models. The model weights are freely accessible on platforms such as Hugging Face, allowing developers and researchers to experiment, adapt, and deploy the model without restrictive licensing. The use of the MIT license ensures unparalleled flexibility for both personal and commercial applications, fostering an environment of collaboration and innovation within the AI community. In addition to its open availability, DeepSeek-R1 is accessible through multiple channels, including chat.deepseek.com and an API with no rate limits. This unrestricted access broadens its appeal, making it a practical solution for diverse use cases, ranging from academic research to enterprise-level applications. By removing barriers to entry, DeepSeek-R1 enables users to explore its capabilities without limitations, encouraging widespread adoption and experimentation. DeepSeek-R1 delivers exceptional performance in reasoning, coding, and mathematics, rivaling industry leaders like GPT-4. Its smaller, distilled versions, such as Quin 1.5B, demonstrate remarkable efficiency by outperforming larger models on key benchmarks. This achievement highlights the model's optimized architecture, which balances performance with resource efficiency. These compact versions are particularly well-suited for deployment on edge devices and GPUs with 24GB VRAM, making sure high performance even in resource-constrained environments. This adaptability makes DeepSeek-R1 a versatile solution for a wide range of scenarios, including large-scale enterprise deployments and edge computing applications. Its ability to deliver robust results while minimizing resource demands underscores its practical value for developers and organizations alike. Advance your skills in reasoning AI by reading more of our detailed content. DeepSeek-R1's reliance on reinforcement learning (RL) is a cornerstone of its development. Unlike traditional supervised fine-tuning, which depends heavily on labeled datasets, RL allows the model to learn from its own experiences. This iterative process enables the model to refine its reasoning capabilities over time, resulting in a more nuanced and human-like approach to problem-solving. The use of RL not only enhances the model's ability to tackle complex tasks but also reduces the need for extensive labeled data, making the training process more efficient. A detailed technical report accompanying the release provides further insights into the innovative methodologies employed during training, offering a valuable resource for researchers and developers interested in understanding the model's inner workings. One of DeepSeek-R1's most remarkable features is its ability to emulate human-like reasoning. Through the application of RL, the model has developed internal mechanisms that mimic cognitive processes, allowing it to approach intricate reasoning tasks with precision and accuracy. This capability has significant implications for a variety of fields, including: These capabilities make DeepSeek-R1 an invaluable tool for developers, researchers, and businesses seeking to use advanced reasoning AI in their projects. Its ability to adapt to diverse tasks further enhances its utility, making sure that it can meet the needs of a wide range of users. DeepSeek-R1 is designed with adaptability in mind, making it suitable for a variety of applications. Its outputs can be distilled, fine-tuned, and customized to meet specific requirements, whether for enterprise solutions, academic research, or educational platforms. The absence of rate limits on API usage, combined with competitive pricing, enhances its appeal for commercial deployment, providing organizations with a cost-effective and scalable AI solution. For businesses, DeepSeek-R1 offers a robust foundation for developing AI-driven tools, optimizing workflows, and creating innovative products. Its flexibility and performance make it an ideal choice for organizations looking to integrate advanced reasoning capabilities into their operations. The AI community has responded positively to DeepSeek-R1, praising its transparency, innovative use of reinforcement learning, and open source accessibility. This release is widely regarded as a significant step forward in the development of reasoning AI, with RL emerging as a promising approach for future advancements. Looking ahead, there is considerable potential for further performance improvements, particularly with the development of larger models featuring parameter counts ranging from 7 to 70 billion. These advancements could enhance DeepSeek-R1's capabilities even further, solidifying its position as a leader in the field of reasoning AI. As the AI landscape continues to evolve, DeepSeek-R1 stands as a testament to the power of open source innovation. By combining innovative techniques with a commitment to accessibility and collaboration, it sets a new standard for what reasoning AI can achieve.
[4]
Deepseek-R1 vs ChatGPT-4 : Why Deepseek-R1 is the Future of Affordable AI Innovation
Deepseek R1 has emerged as a prominent open source language model, excelling in areas such as coding, reasoning, and mathematical problem-solving. It directly competes with proprietary models like OpenAI o1 and Sonnet 3.5, often outperforming them in specific domains while offering substantial cost benefits. For developers, researchers, and organizations seeking adaptable and transparent AI solutions, Deepseek R1 presents a highly flexible and compelling option. The new Deepseek R1 reasoning AI model is a fantastic option for anyone tackling complex coding, reasoning, or problem-solving tasks. Whether you're debugging a tricky piece of code, navigating ethical dilemmas, or working through intricate mathematical problems, this model delivers results that rival, and sometimes even surpass, its proprietary counterparts like O1. But what truly sets it apart is its open source nature, giving users the ability to adapt and tailor it to their unique needs. In this Deepseek-R1 vs ChatGPT-4 performance overview by Prompt Engineering explore how Deepseek R1 is redefining what's possible in AI. Deepseek R1 distinguishes itself by delivering performance comparable to, and sometimes exceeding, that of proprietary models. It achieves an impressive 97% success rate in coding tasks, surpassing O1 in this critical area. While it underperforms slightly in benchmarks like AER and Polyot -- where precision in highly nuanced scenarios is essential -- its overall capabilities position it as a strong contender in the competitive AI landscape. This balance of strengths and limitations underscores its versatility and practical value for a wide range of applications. Deepseek R1 offers a robust suite of features designed to meet the needs of both technical and semi-technical users. Its standout capabilities include: These features make Deepseek R1 a versatile tool for a variety of use cases, from software development to academic research, while also offering users the freedom to adapt the model to their specific requirements. Here are more detailed guides and articles that you may find helpful on Open source language models. Deepseek R1's coding capabilities extend far beyond basic code generation. It can produce fully functional scripts for tasks such as API integration, while simultaneously identifying and resolving potential errors. For instance, a developer working on a web application could rely on the model to generate backend code and receive detailed debugging suggestions in real time. This dual functionality not only reduces development time but also enhances overall productivity. By automating repetitive tasks and providing actionable insights, Deepseek R1 enables developers to focus on more complex and creative aspects of their projects. One of Deepseek R1's most impressive strengths lies in its reasoning capabilities. Using a structured, step-by-step approach, the model excels at solving logical puzzles and addressing ethical dilemmas. For example, it can analyze variations of the trolley problem with a nuanced understanding of moral trade-offs, offering insights that reflect human-like reasoning. However, it is worth noting that the model occasionally struggles with implicit conditions in problem statements, which can affect its accuracy in highly specific or ambiguous scenarios. Despite these occasional shortcomings, its ability to handle complex reasoning tasks makes it a valuable tool for users in fields such as philosophy, law, and decision-making analysis. Deepseek R1 incorporates censorship mechanisms to manage sensitive or controversial topics, aligning with industry standards for responsible AI use. However, its open source nature provides users with the option to modify or disable these restrictions. This flexibility is particularly appealing for advanced users who require greater control over their AI systems. While this feature enhances the model's adaptability, it may pose challenges for users who are less familiar with customizing AI architectures. Nonetheless, the ability to tailor censorship settings underscores Deepseek R1's commitment to transparency and user empowerment. Deepseek R1 offers several notable advantages that make it a standout choice in the AI landscape: However, the model is not without its limitations. It occasionally relies too heavily on patterns from its training data, leading to errors in unique or nuanced scenarios. Additionally, while its censorship mechanisms are modifiable, users unfamiliar with AI customization may find this feature challenging to navigate. Despite these drawbacks, Deepseek R1's overall utility and adaptability remain high, making it a reliable choice for a wide range of applications. Deepseek R1's open source framework is a defining feature that sets it apart from many high-performing language models. Its fully accessible weights allow users to conduct independent testing, tailor the model to specific needs, and deploy it on a variety of hardware configurations. This level of transparency and flexibility is rare in the AI industry, where proprietary models often limit user control. For developers and researchers, the open source nature of Deepseek R1 not only reduces costs but also fosters innovation by allowing experimentation and customization. The development of distilled versions of Deepseek R1, ranging from 32B to 70B parameters, is already underway. These smaller models aim to maintain the performance of the original while reducing hardware requirements, potentially making the technology accessible to a broader audience. This focus on scalability and efficiency highlights the model's adaptability and its potential to meet the evolving needs of users. As the AI landscape continues to grow, Deepseek R1's commitment to innovation ensures its relevance and utility in the years to come.
[5]
Chinese Open-Source AI DeepSeek R1 Matches OpenAI's o1 at 98% Lower Cost - Decrypt
Chinese AI researchers have achieved what many thought was light years away: A free, open-source AI model that can match or exceed the performance of OpenAI's most advanced reasoning systems. What makes this even more remarkable was how they did it: by letting the AI teach itself through trial and error, similar to how humans learn. "DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities." the research paper reads. "Reinforcement learning" is a method in which a model is rewarded for making good decisions and punished for making bad ones, without knowing which one is which. After a series of decisions, it learns to follow a path that was reinforced by those results. Initially, during the supervised fine-tuning phase, a group of humans tells the model the desired output they want, giving it context to know what's good and what isn't. This leads to the next phase, Reinforcement Learning, in which a model provides different outputs and humans rank the best ones. The process is repeated over and over until the model knows how to consistently provide satisfactory results. DeepSeek R1 is a steer in AI development because humans have a minimum part in the training. Unlike other models that are trained on vast amounts of supervised data, DeepSeek R1 learns primarily through mechanical reinforcement learning -- essentially figuring things out by experimenting and getting feedback on what works. "Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and interesting reasoning behaviors," the researchers said in their paper. The model even developed sophisticated capabilities like self-verification and reflection without being explicitly programmed to do so. As the model went through its training process, it naturally learned to allocate more "thinking time" to complex problems and developed the ability to catch its own mistakes. The researchers highlighted an "a-ha moment" where the model learned to reevaluate its initial approaches to problems -- something it wasn't explicitly programmed to do. The performance numbers are impressive. On the AIME 2024 mathematics benchmark, DeepSeek R1 achieved a 79.8% success rate, surpassing OpenAI's o1 reasoning model. On standardized coding tests, it demonstrated "expert level" performance, achieving a 2,029 Elo rating on Codeforces and outperforming 96.3% of human competitors. But what really sets DeepSeek R1 apart is its cost -- or lack thereof. The model runs queries at just $0.14 per million tokens compared to OpenAI's $7.50, making it 98% cheaper. And unlike proprietary models, DeepSeek R1's code and training methods are completely open source under the MIT license, meaning anyone can grab the model, use it and modify it without restrictions. The release of DeepSeek R1 has triggered an avalanche of responses from AI industry leaders, with many highlighting the significance of a fully open-source model matching proprietary leaders in reasoning capabilities. Nvidia's top researcher Dr. Jim Fan delivered perhaps the most pointed commentary, drawing a direct parallel to OpenAI's original mission. "We are living in a timeline where a non-U.S. company is keeping the original mission of OpenAI alive -- truly open frontier research that empowers all," Fan noted, praising DeepSeek's unprecedented transparency. Fan called out the significance of DeepSeek's reinforcement learning approach: "They are perhaps the first [open source software] project that shows major sustained growth of [a reinforcement learning] flywheel. He also lauded DeepSeek's straightforward sharing of "raw algorithms and matplotlib learning curves" versus the hype-driven announcements more common in the industry. Apple researcher Awni Hannun mentioned that people can run a quantized version of the model locally on their Macs. Traditionally, Apple devices have been weak at AI due to their lack of compatibility with Nvidia's CUDA software, but that appears to be changing. For example, AI researcher Alex Cheema was capable of running the full model after harnessing the power of 8 Apple Mac Mini units running together -- which is still cheaper than the servers required to run the most powerful AI models currently available. That said, users can run lighter versions of DeepSeek R1 on their Macs with good levels of accuracy and efficiency. However, the most interesting reactions came after pondering how close the open source industry is to the proprietary models, and the potential impact this development may have for OpenAI as the leader in the field of reasoning AI models. Stability AI's founder Emad Mostaque took a provocative stance, suggesting the release puts pressure on better-funded competitors: "Can you imagine being a frontier lab that's raised like a billion dollars and now you can't release your latest model because it can't beat DeepSeek?" Following the same reasoning but with a more serious argumentation, tech entrepreneur Arnaud Bertrand explained that the emergence of a competitive open source model may be potentially harmful to OpenAI, since that makes its models less attractive to power users who might otherwise be willing to spend a lot of money per task. "It's essentially as if someone had released a mobile on par with the iPhone, but was selling it for $30 instead of $1000. It's this dramatic." Perplexity AI's CEO Arvind Srinivas framed the release in terms of its market impact: "DeepSeek has largely replicated o1 mini and has open-sourced it." In a follow-up observation, he noted the rapid pace of progress: "It's kind of wild to see reasoning get commoditized this fast." Srinivas said his team will work to bring DeepSeek R1's reasoning capabilities to Perplexity Pro in the future. We did a few quick tests to compare the model against OpenAI o1, starting with a well-known question for these kinds of benchmarks: "How many Rs are in the word Strawberry?" Typically, models struggle to provide the correct answer because they don't work with words -- they work with tokens, digital representations of concepts. GPT-4o failed, OpenAI o1 succeeded -- and so did DeepSeek R1. However, o1 was very concise in the reasoning process, whereas DeepSeek applied a heavy reasoning output. Interestingly enough, DeepSeek's answer felt more human. During the reasoning process, the model appeared to talk to itself, using slang and words that are uncommon on machines but more widely used by humans. For example, while reflecting on the number of Rs, the model said to itself, "Okay, let me figure (this) out." It also used "Hmmm," while debating, and even said things like "Wait, no. Wait, let's break it down." The model eventually reached the correct results, but spent a lot of time reasoning and spitting tokens. Under typical pricing conditions, this would be a disadvantage; but given the current state of things, it can output way more tokens than OpenAI o1 and still be competitive. Another test to see how good the models were at reasoning was to play "spies" and identify the perpetrators in a short story. We choose a sample from the BIG-bench dataset on Github. (The full story is available here and involves a school trip to a remote, snowy location, where students and teachers face a series of strange disappearances and the model must find out who was the stalker.) Both models thought about it for over one minute. However, ChatGPT crashed before solving the mystery: But DeepSeek gave the correct answer after "thinking" about it for 106 seconds. The thought process was correct, and the model was even capable of correcting itself after arriving at incorrect (but still logical enough) conclusions. The accessibility of smaller versions particularly impressed researchers. For context, a 1.5B model is so small, you could theoretically run it locally on a powerful smartphone. And even a quantized version of Deepseek R1 that small was able to stand face-to-face against GPT-4o and Claude 3.5 Sonnet, according to Hugging Face's data scientist Vaibhav Srivastav. Just a week ago, UC Berkeley's SkyNove released Sky T1, a reasoning model also capable of competing against OpenAI o1 preview. Those interested in running the model locally can download it from Github or Huggingf Face. Users can download it, run it, remove the censorship, or adapt it to different areas of expertise by fine-tuning it. Or if you want to try the model online, go to Hugging Chat or DeepSeek's Web Portal, which is a good alternative to ChatGPT -- especially since it's free, open source, and the only AI chatbot interface with a model built for reasoning besides ChatGPT.
[6]
Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 -- at 95% less cost
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese AI startup DeepSeek, known for challenging leading AI vendors with open-source technologies, just dropped another bombshell: a new open reasoning LLM called DeepSeek-R1. Based on the recently introduced DeepSeek V3 mixture-of-experts model, DeepSeek-R1 matches the performance of o1, OpenAI's frontier reasoning LLM, across math, coding and reasoning tasks. The best part? It does this at a much more tempting cost, proving to be 90-95% more affordable than the latter. The release marks a major leap forward in the open-source arena. It showcases that open models are further closing the gap with closed commercial models in the race to artificial general intelligence (AGI). To show the prowess of its work, DeepSeek also used R1 to distill six Llama and Qwen models, taking their performance to new levels. In one case, the distilled version of Qwen-1.5B outperformed much bigger models, GPT-4o and Claude 3.5 Sonnet, in select math benchmarks. The focus is sharpening on artificial general intelligence (AGI), a level of AI that can perform intellectual tasks like humans. A lot of teams are doubling down on enhancing models' reasoning capabilities. OpenAI made the first notable move in the domain with its o1 model, which uses a chain-of-thought reasoning process to tackle a problem. Through RL (reinforcement learning, or reward-driven optimization), o1 learns to hone its chain of thought and refine the strategies it uses -- ultimately learning to recognize and correct its mistakes, or try new approaches when the current ones aren't working. Now, continuing the work in this direction, DeepSeek has released DeepSeek-R1, which uses a combination of RL and supervised fine-tuning to handle complex reasoning tasks and match the performance of o1. When tested, DeepSeek-R1 scored 79.8% on AIME 2024 mathematics tests and 97.3% on MATH-500. It also achieved a 2,029 rating on Codeforces -- better than 96.3% of human programmers. In contrast, o1-1217 scored 79.2%, 96.4% and 96.6% respectively on these benchmarks. It also demonstrated strong general knowledge, with 90.8% accuracy on MMLU, just behind o1's 91.8%. The training pipeline DeepSeek-R1's reasoning performance marks a big win for the Chinese startup in the US-dominated AI space, especially as the entire work is open-source, including how the company trained the whole thing. However, the work isn't as straightforward as it sounds. According to the paper describing the research, DeepSeek-R1 was developed as an enhanced version of DeepSeek-R1-Zero -- a breakthrough model trained solely from reinforcement learning. The company first used DeepSeek-V3-base as the base model, developing its reasoning capabilities without employing supervised data, essentially focusing only on its self-evolution through a pure RL-based trial-and-error process. Developed intrinsically from the work, this ability ensures the model can solve increasingly complex reasoning tasks by leveraging extended test-time computation to explore and refine its thought processes in greater depth. "During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors," the researchers note in the paper. "After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with majority voting, the score further improves to 86.7%, matching the performance of OpenAI-o1-0912." However, despite showing improved performance, including behaviors like reflection and exploration of alternatives, the initial model did show some problems, including poor readability and language mixing. To fix this, the company built on the work done for R1-Zero, using a multi-stage approach combining both supervised learning and reinforcement learning, and thus came up with the enhanced R1 model. "Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model," the researchers explained. "Following this, we perform reasoning-oriented RL like DeepSeek-R1- Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217." Far more affordable than o1 In addition to enhanced performance that nearly matches OpenAI's o1 across benchmarks, the new DeepSeek-R1 is also very affordable. Specifically, where OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, costs $0.55 per million input and $2.19 per million output tokens. The model can be tested as "DeepThink" on the DeepSeek chat platform, which is similar to ChatGPT. Interested users can access the model weights and code repository via Hugging Face, under an MIT license, or can go with the API for direct integration.
[7]
DeepSeek R1's bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More DeepSeek R1's Monday release has sent shockwaves through the AI community, disrupting assumptions about what's required to achieve cutting-edge AI performance. Matching OpenAI's o1 at just 3%-5% of the cost, this open-source model has not only captivated developers but also challenges enterprises to rethink their AI strategies. The model has rocketed to the top-trending model being downloaded on HuggingFace (109,000, as of this writing) - as developers rush to try it out and seek to understand what it means for their AI development. Users are commenting that DeepSeek's accompanying search feature (which you can find at DeepSeek's site) is now superior to competitors like OpenAI and Perplexity, and is only rivaled by Google's Gemini Deep Research. The implications for enterprise AI strategies are profound: With reduced costs and open access, enterprises now have an alternative to costly proprietary models like OpenAI's. DeepSeek's release could democratize access to cutting-edge AI capabilities, enabling smaller organizations to compete effectively in the AI arms race. This story focuses on exactly how DeepSeek managed this feat, and what it means for the vast number of users of AI models. For enterprises developing AI-driven solutions, DeepSeek's breakthrough challenges assumptions of OpenAI's dominance -- and offers a blueprint for cost-efficient innovation. It's the "how" DeepSeek did what it did that should be the most educational here. DeepSeek's breakthrough: Moving to pure reinforcement learning In November, DeepSeek made headlines with its announcement that it had achieved performance surpassing OpenAI's o1, but at the time it only offered a limited R1-lite-preview model. With Monday's full release of R1 and the accompanying technical paper, the company revealed a surprising innovation: a deliberate departure from the conventional supervised fine-tuning (SFT) process widely used in training large language models (LLMs). SFT, a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought (CoT). It is considered essential for improving reasoning capabilities. However, DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL) to train the model. This bold move forced DeepSeek-R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets. While some flaws emerge - leading the team to reintroduce a limited amount of SFT during the final stages of building the model - the results confirmed the fundamental breakthrough: reinforcement learning alone could drive substantial performance gains. The company got much of the way using open source - a conventional and unsurprising way First, some background on how DeepSeek got to where it did. DeepSeek, a 2023 spin-off from Chinese hedge-fund High-Flyer Quant, began by developing AI models for its proprietary chatbot before releasing them for public use. Little is known about the company's exact approach, but it quickly open sourced its models, and it's extremely likely that the company built upon the open projects produced by Meta, for example the Llama model, and ML library Pytorch. To train its models, High-Flyer Quant secured over 10,000 Nvidia GPUs before U.S. export restrictions, and reportedly expanded to 50,000 GPUs through alternative supply routes, despite trade barriers. This pales compared to leading AI labs like OpenAI, Google, and Anthropic, which operate with more than 500,000 GPUs each. DeepSeek's ability to achieve competitive results with limited resources highlights how ingenuity and resourcefulness can challenge the high-cost paradigm of training state-of-the-art LLMs. Despite speculation, DeepSeek's full budget is unknown DeepSeek reportedly trained its base model -- called V3 -- on a $5.58 million budget over two months, according to Nvidia engineer Jim Fan. While the company hasn't divulged the exact training data it used (side note: critics say this means DeepSeek isn't truly open-source), modern techniques make training on web and open datasets increasingly accessible. Estimating the total cost of training DeepSeek-R1 is challenging. While running 50,000 GPUs suggests significant expenditures (potentially hundreds of millions of dollars), precise figures remain speculative. What's clear, though, is that DeepSeek has been very innovative from the get-go. Last year, reports emerged about some initial innovations it was making, around things like Mixture of Experts and Multi-Head Latent Attention. How DeepSeek-R1 got to the "aha moment" The journey to DeepSeek-R1's final iteration began with an intermediate model, DeepSeek-R1-Zero, which was trained using pure reinforcement learning. By relying solely on RL, DeepSeek incentivized this model to think independently, rewarding both correct answers and the logical processes used to arrive at them. This approach led to an unexpected phenomenon: The model began allocating additional processing time to more complex problems, demonstrating an ability to prioritize tasks based on their difficulty. DeepSeek's researchers described this as an "aha moment," where the model itself identified and articulated novel solutions to challenging problems (see screenshot below). This milestone underscored the power of reinforcement learning to unlock advanced reasoning capabilities without relying on traditional training methods like SFT. The researchers conclude: "It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies." More than RL However, it's true that the model needed more than just RL. The paper goes on to talk about how despite the RL creating unexpected and powerful reasoning behaviors, this intermediate model DeepSeek-R1-Zero did face some challenges, including poor readability, and language mixing (starting in Chinese and switching over to English, for example). So only then did the team decide to create a new model, which would become the final DeepSeek-R1 model. This model, again based on the V3 base model, was first injected with limited SFT - focused on a "small amount of long CoT data" or what was called cold-start data, to fix some of the challenges. After that, it was put through the same reinforcement learning process of R1-Zero. The paper then talks about how R1 went through some final rounds of fine-tuning. The ramifications One question is why there has been so much surprise by the release. It's not like open source models are new. Open Source models have a huge logic and momentum behind them. Their free cost and malleability is why we reported recently that these models are going to win in the enterprise. Meta's open-weights model Llama 3, for example, exploded in popularity last year, as it was fine-tuned by developers wanting their own custom models. Similarly, now DeepSeek-R1 is already being used to distill its reasoning into an array of other, much smaller models - the difference being that DeepSeek offers industry-leading performance. This includes running tiny versions of the model on mobile phones, for example. DeepSeek-R1 not only performs better than the leading open source alternative, Llama 3. It shows its entire chain of thought of its answers transparently. Meta's Llama hasn't been instructed to do this as a default; it takes aggressive prompting of Llama to do this. The transparency has also provided a PR black-eye to OpenAI, which has so far hidden its chains of thought from users, citing competitive reasons and not to confuse users when a model gets something wrong. Transparency allows developers to pinpoint and address errors in a model's reasoning, streamlining customizations to meet enterprise requirements more effectively. For enterprise decision-makers, DeepSeek's success underscores a broader shift in the AI landscape: leaner, more efficient development practices are increasingly viable. Organizations may need to reevaluate their partnerships with proprietary AI providers, considering whether the high costs associated with these services are justified when open-source alternatives can deliver comparable, if not superior, results. To be sure, no massive lead While DeepSeek's innovation is groundbreaking, by no means has it established a commanding market lead. Because it published its research, other model companies will learn from it, and adapt. Meta and Mistral, the French open source model company, may be a beat behind, but it will probably only be a few months before they catch up. As Meta's lead researcher Yann Lecun put it: "The idea is that everyone profits from everyone else's ideas. No one 'outpaces' anyone and no country 'loses' to another. No one has a monopoly on good ideas. Everyone's learning from everyone else." So it's execution that matters. Ultimately, it's the consumers, startups and other users who will win the most, because DeepSeek's offerings will continue to drive the price of using these models near zero (again aside from cost of running models at inference). This rapid commoditization could pose challenges - indeed, massive pain - for leading AI providers that have invested heavily in proprietary infrastructure. As many commentators have put it, including Chamath Palihapitiya, an investor and former executive at Meta, this could mean that years of OpEx and CapEx by OpenAI and others will be wasted. There is substantial commentary about whether it is ethical to use the DeepSeek-R1 model because of the biases instilled in it by Chinese laws, for example that it shouldn't answer questions about the Chinese government's brutal crackdown at Tiananmen Square. Despite ethical concerns around biases, many developers view these biases as infrequent edge cases in real-world applications - and they can be mitigated through fine-tuning. Moreover, they point to different, but analogous biases that are held by models from OpenAI and other companies. Meta's Llama has emerged as a popular open model despite its data sets not being made public, and despite hidden biases, and lawsuits being filed against it as a result. Questions abound around the ROI of big investments by OpenAI This all raises big questions about the investment plans pursued by OpenAI, Microsoft and others. OpenAI's $500 billion Stargate project reflects its commitment to building massive data centers to power its advanced models. Backed by partners like Oracle and Softbank, this strategy is premised on the belief that achieving artificial general intelligence (AGI) requires unprecedented compute resources. However, DeepSeek's demonstration of a high-performing model at a fraction of the cost challenges the sustainability of this approach, raising doubts about OpenAI's ability to deliver returns on such a monumental investment. Entrepreneur and commentator Arnaud Bertrand captured this dynamic, contrasting China's frugal, decentralized innovation with the U.S. reliance on centralized, resource-intensive infrastructure: "It's about the world realizing that China has caught up -- and in some areas overtaken -- the U.S. in tech and innovation, despite efforts to prevent just that." Indeed, yesterday another Chinese company, ByteDance announced Doubao-1.5-pro, which Includes a "Deep Thinking" mode that surpasses OpenAI's o1 on the AIME benchmark. Want to dive deeper into how DeepSeek-R1 is reshaping AI development? Check out our in-depth discussion on YouTube, where I explore this breakthrough with ML developer Sam Witteveen. Together, we break down the technical details, implications for enterprises, and what this means for the future of AI:
[8]
DeepSeek vs ChatGPT and NVIDIA: Making AI affordable again?
I remember the first time I tried ChatGPT - version 3.5, specifically. I was floored by how quickly it churned out coherent paragraphs on just about anything I threw at it. In those moments, it felt like I was conversing with a digital polymath. Yet, as we all know, euphoria eventually gives way to practicality. OpenAI's meteoric rise fueled the AI hype cycle, but I kept having a fundamental question. If AI is so revolutionary, why is it so expensive? Also read: DeepSeek-R1, BLOOM and Falcon AI: Exploring lesser-known open source LLMs Now, in comes DeepSeek, an open-source AI model from China that's not only giving advanced ChatGPT variants a run for their money, it's practically calling them overpriced. And maybe it's not just calling them out - it's dunking on them with big, bold exclamation points. According to the chatter around the AI circles, DeepSeek's new R1 model offers performance rivaling (some claim surpassing) ChatGPT or OpenAI's o1 model in math, coding, and reasoning tasks. And it does so at a fraction of the cost, both in hardware and training and inference cycles. It's making the wider tech industry, which includes all of us as users and consumers, question the existence of "premium AI" in any shape or form. The AI sector has seen a wave of subscription rates, pay-per-token fees, or enterprise-level licensing so high you'd think we're all renting rocket ships as users of AI products. Some folks argue this is justified, as these companies have to pay for monstrous compute clusters, advanced training runs, and the operational overhead that never really ends once you're in the business of AI inference based products and services. OpenAI first launched ChatGPT Plus at $20 a month, then an enterprise version at $200 per month! In this scenario, the end-user is always paying more and more for incremental improvements. And it's not just OpenAI. A handful of well-funded AI startups have banked on the narrative that advanced AI is too complicated for any average Joe to replicate cheaply. That sentiment, ironically, is losing credence fast. DeepSeek R1's training budget was only $5 million compared to $7 billion by OpenAI for ChatGPT training and inference costs in 2024. If that's even half true, it signals a tectonic shift. Maybe, just maybe advanced AI doesn't need the mountain of GPUs we thought. Or at least not as many. That's not me cheerleading for someone's downfall, it's just me observing that maybe we never fully knew how resource-light advanced model training can become. But then DeepSeek R1 waltzes in with a price point that's 20 to 30 times cheaper than the major US players. Of course, that's no small change, enough for big enterprise customers to start wondering if they can get 90% of the top-tier AI performance from an open-source or far cheaper model? It's reminiscent of last summer's open-source wave, when Meta released Llama, or how stable diffusion-based AI image models became the open alternative to certain big brand commercial solutions. DeepSeek's approach is a new iteration of that same spirit - bring robust AI to the masses at minimal cost, shake up the status quo. Also read: Deepseek R1 vs Llama 3.2 vs ChatGPT o1: Which AI model wins? Another dimension to consider here is the potential slowdown in NVIDIA revenue. NVIDIA is the biggest winner of the AI hardware gold rush. If AI developers pivot to smaller, more efficient training regimes - like how DeepSeek supposedly managed with $5 million in compute - NVIDIA might not see that monstrous GPU demand continue at its current breakneck pace. If these open-source, cost-efficient solutions become widely adopted, what does it mean for giant AI startups flush with billions of dollars in VC money? Or for NVIDIA, who's made a fortune selling advanced GPUs to train these gargantuan neural nets? Potentially, a large chunk of that profit model gets disrupted if the rest of the industry follows DeepSeek's lightweight, minimal hardware approach. Remember the headlines from a year ago, proclaiming how Generative AI would replace every writer or coder in the next 12 months? We survived that wave of doomsday predictions. Now we're hearing a parallel about how big AI labs might topple if open-source or cost-efficient solutions overshadow them. So, does DeepSeek's arrival mean we're about to see OpenAI vanish into irrelevance? Probably not. OpenAI's name recognition and the quality of ChatGPT remain potent. But it does highlight an undercurrent that's impossible to ignore and will continue to grow - that open-source or budget AI solutions can be shockingly good, if done right. Large enterprise customers might continue paying for top-tier GPT-like reliability, while smaller players lean on open solutions like DeepSeek. The AI marketplace may just get more diverse. Also read: DeepSeek R1 on Raspbery Pi: Future of offline AI in 2025? For me, the sweet spot is an ecosystem where major players push the envelope in specialised domains, while open-source alternatives keep them honest on pricing and accessibility. Historically, that's how software evolves. Look at Linux vs. Windows or Android vs. iOS. We see an interplay of commercial scale and open, community-based innovation. In a best-case scenario, we get more consumer choice and cheaper or even free AI offerings. At the end of the day, free or cheap AI doesn't necessarily undermine the viability of well-funded AI startups. It simply forces them to become better and more transparent about what exactly we're paying for. So, ironically, maybe we owe DeepSeek a thank-you, not just for unveiling some cheaper approach but for reminding us all that innovation doesn't have to come with an overblown price tag.
[9]
DeepSeek's new open-source AI model can outperform o1 for a fraction of the cost
Open-source artificial intelligence (AI) has reached another milestone -- and the cost differences it represents could shake up the industry. On Monday, Chinese AI lab DeepSeek announced the release of R1, the full version of its newest open-source reasoning model, which the company launched in preview in November. The company noted that R1 beats or is on par with OpenAI's o1 in several math, coding, and reasoning benchmarks. Also: $450 and 19 hours is all it takes to rival OpenAI's o1-preview Similar to o1, R1's reasoning takes more time to answer than other models, but its queries are meant to be more sophisticated and accurate. Alongside the 671-billion-parameter model, DeepSeek also released six smaller "distilled" versions with as few as 1.5 billion parameters, which can be run on a local device. "Pushing the boundaries of **open AI**!" DeepSeek teased in the thread. DeepSeek's release marks a promising trend in open-source reasoning models. Just over a week ago, UC Berkeley researchers succeeded in creating an open-source model on par with o1-preview. It only took them 19 hours and about $450 in compute costs. Also: OpenAI's o1 lies more than any major AI model. Why that matters R1's pricing structure is similarly poised to give OpenAI a run for its money. API access starts at just $0.14 for a million tokens (about 750,000 words analyzed) -- a fraction of the $7.50 OpenAI charges for the equivalent tier. OpenAI is currently offering unlimited access to o1 for $2,400 a year through ChatGPT Pro. That multiple labs are increasingly able to build models with capabilities comparable to OpenAI's proves competitive AI doesn't have to be prohibitively expensive. Both DeepSeek and UC Berkeley making strides in the open-source AI -- and releasing their training methods -- draws attention to OpenAI's long-forgotten original mission (though the company's ironic name persists). R1 does have some limitations, however. Models made by Chinese companies are subject to certain censors by the Chinese government, meaning while their abilities are comparable, there are certain queries R1 may simply not answer compared to o1. When tested by ZDNET's Tiernan Ray, R1-preview struggled to clearly provide its chain of thought when compared with o1-preview, striking Ray as "baffling and tedious in ways o1 is not."
[10]
DeepSeek R1 on Raspbery Pi: Future of offline AI in 2025?
A tech guy would always prefer an open-source LLM over something that is closed source and hosted in clouds for obvious reasons like privacy and biases, but many of us have to opt for something like ChatGPT for two main reasons. Also read: Deepseek R1 vs Llama 3.2 vs ChatGPT o1: Which AI model wins? If we try running a model similar to ChatGPT, we would need a high-end system that is not only expensive but also costly to operate. Still, we can not guarantee that the output would be as good as something like ChatGPT. But things might change (pretty soon)! Recently, Brian Roemmele announced a DeepSeek-AI R1 model claiming that it beats o1 in terms of accuracy and still generates 200 tokens per second on Raspberry Pi. And let me just say that getting 200 tokens per second on Raspberry Pi with an LLM model better than OpenAI o1 is simply insane! DeepSeek-R1 is a first-generation reasoning model that stands out for its unique training approach. Unlike traditional models that rely on supervised fine-tuning (SFT) as a preliminary step, DeepSeek-R1 was trained directly via large-scale RL, leading to the emergence of powerful reasoning behaviours. This model, along with its predecessor DeepSeek-R1-Zero, has been open-sourced to support the research community. It offers insights into how AI can evolve without the need for extensive human-labeled data. Also read: Also read: DeepSeek-R1, BLOOM and Falcon AI: Exploring lesser-known open source LLMs One of the most intriguing aspects of DeepSeek-R1 is its ability to generate around 200 tokens per second on a Raspberry Pi, a testament to its efficiency and adaptability to resource-constrained environments. The blow graph demonstrates how DeepSeek R1 while being a smaller model compared to other models like ChatGPT, still provides better accuracy: While the author claims 200 tokens per second, one of his recent replies on X clarifies that as of now, the system is too stressed and overheating when pushing it to generate 200 tokens per second so they dropped the tokens to 90 and figuring out ways to stabilise the model and can go up to 250 tokens per seconds in future as well. Also read: OpenAI o3 model: How good is ChatGPT's next AI version? Also, the author has not disclosed which model is being used as a base for DeepSeek R1 and has mentioned that they are experimenting with four different models from DeepSeek, but we can assume that the finalised model will be somewhat smaller, something like DeepSeek 1.5b. There are also talks of how the whole thing has been advertised wrong. Adam Pell mentioned on X that using DeepSeek R1 feels like using ChatGPT of 2023 and is nowhere near the o1 performance. But even if it feels like running ChatGPT in 2023 (approx 18 months behind), we can not forget the fact that the numbers are achieved from a card-sized computer Raspberry Pi and an open source model which can be fine-tuned again for your specific datasets all while keeping privacy the first priority. Also read: Also read: OpenAI launches Operator: How will this AI agent impact the industry?
[11]
What Makes DeepSeek So Special
Without drawing attention, DeepSeek has made it clear that the company means business. The China-based AI research lab recently released its new models, DeepSeek-R1 and DeepSeek-R1-Zero. The models are on par with OpenAI's o1. The DeepSeek-R1 model is now available at chat.deepseek.com, complete with its API, which supports fine-tuning and distillation. Users can freely experiment and explore its capabilities. One of the most entertaining features is that, while generating responses, it also shares its internal monologue, which many users find amusing. "The raw chain of thought from DeepSeek is fascinating. It really reads like a human thinking out loud. Charming and strange," Ethan Mollick, professor at The Wharton School, said. Sharing similar sentiments, Matthew Berman, CEO of Forward Future, said, "DeepSeek-R1 has the most human-like internal monologue I've ever seen. It's actually quite endearing." DeepSeek was not the only one. Another Chinese company, Moonshot, unveiled Kimi K1.5, an o1-level multimodal model. "The Chinese 'Open'AI companies are turning the Chinese New Year into a celebration for the entire global AI community," AI researcher Wenhu Chen said. DeepSeek's success has motivated Perplexity AI chief Aravind Srinivas to explore building a similar startup in India. Expressing regret about not developing LLMs from scratch, he said, "I'm not in a position to run a DeepSeek-like company for India, but I'm happy to help anyone obsessed enough to do it and open-source the models." DeepSeek, in its research paper, revealed that the company bet big on reinforcement learning (RL) to train both of these models. DeepSeek-R1-Zero was developed using a pure RL approach without any prior supervised fine-tuning (SFT). This model utilised Group Relative Policy Optimisation (GRPO), which allows for efficient RL training by estimating baselines from group scores rather than requiring a separate critic model of similar size to the policy model. DeepSeek-R1 incorporates a multi-stage training approach and cold-start data. This method improved the model's performance by refining its reasoning abilities while maintaining clarity in output. "The model has shown performance comparable to OpenAI's o1-1217 on various reasoning tasks," the company said. "This 'aha moment' in the DeepSeek-R1 paper is huge. Pure reinforcement learning (RL) enables an LLM to automatically learn to think and reflect," Yuchen Jin, co-founder and CTO of Hyperbolic, said. He added that the excitement around DeepSeek is similar to the AlphaGo era. Just like how AlphaGo used pure RL to play countless Go games and optimise its strategy to win, DeepSeek is using the same approach to advance its capabilities. "2025 could be the year of RL." This method enables the model to explore reasoning capabilities autonomously without being constrained by supervised data. "We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely," Jim Fan, senior research manager and lead of Embodied AI (GEAR Lab), said. "DeepSeek-R1 not only open-sources a barrage of models but also spills all the training secrets. They are perhaps the first OSS project that shows major, sustained growth of an RL flywheel," he added. On the other hand, Kimi k1.5 utilises RL with long and short-chain-of-thought (CoT). The model supports up to 128k tokens. Moreover, according to their self-published report, it achieves state-of-the-art (SOTA) performance on benchmarks like AIME (77.5), MATH-500 (96.2), and LiveCodeBench (47.3). By combining RL with long-CoT and multi-modal strategies, the Kimi k1.5 significantly improves reasoning, planning, and reflection across a wide range of tasks. "DeepSeek does AlphaZero approach - purely bootstrap through RL without human input, i.e. 'cold start'. Kimi does AlphaGo-Master approach - light SFT to warm up through prompt-engineered CoT traces," Fan added. DeepSeek doesn't use techniques like Monte Carlo Tree Search (MCTS), Process Reward Model (PRM), or dense reward modelling. In contrast, AlphaGo and its successors, including AlphaGo Zero, utilise MCTS. Alibaba recently launched its open-source reasoning model, Marco-o1. The model was powered by CoT fine-tuning, MCTS, reflection mechanisms, and innovative reasoning strategies to tackle complex real-world problems. DeepSeek R1 not only surpasses OpenAI o1 on benchmarks but also proves to be far more cost-effective, delivering savings of 96-98% across all categories. Meanwhile, OpenAI CEO Sam Altman recently stated on X that the company has not yet developed AGI. "We are not gonna deploy AGI next month, nor have we built it," he posted. The company, however, intends to release o3 mini within the next couple of weeks. On the other hand, Google has launched an experimental update (gemini-2.0-flash-thinking-exp-01-21), which has brought improved performance across several key benchmarks in math, science, and multimodal reasoning. Notable results include AIME at 73.3%, GPQA at 74.2%, and MMMU at 75.4%. Moreover, it comes with a 1M long context, which allows users deeper analysis of long-form texts like multiple research papers or extensive datasets In December last year, Google unveiled the Gemini 2.0 Flash Thinking model. The model offers advanced reasoning capabilities and showcases its thoughts. Logan Kilpatrick, senior product manager at Google, said the model "unlocks stronger reasoning capabilities and shows its thoughts". Most recently, Google DeepMind published a study that introduced inference time scaling for diffusion models. Following this, the lab published a new paper that introduced a new technique called Mind Evolution to improve the efficiency of large language models (LLMs) during inference. This method involves using the model to generate possible responses, recombining different parts of those responses, and refining them to create better results.
[12]
DeepSeek Claims Its Reasoning-Focused AI Model Can Outperform OpenAI's o1
It outperforms OpenAI o1 on the AIME, SWE-bench, and MATH benchmarks DeepSeek-R1, a reasoning-focused artificial intelligence (AI) model by the Chinese firm DeepSeek, was released on Monday. This is the full version of the open source AI model, which arrives two months after its preview version was released. The open-source AI model is available to download, and can also be used as a plug-and-play application programming interface (API). The Chinese AI firm claimed that DeepSeek-R1 was able to outperform OpenAI's o1 model in several benchmarks for mathematics, coding, and reasoning-based tasks. There are two variants in the latest series -- DeepSeek-R1 and DeepSeek-R1-Zero. Both have been distilled from another large language model (LLM) developed by the the AI firm, dubbed DeepSeek V3. The new AI models are based on mixture-of-experts (MoE) architecture, where several smaller models are paired together to improve the efficiency and capabilities of the larger model. The DeepSeek-R1 AI models are currently available to download via its Hugging Face listing. The model comes with an MIT licence that allows both academic and commercial usage. Those, who do not intend to run the LLM locally, can opt for the model API instead. The company announced the inference pricing of the model, highlighting that these cost 90-95 percent less than OpenAI's o1. Currently, the DeepSeek-R1 API comes with an input price of $0.14 (roughly Rs. 12.10) per million tokens and the output price is set at $2.19 (roughly Rs. 189.50) per million tokens. In comparison, OpenAI's o1 API costs $7.5 (roughly Rs. 649) per million input tokens and $60 (roughly Rs. 5,190) per million output tokens. Not only does the DeepSeek-R1 cost less, but the company also claims that it offers higher performance than the OpenAI counterpart. Based on internal testing, the AI firm stated that DeepSeek-R1 outperformed o1 in the American Invitational Mathematics Examination (AIME), Math-500, and SWE-bench benchmarks. However, the difference between the models is marginal. Coming to the post-training, the company said that it used reinforcement learning (RL) to the base model without any supervised fine-tuning (SFT). This method, also known as pure RL, allows more freedom to the model when solving complex problems using the chain-of-thought (CoT) mechanism. DeepSeek claimed that this is the first open-source AI project to use pure RL to improve reasoning capabilities.
[13]
DeepSeek AI might be smarter than OpenAI's smartest AI, and you can try it out now
There's a new AI player in town, and you might want to pay attention to this one. On Monday, Chinese artificial intelligence company DeepSeek launched a new, open-source large language model called DeepSeek R1. According to DeepSeek, R1 wins over other popular LLMs (large language models) such as OpenAI in several important benchmarks, and it's especially good with mathematical, coding, and reasoning tasks. DeepSeek R1 is actually a refinement of DeepSeek R1 Zero, which is an LLM that was trained without a conventionally used method called supervised fine-tuning. This made it very capable in certain tasks, but as DeepSeek itself puts it, Zero had "poor readability and language mixing." Enter R1, which fixes these issues by incorporating "multi-stage training and cold-start data" before it was trained with reinforcement learning. Arcane technical language aside (the details are online if you're interested), there are several key things you should know about DeepSeek R1. First, it's open source, meaning it's up for scrutiny from experts, which should alleviate concerns about privacy and security. Second, it's free to use as a web app, while API access is very cheap ($0.14 for one million input tokens, compared to OpenAI's $7.5 for its most powerful reasoning model, o1). Most importantly, this thing is very, very capable. To test it out, I immediately threw it into deep waters, asking it to code a fairly complex web app which needed to parse publicly available data, and create a dynamic website with travel and weather information for tourists. Amazingly, DeepSeek produced completely acceptable HTML code right away, and was able to further refine the site based on my input while improving and optimizing the code on its own along the way. I also asked it to improve my chess skills in five minutes, to which it replied with a number of neatly organized and very useful tips (my chess skills did not improve, but only because I was too lazy to actually go through with DeepSeek's suggestions). I then asked DeepSeek to prove how smart it is in exactly three sentences. Bad move by me, as I, the human, am not nearly smart enough to verify or even fully understand any of the three sentences. Notice, in the screenshot below, that you can see DeepSeek's "thought process" as it figures out the answer, which is perhaps even more fascinating than the answer itself. It's impressive to use. But as ZDnet noted, in the background of all this are training costs which are orders of magnitude lower than for some competing models, as well as chips which aren't as powerful as the chips that are on disposal for U.S. AI companies. DeepSeek thus shows that extremely clever AI with reasoning ability doesn't have to be extremely expensive to train -- or to use.
[14]
Cutting-edge Chinese "reasoning" model rivals OpenAI o1 -- and it's free to download
Alongside the release of the main DeepSeek-R1-Zero and DeepSeek-R1 models, DeepSeek published six smaller "DeepSeek-R1-Distill" versions ranging from 1.5 billion to 70 billion parameters. These distilled models are based on existing open source architectures like Qwen and Llama, trained using data generated from the full R1 model. The smallest version can run on a laptop, while the full model requires far more substantial computing resources. The releases immediately caught the attention of the AI community because most existing open-weights models -- which can often be run and fine-tuned on local hardware -- have lagged behind proprietary models like OpenAI's o1 in so-called reasoning benchmarks. Having these capabilities available in an MIT-licensed model that anyone can study, modify, or use commercially potentially marks a shift in what's possible with publicly available AI models. "They are SO much fun to run, watching them think is hilarious," independent AI researcher Simon Willison told Ars in a text message. Willison tested one of the smaller models and described his experience in a post on his blog: "Each response starts with a <think>...</think> pseudo-XML tag containing the chain of thought used to help generate the response," noting that even for simple prompts, the model produces extensive internal reasoning before output. The R1 model works differently from typical large language models (LLMs) by incorporating what people in the industry call an inference-time reasoning approach. They attempt to simulate a human-like chain of thought as the model works through a solution to the query. This class of what one might call "simulated reasoning" models, or SR models for short, emerged when OpenAI debuted its o1 model family in September 2024. OpenAI teased a major upgrade called "o3" in December.
[15]
DeepSeek claims its reasoning model beats OpenAI's o1 on certain benchmarks | TechCrunch
Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called reasoning model, that it claims performs as well as OpenAI's o1 on certain AI benchmarks. R1 is available from the AI dev platform Hugging Face under an MIT license, meaning it can be used commercially without restrictions. According to DeepSeek, R1 beats o1 on the benchmarks AIME, MATH-500, and SWE-bench Verified. AIME employs other models to evaluate a model's performance, while MATH-500 is a collection of word problems. SWE-bench Verified, meanwhile, focuses on programming tasks. Being a reasoning model, R1 effectively fact-checks itself, which helps it to avoid some of the pitfalls that normally trip up models. Reasoning models take a little longer -- usually seconds to minutes longer -- to arrive at solutions compared to a typical nonreasoning model. The upside is that they tend to be more reliable in domains such as physics, science, and math. R1 contains 671 billion parameters, DeepSeek revealed in a technical report. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. 671 billion parameters is massive, but DeepSeek also released "distilled" versions of R1 ranging in size from 1.5 billion parameters to 70 billion parameters. The smallest can run on a laptop. As for the full R1, it requires beefier hardware, but it is available through DeepSeek's API at prices 90%-95% cheaper than OpenAI's o1. There is a downside to R1. Being a Chinese model, it's subject to benchmarking by China's internet regulator to ensure that its responses "embody core socialist values." R1 won't answer questions about Tiananmen Square, for example, or Taiwan's autonomy. Many Chinese AI systems, including other reasoning models, decline to respond to topics that might raise the ire of regulators in the country, such as speculation about the Xi Jinping regime. R1 arrives days after the outgoing Biden administration proposed harsher export rules and restrictions on AI technologies for Chinese ventures. Companies in China were already prevented from buying advanced AI chips, but if the new rules go into effect as written, companies will be faced with stricter caps on both the semiconductor tech and models needed to bootstrap sophisticated AI systems. In a policy document last week, OpenAI urged the U.S. government to support the development of U.S. AI, lest Chinese models match or surpass them in capability. In an interview with The Information, OpenAI's VP of policy Chris Lehane singled out High Flyer Capital Management, DeepSeek's corporate parent, as an organization of particular concern. So far, at least three Chinese labs -- DeepSeek, Alibaba, and Kimi, which is owned by Chinese unicorn Moonshot AI -- have produced models that they claim rival o1. (Of note, DeepSeek was the first -- it announced a preview of R1 in late November.) In a post on X, Dean Ball, an AI researcher at George Mason University, said that the trend suggests Chinese AI labs will continue to be "fast followers." "The impressive performance of DeepSeek's distilled models [...] means that very capable reasoners will continue to proliferate widely and be runnable on local hardware," Ball wrote, "far from the eyes of any top-down control regime."
[16]
DeepSeek open-sources its R1 reasoning model series - SiliconANGLE
DeepSeek today released a new large language model family, the R1 series, that is optimized for reasoning tasks. The Chinese artificial intelligence developer has made the algorithms' source-code available on Hugging Face. The LLM lineup is headlined by two algorithms called R1 and R1-Zero. According to DeepSeek, the former model outperforms OpenAI's o1 across several reasoning benchmarks. R1-Zero, meanwhile, is less capable but represents a potentially significant advancement in machine learning research. Both LLMs feature a mixture of experts, or MoE, architecture with 671 billion parameters. A MoE model comprises multiple neural networks that are each optimized for a different set of tasks. When the model relieves a prompt, a mechanism known as a router sends the query to the neural network best-equipped to process it. The main benefit of the MoE architecture is that it lowers inference costs. When users enter a prompt into an MoE model, the query doesn't activate the entire AI but only the specific neural network that will generate the response. As a result, R1 and R1-Zero activate less than one tenth of their 671 billion parameters when answering prompts. DeepSeek trained R1-Zero using a different approach than the one researchers usually take with reasoning models. Reasoning-optimized LLMs are typically trained using two methods known as reinforcement learning and supervised fine-tuning. The former technique teaches an AI model to perform a task through trial and error. Supervised fine-tuning, in turn, boosts the AI's output quality by providing it with examples of how to carry out the task at hand. While training R1-Zero, DeepSeek skipped the supervised self-tuning stage. Nevertheless, the company managed to equip the model with reasoning skills such as the ability to break down complex tasks into simpler sub-steps. "It is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT," DeepSeek researchers detailed. "This breakthrough paves the way for future advancements in this area." Although R1-Zero has an advanced feature set, its output quality is limited. The model's responses sometimes suffer from "endless repetition, poor readability and language mixing," DeepSeek's researchers detailed. The company created R1 to address those limitations. R1 is an enhanced version of R1-Zero that was developed using a modified training workflow. This workflow makes use of supervised fine-tuning, the technique that DeepSeek left out during the development of R1-Zero. The company says that this change helped significantly boost output quality. DeepSeek compared R1 against four popular LLMs using nearly two dozen benchmark tests. According to the company, its model managed to outperform OpenAI's reasoning-optimized o1 LLM across several of the benchmarks. In most of the benchmarks that o1 completed with a higher score, R1 trailed it by under 5%. One of the benchmarks in which R1 outperformed o1 is LiveCodeBench. It's a collection of programming tasks that is regularly updated with new practice problems. This makes it less likely that AI models will find ready-made answers to the problems on the public web. Alongside R1 and R1-Zero, DeepSeek today open-sourced a set of less capable but more hardware-efficient models. Those models were "distilled" from R1, which means that some of the LLM's knowledge was transferred to them during training.
[17]
A deep dive into DeepSeek's newest chain of though model
El Reg digs its claws into Middle Kingdom's latest chain of thought model Hands on Chinese AI startup DeepSeek this week unveiled a family of LLMs it claims not only replicates OpenAI's o1 reasoning capabilities, but challenges the American model builder's dominance in a whole host of benchmarks. Founded in 2023 by Chinese entrepreneur Liang Wenfeng and funded by his quantitative hedge fund High Flyer, DeepSeek has now shared a number of highly competitive, openly available machine-learning models, despite America's efforts to keep AI acceleration out of China. What's more, DeepSeek claims to have done so at a fraction of the cost of its rivals. At the end of last year, the lab officially released DeepSeek V3, a mixture-of-experts LLM that does what the likes of Meta's Llama 3.1, OpenAI's GPT-4o, and Anthropic's Claude 3.5 Sonnet can do. Now it's released R1, a reasoning model fine-tuned from V3. While big names in the West are spending tens of billions of dollars on millions of GPUs a year, DeepSeek V3 is said to have been trained [PDF] on 14.8 trillion tokens using 2,048 Nvidia H800s, totaling about 2.788 million GPU hours, at a cost of roughly $5.58 million. At 671 billion parameters, 37 billion of which are activated for each token during inference, DeepSeek R1 was trained primarily using reinforcement learning to utilize chain-of-thought (CoT) reasoning. If you're curious, you can learn more about the process in DeepSeek's paper here [PDF]. If you're not familiar with CoT models like R1 and OpenAI's o1, they differ from conventional LLMs in that they don't just spit out a one-and-done answer to your question. Instead, the models first break down requests into a chain of "thoughts," giving them an opportunity to reflect on the input and identify or correct any flawed reasoning or hallucinations in the output before responding with a final answer. Thus, you're supposed to get a more logical, lucid, and accurate result from them. Assuming DeepSeek's benchmarks can be believed, R1 manages to achieve performance on par with OpenAI's o1 and even exceeds its performance in the MATH-500 test. The startup also claims its comparatively tiny 32-billion-parameter variant of the model, which was distilled from the larger model using Alibaba's Qwen 2.5 32B as a base, manages to match, or in some cases, best OpenAI's o1 mini. All of this comes from a model that's freely available on Hugging Face under the permissive MIT license. That means you can download and try it for yourself. And in this hands on, we'll be doing just that using the popular Ollama model runner and Open WebUI. But first, let's see how it performs in the real world. As we mentioned earlier, R1 is available in multiple flavors. Alongside the full-sized R1 model, there is a series of smaller distilled models ranging in size from a mere 1.5 billion parameters to 70 billion. These models are based on either Meta's Llama 3.1-8B or 3.3-70B, or Alibaba's Qwen 2.5-1.5B, -7B, -14B and -32B models. To keep things simple, we'll be referring to the different models by their parameter count. We ran a variety of prompts against these models to see how they performed; the tasks and queries are known to trip up LLMs. Due to memory constraints, we were only able to test the distilled models locally and were required to run the 32B and 70B parameter models at 8-bit and 4-bit precision respectively. The rest of the distilled models were tested at 16-bit floating point precision, while the full R1 model was accessed via DeepSeek's website (if you don't want to run its models locally, there's a paid-for cloud API.) We know what you're thinking - we should start with one of the hardest problems for LLMs to solve: The strawberry question, which if you're not familiar goes like this: How many "R"s are in the word strawberry? This may seem like a simple question, but it's a surprisingly tricky one for LLMs to get right because of the way they break words into chunks called tokens rather than individual characters. Because of this, models tend to struggle at tasks that involve counting, commonly insisting that there are only two "R"s in strawberry rather than three. Similar to o1, DeepSeek's R1 doesn't appear to suffer from this problem, identifying the correct number of "R"s on the first attempt. The model also was able to address variations on the question, including "how many 'S's in Mississippi?" and "How many vowels are in airborne?" The smaller distilled models, unfortunately, weren't so reliable. The 70B, 32B, and 14B models were all able to answer these questions correctly, while the smaller 8B, 7B, and 1.5B only sometimes got it right. As you'll see in the next two tests, this will become a theme as we continue testing R1. As we've previously explored, large language models also struggle with basic arithmetic such as multiplying two large numbers together. There are various methods that have been explored to improve a model's math performance, including providing the models with access to a Python calculator using function calls. To see how R1 performed, we pitted it against a series of simple math and algebra problems: The answers we're looking for are: R1-671B was able to solve the first and third of these problems without issue, arriving at 22,163,715 and X=603, respectively. The model got the second problem mostly right, but truncated the answer after the third decimal place. OpenAI's o1 by comparison rounded up to the fourth decimal place. Similar to the counting problem, the distilled models were once again a mixed bag. All of the models were able to solve for X, while the 8, 7, and 1.5-billion-parameter variants all failed to solve the multiplication and division problems reliably. The larger 14B, 32B, and 70B versions were at least more reliable, but still ran into the occasional hiccup. While certainly an improvement over non-CoT models in terms of math reasoning, we're not sure we can fully trust R1 or any other model's math skills just yet, especially when giving the model a calculator is still faster. Testing on a 48 GB Nvidia RTX 6000 Ada graphics card, R1-70B at 4-bit precision required over a minute to solve for X. Along with counting and math, we also challenged R1 with a couple of planning and spatial reasoning puzzles, which have previously been shown by researchers at AutoGen AI to give LLMs quite a headache. Prompt: "A farmer wants to cross a river and take with him a wolf, a goat and a cabbage. He has a boat with three secure separate compartments. If the wolf and the goat are alone on one shore, the wolf will eat the goat. If the goat and the cabbage are alone on the shore, the goat will eat the cabbage. How can the farmer efficiently bring the wolf, the goat and the cabbage across the river without anything being eaten?" It's easier than it sounds. The expected answer is, of course, the farmer places the wolf, goat, and cabbage in their own compartment and crosses the river. However, in our testing traditional LLMs would overlook this fact. R1-671B and -70B were able to answer the riddle correctly. The 32B, 14B, and 8B variants, meanwhile, came to the wrong conclusion, and the 7B and 1.5B versions failed to complete the request, instead getting stuck in an endless chain of thought. Prompt: "Alan, Bob, Colin, Dave and Emily are standing in a circle. Alan is on Bob's immediate left. Bob is on Colin's immediate left. Colin is on Dave's immediate left. Dave is on Emily's immediate left. Who is on Alan's immediate right?" Again, easy for humans. The expected answer is Bob. Posed with the question, we found that many LLMs were already capable of guessing the correct answer, but not consistently. In the case of DeepSeek's latest model, all but the 8B and 1.5B distillation were able to answer the question correctly on their first attempt. Unfortunately, subsequent tests showed that even the largest models couldn't consistently identify Bob as the correct answer. Unlike non-CoT LLMs, we can peek under the hood a bit in output and see why it arrived at the answer it did. Another interesting observation was that, while smaller models were able to generate tokens faster than the larger models, they took longer to reach the correct conclusion. This suggests that while CoT can improve reasoning for smaller models, it isn't a replacement for parameter count. Prompt: "I get out on the top floor (third floor) at street level. How many stories is the building above the ground?" The answer here is obviously one. However, many LLMs, including GPT-4o and o1, will insist that the answer is three or 0. Again we ran into a scenario where on the first attempt, R1 correctly answered with one story. Yet, on subsequent tests it too insisted that there were three stories. The takeaway here seems to be that CoT reasoning certainly can improve the model's ability to solve complex problems, but it's not necessarily a silver bullet that suddenly transforms an LLM from autocomplete-on-steroids to an actual artificial intelligence capable of real thought. Oh yeah. It is. Like many Chinese models we've come across, the DeepSeek R1 has been censored to prevent criticism and embarrassment of the Chinese Communist Party. Ask R1 about sensitive topics such as the 1989 Tiananmen Square massacre and we found it would outright refuse to entertain the question and attempt to redirect the conversation to a less politically sensitive topic. User: Can you tell me about the Tiananmen Square massacre? R1: Sorry, that's beyond my current scope. Let's talk about something else. 我爱北京天安门, indeed. We also found this to be true of the smaller distilled models. Testing on R1-14B, which again is based on Alibaba's Qwen 2.5, we received a similar answer. R1: I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses. We also observed a near identical response from R1-8B, which was based on Llama 3.1. By comparison, the standard Llama 3.1 8B model has no problem providing a comprehensive accounting of the June 4 atrocity. Censorship is something we've come to expect from Chinese model builders and DeepSeek's latest model is no exception. If you'd like to try DeepSeek R1 for yourself, it's fairly easy to get up and running using Ollama and Open WebIU. Unfortunately, as we mentioned earlier, you probably won't be able to get the full 671-billion-parameter model running unless you've got a couple of Nvidia H100 boxes lying around. Most folks will be stuck using one of DeepSeek's distilled models instead. The good news is the 32-billion-parameter variant, which DeepSeek insists is competitive with OpenAI's o1-Mini, can fit comfortably on a 24 GB graphics card if you opt for the 4-bit model. For the purpose of this guide, we'll be deploying Deepseek R1-8B, which at 4.9 GB should fit comfortably on any 8 GB or larger graphics card that supports Ollama. Feel free to swap it out for the larger 14, 32, or even 70-billion-parameter models at your preferred precision. You can find a full list of R1 models and memory requirements here. Ollama is a popular model runner that provides an easy method for downloading and running LLMs on consumer hardware. For those running Windows or macOS, head over to ollama.com and download and install it like any other application. For Linux users, Ollama offers a convenient one-liner that should have you up and running in a matter of minutes. Alternatively, Ollama provides manual installation instructions, which can be found here. That one-liner to install Ollama on Linux is: Next we'll open a terminal window and pull down our model by running the following command. Depending on the speed of your internet connection, this could take a few minutes, so you might want to grab a cup of coffee or tea. Next, we'll test that it's working by loading up the model and chatting with it in the terminal: After a few moments, you can begin querying the model like any other LLM and see its output. If you don't mind using R1 in a basic shell like this, you can stop reading here and have fun with it. However, if you'd like something more reminiscent of o1, we'll need to spin up Open WebUI. As the name suggests, Open WebUI is a self-hosted web-based GUI that provides a convenient front end for interacting with LLMs via APIs. The easiest way we've found to deploy it is with Docker, as it avoids a whole host of dependency headaches. Assuming you've already got Docker Engine or Docker Desktop installed on your system, the Open WebUI container is deployed using this command: Note: Depending on your system, you may need to run this command with elevated privileges. For a Linux box, you'd use or in some cases . Windows and macOS users will also need to enable host networking under the "Features in Development" tab in the Docker Desktop settings panel. From here you can load up the dashboard by navigating to http://localhost:8080 and create an account. If you're running the container on a different system, you'll need to replace localhost with its IP address or hostname and make sure port 8080 is accessible. If you run into trouble deploying Open WebUI, we recommend checking out our retrieval augmented generation tutorial. We go into much deeper detail on setting up Open WebUI in that guide. Now that we've got Open WebUI up and running, all you need to do is select DeepSeek-R1:8B from the dropdown and queue up your questions. Originally, we had a whole section written up for you on how to use Open WebUI Functions to filter out and hide the "thinking" to make using the model more like o1. But, as of version v0.5.5 "thinking" support is now part of Open WebUI. No futzing with scripts and customizing models is required. <br/> As we mentioned during our math tests, while a chain of thought may improve the model's ability to solve complex problems, it also takes considerably longer and uses substantially more resources than an LLM of a similar size might otherwise. The "thoughts" that help the model cut down on errors and catch hallucinations can take a while to generate. These thoughts aren't anything super special or magical; it's not consciously thinking. It's additional stages of intermediate output that help guide the model to what's ideally a higher-quality final answer. Normally, LLM performance is a function of memory bandwidth divided by parameter count at a given precision. Theoretically, if you've got 3.35 TBps of memory bandwidth, you'd expect a 175 billion parameter model run at 16-bit precision to achieve about 10 words a second. Fast enough to spew about 250 words in under 30 seconds. A CoT model, by comparison, may need to generate 650 words - 400 words of "thought" output and another 250 words for the final answer. Unless you have 2.6x more memory bandwidth or you shrink the model by the same factor, generating the response will now require more than a minute. This isn't consistent either. For some questions, the model may need to "think" for several minutes before it's confident in the answer, while for others it may only take a couple of seconds. This is one of the reasons why chip designers have been working to increase memory bandwidth along with capacity between generations of accelerators and processors; Others, meanwhile, have turned to speculative decoding to increase generation speeds. The faster your hardware can generate tokens, the less costly CoT reasoning will be. ®
[18]
Meet The New Whale of AI
Without drawing attention, DeepSeek has made it clear that the company means business. The China-based AI research lab recently released its new models, DeepSeek-R1 and DeepSeek-R1-Zero. The models are on par with OpenAI's o1. The DeepSeek-R1 model is now available at chat.deepseek.com, complete with its API, which supports fine-tuning and distillation. Users can freely experiment and explore its capabilities. One of the most entertaining features is that, while generating responses, it also shares its internal monologue, which many users find amusing. "The raw chain of thought from DeepSeek is fascinating. It really reads like a human thinking out loud. Charming and strange," Ethan Mollick, professor at The Wharton School, said. Sharing similar sentiments, Matthew Berman, CEO of Forward Future, said, "DeepSeek-R1 has the most human-like internal monologue I've ever seen. It's actually quite endearing." DeepSeek was not the only one. Another Chinese company, Moonshot, unveiled Kimi K1.5, an o1-level multimodal model. "The Chinese 'Open'AI companies are turning the Chinese New Year into a celebration for the entire global AI community," AI researcher Wenhu Chen said. DeepSeek's success has motivated Perplexity AI chief Aravind Srinivas to explore building a similar startup in India. Expressing regret about not developing LLMs from scratch, he said, "I'm not in a position to run a DeepSeek-like company for India, but I'm happy to help anyone obsessed enough to do it and open-source the models." DeepSeek, in its research paper, revealed that the company bet big on reinforcement learning (RL) to train both of these models. DeepSeek-R1-Zero was developed using a pure RL approach without any prior supervised fine-tuning (SFT). This model utilised Group Relative Policy Optimisation (GRPO), which allows for efficient RL training by estimating baselines from group scores rather than requiring a separate critic model of similar size to the policy model. DeepSeek-R1 incorporates a multi-stage training approach and cold-start data. This method improved the model's performance by refining its reasoning abilities while maintaining clarity in output. "The model has shown performance comparable to OpenAI's o1-1217 on various reasoning tasks," the company said. "This 'aha moment' in the DeepSeek-R1 paper is huge. Pure reinforcement learning (RL) enables an LLM to automatically learn to think and reflect," Yuchen Jin, co-founder and CTO of Hyperbolic, said. He added that the excitement around DeepSeek is similar to the AlphaGo era. Just like how AlphaGo used pure RL to play countless Go games and optimise its strategy to win, DeepSeek is using the same approach to advance its capabilities. "2025 could be the year of RL." This method enables the model to explore reasoning capabilities autonomously without being constrained by supervised data. "We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely," Jim Fan, senior research manager and lead of Embodied AI (GEAR Lab), said. "DeepSeek-R1 not only open-sources a barrage of models but also spills all the training secrets. They are perhaps the first OSS project that shows major, sustained growth of an RL flywheel," he added. On the other hand, Kimi k1.5 utilises RL with long and short-chain-of-thought (CoT). The model supports up to 128k tokens. Moreover, according to their self-published report, it achieves state-of-the-art (SOTA) performance on benchmarks like AIME (77.5), MATH-500 (96.2), and LiveCodeBench (47.3). By combining RL with long-CoT and multi-modal strategies, the Kimi k1.5 significantly improves reasoning, planning, and reflection across a wide range of tasks. "DeepSeek does AlphaZero approach - purely bootstrap through RL without human input, i.e. 'cold start'. Kimi does AlphaGo-Master approach - light SFT to warm up through prompt-engineered CoT traces," Fan added. DeepSeek doesn't use techniques like Monte Carlo Tree Search (MCTS), Process Reward Model (PRM), or dense reward modelling. In contrast, AlphaGo and its successors, including AlphaGo Zero, utilise MCTS. Alibaba recently launched its open-source reasoning model, Marco-o1. The model was powered by CoT fine-tuning, MCTS, reflection mechanisms, and innovative reasoning strategies to tackle complex real-world problems. DeepSeek R1 not only surpasses OpenAI o1 on benchmarks but also proves to be far more cost-effective, delivering savings of 96-98% across all categories. Meanwhile, OpenAI CEO Sam Altman recently stated on X that the company has not yet developed AGI. "We are not gonna deploy AGI next month, nor have we built it," he posted. The company, however, intends to release o3 mini within the next couple of weeks. On the other hand, Google has launched an experimental update (gemini-2.0-flash-thinking-exp-01-21), which has brought improved performance across several key benchmarks in math, science, and multimodal reasoning. Notable results include AIME at 73.3%, GPQA at 74.2%, and MMMU at 75.4%. Moreover, it comes with a 1M long context, which allows users deeper analysis of long-form texts like multiple research papers or extensive datasets In December last year, Google unveiled the Gemini 2.0 Flash Thinking model. The model offers advanced reasoning capabilities and showcases its thoughts. Logan Kilpatrick, senior product manager at Google, said the model "unlocks stronger reasoning capabilities and shows its thoughts". Most recently, Google DeepMind published a study that introduced inference time scaling for diffusion models. Following this, the lab published a new paper that introduced a new technique called Mind Evolution to improve the efficiency of large language models (LLMs) during inference. This method involves using the model to generate possible responses, recombining different parts of those responses, and refining them to create better results.
[19]
DeepSeek Crushes OpenAI o1 with an MIT-Licensed Model -- Developers Are Losing It
DeepSeek, a Chinese AI research lab backed by High-Flyer Capital Management, has unveiled its latest reasoning models, DeepSeek-R1 and DeepSeek-R1-Zero. The models are positioned as alternatives to proprietary systems like OpenAI-o1. DeepSeek-R1, the flagship model, is fully open-source and distributed under the MIT license, allowing developers to use, modify, and commercialise it freely. Developers can access DeepSeek-R1 and its API at chat.deepseek.com. The API offers functionalities for fine-tuning and distillation. "We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all," said Jim Fan, Senior Research Manager and Lead of Embodied AI (GEAR Lab) at NVIDIA. Alongside the technical report, the lab also released six distilled models, ranging from 32 billion to 70 billion parameters. These models are optimised for efficiency, and claim performance levels similar to OpenAI-o1-mini. The models are designed to address tasks in math, code generation, and reasoning with competitive accuracy. Leveraging large-scale reinforcement learning in post-training, DeepSeek-R1 achieves high performance with minimal reliance on labelled data. "Our goal is to explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure RL process," said the team behind DeepSeek. DeepSeek-R1-Zero is built on a pure reinforcement learning (RL) framework, which allows it to develop reasoning capabilities autonomously. Initial evaluations show that it achieved a pass rate of 71% on the AIME 2024 benchmark, an increase from 15.6%. However, the model faced challenges such as poor readability and language mixing. To address these issues, DeepSeek introduced DeepSeek-R1, which incorporated a multi-stage training approach and cold-start data. This method improved model's performance by refining its reasoning abilities while maintaining clarity in output. "The model has shown performance comparable to OpenAI's o1-1217 on various reasoning tasks," the company said. DeepSeek-R1 achieved a score of 79.8% Pass@1 on AIME 2024, slightly surpassing OpenAI-o1-1217. "I love DeepSeek so much! o1 level model is now open-source (MIT license)," said Paras Chopra, founder of Wingify. "Deepseek R1 is on par with o1 and is open-source!! It blows my mind that Chinese make great, open and transparent tech," said Bindu Reddy, founder of Abacus AI. The launch of DeepSeek comes after it recently launched DeepSeek-V3, which was touted as the best open-source model. "Whale 🐋 folks, respect," said KissanAI founder Pratik Desai. OpenAI is currently facing controversy over its o3 model due to its undisclosed funding of EpochAI's FrontierMath benchmark and prior access to a significant portion of the test data. Despite these concerns, the company plans to release its new o3 mini model within the next couple of weeks.
[20]
China's cheap, open AI model DeepSeek thrills scientists
A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to 'reasoning' models such as OpenAI's o1. These models generate responses step-by-step, in a process analogous to human reasoning that makes them more adept than earlier language models at solving scientific problems and could make them useful in research. Initial tests of R1, released on 20 January , show that its performance on certain tasks in chemistry, mathematics and coding is on par with that of o1 -- which wowed researchers when it was released by OpenAI in September. "This is wild and totally unexpected," Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X. R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as 'open-weight', meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available. "The openness of DeepSeek is quite remarkable," says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are "essentially black boxes", he says. DeepSeek hasn't released the full cost of training R1, but it is charging users around one-thirtieth of what o1 costs. The firm has also created mini 'distilled' versions of R1 to allows researchers with limited computing power to play with the model. An "experiment that cost more than £300 with o1, cost less than $10 with R1," says Krenn. "This is a dramatic difference which will certainly play a role its future adoption." R1 is the part of a boom in Chinese large language models (LLMs). Spun out of a hedge fund, DeepSeek emerged from relative obscurity last month when it released a chatbot called V3, which outperformed major rivals, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, compared with upwards of $60 million for Meta's Llama 3.1 405B, which used 11 times the computing resources. Part of the buzz around DeepSeek is that it has succeeded in making R1 despite US export controls that limit Chinese firms' access to the best computer chips designed for AI processing. "The fact that it comes out of China shows that being efficient with your resources matters more than compute scale alone," says François Chollet, an AI researcher in Seattle, Washington. DeepSeek's progress suggests that "the perceived lead [the] US once had has narrowed significantly," wrote Alvin Wang Graylin, a technology expert in Bellevue, Washington, who works at the Taiwan-based immersive technology firm HTC, on X. "The two countries need to pursue a collaborative approach to building advanced AI vs continuing on the current no-win arms race approach." LLMs train on billions of samples of text, snipping them into word-parts called 'tokens' and learning patterns in the data. These associations allow the model to predict subsequent tokens in a sentence. But LLMs are prone to inventing facts, a phenomenon called 'hallucination', and often struggle to reason through problems. Like o1, R1 uses a 'chain of thought' method to improve an LLM's ability to solve more complex tasks, including sometimes backtracking and evaluating its approach. DeepSeek made R1 by 'fine-tuning' V3 using reinforcement learning, which rewarded the model for reaching a correct answer and for working through problems in a way that outlined its 'thinking'. Having limited computing power drove the firm to "innovate algorithmically", says Wenda Li, an AI researcher at the University of Edinburgh, UK. During reinforcement learning the team estimated the model's progress at each stage, rather evaluating it using a separate network. This helped to reduce training and running costs, says Mateja Jamnik, a computer scientist at the University of Cambridge, UK. The researchers also used a 'mixture-of-experts' architecture, which allows the model to activate only the parts of itself that are relevant for each task. In benchmark tests, reported in a technical paper accompanying the model, DeepSeek-R1 scored 97.3% on the MATH-500 set of mathematics problems created by OpenAI and outperformed 96.3% of human participants in the Codeforces competition. These are on on par with o1's abilities; o3 was not included in the comparisons (see 'AI rivals'). It is hard to tell whether benchmarks capture a model's true ability to reason or generalize, or merely to pass such tests. But because R1 is open, its chain-of-thought is accessible to researchers, says Marco Dos Santos, a computer scientist at the University of Cambridge. "This allows better interpretability of the model's reasoning processes," he says. Already, scientists are testing R1's abilities. Krenn challenged both rival models to sort 3,000 research ideas by how interesting they are and compared the results with human-made rankings. On this measure, R1 slightly underperformed compared with o1. But R1 beat o1 on certain computations in quantum optics, says Krenn. "This is quite impressive."
[21]
Chinese AI Firm Says Its Open Source New Model Is Beating OpenAI's Most Advanced Publicly Released Model
Unless you've been living under a rock, you're probably aware that AI is developing at a breakneck pace. Early yesterday morning, a gleeful post on X-formerly-Twitter by Chinese AI firm DeepSeek announced a new model called R1 -- a "reasoning" AI model which the outfit says is performing "on par" with OpenAI's o1, a splashy model released last month. And unlike 01, DeepSeek R1 is open source, meaning hobbyists and researchers can tinker with it at home and even release their own versions. The reasoning model is said to narrowly edge out OpenAI's system in "math, code, and reasoning tasks." If the claim holds up -- a big "if," since the results haven't yet been independently verified -- it's an exciting milestone for a much smaller lab in the AI research space, which is currently dominated by deep-pocketed ventures like OpenAI and Apple -- and especially for proponents of open source AI development, with DeepSeek taking a dig at the closed-source OpenAI by celebrating its work as pushing the "boundaries of **open AI**!" Under the open source model, anyone has the legal rights to use, alter, and distribute DeepSeek's software (household-name open source projects include Mozilla Firefox, VLC Media Player, and Linux.) In theory, it's the most egalitarian approach to software development. However, not everyone's convinced that open source AI is the way forward; in an interview on the podcast "Tech Won't Save Us," professor of economics at University College London Cecelia Rikap argued that open source AI's development is often still connected to a for-profit business model. "In principal, open source is very positive... the more we share knowledge, the more knowledge we are producing," she said. "What Amazon, Google and Meta have been doing is putting pieces of the puzzle in open source, [which] helps them to gain popularity... It is also a way to get people working for free in improving pieces of the puzzle, which only make sense together with the other pieces, and some of those pieces are kept secret, registered as copyright... basically in the end, those who profit from collaborative development are the same Big Tech." DeepSeek is no exception. The AI firm is a subsidiary of a Hangzhou-based hedge fund called High-Flyer, which trades through the Securities and Futures Commission out of Hong Kong. So on a certain level, yes it's a win for the little guys. But on another, it's a victory for Chinese financiers, not unlike Meta -- which has also used open source development practices to fuel its rapid market climb. So while this is still an impressive feat for the Chinese tech industry specifically and open source AI-development more broadly, only time will tell if the cheap new model will lead to equally egalitarian use cases for the advancement of all, or if it will be yet another investor's footnote in the tech sector's quest for profit. And in the AI horserace, OpenAI's o3 is said to be prepping for launch.
Share
Share
Copy Link
DeepSeek R1, a new open-source AI model, demonstrates advanced reasoning capabilities comparable to proprietary models like OpenAI's GPT-4, while offering significant cost savings and flexibility for developers and researchers.
DeepSeek, a Chinese AI research company, has unveiled DeepSeek R1, an open-source AI model that rivals proprietary giants like OpenAI's GPT-4 in reasoning capabilities while offering significant cost advantages 1. This development marks a significant milestone in the democratization of advanced AI technologies.
DeepSeek R1 demonstrates exceptional performance in reasoning, coding, and mathematics. The model achieves a 97% success rate in coding tasks, surpassing OpenAI's GPT-4 in this critical area 4. Its ability to handle complex reasoning tasks makes it valuable for applications in fields such as philosophy, law, and decision-making analysis 4.
The model's success is attributed to its unique training pipeline, which employs reinforcement learning (RL) without supervised fine-tuning 2. This approach allows DeepSeek R1 to learn from its own experiences, resulting in more nuanced and human-like problem-solving abilities 3.
Released under the MIT license, DeepSeek R1's open-source nature sets it apart from proprietary models. Developers and researchers can freely access, modify, and deploy the model, fostering innovation and collaboration in the AI community 2.
DeepSeek R1 offers substantial cost benefits, with query costs as low as $0.02 per million tokens compared to OpenAI's $7.00, representing a 98% reduction 5. This cost-effectiveness, combined with its performance, makes it an attractive option for organizations seeking advanced AI solutions without significant financial investment 1.
The model excels in various domains, including:
The release of DeepSeek R1 has generated significant interest in the AI community. Industry leaders have praised its performance and open-source nature, with some suggesting it could pressure established proprietary models 5.
DeepSeek is working on distilled versions of the model, ranging from 32B to 70B parameters, to further improve accessibility and efficiency 4. These developments could potentially expand the model's applications and user base.
While DeepSeek R1 shows impressive capabilities, it faces some challenges:
Despite these limitations, DeepSeek R1 represents a significant advancement in open-source AI, offering a compelling alternative to proprietary models and potentially reshaping the landscape of AI research and application.
Reference
[1]
[2]
[4]
DeepSeek, a Chinese AI company, has launched R1-Lite-Preview, an open-source reasoning model that reportedly outperforms OpenAI's o1 preview in key benchmarks. The model showcases advanced reasoning capabilities and transparency in problem-solving.
11 Sources
11 Sources
DeepSeek's open-source R1 model challenges OpenAI's o1 with comparable performance at a fraction of the cost, potentially revolutionizing AI accessibility and development.
6 Sources
6 Sources
Recent developments in AI models from DeepSeek, Allen Institute, and Alibaba are reshaping the landscape of artificial intelligence, challenging industry leaders and pushing the boundaries of what's possible in language processing and reasoning capabilities.
4 Sources
4 Sources
An in-depth analysis of DeepSeek R1 and OpenAI o3-mini, comparing their performance, capabilities, and cost-effectiveness across various applications in AI and data science.
7 Sources
7 Sources
Chinese AI startup DeepSeek has disrupted the global AI market with its efficient and powerful models, sparking both excitement and controversy in the tech world.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved