Curated by THEOUTPOST
On Mon, 7 Apr, 4:02 PM UTC
5 Sources
[1]
DeepSeek readies the next AI disruption with self-improving models
Barely a few months ago, Wall Street's big bet on generative AI had a moment of reckoning when DeepSeek arrived on the scene. Despite its heavily censored nature, the open source DeepSeek proved that a frontier reasoning AI model doesn't necessarily require billions of dollars and can be pulled off on modest resources. It quickly found commercial adoption by giants such as Huawei, Oppo, and Vivo, while the likes of Microsoft, Alibaba, and Tencent quickly gave it a spot on their platforms. Now, the buzzy Chinese company's next target is self-improving AI models that use a looping judge-reward approach to improve themselves. Recommended Videos In a pre-print paper (via Bloomberg), researchers at DeepSeek and China's Tsinghua University describe a new approach that could make AI models more intelligent and efficient in a self-improving fashion. The underlying tech is called self-principled critique tuning (SPCT), and the approach is technically known as generative reward modeling (GRM). In the simplest of terms, it is somewhat like creating a feedback loop in real-time. An AI model is fundamentally improved by scaling up the model's size during training. That takes a lot of human work and computing resources. DeepSeek is proposing a system where the underlying "judge" comes with its own set of critiques and principles for an AI model as it prepares an answer to user queries. This set of critiques and principles is then compared against the static rules set at the heart of an AI model and the desired outcome. If there is a high degree of match, a reward signal is generated, which effectively guides the AI to perform even better in the next cycle. The experts behind the paper are referring to the next generation of self-improving AI models as DeepSeek-GRM. Benchmarks listed in the paper suggest that these models perform better than Google's Gemini, Meta's Llama, and OpenAI's GPT-4o models. DeepSeek says these next-gen AI models will be released via the open-source channel. Self-improving AI? The topic of AI that can improve itself has drawn some ambitious and controversial remarks. Former Google CEO, Eric Schmidt, argued that we might need a kill switch for such systems. "When the system can self-improve, we need to seriously think about unplugging it," Schmidt was quoted as saying by Fortune. The concept of a recursively self-improving AI is not exactly a novel concept. The idea of an ultra-intelligent machine, which is subsequently capable of making even better machines, actually traces all the way back to mathematician I.J. Good back in 1965. In 2007, AI expert Eliezer Yudkowsky hypothesized about Seed AI, an AI "designed for self-understanding, self-modification, and recursive self-improvement." In 2024, Japan's Sakana AI detailed the concept of an "AI Scientist" about a system capable of passing the whole pipeline of a research paper from beginning to end. In a research paper published in March this year, Meta's experts revealed self-rewarding language models where the AI itself acts as a judge to provide rewards during training. Meta's internal tests on its Llama 2 AI model using the novel self-rewarding technique saw it outperform rivals such as Anthropic's Claude 2, Google's Gemini Pro, and OpenAI's GPT-4 models. Amazon-backed Anthropic detailed what they called reward-tampering, an unexpected process "where a model directly modifies its own reward mechanism." Google is not too far behind on the idea. In a study published in the Nature journal earlier this month, experts at Google DeepMind showcased an AI algorithm called Dreamer that can self-improve, using the Minecraft game as an exercise example. Experts at IBM are working on their own approach called deductive closure training, where an AI model uses its own responses and evaluates them against the training data to improve itself. The whole premise, however, isn't all sunshine and rainbows. Research suggests that when AI models try to train themselves on self-generated synthetic data, it leads to defects colloquially known as "model collapse." It would be interesting to see just how DeepSeek executes the idea, and whether it can do it in a more frugal fashion than its rivals from the West.
[2]
DeepSeek is developing self-improving AI models. Here's how it works
DeepSeek and China's Tsinghua University say they have found a way that could make AI models more intelligent and efficient. Chinese AI start-up DeepSeek has introduced a new way to improve the reasoning capabilities of large language models (LLMs) to deliver better and faster results to general queries than its competitors. DeepSeek sparked a frenzy in January when it came onto the scene with R1, an artificial intelligence (AI) model and chatbot that the company claimed was cheaper and performed just as well as OpenAI's rival ChatGPT model. Collaborating with researchers from China's Tsinghua University, DeepSeek said in its latest paper released on Friday that it had developed a technique for self-improving AI models. The underlying technology is called self-principled critique tuning (SPCT), which trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. It gets better results by running several evaluations simultaneously rather than using larger models. This approach is known as generative reward modeling (GRM), a machine learning system that checks and rates what AI models produce, making sure they match what humans ask with SPCT. Usually, improving AI requires making models bigger during training, which takes a lot of human effort and computing power. Instead, DeepSeek has created a system with a built-in "judge" that evaluates the AI's answers in real-time. When you ask a question, this judge compares the AI's planned response against both the AI's core rules and what a good answer should look like. If there's a close match, the AI gets positive feedback, which helps it improve. DeepSeek calls this self-improving system "DeepSeek-GRM". The researchers said this would help models perform better than competitors like Google's Gemini, Meta's Llama, and OpenAI's GPT-4o. DeepSeek plans to make these advanced AI models available as open-source software, but no timeline has been given. The paper's release comes as rumours swirl that DeepSeek is set to unveil its latest R2 chatbot. But the company has not commented publicly on any such new release.
[3]
DeepSeek to Release Open-Source Model With Enhanced Reward Modeling Techniques
DeepSeek AI, in collaboration with Tsinghua University, unveiled a new research study to improve reward modelling in large language models with more inference time compute. The research led to a model named DeepSeek-GRM, which the company claims will be released as open source. The authors propose a novel method called Self-Principled Critique Tuning (SPCT) to develop scalable reward generation behaviours in generative reward models (GRMs). Simply put, this method teaches AI models to develop their own guiding principles and critiques as they process information and reason. This enhances the effectiveness of self-evaluation across various types of tasks. The DeepSeek-GRM is a 27-billion-parameter AI model post-trained on SPCT, based on Google's open-source Gemma-2-27B model. To further increase efficiency, the research proposes running multiple samples, or responses simultaneously, utilising more computing power. The DeepSeek-GRM-27B consistently scored strong results across diverse reward modeling benchmarks. The research paper discusses the benchmark scores and the techniques used in the methodology in depth. A few weeks ago, DeepSeek released an update to its DeepSeek-V3 model. The updated model 'DeepSeek V3-0324' currently ranks highest in benchmarks among all non-reasoning models. Artificial Analysis, a platform that benchmarks AI models, stated, "This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source." The model scored the highest points among all non-reasoning models on the platform's 'Intelligence Index'. Recently, Reuters reported that DeepSeek plans to release R2 "as early as possible". The company initially intended to launch it in early May but is now contemplating an earlier timeline. The model is expected to produce "better coding" and can reason in languages beyond English. The DeepSeek-R2 will be the successor to the DeepSeek-R1 reasoning model, which created quite a storm in both the AI ecosystem and the markets.
[4]
DeepSeek to Open Source its Inference Engine | AIM Media House
The announcement emphasises DeepSeek AI's dedication to open-sourcing key components and libraries of its models. Chinese AI lab DeepSeek on Monday announced its intention to open-source its inference engine. To achieve this, the company is "collaborating closely" with existing open-source projects and frameworks. Previously, when the company planned to open-source its inference engine, it identified challenges such as significant codebase divergence from the original framework, extensive infrastructure dependencies, and a limited capacity to maintain a large-scale public project. The latest announcement further emphasises DeepSeek AI's dedication to open-sourcing key components and libraries of its models. Recently, during Open Source Week, the company released five high-performance AI infrastructure tools as open-source libraries. These enhance the scalability, deployment, and efficiency of training large language models. "It's an honour to contribute to this thriving [open source] ecosystem and to see our models and code embraced by the community. Together, let's push the boundaries of AGI and ensure its benefits serve all of humanity," said DeepSeek in the announcement. Recently, the company, in collaboration with Tsinghua University, unveiled a new research study aimed at improving reward modelling in large language models by utilising more inference time compute. This research resulted in a model named DeepSeek-GRM, which the company asserts will be released as open source. A few weeks ago, DeepSeek released an update for its DeepSeek-V3 model. The updated model, 'DeepSeek V3-0324', now ranks highest in benchmarks among all non-reasoning models. Artificial Analysis, a platform that benchmarks AI models, stated, "This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source." The model scored the highest points among all non-reasoning models on the platform's 'Intelligence Index'. Recently, Reuters reported that DeepSeek plans to release R2 "as early as possible". The company initially intended to launch it in early May but is now considering an earlier timeline. The model is expected to produce "better coding" and can reason in languages beyond English.
[5]
DeepSeek and Tsinghua Developing Self-Improving AI Models
DeepSeek's AI revamp strategy uses fewer computing resources DeepSeek is working with Tsinghua University on reducing the training its AI models need in an effort to lower operational costs. The Chinese startup, which roiled markets with its low-cost reasoning model that emerged in January, collaborated with researchers from the Beijing institution on a paper detailing a novel approach to reinforcement learning to make models more efficient. The new method aims to help artificial intelligence models better adhere to human preferences by offering rewards for more accurate and understandable responses, the researchers wrote. Reinforcement learning has proven effective in speeding up AI tasks in narrow applications and spheres. However, expanding it to more general applications has proven challenging -- and that's the problem that DeepSeek's team is trying to solve with something it calls self-principled critique tuning. The strategy outperformed existing methods and models on various benchmarks and the result showed better performance with fewer computing resources, according to the paper. DeepSeek is calling these new models DeepSeek-GRM -- short for "generalist reward modeling" -- and will release them on an open source basis, the company said. Other AI developers, including Chinese tech giant Alibaba Group Holding. and San Francisco-based OpenAI, are also pushing into a new frontier of improving reasoning and self-refining capabilities while an AI model is performing tasks in real time. Menlo Park, California-based Meta Platforms Inc. released its latest family of AI models, Llama 4, over the weekend and marked them as its first to use the Mixture of Experts (MoE) architecture. DeepSeek's models rely significantly on MoE to make more efficient use of resources, and Meta benchmarked its new release against the Hangzhou-based startup. DeepSeek hasn't specified when it might release its next flagship model. © 2025 Bloomberg LP
Share
Share
Copy Link
Chinese AI startup DeepSeek, in collaboration with Tsinghua University, introduces a novel approach to create self-improving AI models, potentially revolutionizing the field with more efficient and intelligent systems.
Chinese AI startup DeepSeek, in collaboration with Tsinghua University, has unveiled a groundbreaking approach to create self-improving AI models. This development could potentially revolutionize the field of artificial intelligence by making models more efficient and intelligent 1.
The core of DeepSeek's innovation lies in a technique called Self-Principled Critique Tuning (SPCT). This method trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. The approach, known as Generative Reward Modeling (GRM), creates a feedback loop that allows the AI to improve its performance in real-time 2.
The resulting model, named DeepSeek-GRM, is a 27-billion-parameter AI system based on Google's open-source Gemma-2-27B model. According to the researchers, DeepSeek-GRM outperforms competitors like Google's Gemini, Meta's Llama, and OpenAI's GPT-4 on various benchmarks 3.
In line with their commitment to open-source development, DeepSeek plans to release these advanced AI models as open-source software. The company has also announced its intention to open-source its inference engine, further contributing to the AI community 4.
DeepSeek's innovations have already made waves in the AI industry. The company's previous model, DeepSeek-R1, created a stir in the market with its low-cost, high-performance capabilities. Now, with DeepSeek-GRM, the company is poised to push the boundaries of AI technology even further 5.
While the concept of self-improving AI holds immense potential, it also raises important questions about control and safety. Former Google CEO Eric Schmidt has suggested the need for a "kill switch" for such systems, highlighting the complex ethical considerations surrounding this technology 1.
As DeepSeek continues to develop its self-improving AI models, the industry watches closely. The company's ability to create high-performance models with relatively modest resources could potentially disrupt the AI landscape, challenging the dominance of well-funded Western tech giants.
Reference
[1]
[2]
[3]
Analytics India Magazine
|DeepSeek to Release Open-Source Model With Enhanced Reward Modeling Techniques[4]
[5]
Chinese AI startup DeepSeek announces plans to release key code repositories and data to the public, marking a significant move towards transparency and open-source AI development.
8 Sources
8 Sources
DeepSeek's open-source R1 model challenges OpenAI's o1 with comparable performance at a fraction of the cost, potentially revolutionizing AI accessibility and development.
6 Sources
6 Sources
Chinese AI startup DeepSeek releases DeepSeek V3, an open-weight AI model with 671 billion parameters, outperforming leading open-source models and rivaling proprietary systems in various benchmarks.
7 Sources
7 Sources
Chinese AI startup DeepSeek releases a major upgrade to its V3 language model, showcasing improved performance and efficiency. The open-source model challenges industry leaders with its ability to run on consumer hardware.
16 Sources
16 Sources
DeepSeek, a Chinese AI company, has launched R1-Lite-Preview, an open-source reasoning model that reportedly outperforms OpenAI's o1 preview in key benchmarks. The model showcases advanced reasoning capabilities and transparency in problem-solving.
11 Sources
11 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved