7 Sources
7 Sources
[1]
Secrets of DeepSeek AI model revealed in landmark paper
The success of DeepSeek's powerful artificial intelligence (AI) model R1 -- that made the US stock market plummet when it was released in January -- did not hinge on being trained on the output of its rivals, researchers at the Chinese firm have said. The statement came in documents released alongside a peer-reviewed version of the R1 model, published today in Nature. R1 is designed to excel at 'reasoning' tasks such as mathematics and coding, and is a cheaper rival to tools developed by US technology firms. As an 'open weight' model, it is available for anyone to download and is the most popular such model on the AI community platform Hugging Face to date, having been downloaded 10.9 million times. The paper updates a preprint released in January, which describes how DeepSeek augmented a standard large language model (LLM) to tackle reasoning tasks. Its supplementary material reveals for the first time how much R1 cost to train: the equivalent of just US$294,000. This comes on top of the $6 million or so that the company, based in Hangzhou, spent to make the base LLM that R1 is built on, but the total amount is still substantially less than the tens of millions of dollars that rival models are thought to have cost. DeepSeek says R1 was trained mainly on Nvidia's H800 chips, which in 2023 became forbidden from being sold to China under US export controls. R1 is thought to be the first major LLM to undergo the peer-review process. "This is a very welcome precedent," says Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the Nature paper. "If we don't have this norm of sharing a large part of this process publicly, it becomes very hard to evaluate whether these systems pose risks or not." In response to peer-review comments, the DeepSeek team reduced anthropomorphizing in its descriptions and added clarifications of technical details, including the kinds of data the model was trained on, and its safety. "Going through a rigorous peer-review process certainly helps verify the validity and usefulness of the model," says Huan Sun, an AI researcher at Ohio State University in Columbus. "Other firms should do the same." DeepSeek's major innovation was to use an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1. The process rewarded the model for reaching correct answers, rather than teaching it to follow human-selected reasoning examples. The company says that this is how its model learnt its own reasoning-like strategies, such as how to verify its workings without following human-prescribed tactics. To boost efficiency, the model also scored its own attempts using estimates, rather than employing a separate algorithm to do so, a technique known as group relative policy optimization. The model has been "quite influential" among AI researchers, says Sun. "Almost all work in 2025 so far that conducts reinforcement learning in LLMs might have been inspired by R1 one way or another." Media reports in January suggested that researchers at OpenAI, the company, based in San Francisco, California, that created ChatGPT and the 'o' series of reasoning models, thought DeepSeek had used outputs from OpenAI models to train R1, a method that could have accelerated a model's abilities while using fewer resources. DeepSeek has not published its training data as part of the paper. But, in exchanges with referees, the firm's researchers stated that R1 did not learn by copying reasoning examples that were generated by OpenAI models. However, they acknowledged that, like most other LLMs, R1's base model was trained on the web, so it will have ingested any AI-generated content already on the Internet. This rebuttal is "as convincing as what we could see in any publication", says Sun. Tunstall adds that although he can't be 100% sure R1 wasn't trained on OpenAI examples, replication attempts by other labs suggest that DeepSeek's recipe for reasoning is probably good enough to not need to do this. "I think the evidence now is fairly clear that you can get very high performance just using pure reinforcement learning," he says. For researchers, R1 is still very competitive, Sun says. In a challenge to complete scientific tasks such as analyzing and visualizing data, known as ScienceAgentBench, Sun and colleagues found that although R1 was not first for accuracy, it was one of the best models in terms of balancing ability with cost. Other researchers are now trying to apply the methods used to create R1 to improve the reasoning-like abilities of existing LLMs, as well as extending them to domains beyond mathematics and coding, says Tunstall. In that way, he adds, R1 has "kick-started a revolution".
[2]
Secrets of Chinese AI Model DeepSeek Revealed in Landmark Paper
The first peer-reviewed study of the DeepSeek AI model shows how a Chinese start-up firm made the market-shaking LLM for $300,000 The success of DeepSeek's powerful artificial intelligence (AI) model R1 -- that made the US stock market plummet when it was released in January -- did not hinge on being trained on the output of its rivals, researchers at the Chinese firm have said. The statement came in documents released alongside a peer-reviewed version of the R1 model, published today in Nature. R1 is designed to excel at 'reasoning' tasks such as mathematics and coding, and is a cheaper rival to tools developed by US technology firms. As an 'open weight' model, it is available for anyone to download and is the most popular such model on the AI community platform Hugging Face to date, having been downloaded 10.9 million times. The paper updates a preprint released in January, which describes how DeepSeek augmented a standard large language model (LLM) to tackle reasoning tasks. Its supplementary material reveals for the first time how much R1 cost to train: the equivalent of just US$294,000. This comes on top of the $6 million or so that the company, based in Hangzhou, spent to make the base LLM that R1 is built on, but the total amount is still substantially less than the tens of millions of dollars that rival models are thought to have cost. DeepSeek says R1 was trained mainly on Nvidia's H800 chips, which in 2023 became forbidden from being sold to China under US export controls. If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. R1 is thought to be the first major LLM to undergo the peer-review process. "This is a very welcome precedent," says Lewis Tunstall, a machine-learning engineer at Hugging Face who reviewed the Nature paper. "If we don't have this norm of sharing a large part of this process publicly, it becomes very hard to evaluate whether these systems pose risks or not." In response to peer-review comments, the DeepSeek team reduced anthropomorphizing in its descriptions and added clarifications of technical details, including the kinds of data the model was trained on, and its safety. "Going through a rigorous peer-review process certainly helps verify the validity and usefulness of the model," says Huan Sun, an AI researcher at Ohio State University in Columbus. "Other firms should do the same." DeepSeek's major innovation was to use an automated kind of the trial-and-error approach known as pure reinforcement learning to create R1. The process rewarded the model for reaching correct answers, rather than teaching it to follow human-selected reasoning examples. The company says that this is how its model learnt its own reasoning-like strategies, such as how to verify its workings without following human-prescribed tactics. To boost efficiency, the model also scored its own attempts using estimates, rather than employing a separate algorithm to do so, a technique known as group relative policy optimization. The model has been "quite influential" among AI researchers, says Sun. "Almost all work in 2025 so far that conducts reinforcement learning in LLMs might have been inspired by R1 one way or another." Media reports in January suggested that researchers at OpenAI, the company, based in San Francisco, California, that created ChatGPT and the 'o' series of reasoning models, thought DeepSeek had used outputs from OpenAI models to train R1, a method that could have accelerated a model's abilities while using fewer resources. DeepSeek has not published its training data as part of the paper. But, in exchanges with referees, the firm's researchers stated that R1 did not learn by copying reasoning examples that were generated by OpenAI models. However, they acknowledged that, like most other LLMs, R1's base model was trained on the web, so it will have ingested any AI-generated content already on the Internet. This rebuttal is "as convincing as what we could see in any publication", says Sun. Tunstall adds that although he can't be 100% sure R1 wasn't trained on OpenAI examples, replication attempts by other labs suggest that DeepSeek's recipe for reasoning is probably good enough to not need to do this. "I think the evidence now is fairly clear that you can get very high performance just using pure reinforcement learning," he says. For researchers, R1 is still very competitive, Sun says. In a challenge to complete scientific tasks such as analyzing and visualizing data, known as ScienceAgentBench, Sun and colleagues found that although R1 was not first for accuracy, it was one of the best models in terms of balancing ability with cost. Other researchers are now trying to apply the methods used to create R1 to improve the reasoning-like abilities of existing LLMs, as well as extending them to domains beyond mathematics and coding, says Tunstall. In that way, he adds, R1 has "kick-started a revolution."
[3]
DeepSeek didn't really train its flagship model for $294,000
Training costs detailed in R1 training report don't include 2.79 million GPU hours that laid its foundation Chinese AI darling DeepSeek's now infamous R1 research report was published in the Journal Nature this week, alongside new information on the compute resources required to train the model. Unfortunately, some people got the wrong idea about just how expensive it was to create. The disclosures led some to believe the Chinese AI darling had actually managed to train the model at cost of just $294,000 USD, a figure much lower than previously reported. In reality, the true cost to train the model was roughly 20x that. At least. The confusion stemmed from the supplementary information released alongside the original January paper, in which the AI model dev revealed it had used just 64 eight-way H800 boxes totaling 512 GPUs running at full tilt for 198 hours to train the preliminary R1-Zero release, and another 80 hours or so to complete it. Along with about 5,000 GPU hours to generate the supervised fine-tuning datasets used in the training process, the entire endeavor came out to a hair under $300,000 -- a pretty damning claim considering the tens of billions dollars American model devs have burned this year alone. But, that's not actually what happened. Never mind the fact that $300,000 won't buy you anywhere close to 512 H800s (those estimates are based on GPU lease rates not actual hardware costs), the researchers aren't talking about end-to-end model training. Instead, it focuses on the application of reinforcement learning used to imbue its existing V3 base model with "reasoning" or "thinking" capabilities. In other words, they'd already already done about 95 percent of the work by the time they'd reached the RL phase detailed in this paper. There are several ways to approach reinforcement learning, but in a nutshell, it is a post-training process that typically involves reinforcing stepwise reasoning by rewarding models for correct answers, encouraging more accurate responses in the process. The paper very clearly centers on the application of Group Relative Policy Optimization (GRPO), the specific reinforcement learning technique used in the model's training. Instead, headlines touting the $294,000 training cost appear to have confused reinforcement learning, which takes place post-training, with the more costly pre-training process used by DeepSeek V3. How do we know? Because DeepSeek's research team disclosed how much compute it used to train the base model. According to that paper, DeepSeek V3 was trained on 2,048 H800 GPUs for approximately two months. In total, the model required 2.79 million GPU hours at an estimated cost of $5.58 million. Since you can't have R1 without first building V3, the actual cost of the model was closer to $5.87 million. Whether or not these figures have been intentionally understated to cast western model devs as frivolous hype fiends is a subject of intense debate. It's also worth pointing out that the cost figures are based on the assumption that those H800 GPUs could be rented for $2/hr. By our estimate, the purchase cost of the 256 GPU servers used to train the models is somewhere north of $51 million. And that doesn't take into account research and development, data acquisition, data cleaning, or any false starts or wrong turns on the way to making a successful model. Overall, the idea that DeepSeek was substantially cheaper or more efficient to train than Western models appears to be overblown. DeepSeek V3 and R1 are roughly comparable to Meta's Llama 4 in terms of compute. Llama 4 required between 2.38M (Maverick) and 5M (Scout) hours to train, but was trained on between 22 and 40 trillion tokens. DeepSeek V3 is larger than Llama 4 Maverick, but used significantly fewer training tokens at 14.8 trillion. In other words, Meta trained a slightly smaller model in slightly fewer GPU hours using significantly more training data.®
[4]
In rare disclosure, DeepSeek claims R1 model training cost just $294K
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Bottom line: China's DeepSeek has released detailed cost figures for training its R1 artificial intelligence model, providing rare insight into its development and drawing renewed scrutiny of the company's methods and resources. The Hangzhou-based startup said the model was trained for $294,000 using 512 Nvidia H800 chips, a cost far below estimates for US competitors and one that may intensify questions about how Beijing-backed firms are advancing in the global AI race. The disclosure appeared in a peer-reviewed Nature paper this week, co-authored by founder Liang Wenfeng. The publication marks a rare move for DeepSeek, which has revealed little since its surprise debut on the international stage earlier this year. In January, the company's launch of lower-cost AI systems rattled markets, sending shares of major technology firms down as investors worried the competitive landscape could shift. The reported $294,000 training cost stands in sharp contrast to estimates for US companies. OpenAI Chief Executive Sam Altman said in 2023 that training its foundation models cost "much more" than $100 million, though no detailed figures were provided. DeepSeek researchers said the R1 model was trained over 80 hours on a 512-chip cluster of Nvidia H800s, hardware the US chipmaker designed specifically for China's restricted market. A supplementary filing also acknowledged for the first time that DeepSeek owns Nvidia A100 units, which were used in early experiments with smaller models before the team shifted to H800 hardware. Although the figures outlined in Nature suggest unusually low expenditures for training a frontier model, industry experts have raised doubts. Research firm SemiAnalysis reported that DeepSeek operated at a far larger scale than initially indicated, with access to roughly 50,000 Nvidia Hopper GPUs, including 10,000 H800s and 10,000 H100s. The firm argued that the widely cited $5.5 million pre-training figure represented only a narrow portion of the company's true costs. According to SemiAnalysis, DeepSeek invested about $1.6 billion in servers, incurred roughly $944 million in operating costs, and spent more than $500 million specifically on GPUs. The findings challenge the perception that DeepSeek built frontier AI systems at only a fraction of US costs. Beyond financials, the company also addressed longstanding questions about the origins of its models. Critics, including US officials and AI executives, have alleged that DeepSeek's progress relied heavily on distillation - a method in which a new model is trained on the outputs of another, allowing it to replicate knowledge at lower cost. DeepSeek has consistently defended the practice, saying it enables more efficient systems that can be deployed affordably at scale. The company previously acknowledged incorporating Meta's open-source Llama in some distilled models. In its Nature paper, DeepSeek researchers further admitted that training data for its V3 model included "a significant number" of responses generated by OpenAI systems. They described this as incidental, the result of crawled web data, rather than a deliberate attempt to replicate outside models. Taken together, the cost disclosures, disputed claims, and methodological debates highlight the difficulty of verifying DeepSeek's true capabilities. Since its debut in January, the company has rolled out incremental product updates while keeping a relatively low public profile. Still, evidence of cost efficiency and alternative development methods could increase pressure on US firms grappling with soaring training expenses.
[5]
We Finally Know How Much It Cost to Train China's Astonishing DeepSeek Model
Remember when DeepSeek briefly shook up the entire artificial intelligence industry by launching its large language model, R1, that was trained for a fraction of the money that OpenAI and other big players were pouring into their models? Thanks to a new paper published by the DeepSeek AI team in the journal Nature, we finally know what it took to train DeepSeek 1: $294,000 and 512 Nvidia H800 chips. The reason it was able to spend less, it seems, is because of the team's use of trial-and-error-based reinforcement learning techniques. Most AI models tasked with performing reasoning tasks need to be trained on human-annotated data and demonstrations to "learn" how to solve certain problems, which is both expensive and time-consuming to scale as models are given more challenging tasks. DeepSeek found that it could improve the reasoning and outputs of its model simply by incentivizing it to perform a trial-and-error process until it gets the right answer. In an article accompanying the paper, Carnegie Mellon University assistant professor Daphne Ippolito and PhD student Yiming Zhang explain the reinforcement method by comparing it to a child playing a video game: "As the child navigates their avatar through the game world, they learn through trial and error that some actions (such as collecting gold coins) earn points, whereas others (such as running into enemies) set their score back to zero. In a similar vein, DeepSeek-R1 was awarded a high score when it answered questions correctly and a low score when it gave wrong answers." Previous research showed that using a prompting approachâ€"asking an LLM to provide a step-by-step explanation of how it comes to its outputâ€"provides more accurate answers. But the DeepSeek team figured out a way to get better answers through reinforcement by assigning a scoring system to the outputs that R1 produced. That works particularly well with math and programming questions, which usually have a verifiably correct answer. By using this method instead of human-guided reasoning, the LLM was able to come to a correct conclusion on its own as it sought the higher scores. While the outputs of this method appear to be more accurate, it also obfuscates the machine's "thought" process a bit more for humans trying to follow along. Asked to produce a reasoning trail for its answer, the model would sometimes switch back and forth between English and Chinese. It also produced explanations that were 10,000 words or more. The method was also only particularly functional for answers with clear right or wrong answers rather than more nuanced or subjective prompts. Regardless, it's an interesting window into how DeepSeek has managed to be competitive on a smaller budget. Still, the company itself has plenty of skepticism surrounding it because of its perceived closeness to the Chinese government. Just recently, researchers showed The Washington Post that the company's model would refuse to produce code with major security flaws when the prompter indicates that they are working with groups considered sensitive by the Chinese government. The researchers also found that the model spat out less secure code when asked to produce work for Tibet, Taiwan, the Falun Gong religious movement, or the Islamic State.
[6]
DeepSeek releases R1 model trained for $294,000 on 512 H800 GPUs
The Chinese company DeepSeek AI has released its large language model, R1, which was trained for only $294,000 using 512 Nvidia H800 GPUs. In a paper published in the journal Nature, the company detailed how it achieved this low cost by using a trial-and-error reinforcement learning method, allowing the model to achieve competitive performance against rivals with much larger budgets, like OpenAI. DeepSeek's key innovation was to move away from the expensive, human-intensive process of creating annotated datasets. Traditional AI models for reasoning tasks are often trained on vast datasets where human experts provide step-by-step solutions to complex problems. Instead, DeepSeek developed an autonomous learning system that uses reinforcement learning to refine the model's reasoning skills through a system of rewards and penalties. Researchers from Carnegie Mellon University, in an article accompanying the Nature paper, compared the process to a child learning to play a video game. "As the child navigates their avatar through the game world, they learn through trial and error that some actions (such as collecting gold coins) earn points, whereas others (such as running into enemies) set their score back to zero. In a similar vein, DeepSeek-R1 was awarded a high score when it answered questions correctly and a low score when it gave wrong answers." This method was particularly effective for tasks in mathematics and programming, where answers can be definitively verified as right or wrong. The model would generate potential solutions, which were then evaluated by an automated scoring system. It would then iterate on its approach until it achieved the highest score, all without human intervention. This efficient, self-directed process allowed the company to build a powerful AI system with a fraction of the investment required by its competitors. While the reinforcement learning approach proved cost-effective, it also has some limitations. The model's outputs often hide the underlying reasoning steps, making it difficult for a human to understand how it arrived at a conclusion. When asked to provide its reasoning, R1 generated extremely long and hard-to-read explanations -- sometimes over 10,000 words -- that switched between English and Chinese. The technique also struggled with tasks requiring nuance or subjectivity, where there is no single "correct" answer. Beyond its technical limitations, the model's development in China has raised concerns about potential government influence. A recent report from The Washington Post found that R1 exhibited biases in its outputs. Researchers discovered that the model would refuse to generate code with major security flaws when the prompts involved groups considered sensitive by Chinese authorities. However, when asked to create code for entities like Tibet, Taiwan, or the Falun Gong religious movement, the model produced less secure versions with built-in vulnerabilities. This suggests that the model's behavior may be shaped by the political priorities of the Chinese government.
[7]
China's DeepSeek says its hit AI model cost just $294,000 to train - The Economic Times
Chinese AI firm DeepSeek revealed its R1 model was trained for just $294,000 using 512 Nvidia H800 chips, far below US rivals' costs. The disclosure revives debates on China's AI progress, export restrictions, and transparency, with skepticism over DeepSeek's true access to banned Nvidia hardware.Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for US rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence. The rare update from the Hangzhou-based company - the first estimate it has released of R1's training costs - appeared in a peer-reviewed article in the academic journal Nature published on Wednesday. DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia. Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product updates. The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. A previous version of the article published in January did not contain this information. Sam Altman, CEO of US AI giant OpenAI, said in 2023 that what he called "foundational model training" had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases. Training costs for the large-language models powering AI chatbots refer to the expenses incurred from running a cluster of powerful chips for weeks or months to process vast amounts of text and code. Some of Deepseek's statements about its development costs and the technology it used have been questioned by U.S. companies and officials. The H800 chips it mentioned were designed by Nvidia for the Chinese market after the US in October 2022 made it illegal for the company to export its more powerful H100 and A100 AI chips to China. US officials told Reuters in June that DeepSeek has access to "large volumes" of H100 chips that were procured after US export controls were implemented. Nvidia told Reuters at the time that DeepSeek has used lawfully acquired H800 chips, not H100s. In a supplementary information document accompanying the Nature article, the company acknowledged for the first time it does own A100 chips and said it had used them in preparatory stages of development. "Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote. After this initial phase, R1 was trained for a total of 80 hours on the 512 chip-cluster of H800 chips, they added. Reuters has previously reported that one reason DeepSeek was able to attract the brightest minds in China was because it was one of the few domestic companies to operate an A100 supercomputing cluster.
Share
Share
Copy Link
Chinese AI startup DeepSeek reveals groundbreaking training methods and costs for its R1 model in a peer-reviewed Nature paper, sparking debates over efficiency and transparency in AI development.
Chinese AI startup DeepSeek has made waves in the artificial intelligence community with the publication of a peer-reviewed paper in Nature, detailing the development of their R1 model. This landmark study marks the first major large language model (LLM) to undergo the rigorous peer-review process, setting a new precedent for transparency in AI research
1
2
.Source: Nature
DeepSeek's primary innovation lies in its use of pure reinforcement learning to create R1. This automated trial-and-error approach rewards the model for reaching correct answers, rather than following human-selected reasoning examples. The process allowed R1 to develop its own reasoning-like strategies, including self-verification methods
1
2
.One of the most striking claims in the paper is the reported training cost of just $294,000 for R1. This figure, based on 512 Nvidia H800 GPUs running for 198 hours, is substantially lower than the tens of millions of dollars typically associated with training competitive AI models
3
4
.However, this claim has been met with skepticism. Critics argue that the $294,000 figure only accounts for the final reinforcement learning phase, not the entire training process. When including the development of the base V3 model, which required 2.79 million GPU hours, the total cost rises to approximately $5.87 million
3
.Related Stories
Despite the cost controversy, R1's performance has been impressive. It has become the most popular open-weight model on the AI community platform Hugging Face, with 10.9 million downloads. In scientific task challenges, R1 has proven to be highly competitive, particularly in balancing ability with cost
1
.Source: Economic Times
The paper also addresses concerns about DeepSeek's training data sources. While acknowledging that R1's base model was trained on web data, which may have included AI-generated content, the researchers deny deliberately using outputs from rival models like OpenAI's
1
2
.The publication of this paper in Nature has been widely welcomed as a step towards greater transparency in AI development. It sets a precedent that other firms may be encouraged to follow, potentially leading to more open evaluation of AI systems and their associated risks
1
2
.Source: Gizmodo
As researchers continue to explore and apply DeepSeek's methods, the R1 model's influence is likely to grow, potentially revolutionizing how reasoning capabilities are developed in future AI systems
5
.Summarized by
Navi
[2]
[3]