Curated by THEOUTPOST
On Tue, 25 Feb, 4:04 PM UTC
6 Sources
[1]
Why DeepSeek R1 could be right for your business, and why the hysteria around it is wrong
DeepSeek's R1 model beat American tech giants at their own game, and here's how DeepSeek R1, released January 20. 2025, is an open source large language model (LLM), on par with the capabilities of OpenAI's o1 model, that you can scale to run on your own hardware, or the cloud infrastructure of your choice, today, and it won't cost you anything -- well, maybe a GPU. With this development, artificial intelligence, the futurist term adopted by companies such as OpenAI, Anthropic, Nvidia and Google, who want you to believe that LLMs can achieve more than they currently can and probably ever will, and only through their input and infinite money pile, has been freely proliferated for all -- has sent these companies into freefall and damage control. Certain reporters on the tech industry have followed suit -- suggesting DeepSeek is a Chinese state psyop (untrue; it's a startup that came out of a hedge fund), conflating of the R1 and V3 models (though R1 is based on V3, the latter of which is the model used by the web/app version of DeepSeek, which you may have heard about about in the context of it having to be jailbroken -- bypassing their safeguards with enough prompt engineering -- into referencing the 1989 Tiananmen Square massacre), or accusing DeepSeek of plagiarism because some cherry-picked output 'believes' that it's ChatGPT (we'll get onto this latter point in a minute). More sinister are articles that have inferred DeepSeek's models are dangerous in some way, or 'just asking questions' about whether it's 'safe' to use at all. News coverage has editorialised studies by security research firms as "worrying" or "concerning", hammering home the point with plausible deniability by using hero images with the Chinese flag, or a snarling man sat at a keyboard in a balaclava, looming in the background. To be clear, yes state actors are a thing, but you cannot see that a new tech thing has come out of China, jump to the conclusion that the Chinese state is behind it, and still expect to be taken seriously as a mammal, let alone a journalist. These stories' headlines couch a "stupid" and plausibly "xenophobic" agenda (not my words, but those of Better Offline podcast host, CEO of PR firm EZPR, and tech journalist Ed Zitron) via quotes from these studies, and allude to jailbreaking while also backlinking to an article admitting that ChatGPT and other LLMs are also susceptible to it for that sweet search engine optimization. It's bewildering to then read headlines that can't even get to the end of falsely implying that DeepSeek has code or work from OpenAI, and it's important to refute this false implication, because it's readily available information, if you look under the mound of garbage, that DeepSeek trains its models using synthetic data -- AI generated output from other LLMs, like ChatGPT. It's an unfortunate hallucination, perhaps, but there's nothing wrong or unethical with this approach, and even Elon Musk has admitted that synthetic data is the way forward for AI training, and reams of it are already available in repositories online. False accusations that Deepseek has stolen code from OpenAI fall apart easily for another obvious reason; R1 is open source, so you - anyone - can go into the code and see that that's patently untrue. That's the whole point, and, by ignoring it, client journalists are muddying the waters around DeepSeek. They're motivated to do this because, with the release of R1, 'artificial intelligence' -- America's next top tech bubble -- has been freely proliferated, which is a direct threat to the business model of the sites that they work on -- that 'line go up' mentality shared by American politicians. Now that DeepSeek R1 is out there, you can't be sold an LLM, AI tools, or AI writers, because it's just there. There's no incentive to buy a $200 ChatGPT Pro subscription (which OpenAI still sells at a loss, by the way), or to race to buy a beefy Nvidia GPU thanks to the model being scalable depending on your hardware, plus that peer-to-peer networks are able to be leveraged to pool processing power for AI workloads. AI is now for everyone, which is The One Thing They Didn't Want to Happen. OpenAI's CEO Sam Altman, for instance, wanted you buying into the narrative that pumping catastrophic amounts of money into training models, and building electricity-juicing, water-slurping data centers was the only way to make AI happen, because that's what they want so badly to make money (and it really, really doesn't; OpenAI lost $5 billion in 2024). Altman has offered mealy-mouthed praise to DeepSeek for R1's efficiency, but, ever graceful in being shown to be wearing no clothes, also said that OpenAI will ""obviously deliver much better models." This claim is disgraceful in defeat and patently inaccurate -- with R1, DeepSeek is thought to have achieved with $6 million (£4.8m) what OpenAI spends tens of millions of dollars doing -- the training costs of OpenAI's closest competitor to R1, OpenAI o1, are unclear, but the less-capable GPT-4 cost in the region of $100 million dollars to train. There's now an alternative way through. DeepSeek's models are on par with OpenAI's - DeepSeek R1, is described on Github as being 'on par with OpenAI o1' (and is even better than it at reasoning in some cases), the company's most advanced model, while V3 is thought to be on par with GPT-4. It's great that a solid open source model now exists, and that it could be created efficiently and cheaply. Those facts make R1 easy to build on via platforms like Hugging Face, and AI's sudden decentralization bodes well for more use cases to be born out of AI. The Deepseek-pooh-poohing posturing by AI companies doesn't even hold up to scrutiny, and so neither does the press closing ranks; because plenty of bigger companies are all taking advantage of R1 being open source - Nvidia are now hosting DeepSeek R1 as a NIM microservice, and, in late January 2025, Microsoft, currently OpenAI's largest investor, added distilled versions of DeepSeek R1 to the Azure AI Foundry. The only company which I reckon will come out on top is Nvidia, because it makes the hardware used by hyperscalers - the large-scale data centers you may have heard so much about - built by AI companies in the search for more infrastructure so that they can keep throwing money at the solution without a problem, even now. Nvidia lost a sixth of its value after the launch of DeepSeek R1 on January 20 -- but all signs point to demand for its top-end Blackwell GPUs remaining strong. But I want to leave you with the important thing, which is that, should you be looking to implement AI into your business, with DeepSeek, these savings are passed onto you, with DeepSeek R1 being 30 times cheaper to run than OpenAI's models, in part because you can do on consumer-level hardware using a distilled version of the model. That's bad for big, subscription-driven, AI companies, and the outlets that prop them up, because you can't argue with cost. The AI bubble isn't profitable, and DeepSeek represents an existential threat to a business model that has yet to actually begin to function. That's why you're having this drivel sluiced into your eyeballs, constantly, and why I think it's worth giving DeepSeek R1 a try if you have even a passing interest in AI implementation.
[2]
DeepSeek: Everything you need to know about the AI chatbot app
Chinese AI lab DeepSeek broke into the mainstream consciousness this week after its chatbot app rose to the top of the Apple App Store charts (and Google Play, as well). DeepSeek's AI models, which were trained using compute-efficient techniques, have led Wall Street analysts -- and technologists -- to question whether the U.S. can maintain its lead in the AI race and whether the demand for AI chips will sustain. But where did DeepSeek come from, and how did it rise to international fame so quickly? DeepSeek's trader origins DeepSeek is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions. AI enthusiast Liang Wenfeng co-founded High-Flyer in 2015. Wenfeng, who reportedly began dabbling in trading while a student at Zhejiang University, launched High-Flyer Capital Management as a hedge fund in 2019 focused on developing and deploying AI algorithms. In 2023, High-Flyer started DeepSeek as a lab dedicated to researching AI tools separate from its financial business. With High-Flyer as one of its investors, the lab spun off into its own company, also called DeepSeek. From day one, DeepSeek built its own data center clusters for model training. But like other AI companies in China, DeepSeek has been affected by U.S. export bans on hardware. To train one of its more recent models, the company was forced to use Nvidia H800 chips, a less-powerful version of a chip, the H100, available to U.S. companies. DeepSeek's technical team is said to skew young. The company reportedly aggressively recruits doctorate AI researchers from top Chinese universities. DeepSeek also hires people without any computer science background to help its tech better understand a wide range of subjects, per The New York Times. DeepSeek's strong models DeepSeek unveiled its first set of models -- DeepSeek Coder, DeepSeek LLM, and DeepSeek Chat -- in November 2023. But it wasn't until last spring, when the startup released its next-gen DeepSeek-V2 family of models, that the AI industry started to take notice. DeepSeek-V2, a general-purpose text- and image-analyzing system, performed well in various AI benchmarks -- and was far cheaper to run than comparable models at the time. It forced DeepSeek's domestic competition, including ByteDance and Alibaba, to cut the usage prices for some of their models, and make others completely free. DeepSeek-V3, launched in December 2024, only added to DeepSeek's notoriety. According to DeepSeek's internal benchmark testing, DeepSeek V3 outperforms both downloadable, openly available models like Meta's Llama and "closed" models that can only be accessed through an API, like OpenAI's GPT-4o. Equally impressive is DeepSeek's R1 "reasoning" model. Released in January, DeepSeek claims R1 performs as well as OpenAI's o1 model on key benchmarks. Being a reasoning model, R1 effectively fact-checks itself, which helps it to avoid some of the pitfalls that normally trip up models. Reasoning models take a little longer -- usually seconds to minutes longer -- to arrive at solutions compared to a typical non-reasoning model. The upside is that they tend to be more reliable in domains such as physics, science, and math. There is a downside to R1, DeepSeek V3, and DeepSeek's other models, however. Being Chinese-developed AI, they're subject to benchmarking by China's internet regulator to ensure that its responses "embody core socialist values." In DeepSeek's chatbot app, for example, R1 won't answer questions about Tiananmen Square or Taiwan's autonomy. A disruptive approach If DeepSeek has a business model, it's not clear what that model is, exactly. The company prices its products and services well below market value -- and gives others away for free. The way DeepSeek tells it, efficiency breakthroughs have enabled it to maintain extreme cost competitiveness. Some experts dispute the figures the company has supplied, however. Whatever the case may be, developers have taken to DeepSeek's models, which aren't open source as the phrase is commonly understood but are available under permissive licenses that allow for commercial use. According to Clem Delangue, the CEO of Hugging Face, one of the platforms hosting DeepSeek's models, developers on Hugging Face have created over 500 "derivative" models of R1 that have racked up 2.5 million downloads combined. DeepSeek's success against larger and more established rivals has been described as "upending AI" and "over-hyped." The company's success was at least in part responsible for causing Nvidia's stock price to drop by 18% in January, and for eliciting a public response from OpenAI CEO Sam Altman. Microsoft announced that DeepSeek is available on its Azure AI Foundry service, Microsoft's platform that brings together AI services for enterprises under a single banner. When asked about DeepSeek's impact on Meta's AI spending during its first-quarter earnings call, CEO Mark Zuckerberg said spending on AI infrastructure will continue to be a "strategic advantage" for Meta. During Nvidia's fourth-quarter earnings call, CEO Jensen Huang emphasized DeepSeek's "excellent innovation," saying that it and other "reasoning" models are great for Nvidia because they need so much more compute. At the same time, some companies are banning DeepSeek, and so are entire countries and governments, including South Korea. New York state also banned DeepSeek from being used on government devices. As for what DeepSeek's future might hold, it's not clear. Improved models are a given. But the U.S. government appears to be growing wary of what it perceives as harmful foreign influence. TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.
[3]
DeepSeek rushes to launch new AI model as China goes all in
The Chinese startup triggered a $1 trillion-plus sell-off in global equities markets last month with a cut-price AI reasoning model that outperformed many Western competitors.DeepSeek is looking to press home its advantage. The Chinese startup triggered a $1 trillion-plus sell-off in global equities markets last month with a cut-price AI reasoning model that outperformed many Western competitors. Now, the Hangzhou-based firm is accelerating the launch of the successor to January's R1 model, according to three people familiar with the company. Deepseek had planned to release R2 in early May but now wants it out as early as possible, two of them said, without providing specifics. The company says it hopes the new model will produce better coding and be able to reason in languages beyond English. Details of the accelerated timeline for R2's release have not been previously reported. DeepSeek did not respond to a request for comment for this story. Rivals are still digesting the implications of R1, which was built with less-powerful Nvidia chips but is competitive with those developed at the costs of hundreds of billions of dollars by U.S. tech giants. "The launch of DeepSeek's R2 model could be a pivotal moment in the AI industry," said Vijayasimha Alilughatta, chief operating officer of Indian tech services provider Zensar. DeepSeek's success at creating cost-effective AI models "would likely spur companies worldwide to accelerate their own efforts ... breaking the stranglehold of the few dominant players in the field," he said. R2 is likely to worry the U.S. government, which has identified leadership of AI as a national priority. Its release may further galvanize Chinese authorities and companies, dozens of which say they have started integrating DeepSeek models into their products. Little is known about DeepSeek, whose founder Liang Wenfeng became a billionaire through his quantitative hedge fund High-Flyer. Liang, who was described by a former employer as "low-key and introverted," has not spoken to any media since July 2024. Reuters interviewed a dozen former employees, as well as quant fund professionals knowledgeable about the operations of DeepSeek and its parent company High-Flyer. It also reviewed state media articles, social-media posts from the companies and research papers dating back to 2019. They told a story of a company that functioned more like a research lab than a for-profit enterprise and was unencumbered by the hierarchical traditions of China's high-pressure tech industry, even as it became responsible for what many investors see as the latest breakthrough in AI. Different path Liang was born in 1985 in a rural village in the southern province of Guangdong. He later obtained communication engineering degrees at the elite Zhejiang University. One of his first jobs was running a research department at a smart imaging firm in Shanghai. His then-boss, Zhou Chaoen, told state media on Feb. 9 that Liang had hired prize-winning algorithm engineers and operated with a "flat management style." At DeepSeek and High-Flyer, Liang has similarly shunned the practices of Chinese tech giants known for rigid top-down management, low pay for young employees and "996" - working from 9 a.m. to 9 p.m. six days a week. Liang opened his Beijing office within walking distance of Tsinghua University and Peking University, China's two most prestigious education institutions. He regularly delved into technical details and was happy to work alongside Gen-Z interns and recent graduates that comprised the bulk of its workforce, according to two former employees. They also described usually working eight-hour days in a collaborative atmosphere. "Liang gave us control and treated us as experts. He constantly asked questions and learned alongside us," said 26-year-old researcher Benjamin Liu, who left the company in September. "DeepSeek allowed me to take ownership of critical parts of the pipeline, which was very exciting." Liang did not respond to questions sent via DeepSeek. While Baidu and other Chinese tech giants were racing to build their consumer-facing versions of ChatGPT in 2023 and profit off of the global AI boom, Liang told Chinese media outlet Waves last year that he deliberately avoided spending heavily on app development, focusing instead on refining the AI model's quality. Both DeepSeek and High-Flyer are known for paying generously, according to three people familiar with its compensation practices. At High-Flyer, it is not uncommon for a senior data scientist to make 1.5 million yuan annually, while competitors rarely pay more than 800,000, said one of the people, a rival quant fund manager who knows Liang. The largesse was funded by High-Flyer, which became one of China's most successful quant funds and, even after a government crackdown on the sector, still manages tens of billions of yuan, according to two people in the industry. Computing power DeepSeek's success with a low-cost AI model is based on High-Flyer's decade-long and substantial investment in research and computing power, three people said. The quant fund was an earlier pioneer in AI trading and a top executive said in 2020 that High-Flyer was going "all in" on AI by re-investing 70% of its revenue, mostly into AI research. High-Flyer spent 1.2 billion yuan on two supercomputing AI clusters in 2020 and 2021. The second cluster, Fire-Flyer II, was made up of around 10,000 Nvidia A100 chips, used for training AI models. DeepSeek had not been established at that time, so the accumulation of computing power caught the attention of Chinese securities regulators, said a person with direct knowledge of officials' thinking. "Regulators wanted to know why they need so many chips?" the person said. "How they were going to use it? What kind of impact would that have on the market?" Authorities decided not to intervene, in a move that would prove crucial for DeepSeek's fortunes: the U.S. banned the export of A100 chips to China in 2022, at which point Fire-Flyer II was already in operation. Beijing now celebrates DeepSeek, but has instructed it not to engage with the media without approval, according to a person familiar with Chinese official thinking. Authorities had asked Liang to keep a low-profile because they were worried that too much hype in the media would draw unnecessary attention, the person said. China's cabinet and commerce ministry, as well as China's securities regulator, did not respond to requests for comment. As one of the few companies with a large A100 cluster, High-Flyer and DeepSeek were able to attract some of China's best research talent, two former employees said. "The key advantage of vast (computing) resources is that it allows for large-scale experimentation," said Liu, the former employee. Some Western AI entrepreneurs, like Scale AI CEO Alexandr Wang, have claimed that DeepSeek had as many as 50,000 higher-end Nvidia chips that are banned for export to China. He has not produced evidence for the allegation or responded to Reuters' requests to provide proof. DeepSeek has not responded to Wang's claims. Two former employees attributed the company's success to Liang's focus on more cost-effective AI architecture. The startup used techniques like Mixture-of-Experts (MoE) and multihead latent attention (MLA), which incur far lower computing costs, its research papers show. The MoE technique divides an AI model into different areas of expertise and activates only those related to a query, as opposed to more common architectures that use the entire model. MLA architecture allows a model to process different aspects of one piece of information simultaneously, helping it detect key details more effectively. While competitors like France's Mistral have developed models based on MoE, DeepSeek was the first firm to depend heavily on this architecture while achieving parity with more expensively built models. DeepSeek's pricing was 20 to 40 times cheaper than what OpenAI charged for equivalent models, analysts at Bernstein brokerage estimated in early February. For now, Western and Chinese tech giants have signaled plans to continue heavy AI spending, but DeepSeek's success with R1 and its earlier V3 model has prompted some to alter strategies. OpenAI cut prices this month, while Google's Gemini has introduced discounted tiers of access. Since R1's launch, OpenAI has also released an O3-Mini model that relies on less computing power. Adnan Masood of U.S. tech services provider UST told Reuters that his laboratory had run benchmarks that found R1 often used three times as many tokens, or units of data processed by the AI model, for reasoning as OpenAI's scaled-down model. State embrace Even before R1 gripped global attention, there were signs that DeepSeek had caught Beijing's favor. In January, state media reported that Liang attended a meeting with Chinese Premier Li Qiang in Beijing as the designated representative of the AI sector, ahead of the leaders of better-known firms. The subsequent fanfare over the cost competitiveness of its models has buoyed Beijing's belief that it can out-innovate the U.S., with Chinese companies and government bodies embracing DeepSeek models at a pace that has not been offered to other firms. At least 13 Chinese city governments and 10 state-owned energy companies say they have deployed DeepSeek into their systems, while tech giants Lenovo, Baidu and Tencent - owner of China's largest social media app WeChat - have integrated DeepSeek's models into their products. Chinese leader Xi Jinping and Li "have signalled they endorse DeepSeek," said Alfred Wu, an expert on Chinese policymaking at Singapore's Lee Kuan Yew School of Public Policy. "Now everyone just endorses it." The Chinese embrace comes as governments from South Korea to Italy remove DeepSeek from national app stores, citing privacy concerns. "If DeepSeek becomes the go-to AI model across Chinese state entities, Western regulators might see this as another reason to escalate restrictions on AI chips or software collaborations," said Stephen Wu, an AI expert and founder of hedge fund Carthage Capital. Further limits on advanced AI chips are a challenge that Liang has acknowledged. "Our problem has never been funding," he told Waves in July. "It's the embargo on high-end chips."x
[4]
DeepSeek rushes to launch new AI model as China goes all in
BEIJING/HONG KONG/SINGAPORE (Reuters) - DeepSeek is looking to press home its advantage. The Chinese startup triggered a $1 trillion-plus sell-off in global equities markets last month with a cut-price AI reasoning model that outperformed many Western competitors. Now, the Hangzhou-based firm is accelerating the launch of the successor to January's R1 model, according to three people familiar with the company. Deepseek had planned to release R2 in early May but now wants it out as early as possible, two of them said, without providing specifics. The company says it hopes the new model will produce better coding and be able to reason in languages beyond English. Details of the accelerated timeline for R2's release have not been previously reported. DeepSeek did not respond to a request for comment for this story. Rivals are still digesting the implications of R1, which was built with less-powerful Nvidia chips but is competitive with those developed at the costs of hundreds of billions of dollars by U.S. tech giants. "The launch of DeepSeek's R2 model could be a pivotal moment in the AI industry," said Vijayasimha Alilughatta, chief operating officer of Indian tech services provider Zensar. DeepSeek's success at creating cost-effective AI models "would likely spur companies worldwide to accelerate their own efforts ... breaking the stranglehold of the few dominant players in the field," he said. R2 is likely to worry the U.S. government, which has identified leadership of AI as a national priority. Its release may further galvanize Chinese authorities and companies, dozens of which say they have started integrating DeepSeek models into their products. Little is known about DeepSeek, whose founder Liang Wenfeng became a billionaire through his quantitative hedge fund High-Flyer. Liang, who was described by a former employer as "low-key and introverted," has not spoken to any media since July 2024. Reuters interviewed a dozen former employees, as well as quant fund professionals knowledgeable about the operations of DeepSeek and its parent company High-Flyer. It also reviewed state media articles, social-media posts from the companies and research papers dating back to 2019. They told a story of a company that functioned more like a research lab than a for-profit enterprise and was unencumbered by the hierarchical traditions of China's high-pressure tech industry, even as it became responsible for what many investors see as the latest breakthrough in AI. DIFFERENT PATH Liang was born in 1985 in a rural village in the southern province of Guangdong. He later obtained communication engineering degrees at the elite Zhejiang University. One of his first jobs was running a research department at a smart imaging firm in Shanghai. His then-boss, Zhou Chaoen, told state media on Feb. 9 that Liang had hired prize-winning algorithm engineers and operated with a "flat management style." At DeepSeek and High-Flyer, Liang has similarly shunned the practices of Chinese tech giants known for rigid top-down management, low pay for young employees and "996" - working from 9 a.m. to 9 p.m. six days a week. Liang opened his Beijing office within walking distance of Tsinghua University and Peking University, China's two most prestigious education institutions. He regularly delved into technical details and was happy to work alongside Gen-Z interns and recent graduates that comprised the bulk of its workforce, according to two former employees. They also described usually working eight-hour days in a collaborative atmosphere. "Liang gave us control and treated us as experts. He constantly asked questions and learned alongside us," said 26-year-old researcher Benjamin Liu, who left the company in September. "DeepSeek allowed me to take ownership of critical parts of the pipeline, which was very exciting." Liang did not respond to questions sent via DeepSeek. While Baidu and other Chinese tech giants were racing to build their consumer-facing versions of ChatGPT in 2023 and profit off of the global AI boom, Liang told Chinese media outlet Waves last year that he deliberately avoided spending heavily on app development, focusing instead on refining the AI model's quality. Both DeepSeek and High-Flyer are known for paying generously, according to three people familiar with its compensation practices. At High-Flyer, it is not uncommon for a senior data scientist to make 1.5 million yuan annually, while competitors rarely pay more than 800,000, said one of the people, a rival quant fund manager who knows Liang. The largesse was funded by High-Flyer, which became one of China's most successful quant funds and, even after a government crackdown on the sector, still manages tens of billions of yuan, according to two people in the industry. COMPUTING POWER DeepSeek's success with a low-cost AI model is based on High-Flyer's decade-long and substantial investment in research and computing power, three people said. The quant fund was an earlier pioneer in AI trading and a top executive said in 2020 that High-Flyer was going "all in" on AI by re-investing 70% of its revenue, mostly into AI research. High-Flyer spent 1.2 billion yuan on two supercomputing AI clusters in 2020 and 2021. The second cluster, Fire-Flyer II, was made up of around 10,000 Nvidia A100 chips, used for training AI models. DeepSeek had not been established at that time, so the accumulation of computing power caught the attention of Chinese securities regulators, said a person with direct knowledge of officials' thinking. "Regulators wanted to know why they need so many chips?" the person said. "How they were going to use it? What kind of impact would that have on the market?" Authorities decided not to intervene, in a move that would prove crucial for DeepSeek's fortunes: the U.S. banned the export of A100 chips to China in 2022, at which point Fire-Flyer II was already in operation. Beijing now celebrates DeepSeek, but has instructed it not to engage with the media without approval, according to a person familiar with Chinese official thinking. Authorities had asked Liang to keep a low-profile because they were worried that too much hype in the media would draw unnecessary attention, the person said. China's cabinet and commerce ministry, as well as China's securities regulator, did not respond to requests for comment. As one of the few companies with a large A100 cluster, High-Flyer and DeepSeek were able to attract some of China's best research talent, two former employees said. "The key advantage of vast (computing) resources is that it allows for large-scale experimentation," said Liu, the former employee. Some Western AI entrepreneurs, like Scale AI CEO Alexandr Wang, have claimed that DeepSeek had as many as 50,000 higher-end Nvidia chips that are banned for export to China. He has not produced evidence for the allegation or responded to Reuters' requests to provide proof. DeepSeek has not responded to Wang's claims. Two former employees attributed the company's success to Liang's focus on more cost-effective AI architecture. The startup used techniques like Mixture-of-Experts (MoE) and multihead latent attention (MLA), which incur far lower computing costs, its research papers show. The MoE technique divides an AI model into different areas of expertise and activates only those related to a query, as opposed to more common architectures that use the entire model. MLA architecture allows a model to process different aspects of one piece of information simultaneously, helping it detect key details more effectively. While competitors like France's Mistral have developed models based on MoE, DeepSeek was the first firm to depend heavily on this architecture while achieving parity with more expensively built models. DeepSeek's pricing was 20 to 40 times cheaper than what OpenAI charged for equivalent models, analysts at Bernstein brokerage estimated in early February. For now, Western and Chinese tech giants have signaled plans to continue heavy AI spending, but DeepSeek's success with R1 and its earlier V3 model has prompted some to alter strategies. OpenAI cut prices this month, while Google's Gemini has introduced discounted tiers of access. Since R1's launch, OpenAI has also released an O3-Mini model that relies on less computing power. Adnan Masood of U.S. tech services provider UST told Reuters that his laboratory had run benchmarks that found R1 often used three times as many tokens, or units of data processed by the AI model, for reasoning as OpenAI's scaled-down model. STATE EMBRACE Even before R1 gripped global attention, there were signs that DeepSeek had caught Beijing's favor. In January, state media reported that Liang attended a meeting with Chinese Premier Li Qiang in Beijing as the designated representative of the AI sector, ahead of the leaders of better-known firms. The subsequent fanfare over the cost competitiveness of its models has buoyed Beijing's belief that it can out-innovate the U.S., with Chinese companies and government bodies embracing DeepSeek models at a pace that has not been offered to other firms. At least 13 Chinese city governments and 10 state-owned energy companies say they have deployed DeepSeek into their systems, while tech giants Lenovo, Baidu and Tencent - owner of China's largest social media app WeChat - have integrated DeepSeek's models into their products. Chinese leader Xi Jinping and Li "have signalled they endorse DeepSeek," said Alfred Wu, an expert on Chinese policymaking at Singapore's Lee Kuan Yew School of Public Policy. "Now everyone just endorses it." The Chinese embrace comes as governments from South Korea to Italy remove DeepSeek from national app stores, citing privacy concerns. "If DeepSeek becomes the go-to AI model across Chinese state entities, Western regulators might see this as another reason to escalate restrictions on AI chips or software collaborations," said Stephen Wu, an AI expert and founder of hedge fund Carthage Capital. Further limits on advanced AI chips are a challenge that Liang has acknowledged. "Our problem has never been funding," he told Waves in July. "It's the embargo on high-end chips." (Additional reporting by Samuel Shen, Gu Li, Larissa Liao, Aditya Soni and Shanghai Newsroom; Editing by Brenda Goh and Katerina Ang)
[5]
DeepSeek Rushes to Launch New AI Model as China Goes All In
Limits on advanced AI chips are a challenge, Deepseek founder said DeepSeek is looking to press home its advantage. The Chinese startup triggered a $1 trillion (roughly Rs. 8,72,00,30 crore)-plus sell-off in global equities markets last month with a cut-price AI reasoning model that outperformed many Western competitors. Now, the Hangzhou-based firm is accelerating the launch of the successor to January's R1 model, according to three people familiar with the company. DeepSeek had planned to release R2 in early May but now wants it out as early as possible, two of them said, without providing specifics. The company says it hopes the new model will produce better coding and be able to reason in languages beyond English. Details of the accelerated timeline for R2's release have not been previously reported. DeepSeek did not respond to a request for comment for this story. Rivals are still digesting the implications of R1, which was built with less-powerful Nvidia chips but is competitive with those developed at the costs of hundreds of billions of dollars by US tech giants. "The launch of DeepSeek's R2 model could be a pivotal moment in the AI industry," said Vijayasimha Alilughatta, chief operating officer of Indian tech services provider Zensar. DeepSeek's success at creating cost-effective AI models "would likely spur companies worldwide to accelerate their own efforts ... breaking the stranglehold of the few dominant players in the field," he said. R2 is likely to worry the US government, which has identified leadership of AI as a national priority. Its release may further galvanise Chinese authorities and companies, dozens of which say they have started integrating DeepSeek models into their products. Little is known about DeepSeek, whose founder Liang Wenfeng became a billionaire through his quantitative hedge fund High-Flyer. Liang, who was described by a former employer as "low-key and introverted," has not spoken to any media since July 2024. Reuters interviewed a dozen former employees, as well as quant fund professionals knowledgeable about the operations of DeepSeek and its parent company High-Flyer. It also reviewed state media articles, social-media posts from the companies and research papers dating back to 2019. They told a story of a company that functioned more like a research lab than a for-profit enterprise and was unencumbered by the hierarchical traditions of China's high-pressure tech industry, even as it became responsible for what many investors see as the latest breakthrough in AI. Different Path Liang was born in 1985 in a rural village in the southern province of Guangdong. He later obtained communication engineering degrees at the elite Zhejiang University. One of his first jobs was running a research department at a smart imaging firm in Shanghai. His then-boss, Zhou Chaoen, told state media on February 9 that Liang had hired prize-winning algorithm engineers and operated with a "flat management style." At DeepSeek and High-Flyer, Liang has similarly shunned the practices of Chinese tech giants known for rigid top-down management, low pay for young employees and "996" - working from 9 a.m. to 9 p.m. six days a week. Liang opened his Beijing office within walking distance of Tsinghua University and Peking University, China's two most prestigious education institutions. He regularly delved into technical details and was happy to work alongside Gen-Z interns and recent graduates that comprised the bulk of its workforce, according to two former employees. They also described usually working eight-hour days in a collaborative atmosphere. "Liang gave us control and treated us as experts. He constantly asked questions and learned alongside us," said 26-year-old researcher Benjamin Liu, who left the company in September. "DeepSeek allowed me to take ownership of critical parts of the pipeline, which was very exciting." Liang did not respond to questions sent via DeepSeek. While Baidu and other Chinese tech giants were racing to build their consumer-facing versions of ChatGPT in 2023 and profit off of the global AI boom, Liang told Chinese media outlet Waves last year that he deliberately avoided spending heavily on app development, focusing instead on refining the AI model's quality. Both DeepSeek and High-Flyer are known for paying generously, according to three people familiar with its compensation practices. At High-Flyer, it is not uncommon for a senior data scientist to make CNY 1.5 million (roughly Rs. 1.8 lakh) annually, while competitors rarely pay more than 800,000, said one of the people, a rival quant fund manager who knows Liang. The largesse was funded by High-Flyer, which became one of China's most successful quant funds and, even after a government crackdown on the sector, still manages tens of billions of yuan, according to two people in the industry. Computing Power DeepSeek's success with a low-cost AI model is based on High-Flyer's decade-long and substantial investment in research and computing power, three people said. The quant fund was an earlier pioneer in AI trading and a top executive said in 2020 that High-Flyer was going "all in" on AI by re-investing 70 percent of its revenue, mostly into AI research. High-Flyer spent CNY 1.2 billion (roughly Rs. 1,441 crore) on two supercomputing AI clusters in 2020 and 2021. The second cluster, Fire-Flyer II, was made up of around 10,000 Nvidia A100 chips, used for training AI models. DeepSeek had not been established at that time, so the accumulation of computing power caught the attention of Chinese securities regulators, said a person with direct knowledge of officials' thinking. "Regulators wanted to know why they need so many chips?" the person said. "How they were going to use it? What kind of impact would that have on the market?" Authorities decided not to intervene, in a move that would prove crucial for DeepSeek's fortunes: the US banned the export of A100 chips to China in 2022, at which point Fire-Flyer II was already in operation. Beijing now celebrates DeepSeek, but has instructed it not to engage with the media without approval, according to a person familiar with Chinese official thinking. Authorities had asked Liang to keep a low-profile because they were worried that too much hype in the media would draw unnecessary attention, the person said. China's cabinet and commerce ministry, as well as China's securities regulator, did not respond to requests for comment. As one of the few companies with a large A100 cluster, High-Flyer and DeepSeek were able to attract some of China's best research talent, two former employees said. "The key advantage of vast (computing) resources is that it allows for large-scale experimentation," said Liu, the former employee. Some Western AI entrepreneurs, like Scale AI CEO Alexandr Wang, have claimed that DeepSeek had as many as 50,000 higher-end Nvidia chips that are banned for export to China. He has not produced evidence for the allegation or responded to Reuters' requests to provide proof. DeepSeek has not responded to Wang's claims. Two former employees attributed the company's success to Liang's focus on more cost-effective AI architecture. The startup used techniques like Mixture-of-Experts (MoE) and multihead latent attention (MLA), which incur far lower computing costs, its research papers show. The MoE technique divides an AI model into different areas of expertise and activates only those related to a query, as opposed to more common architectures that use the entire model. MLA architecture allows a model to process different aspects of one piece of information simultaneously, helping it detect key details more effectively. While competitors like France's Mistral have developed models based on MoE, DeepSeek was the first firm to depend heavily on this architecture while achieving parity with more expensively built models. DeepSeek's pricing was 20 to 40 times cheaper than what OpenAI charged for equivalent models, analysts at Bernstein brokerage estimated in early February. For now, Western and Chinese tech giants have signaled plans to continue heavy AI spending, but DeepSeek's success with R1 and its earlier V3 model has prompted some to alter strategies. OpenAI cut prices this month, while Google's Gemini has introduced discounted tiers of access. Since R1's launch, OpenAI has also released an O3-Mini model that relies on less computing power. Adnan Masood of US tech services provider UST told Reuters that his laboratory had run benchmarks that found R1 often used three times as many tokens, or units of data processed by the AI model, for reasoning as OpenAI's scaled-down model. State Embrace Even before R1 gripped global attention, there were signs that DeepSeek had caught Beijing's favour. In January, state media reported that Liang attended a meeting with Chinese Premier Li Qiang in Beijing as the designated representative of the AI sector, ahead of the leaders of better-known firms. The subsequent fanfare over the cost competitiveness of its models has buoyed Beijing's belief that it can out-innovate the US, with Chinese companies and government bodies embracing DeepSeek models at a pace that has not been offered to other firms. At least 13 Chinese city governments and 10 state-owned energy companies say they have deployed DeepSeek into their systems, while tech giants Lenovo, Baidu and Tencent - owner of China's largest social media app WeChat - have integrated DeepSeek's models into their products. Chinese leader Xi Jinping and Li "have signalled they endorse DeepSeek," said Alfred Wu, an expert on Chinese policymaking at Singapore's Lee Kuan Yew School of Public Policy. "Now everyone just endorses it." The Chinese embrace comes as governments from South Korea to Italy remove DeepSeek from national app stores, citing privacy concerns. "If DeepSeek becomes the go-to AI model across Chinese state entities, Western regulators might see this as another reason to escalate restrictions on AI chips or software collaborations," said Stephen Wu, an AI expert and founder of hedge fund Carthage Capital. Further limits on advanced AI chips are a challenge that Liang has acknowledged. "Our problem has never been funding," he told Waves in July. "It's the embargo on high-end chips." © Thomson Reuters 2025
[6]
Insight: DeepSeek rushes to launch new AI model as China goes all in
BEIJING/HONG KONG/SINGAPORE, Feb 25 (Reuters) - DeepSeek is looking to press home its advantage. The Chinese startup triggered a $1 trillion-plus sell-off in global equities markets last month with a cut-price AI reasoning model that outperformed many Western competitors. Now, the Hangzhou-based firm is accelerating the launch of the successor to January's R1 model, according to three people familiar with the company. Deepseek had planned to release R2 in early May but now wants it out as early as possible, two of them said, without providing specifics. The company says it hopes the new model will produce better coding and be able to reason in languages beyond English. Details of the accelerated timeline for R2's release have not been previously reported. DeepSeek did not respond to a request for comment for this story. Rivals are still digesting the implications of R1, which was built with less-powerful Nvidia chips but is competitive with those developed at the costs of hundreds of billions of dollars by U.S. tech giants. "The launch of DeepSeek's R2 model could be a pivotal moment in the AI industry," said Vijayasimha Alilughatta, chief operating officer of Indian tech services provider Zensar. DeepSeek's success at creating cost-effective AI models "would likely spur companies worldwide to accelerate their own efforts ... breaking the stranglehold of the few dominant players in the field," he said. R2 is likely to worry the U.S. government, which has identified leadership of AI as a national priority. Its release may further galvanize Chinese authorities and companies, dozens of which say they have started integrating DeepSeek models into their products. Little is known about DeepSeek, whose founder Liang Wenfeng became a billionaire through his quantitative hedge fund High-Flyer. Liang, who was described by a former employer as "low-key and introverted," has not spoken to any media since July 2024. Reuters interviewed a dozen former employees, as well as quant fund professionals knowledgeable about the operations of DeepSeek and its parent company High-Flyer. It also reviewed state media articles, social-media posts from the companies and research papers dating back to 2019. They told a story of a company that functioned more like a research lab than a for-profit enterprise and was unencumbered by the hierarchical traditions of China's high-pressure tech industry, even as it became responsible for what many investors see as the latest breakthrough in AI. DIFFERENT PATH Liang was born in 1985 in a rural village in the southern province of Guangdong. He later obtained communication engineering degrees at the elite Zhejiang University. One of his first jobs was running a research department at a smart imaging firm in Shanghai. His then-boss, Zhou Chaoen, told state media on Feb. 9 that Liang had hired prize-winning algorithm engineers and operated with a "flat management style." At DeepSeek and High-Flyer, Liang has similarly shunned the practices of Chinese tech giants known for rigid top-down management, low pay for young employees and "996" - working from 9 a.m. to 9 p.m. six days a week. Liang opened his Beijing office within walking distance of Tsinghua University and Peking University, China's two most prestigious education institutions. He regularly delved into technical details and was happy to work alongside Gen-Z interns and recent graduates that comprised the bulk of its workforce, according to two former employees. They also described usually working eight-hour days in a collaborative atmosphere. "Liang gave us control and treated us as experts. He constantly asked questions and learned alongside us," said 26-year-old researcher Benjamin Liu, who left the company in September. "DeepSeek allowed me to take ownership of critical parts of the pipeline, which was very exciting." Liang did not respond to questions sent via DeepSeek. While Baidu and other Chinese tech giants were racing to build their consumer-facing versions of ChatGPT in 2023 and profit off of the global AI boom, Liang told Chinese media outlet Waves last year that he deliberately avoided spending heavily on app development, focusing instead on refining the AI model's quality. Both DeepSeek and High-Flyer are known for paying generously, according to three people familiar with its compensation practices. At High-Flyer, it is not uncommon for a senior data scientist to make 1.5 million yuan annually, while competitors rarely pay more than 800,000, said one of the people, a rival quant fund manager who knows Liang. The largesse was funded by High-Flyer, which became one of China's most successful quant funds and, even after a government crackdown on the sector, still manages tens of billions of yuan, according to two people in the industry. COMPUTING POWER DeepSeek's success with a low-cost AI model is based on High-Flyer's decade-long and substantial investment in research and computing power, three people said. The quant fund was an earlier pioneer in AI trading and a top executive said in 2020 that High-Flyer was going "all in" on AI by re-investing 70% of its revenue, mostly into AI research. High-Flyer spent 1.2 billion yuan on two supercomputing AI clusters in 2020 and 2021. The second cluster, Fire-Flyer II, was made up of around 10,000 Nvidia A100 chips, used for training AI models. DeepSeek had not been established at that time, so the accumulation of computing power caught the attention of Chinese securities regulators, said a person with direct knowledge of officials' thinking. "Regulators wanted to know why they need so many chips?" the person said. "How they were going to use it? What kind of impact would that have on the market?" Authorities decided not to intervene, in a move that would prove crucial for DeepSeek's fortunes: the U.S. banned the export of A100 chips to China in 2022, at which point Fire-Flyer II was already in operation. Beijing now celebrates DeepSeek, but has instructed it not to engage with the media without approval, according to a person familiar with Chinese official thinking. Authorities had asked Liang to keep a low-profile because they were worried that too much hype in the media would draw unnecessary attention, the person said. China's cabinet and commerce ministry, as well as China's securities regulator, did not respond to requests for comment. As one of the few companies with a large A100 cluster, High-Flyer and DeepSeek were able to attract some of China's best research talent, two former employees said. "The key advantage of vast (computing) resources is that it allows for large-scale experimentation," said Liu, the former employee. Some Western AI entrepreneurs, like Scale AI CEO Alexandr Wang, have claimed that DeepSeek had as many as 50,000 higher-end Nvidia chips that are banned for export to China. He has not produced evidence for the allegation or responded to Reuters' requests to provide proof. DeepSeek has not responded to Wang's claims. Two former employees attributed the company's success to Liang's focus on more cost-effective AI architecture. The startup used techniques like Mixture-of-Experts (MoE) and multihead latent attention (MLA), which incur far lower computing costs, its research papers show. The MoE technique divides an AI model into different areas of expertise and activates only those related to a query, as opposed to more common architectures that use the entire model. MLA architecture allows a model to process different aspects of one piece of information simultaneously, helping it detect key details more effectively. While competitors like France's Mistral have developed models based on MoE, DeepSeek was the first firm to depend heavily on this architecture while achieving parity with more expensively built models. DeepSeek's pricing was 20 to 40 times cheaper than what OpenAI charged for equivalent models, analysts at Bernstein brokerage estimated in early February. For now, Western and Chinese tech giants have signaled plans to continue heavy AI spending, but DeepSeek's success with R1 and its earlier V3 model has prompted some to alter strategies. OpenAI cut prices this month, while Google's Gemini has introduced discounted tiers of access. Since R1's launch, OpenAI has also released an O3-Mini model that relies on less computing power. Adnan Masood of U.S. tech services provider UST told Reuters that his laboratory had run benchmarks that found R1 often used three times as many tokens, or units of data processed by the AI model, for reasoning as OpenAI's scaled-down model. STATE EMBRACE Even before R1 gripped global attention, there were signs that DeepSeek had caught Beijing's favor. In January, state media reported that Liang attended a meeting with Chinese Premier Li Qiang in Beijing as the designated representative of the AI sector, ahead of the leaders of better-known firms. The subsequent fanfare over the cost competitiveness of its models has buoyed Beijing's belief that it can out-innovate the U.S., with Chinese companies and government bodies embracing DeepSeek models at a pace that has not been offered to other firms. At least 13 Chinese city governments and 10 state-owned energy companies say they have deployed DeepSeek into their systems, while tech giants Lenovo (0992.HK), opens new tab, Baidu (9888.HK), opens new tab and Tencent (0700.HK), opens new tab - owner of China's largest social media app WeChat - have integrated DeepSeek's models into their products. Chinese leader Xi Jinping and Li "have signalled they endorse DeepSeek," said Alfred Wu, an expert on Chinese policymaking at Singapore's Lee Kuan Yew School of Public Policy. "Now everyone just endorses it." The Chinese embrace comes as governments from South Korea to Italy remove DeepSeek from national app stores, citing privacy concerns. "If DeepSeek becomes the go-to AI model across Chinese state entities, Western regulators might see this as another reason to escalate restrictions on AI chips or software collaborations," said Stephen Wu, an AI expert and founder of hedge fund Carthage Capital. Further limits on advanced AI chips are a challenge that Liang has acknowledged. "Our problem has never been funding," he told Waves in July. "It's the embargo on high-end chips." Additional reporting by Samuel Shen, Gu Li, Larissa Liao, Aditya Soni and Shanghai Newsroom; Editing by Brenda Goh and Katerina Ang Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Boards, Policy & Regulation Julie Zhu Thomson Reuters Julie Zhu is a Hong Kong-based senior correspondent for Reuters, focusing on M&A, IPOs, private equity and regulatory changes across Greater China. Since joining Reuters in 2016, Julie has led the coverage of some of the region's most significant stories including China's unprecedented regulatory crackdown, Beijing's COVID policies, Ant Group's IPO fiasco and Evergrande's debt crisis. She was named Reuters' Reporter of the Year in 2021. Prior to Reuters, Julie worked at the Financial Times where she reported on business and general news about South China and Hong Kong.
Share
Share
Copy Link
Chinese AI startup DeepSeek has disrupted the global AI market with its efficient and powerful models, sparking both excitement and controversy in the tech world.
Chinese AI startup DeepSeek has sent shockwaves through the global tech industry with the release of its highly efficient and powerful AI models. The company's R1 model, launched in January 2025, has demonstrated capabilities on par with or exceeding those of Western tech giants, while being significantly more cost-effective 12.
DeepSeek's R1 model, an open-source large language model (LLM), has been described as comparable to OpenAI's o1 model in capabilities. What sets R1 apart is its ability to run on various hardware configurations, from personal devices to cloud infrastructure, at a fraction of the cost of its competitors 1.
The release of R1 triggered a substantial market reaction, causing a $1 trillion-plus sell-off in global equities markets. This development has led to questions about the sustainability of the U.S. lead in the AI race and the future demand for AI chips 23.
DeepSeek was founded by Liang Wenfeng, who previously made his fortune through the quantitative hedge fund High-Flyer. The company operates more like a research lab than a traditional profit-driven enterprise, with a flat management structure and a focus on collaborative work environments 34.
Unlike many Chinese tech giants known for their intense work culture, DeepSeek offers generous compensation and maintains a more balanced work schedule. This approach has allowed the company to attract top talent and foster innovation 34.
DeepSeek's success is attributed to its efficient training techniques and substantial investment in computing power. The company's models, including DeepSeek-V2 and DeepSeek-V3, have performed well in various AI benchmarks, often outperforming both open-source and proprietary models from established players 23.
The company is now accelerating the launch of its next model, R2, which is expected to offer improved coding capabilities and reasoning in multiple languages 34.
DeepSeek's rapid rise has not been without controversy. Some reports have suggested connections to the Chinese state, though these claims are disputed 1. The company's models are subject to Chinese regulations, which limit certain types of content and responses 2.
The success of DeepSeek has raised concerns in the U.S. government, which views AI leadership as a national priority. Some countries and organizations have banned the use of DeepSeek's products on government devices 24.
Major tech companies like Microsoft, Meta, and Nvidia have acknowledged DeepSeek's impact on the industry. While some see it as a threat, others view it as an opportunity for innovation and market expansion 23.
As DeepSeek continues to develop its models and expand its reach, the global AI landscape is likely to see increased competition and potentially a shift in the balance of power between established tech giants and emerging players 1234.
Reference
[1]
[3]
[4]
[5]
Chinese AI startup DeepSeek has disrupted the AI industry with its cost-effective and powerful AI models, causing significant market reactions and challenging the dominance of major U.S. tech companies.
14 Sources
14 Sources
Chinese AI company DeepSeek's new large language model challenges US tech dominance, sparking debates on open-source AI and geopolitical implications.
9 Sources
9 Sources
Chinese AI startup DeepSeek has quickly gained prominence with its powerful and cost-effective AI models, challenging U.S. dominance in AI technology while raising security and ethical concerns.
4 Sources
4 Sources
Recent developments in AI models from DeepSeek, Allen Institute, and Alibaba are reshaping the landscape of artificial intelligence, challenging industry leaders and pushing the boundaries of what's possible in language processing and reasoning capabilities.
4 Sources
4 Sources
Chinese AI startup DeepSeek has shaken the tech industry with its cost-effective and powerful AI model, causing market turmoil and raising questions about the future of AI development and investment.
49 Sources
49 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved