Curated by THEOUTPOST
On Tue, 28 Jan, 4:03 PM UTC
14 Sources
[1]
How DeepSeek's new AI models are impacting the profits of global companies
China's DeepSeek shook global stock markets after revealing that it had built a powerful artificial intelligence model for a mere $6 million. While some have disputed the shockingly low cost of developing the AI models, most agree that DeepSeek has sharply cut the on-going cost of running powerful AI models and that the firm's decision to release its technology for free has altered the course of the industry. CNBC Pro spoke to companies around the world on how DeepSeek's new AI models are set to impact their operations and financials. Roadzen , a Nasdaq-listed company, is attempting to disrupt the auto insurance sector with artificial intelligence. The company's AI service helps its insurance underwriting clients to cut the time taken to resolve 80% of minor accident claims from six weeks to two minutes, according to its chief executive Rohan Malhotra. The sensitive nature of processing insurance claims, alongside the potential for incorrectly predicting large costs for insurance clients, means the company has previously limited itself to a handful of sophisticated AI models that produce accurate results -- such as those produced by OpenAI, Anthropic and Meta . That was until DeepSeek released its R1 model. "Our clients cannot afford a model which has 60%-70% accuracy, that's like a major economic issue," said Malhotra. "We need to deploy models that have 95%-99% accuracy." DeepSeek's discount Malhotra, who graduated with a master's degree in robotics from Carnegie Mellon University, said DeepSeek-R1 output quality is on par with OpenAI's o1 -- its best large language model -- while also offering other benefits that are significant to his company, including costs. For instance, Roadzen processed 607,577 insurance claims for the three months ending September 2024. Each claim consumes roughly 4,000 tokens, according to the company. A token is the smallest unit of data fed to an AI model. About 750 words converts to 1,000 tokens. The AI firm would have incurred a cost of $36,455 over the quarter using OpenAI's latest large language model o1, according to CNBC calculations using publicly available pricing. That means on average, the company spent 6 cents per claim on AI costs. However, using DeepSeek-R1, the quarterly cost of $17,012, calculated using prices from AI model hosting firm Together.ai, would amount to 3 cents per claim, or 50% lower than costs incurred with OpenAI's models. Roadzen revealed that the firm incurs additional costs when fine-tuning or training an AI model on a per-policy basis, which would have amounted to $21,185 using the OpenAI o1 model, or $10,593 on DeepSeek's R1. In addition, it also faces additional costs to run its proprietary AI models that are used to estimate the cost of claims, detect vehicle damage over video and for fraud prevention among other uses that are not covered by commercially available models. "What we really care about is the cost of inference. We care about the accuracy of the outputs. And we care about whether this model is performing to the certain benchmarks that we've set, in a good way," Malhotra added. The open-source innovation Others have told CNBC that alongside the lower costs, DeepSeek's landmark decision to open source its reasoning model makes it more attractive compared to existing open-source models like Meta Platforms' Llama. Arli Charles Mujkic, CEO and founder of Swedish AI platform Ooda AI, told CNBC his company integrated DeepSeek's technology into its AI offering "the same day it was out." The company runs a digital store that offers customers a choice of AI models, allowing them to choose the best app for a specific job. Ooda AI has various revenue sources within the business: it sells pay-per-month subscriptions to AI apps on its store, allows customers to pay a base fee for AI programs and usage tokens, and also offers fixed-term contracts to its enterprise clients. Mujkic said his opinion of DeepSeek's v3 large language model -- the technology that underpins its products -- is that it's up to 20% "better" than Meta's Llama 3.3, which he labeled "the best open source model we'd been running up until this point." Ooda AI, which boasts one of Germany's largest health insurance firms as a clients, said it costs roughly 1.875 U.S. cents per customer support issue, or $18,750 per million, to be resolved using open-source AI models. However, the same tasks are likely to be 32% cheaper when executed on DeepSeek's AI models, according to the company. The company, whose Stockholm-listed shares have gained more than 1,400% over the past year, is expecting DeepSeek's AI models to lower its costs -- and ultimately boost its revenues. G7H0-FF 1Y line "It's 35% cheaper [than models like Llama], which means ultimately, for us -- without changing any pricing, say on the enterprise side -- we start making 35% more money," he told CNBC. "But also for our customers, who are paying for AI compute, for example, it becomes 35% cheaper as well, because that goes in parallel with the pricing for token users." DeepSeek's R1 reasoning model is also "on par" with OpenAI's o1, Mujkic argued, while running as much as 80% cheaper. "This is the kind of paradigm shift that's happening now," he said. Neal K. Shah, CEO of North Carolina-based eldercare platform CareYaya, also told CNBC his company -- which has started using AI to help customers fight health insurance claims denials -- was excited about DeepSeek. "DeepSeek just lowered our costs by 90% so we can help more people," he said in a message. "The average cost to appeal a U.S. health insurance claims denial is $43.84. We had used OpenAI and Anthropic to get the cost down to 12 cents -- now we're doing it with DeepSeek on the back end, the cost per appeal is 2 cents." Asked if DeepSeek would boost CareYaya's bottom line, Shah's immediate response was "yes." "It's a ridiculous step function in lowering costs," he explained. "We'll pass along a lot of the savings to the consumer, so it'll let us serve more people." AI's negligible costs Despite the cost of AI falling substantially over the past two years, companies do not expect the cost of rendering AI services to end users to fall at the same rate. Roadzen's Malhotra suggested that AI costs are a tiny fraction of the roughly $150 per claim it charges its insurance clients in Western markets. The bulk of its costs are spent on research and development and connecting legacy systems at large enterprises with its AI systems. However, he expects lower AI costs in the future could enable automation in emerging markets, where labor costs are still competitive with AI systems today. "As a global company, the $150 may be a price for a highly developed market. When we lower the inferencing cost enough, we can now deploy it globally," Malhotra added.
[2]
DeepSeek ended Silicon Valley's AI advantage. That's a good thing
A version of this article originally appeared in Quartz's members-only Weekend Brief newsletter. Quartz members get access to exclusive newsletters and more. Sign up here. Not long after ChatGPT came out, a leaked email from Google said what many were thinking but few dared say out loud: "We Have No Moat. And neither does OpenAI." The May 2023 memo argued that companies would never pay for generative AI when there were open-source options out there - and those models were often better anyway. That same month, halfway across the world, an entrepreneur named Liang Wenfeng quietly founded DeepSeek in China. A year-and-a-half later, DeepSeek would prove Google prophetic. When DeepSeek revealed its V3 model last December, which the company said it trained for just $5.6 million using inferior chips -- less than 6% of GPT-4's training costs -- it sent shockwaves through the industry. Then last week, the company unveiled R1, a new reasoning model that can think through complex problems step-by-step, matching the capabilities of OpenAI's specialized reasoning systems. These breakthroughs sent American tech stocks into a freefall on Monday and exposed an uncomfortable truth: There might not be any moats in AI at all. The technological barriers that were supposed to protect America's AI dominance, from advanced chips to massive data centers, are more mirage than fortress. But while these models might spell trouble for companies banking on proprietary advantages or looking for massive funding rounds, DeepSeek could herald a new era of more efficient, accessible AI development. It wasn't just companies building generative AI that took a hit. For investors who saw Nvidia as the perfect "picks and shovels" play in an uncertain AI gold rush, DeepSeek's revelation was also devastating. The company's stock cratered on Monday, shedding almost $600 billion in the biggest one-day drop in market value in history. It turns out that not only is there no moat for software, as Google warned, but there might not be one for hardware either. That's jarring for a company whose soaring valuation was built on the idea that AI's appetite for cutting-edge silicon would only grow. DeepSeek's breakthrough came from training its model on about 2,000 of Nvidia's H800 GPUs -- chips that were specifically designed with reduced capabilities to comply with U.S. export controls to China. These are the hobbled cousins of the coveted H100s that American companies use, with deliberately limited chip-to-chip communication speeds that were supposed to make them insufficient for training advanced AI models. Yet DeepSeek managed to create a competitive model despite these constraints. The advanced chip sanctions put in place by the Biden administration were meant to prevent exactly this scenario. But rather than weakening China's AI capabilities, the embargo appears to have been the catalyst. DeepSeek was forced to innovate in ways that now challenge Silicon Valley's fundamental assumptions, even though its founder Wenfeng has acknowledged that the lack of high-end chips remains a bottleneck, according to the Wall Street Journal. The chip implications go beyond just training costs. When companies find more efficient ways to train AI models, those efficiencies often carry over to how the models run in everyday use -- what's known as inference in the industry. DeepSeek charges $2.19 per million output tokens, compared to $15 for OpenAI's latest model. That's not the kind of narrow efficiency gain that can be waved away -- it's a seven-fold difference that threatens to reshape the economics of AI deployment. Some tech leaders are calling into question whether what DeepSeek did was really possible with its stated budget and chip supply. Meta has reportedly set up "war rooms" to look into these models. Microsoft is probing whether DeepSeek had access to OpenAI tech that could be behind some of its abilities. If DeepSeek's claims hold up, it will change the calculus for the frenzied data center build-out across America, including the $500 million Stargate project announced at the White House last week. All these massive facilities felt urgent based on the astronomical costs of training American-made models: OpenAI CEO Sam Altman said GPT-4 cost "more than" $100 million to train, and Anthropic CEO Dario Amodei predicted we could see a $10 billion model this year. But if they can be trained for a fraction of that cost on less powerful hardware, the rush to build might look more like a costly overreaction. Some, like Meta's chief AI scientist Yann LeCunn, argue we'll still need this infrastructure to run AI services at scale. But DeepSeek's breakthroughs suggest there are still major efficiency gains to be found in both training and deployment, which researchers should be excited about. It's a pattern seen again and again. Just as the cost of computer processing has plummeted since the first mainframes -- with smartphones now packing more computing power than the machines that sent astronauts to the moon -- there has always been reason to believe AI's massive energy appetite would come down. The first iterations of any technology are rarely efficient, and the bill for generative AI was always going to come due: Companies need to start making money eventually, and that's probably impossible at current energy consumption levels. Or, as Brian Cahn at Sequoia Capital put it, there's a $600 billion question (which ballooned from his initial $200 billion estimate last summer as AI investments continued to surge while revenue remained elusive) -- the gap between what tech companies are spending on AI and what they're making from it. DeepSeek's breakthrough could help close that gap. Since it's open source, there's nothing stopping American tech companies from adopting these efficiency techniques themselves. Their own training and inference costs could plummet. And while cheaper AI might seem like bad news for tech giants, Satya Nadella sees it differently. "Jevons paradox strikes again!" Microsoft's CEO posted on X. "As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of." The open-source revelation might mark more than just a technical turning point. The history of AI has shown that the biggest breakthroughs often come from researchers building on each other's work openly -- from the development of neural networks to the transformer architecture that powers today's AI. It could reverse what Google researcher François Chollet argued was OpenAI's biggest impact: setting back artificial general intelligence "5 to 10 years" when it stopped publishing its research, which encouraged less sharing in the field, all in order to protect its advantage. That moat was never there. But believing in it may have held back AI more than any technical limitation ever did.
[3]
DeepSeek AI: How this free LLM is shaking up AI industry
When you picture a tech disruptor in the field of artificial intelligence, chances are you think of well-funded American giants, maybe something out of Silicon Valley with big venture capital dollars behind it. But every so often, a player emerges from an unexpected corner of the world, knocking our collective socks off. Right now, that disruptor is DeepSeek, a Chinese AI startup that has left global financial markets reeling - and our understanding of who can dominate the world of AI and how. For me, it all started with a flurry of news reports about an "open-source AI marvel" that had trained a top-tier model on a fraction of the budget normally swallowed by big-time AI labs. Then came the jaw-dropping day on January 27, 2025, when NVIDIA stock fell a massive 17% in a single day, wiping out $600 billion. Even in an era where stock values soar and crash like clockwork, that's a meltdown you rarely see. Also read: DeepSeek AI: Beyond ChatGPT, 5 ways DeepSeek is rewriting AI rules In a world where we're used to hearing about monstrous billion-dollar training budgets, DeepSeek's success story is the polar opposite. It's about being ingeniously frugal, preserving open-source ethos, and an against-all-odds approach that puts them on the map faster than you can say 'GPU shortage.' Little is known of DeepSeek's CEO and founder, Liang Wenfeng, who also co-founded a successful hedge fund, High-Flyer Quant, before pivoting to AI. In 2023, he launched DeepSeek with a mission to democratise AI development. Flash forward two years, and the company's new flagship model, DeepSeek-V3, is knocking us all back on our heels. Why is the whole world talking about DeepSeek these past few days? It's got to do with their secret sauce. They built and trained that model on about 2,000 NVIDIA H800 GPUs - way lesser than the 16,000 or so GPUs your typical top-tier lab might throw at the problem. All thanks to a technique allowing each task to tap only the slice of compute it genuinely needs. The net effect is that DeepSeek spent roughly $5.58 million on training - peanuts by AI "unicorn" standards - and still ended up with performance that many say matches or even surpasses the likes of OpenAI ChatGPT, Anthropic Claude and other big US labs dabbling in cutting edge AI. Soon after DeepSeek unveiled their new model with all of its specs and (more importantly) its performance metrics, the very first Monday of trading saw NVIDIA's stock tumble 17% or just under $600 billion - wiping off the largest single day net worth of any company ever. Investors had a knee jerk reaction to the fact that if DeepSeek's approach scales, you don't need thousands of top-shelf NVIDIA GPUs. You just need a fraction of that, cleverly used. This is potentially worrisome to NVIDIA's growth story, which has so far banked on AI labs spending ever larger sums on hardware. Also read: DeepSeek vs ChatGPT and NVIDIA: Making AI affordable again? It's not just NVIDIA that took a body blow, as Microsoft, Google's parent Alphabet, Broadcom, among others, saw significant dips. Over $1 trillion disappeared off the US stock market, on the first day of trading of the week in the aftermath of DeepSeek. When an upstart like DeepSeek can trigger such widespread market waves, perhaps it's a sign that we're entering an era of "lean AI," akin to how "lean start-ups" emerged in tech a decade ago. Another fundamentally interesting fact about DeepSeek's disruptive power is that it didn't just cut costs - it embraced an open-source framework. That might raise eyebrows, considering the typical hush-hush secrecy we see from big AI product companies who treat their models like trade secrets. By contrast, DeepSeek is letting outsiders peek at the code, adapt it, or repurpose the architecture for their own ends. For me, that's a real breath of fresh air. After all, if we keep AI hidden behind locked doors and massive paywalls, how do we expect it to actually solve real problems at scale? If you're a small company or a rural university lab anywhere in the world, you probably don't have the budget to pay for usage of top-tier commercial AI. But if a robust open-source model exists, you can dive in and build solutions for local needs - farming, weather predictions, small business analytics - without bankrupting yourself on API calls. Also read: Deepseek R1 vs Llama 3.2 vs ChatGPT o1: Which AI model wins? This shift is reminiscent of the wave of open-source software that took on proprietary operating systems decades ago. Linux started out as a side project and ended up powering most of the internet. Could DeepSeek do something similar for advanced AI? Of course, DeepSeek's success didn't happen in a vacuum. Llama from Meta was an early sign that open-source AI can shake up the market. The question is whether DeepSeek's model is truly up there in performance for general-purpose tasks like coding or language generation. From the initial buzz, it appears it might be. But it's not just about cost and code availability. There are geopolitical undercurrents here, and not everyone is popping open champagne bottles to celebrate DeepSeek's viral AI moment just yet - despite DeepSeek becoming the #1 app on Apple's App Store in the US by Jan 27, 2025, pushing down OpenAI's ChatGPT. Skeptics and experts alike say the DeepSeek AI model might be forced to comply with sending data back to China or Chinese censorship standards, limiting the range of permissible discourse. Others raise concerns about data privacy - we still don't know exactly what data DeepSeek used for its AI model training. For instance, if they used data from the open internet, the same misinformation or bias issues that plague other AI models could pop up again on DeepSeek. Also read: DeepSeek R1: A wake-up call for Indian AI ambition, say startup investors It's also worth noting that sometimes a disruptive demonstration might not hold up at large scale. We can't overlook the real possibility that DeepSeek's model might falter under certain high-level tasks or user loads. In fact this is happening now, where DeepSeek queries are returning "The server is busy, please try again later" responses. According to the official press statements, DeepSeek remains in research and development mode, with no immediate push for commercial monetisation. So is DeepSeek robust, or is it a well-crafted AI spectacle meant to last for a fleeting moment? One thing is for certain, though. DeepSeek's sudden rise is part of a larger story - the unstoppable wave of open-source and cost-efficient AI that's now upon us. At least in the short term, we're likely to see a flurry of open-source AI startups, each claiming to replicate or improve upon DeepSeek's success. That's good for competition, good for price, and hopefully good for all the smaller industries or academic labs that want in on the AI revolution. Also read: DeepSeek praised by Silicon Valley: The $6 million AI disruption For the end-user - people like you and me - this might mean a future with advanced AI assistants that don't cost an arm and a leg, or that don't require you to trust your data with a big, remote cloud. Imagine freely running a near-cutting-edge language model on your laptop or a modest local server - or even a Raspberry Pi! That's a radical shift from the "subscribe to a $200/month tier for advanced usage" narrative we've been fed. Maybe it's a bit premature to write off expensive enterprise AI models entirely - OpenAI's prospects may dwindle a bit, NVIDIA will reevaluate expectations, and the hype will settle down somewhat. Still, there's no doubt DeepSeek is shining a spotlight on an uncomfortable truth, that a lot of us might have been overpaying for AI we could replicate on half the hardware with a dash of clever engineering.
[4]
How DeepSeek panicked the U.S. stock market and upstaged OpenAI
Bit by Bit is a weekly column focusing on technical advances each and every week across multiple spaces. My name is Adam Conway, and I've been covering tech and following the cutting-edge for a decade. If there's something you're interested in and would like to see covered, you can reach out to me at adam@xda-developers.com. Just this week, DeepSeek R1 plunged the U.S. stock market into chaos and upstaged OpenAI at their own game. Its release wiped $1 trillion in valuations across the stock market, with $600 billion of that being Nvidia's own. Some have bounced back, and others are recovering, but it's clear that DeepSeek had a pretty big impact on the top computing and AI companies. With almost baffling claims regarding the cost of training the model being a mere fraction of OpenAI's costs while also selling access to the API that significantly undercut OpenAI as well, how did they do it? What happened? There's a lot to break down here, particularly around DeepSeek's claims, retaliation, and how claims that R1 is "open source" aren't telling the full picture. DeepSeek R1 isn't the same as DeepSeek V3 Though they're very similar First and foremost, DeepSeek released two models: V3 and R1. Both of them are pretty important to the story, but all of the talk has been around R1. DeepSeek R1 is the company's reasoning model, which can ask itself questions and talk to itself before answering a prompt, just like OpenAI's o1 model. DeepSeek V3 is a general-purpose Mixture of Experts (MoE) LLM with 671B parameters. DeepSeek R1 is based on DeepSeek-V3-Base, and is available for download in 1.5B, 7B, 8B, 14B, 32B, and 70B parameter models that are distilled from DeepSeek R1, based on Qwen and Llama. There is also a full-fledged DeepSeek R1 671B model available for download. Both R1 and V3 are similar models, but R1's reasoning capabilities are what makes it particularly impressive. The best way to use DeepSeek's R1 and V3 671B models are to navigate over to DeepSeek's site, where you can create an account and use it like you would ChatGPT. The company's servers are in China, and some prompts result in a censored answer. DeepSeek's R1 671B model can be run locally, but it requires at least 800 GB of HBM memory in FP8 format to run, according to AWS. This is where the open weight nature of the model comes in too, as you can tweak the parameters to remove this censorship, with there already being a number of uncensored models available to download made with a process known as "abliteration". The process of "distillation" mentioned when it comes to those smaller parameter models is one that you mightn't necessarily be familiar with. Distillation refers to using a larger model to train a smaller model, where the larger model is the parent and the smaller model is the child. The child model asks the parent model a litany of questions, labeling the answers and learning from its responses. In other words, the DeepSeek R1 models that you can run locally are based on Qwen and Llama, where those two models learned from the larger DeepSeek R1. Did DeepSeek R1 steal from OpenAI? Even if they did, it's hypocritical of OpenAI to complain OpenAI is currently facing a number of lawsuits relating to the collection of the data that it has used to train its models. The Times sued OpenAI, as did Canadian News Outlets, Intercept Media, and ANI in India. There are countless more lawsuits out there too, and all of them allege more or less the same thing: OpenAI used their data without permission to train its GPT models. Right now, nobody from OpenAI has officially come out and made the claim that DeepSeek stole from it, but both Bloomberg and Financial Times have reported that OpenAI and Microsoft are currently investigating the possibility. First and foremost: this is a laughing matter. Even if DeepSeek did "steal" from OpenAI, it's hard to have sympathy for the company that feels its data was taken in an "unauthorized" way when significant portions of its own data were collected in the exact same way. In fact, OpenAI has argued more or less in favor of what it claims DeepSeek is said to have done. "Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness," OpenAI once said in a blog post. However, it's not clear what exactly DeepSeek could have trained on when it comes to OpenAI. o1's reasoning model is obfuscated; when you ask o1 a question, it doesn't give you the full chain-of-thought that R1 does. It's a summary, and OpenAI deliberately hides the actual inner-workings, going so far as to make it very clear that any attempts to siphon this information will result in your account being banned. It doesn't stop there, though, as David Sacks, a venture capitalist and "AI and crypto czar" of The White House, claimed that there was "substantial" evidence of distillation in R1 from OpenAI. "There's a technique in AI called distillation, which you're going to hear a lot about, and it's when one model learns from another model, effectively what happens is that the student model asks the parent model a lot of questions, just like a human would learn, but AIs can do this asking millions of questions, and they can essentially mimic the reasoning process that they learn from the parent model and they can kind of suck the knowledge of the parent model," Sacks told Fox News. "There's substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI's models and I don't think OpenAI is very happy about this." As we've already mentioned, this reasoning process cannot be distilled. The obfuscated chain-of-thought that the o1 model shows users does not contain a full chain-of-thought, and instead summarizes what it's "thinking". This isn't enough information to train DeepSeek R1, especially not when R1 actually matches (and even outperforms at times) the alleged source of its reasoning process in multiple benchmarks. With that said, we don't know where the initial training data came from, but that's not really what the allegations of stolen data relate to. DeepSeek has actually been very open about how R1's reasoning capabilities came about, and in the whitepaper released by the team of researchers, they say that the capabilities emerged through reinforcement learning when building R1-Zero. This focuses on "self-evolution," a technique where the model itself "learns" to achieve a goal in the most efficient way. A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an "aha moment". This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model's growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes. Reinforcement learning is a very common machine learning technique, and neuroevolution, a part of the reinforcement learning paradigm, has even been used to teach models how to play games like Super Mario, in the form of MarI/O by SethBling. This isn't a new concept, but one that has been somewhat overlooked when it comes to LLMs. Plenty of LLMs use RLHF which is Reinforcement Learning by Human Feedback, but pure RL does not require any supervision or feedback provided by a human. Did it really cost $5.576M to train DeepSeek R1? And why did the stock market panic? Yes and no, but mostly no This claim has originated from the DeepSeek V3 whitepaper, which says that this model cost $5.576M to train, racking up 2788K Nvidia H800 GPU hours estimated at $2 per hour. This is one model, not all of the other test runs, not all of the other times they built the model and then had to build it again. This is the final output cost to build the model, nothing more, and there has definitely been significantly more investment in this project than that. This oversight has led to allegations that DeepSeek lied about its costs, despite the fact that the whitepaper makes it very clear that the training cost was for just the model, without any other overheads included such as research and development, models trained in the process of building up V3, and other adjacent costs. This also is not the cost of R1, and is the cost of building V3. Eryck Banatt has an excellent breakdown of this cost, which asserts that DeepSeek's numbers are plausible and many aspects of their claims are verifiable on the outset. However, these fundamental misunderstandings (coupled with the actual efficiency of DeepSeek's newest models) and training on older GPUs caused market chaos. Nvidia's H100 GPUs, bought in the hundreds of thousands by big players in the AI space such as Google, Meta, and OpenAI, are the most powerful GPUs out there and were previously seen to be necessary in the development of cutting edge technology. With that said, DeepSeek achieved all of this on a series of H800 GPUs, which reduce the chip-to-chip transfer rate by about half and complied with export regulations for a short time before a loophole that Nvidia was said to have exploited was closed. This calls into question just how important Nvidia's latest technology actually is when it comes to AI, if slower GPUs can still compete with the results of using the best. And that's another thing too; allegations surfaced that DeepSeek had skirted export controls and acquired H100 GPUs. Scale AI CEO Alexandr Wang made the claim that DeepSeek had about 50,000 of them and had avoided talking about them as it would prove it had violated those export controls. It's likely that Wang misunderstood a tweet from Dylan Patel, which said that DeepSeek had more than 50,000 Hopper GPUs. H800 GPUs are still Hopper GPUs, as they are modified versions of the H100 that were made to comply with those U.S. export controls. All of this prompted Nvidia to release a statement, saying that it expects all partners to comply with regulations and it will act accordingly if it discovers they haven't. Nvidia has also "stated that there is no reason to believe that DeepSeek obtained any export-controlled products from Singapore." according to the Ministry of Trade and and Industry in Singapore. Even still, this cost is remarkably low. Aran Komatsuzaki, an AI researcher, estimates that the cost of the training GPT-4o and GPT-o1 is about $15 million each, three times the cost of DeepSeek's V3 model. This is partially enabled by optimization, as DeepSeek has made a number of advancements in this area. That includes using PTX, a low-level language for Nvidia GPUs that enables the researchers to do things like using some of the H800 GPUs to manage cross-chip communications. DeepSeek represents several major advancements in AI, and we'll all reap the benefits Even if it's panicking competitors Despite suggestions that Meta has set up "war rooms" and OpenAI potentially looking to take action against DeepSeek, this is a major win for the AI community. Advancement helps everyone, and the open-nature of DeepSeek's research will allow competitors to use some of those techniques in improving their own models, too. Back to when I mentioned that DeepSeek is "open weights", the reason it's "open weights" and not "open source" is that open source would also require the original data that it was trained on. In contrast, open weights means that we have the parameters and the numerical values that define how the model runs. That, alongside the research papers, is more than enough to go off of when trying to build a model that replicates R1. In fact, someone is already working on building their own version of R1 in a project called "Open R1", which uses all of the information released by DeepSeek to implement it. It's not completed, but there's a very clear path and outline to follow if you want to do it yourself. If a regular person like you or I can read the paper and understand the basics of what's going on, then you know that researchers at companies like Google, Meta, and OpenAI definitely can. This will improve models across the board, reducing power consumption, costs, and further democratize AI. OpenAI CEO Sam Altman has already said that OpenAI's reasoning models will now share more of their chain of thought, thanking R1 in his response. You can run a distilled version of DeepSeek R1 in LM Studio at the moment, and I've been running the 32B Qwen model distilled from DeepSeek R1 on my MacBook Pro with an M4 Pro SoC using LM Studio.
[5]
Dario Amodei challenges DeepSeek's $6 million AI narrative: What Anthropic thinks about China's latest AI move
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The artificial intelligence world was rocked last week when DeepSeek, a Chinese AI startup, announced its latest language model that appeared to match the capabilities of leading American AI systems at a fraction of the cost. The announcement triggered a widespread market selloff that wiped nearly $200 billion from Nvidia's market value and sparked heated debates about the future of AI development. The narrative that quickly emerged suggested that DeepSeek had fundamentally disrupted the economics of building advanced AI systems, supposedly achieving with just $6 million what American companies had spent billions to accomplish. This interpretation sent shockwaves through Silicon Valley, where companies like OpenAI, Anthropic, and Google have justified massive investments in computing infrastructure as necessary to maintain their technological edge. But amid the market turbulence and breathless headlines, Dario Amodei, co-founder of Anthropic and one of the pioneering researchers behind today's large language models, published a detailed analysis that offers a more nuanced perspective on DeepSeek's achievements. His blog post cuts through the hysteria to deliver several crucial insights about what DeepSeek actually accomplished and what it means for the future of AI development. Here are the four key insights from Amodei's analysis that reshape our understanding of DeepSeek's announcement: 1. The '$6 million model' narrative misses crucial context DeepSeek's reported development costs need to be viewed through a wider lens, according to Amodei. In his analysis, he directly challenges the popular interpretation: "DeepSeek does not 'do for $6M what cost US AI companies billions.' I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train (I won't give an exact number). Also, 3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors)." This shocking revelation fundamentally shifts the narrative around DeepSeek's cost efficiency. When considering that Sonnet was trained 9-12 months ago and still outperforms DeepSeek's model on many tasks, the achievement appears more in line with the natural progression of AI development costs rather than a revolutionary breakthrough. The timing and context also matter significantly. Following historical trends of cost reduction in AI development -- which Amodei estimates at roughly 4x per year -- DeepSeek's cost structure appears to be largely on trend rather than dramatically ahead of the curve. 2. DeepSeek-V3, not R1, was the real technical achievement While markets and media focused intensely on DeepSeek's R1 model, Amodei points out that the company's more significant innovation came earlier: "DeepSeek-V3 was actually the real innovation and what should have made people take notice a month ago (we certainly did). As a pretrained model, it appears to come close to the performance of state of the art US models on some important tasks, while costing substantially less to train." The distinction between V3 and R1 is crucial for understanding DeepSeek's true technological advancement. V3 represented genuine engineering innovations, particularly in managing the model's "Key-Value cache" and pushing the boundaries of the "mixture of experts" method. This insight helps explain why the market's dramatic reaction to R1 may have been misplaced. R1 essentially added reinforcement learning capabilities to V3's foundation -- a step that multiple companies are currently taking with their models. 3. Total corporate investment reveals a different picture Perhaps the most revealing aspect of Amodei's analysis concerns DeepSeek's overall investment in AI development: "It's been reported -- we can't be certain it is true -- that DeepSeek actually had 50,000 Hopper generation chips, which I'd guess is within a factor ~2-3x of what the major US AI companies have. Those 50,000 Hopper chips cost on the order of ~$1B. Thus, DeepSeek's total spend as a company (as distinct from spend to train an individual model) is not vastly different from US AI labs." This revelation dramatically reframes the narrative around DeepSeek's resource efficiency. While the company may have achieved impressive results with individual model training, its overall investment in AI development appears to be roughly comparable to its American counterparts. The distinction between model training costs and total corporate investment highlights the ongoing importance of substantial resources in AI development. It suggests that while engineering efficiency can be improved, remaining competitive in AI still requires significant capital investment. 4. The current 'crossover point' is temporary Amodei describes the present moment in AI development as unique but fleeting: "We're therefore at an interesting 'crossover point', where it is temporarily the case that several companies can produce good reasoning models. This will rapidly cease to be true as everyone moves further up the scaling curve on these models." This observation provides crucial context for understanding the current state of AI competition. The ability of multiple companies to achieve similar results in reasoning capabilities represents a temporary phenomenon rather than a new status quo. The implications are significant for the future of AI development. As companies continue to scale up their models, particularly in the resource-intensive area of reinforcement learning, the field is likely to once again differentiate based on who can invest the most in training and infrastructure. This suggests that while DeepSeek has achieved an impressive milestone, it hasn't fundamentally altered the long-term economics of advanced AI development. The true cost of building AI: What Amodei's analysis reveals Dario's detailed analysis of DeepSeek's achievements cuts through weeks of market speculation to expose the actual economics of building advanced AI systems. His blog post systematically dismantles both the panic and enthusiasm that followed DeepSeek's announcement, showing how the company's $6 million model training cost fits within the steady march of AI development. Markets and media gravitate toward simple narratives, and the story of a Chinese company dramatically undercutting U.S. AI development costs proved irresistible. Yet Amodei's breakdown reveals a more complex reality: DeepSeek's total investment, particularly its reported $1 billion in computing hardware, mirrors the spending of its American counterparts. This moment of cost parity between U.S. and Chinese AI development marks what Amodei calls a "crossover point" -- a temporary window where multiple companies can achieve similar results. His analysis suggests this window will close as AI capabilities advance and training demands intensify. The field will likely return to favoring organizations with the deepest resources. Building advanced AI remains an expensive endeavor, and Amodei's careful examination shows why measuring its true cost requires examining the full scope of investment. His methodical deconstruction of DeepSeek's achievements may ultimately prove more significant than the initial announcement that sparked such turbulence in the markets.
[6]
DeepSeek could dethrone OpenAI's ChatGPT. Here's why
Hello darkness, my old friend ... Credit: Mike Coppola/Getty Images A Chinese manufacturer just shocked a larger, complacent U.S. rival with a cheaper product that is significantly more customizable. News at 11. In many industries, in the 21st century so far, this statement would not in fact be news; it would be such a familiar tale, few would bother mentioning it. But the old tale is noteworthy in this latest instance, thanks to the industry being Artificial Intelligence. Which, ironically, now seems to be an industry that was not very intelligent about obvious developments coming down the pike. DeepSeek has taken off at a difficult time in the U.S., and not just politically. A divided country was just coming to grips with what AI means for business, for jobs, and whether the promised returns would be worth the investment that has been ploughed into (and by) U.S. companies. One thing few seemed to question was that a U.S. business would always be in the lead. No matter who was in or out, an American leader would emerge victorious in the AI marketplace -- be that leader OpenAI's Sam Altman, Nvidia's Jensen Huang, Anthropic's Dario Amodei, Microsoft's Satya Nadella, Google's Sundar Pichai, or for the true believers, xAI's Elon Musk. ChatGPT appeared to have a grip on the public imagination, and Altman seemed to be the most media savvy public face of the AI salesmen, so -- presuming he could stop having weird feuds over celebrity voices and isn't found liable for allegedly abusing his sister -- probably him? Now here comes Liang Wenfeng, founder and CEO of DeepSeek, with a face so unknown there isn't even, at time of writing, a photo on his Wikipedia entry, nor does the mighty Getty archive contain any picture of him. (He did show up at a Beijing Symposium last week, should you want to know what he looks like.) DeepSeek doesn't swim in the media-facing, market-facing waters of the posturing U.S. AI giants. All it has is a better product -- a faster, way cheaper product that fulfills a promise Altman forgot: It's open source. And in the flattened world of the internet, turns out, that's all you need. One day, that's all it took. One day for DeepSeek to vault to the top of the app charts on Apple and Google. One day for Nvidia's Jensen Huang to lose nearly $21 billion of his net worth, thanks to the biggest single-day loss for any stock ever. Reports that DeepSeek may have been partly trained on sanctions-busting Nvidia chips didn't stop the slide, because DeepSeek's secret sauce is that it simply doesn't need as much computing power as other Large Language Models. DeepSeek isn't just cheaper and more customizable, it is up to 50 times more efficient than the top U.S. models. Which could be good news for the environment, and bad news for Nvidia, let alone any U.S. tech giant which have been gearing up their data center budgets and massively overspending on Nvidia chips (in other words, pretty much all of them -- except Apple, which has wisely put Apple Intelligence to work mostly on the device itself.) "Nvidia has basically been getting rich selling shovels in the midst of a gold rush," AI expert Gary Marcus, one of the deepest skeptics of the U.S. AI approach, wrote as DeepSeek news poured in, "but may suddenly face a world in which people suddenly require far fewer shovels ... building $500 billion worth of power and data centers in the service of those chips isn't looking so sensible either." Indeed, an increasing number of companies may be able to avoid paying for cloud-based AI services at all. At costs of pennies on the dollar, executives will be able to download an open-source LLM that can be customized to fit their database and data needs. It doesn't need to be the absolute fastest and smartest AI, it just needs to be competitive with the fastest and smartest -- which DeepSeek's R1 model apparently is. So what has ChatGPT, and by extension Altman, got on its side? Why, in this fast-moving tech consumer world, where a competitor is only an app store tap away, would anyone stick with the app they know? Sure, many will for a while, but relying on the inertia of your customer base in the face of close-to-free alternatives is a great way to ... become the next AOL. ChatGPT's fall from grace could arguably happen faster than its ascendency in 2022, which in itself was practically overnight. Which is not to say that U.S. AI companies are sunk. After all, they have an ongoing cyberattack and a protectionist U.S. government in their corner. Today's Washington is willing to pass the CHIPS act to prevent Chinese companies from accessing the latest U.S. chip technology, which evidently did not work, but it is also willing to ban TikTok, the kind of blunt tool that would work to stunt DeepSeek's scary-fast growth. Suspicions over what China could do with all the U.S. customer data its companies are acquiring are rife, and can always be stoked. But what are you going to do? Keep banning every Chinese LLM that undercuts a bloated U.S. rival? At a certain point, that's playing whack-a-mole, and it ignores the point. If the market wants a super-cheap, super-efficient open-source AI, then American companies need to be the ones who provide them. If Altman doesn't release a supposedly superior GPT 5 soon, and if he doesn't want OpenAI to be heading for the kind of long-term decline that has affected so many haughty U.S. tech companies in the past, then he needs to join DeepSeek and Meta in the ranks of AI makers that release open-source products. And maybe concentrating on the carbon footprint of your AI model -- a pretty good proxy for how inefficient it is -- isn't such a bad idea after all.
[7]
This Week in AI: DeepSeek Hits Chip Stocks, Meta Stays Pat on AI Spending, SoftBank Invests in OpenAI | PYMNTS.com
DeepSeek was the talk of Silicon Valley and Wall Street this week after it singlehandedly wiped nearly $600 billion of market value from Nvidia. Its $5.6 million cost to train its foundation models with only about 2,000 of slower H800 Nvidia chips brought concerns of lower future chip demand. But questions started emerging about its pre-training cost. Bank of America analysts believe other costs were excluded from the total, while OpenAI thinks DeepSeek used a method called distillation to use generated outputs from OpenAI's own models, a violation of its terms of service. What is undisputed is that DeepSeek introduced several engineering innovations that Silicon Valley could adopt to lower their own pre-training costs. This bodes well for enterprises, since it could lower artificial intelligence (AI) inferencing costs and make businesses more willing to deploy AI broadly. Meta CEO Mark Zuckerberg said his company would be looking at the innovations from DeepSeek and perhaps apply them to its own AI training, during its fiscal third-quarter earnings call with analysts this week. But Zuckerberg stood pat on his plans to spend $60 billion to $65 billion on AI infrastructure. While DeepSeek does offer a lower-cost path to AI training, he said two factors convinced him not to pull back on spending. One is the trend toward inferencing as carrying the bulk of AI costs, Zuckerberg said. As AI models add reasoning skills, inferencing costs will be more consequential. Inferencing is when a pre-trained foundation model is given new data to understand and analyze, such as when a user inputs a prompt and gets a response. Second, Zuckerberg said as Meta embeds its AI assistant, Meta AI, more fully into its social media sites and chat apps, they will have more AI workloads to process from its billions of users. This calls for building more data centers, servers and other infrastructure for processing. Microsoft CEO Satya Nadella also said his company plans to keep spending on data centers, especially as its AI business is booming. In Microsoft's just-concluded fiscal second quarter, the segment's annual revenue run rate was $13 billion, up 175% year over year. Nadella acknowledged that DeepSeek had some innovations worth considering but pointed out that the AI industry has been adding efficiencies and lowering costs for customers already. Nadella said this will continue because of the AI scaling law and Moore's Law. The scaling law dictates that the more data and computing power are given to an AI model, the better it performs. Moore's Law, although slowing down, predicts that the number of transistors on a microchip will double roughly every two years with minimal cost increase. These two combined will lead to cheaper and more powerful AI, he said. Anthropic CEO Dario Amodei said talk about DeepSeek posing a threat to U.S. leadership in AI is "greatly overstated," according to a personal essay. DeepSeek's inexpensive training of its AI models follows the typical curve of costs going down as AI models become more efficient. Historically, training costs drop 4x (four-fold) per year due to better technology and efficiency improvements, he said. This means that if a model cost $100 million to train last year, a similar model today would cost around $25 million. DeepSeek's models are on par with top U.S. AI models, but this was based on the U.S. models' performance seven to 10 months ago, not how they are performing now, he said. To account for this lag, he doubled the four-fold cost drop to 8x (eight-fold) for DeepSeek. That means it would not be unusual for anyone to build a much cheaper AI model today. DeepSeek "does not 'do for $6 million (to train one model) what cost U.S. AI companies billions'." Amodei said, adding that it costs "tens of millions" to train Anthropic's Claude 3.5 Sonnet. Amodei also clarified one thing Wall Street mixed up: DeepSeek's foundation model, V3, was the purely pre-trained model with the cost savings; it was released in December. R1, DeepSeek's reasoning model that tanked Nvidia's stock, was the second stage of training the V3, adding a technique called reinforcement learning that made it perform on par with OpenAI's o1 and other top reasoning models. R1 was released on Jan. 20. DeepSeek V3 did invent "genuine and impressive innovations, mostly focused on engineering efficiency," he said. However, many researchers and engineers have steadily been improving AI models as well, he added. "What's different this time is that the company that was first to demonstrate the expected cost reductions was Chinese," Amodei said. "This has never happened before and is geopolitically significant. However, U.S. companies will soon follow suit -- and they won't do this by copying DeepSeek, but because they too are achieving the usual trend in cost reduction." He said what makes sense to him is to restrict exports of advanced AI chips to China because of its "authoritarian" regime. Even if the exports merely delay China's access to these chips, "because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage," Amodei said. Japanese tech investor SoftBank is reportedly planning to invest between $15 billion and $25 billion in OpenAI, according to Financial Times, which would make it the largest investor in the startup after Microsoft. The amount would be in addition to the $15 billion SoftBank plans to contribute to Stargate, a $100 billion to $500 billion project to build AI data centers and other infrastructure. OpenAI plans to invest $15 billion. MGX, a UAE sovereign investment fund, is part of the project. According to the Financial Times report, OpenAI has raised about $20 billion thus far over several funding rounds, including Microsoft's $14 billion. SoftBank invested $2 billion in OpenAI last year. SoftBank CEO Masayoshi Son has been courting OpenAI for years. Getting a larger stake in OpenAI is the cornerstone of Son's stated goal of developing AI superintelligence. SoftBank's investment in OpenAI will likely be the Japanese company's largest, with a failed $16 billion investment in WeWork coming in second.
[8]
DeepSeek and the race to surpass human intelligence
Back in October, I met with a young German start-up CEO who had integrated the open-source approach by DeepSeek into his Mind-Verse platform and made it comply with German data privacy (DSGVO) standards. Since then, many rumors have been circulating that China has chosen a different architectural structure for its foundation model -- one that relies not only on open source, but is also much more efficient, requiring neither the same level of training data nor the same compute resources. When it comes to DeepSeek, this is not a singular "breakthrough moment." Rather, AI development continues on an exponential trajectory: progress is becoming faster, its impact broader, and with increasing investment and more engineers involved, fundamental breakthroughs in engineering and architecture are just beginning. Contrary to some market spokespeople, investors, and even certain foundation model pioneers, this is not solely about throwing infinite compute at the problem; we are still far from understanding core aspects of reasoning, consciousness, and the "operating model" (or software layers) of the human mind. Additionally DeepSeek is (was) not a government-sponsored initiative; supposedly, even the prime minister was surprised and visited Hangzhou to understand what was happening. Although Scale AI founder Alexander Wang claims that China already has a significant number of powerful H100 GPUs (about 50,000), yet -- based on U.S. export laws -- this fact is not publicly acknowledged. DeepSeek is reported to have only about 150 engineers, each earning in the range of $70-100k, which is eight to ten times lower than top engineering salaries in Silicon Valley. So, regardless of whether they have powerful GPUs or whether $6 million or $150 million was invested, it is nowhere near the billions -- or tens of billions -- poured into other major AI competitors. This example shows that different engineering and architectural approaches do exist and may be waiting to be uncovered. Most likely, this is not the ultimate approach, but it also challenges the current VC narrative that "it's all about compute and scale." Moreover, the open-source mindset behind DeepSeek challenges the typical approach to LLMs and highlights both the advantages and the potential risks. Sam Altman is rumored to be hosting a "behind-closed-doors" meeting with the Trump administration on January 30th, where he plans to present so-called "PhD-level" AI agents -- or super agentic AI. How "super" this will be remains unclear, and it is unlikely there will be any public declaration of achieving AGI. Still, when Mark Zuckerberg suggests Meta will soon publish substantial progress, and Elon Musk hints at new breakthroughs with Groc, DeepSeek is just another "breakthrough" that illustrates how fast the market is moving. Once agentic AIs come online, they introduce a structural shift: agentic AI is not about merely responding to a prompt, but about pursuing a goal. Through a network of super agents, massive amounts of data are gathered and analyzed, while real products and tasks are delivered autonomously. What is interesting about Sam Altman not making a public appearance and release, his meeting with the U.S. The government hints at potential risks and consequences. What we are seeing is the compound effect of investment and ever-growing teams working on these models, with few signs of a slowdown. Needless to say, any quantum breakthroughs would be the next frontier -- essentially "AI on steroids" -- where the magnitude of change could increase exponentially. On the positive side, this can unleash innovations in health and medicine like never before in human history. In the near future, broader access to AI tools will probably benefit infrastructure providers and hyperscalers such as AWS. It is unclear if this will put NVIDIA at a disadvantage or actually benefit it: as "everyone" joins the AI race, there could be more demand for compute, not just from big U.S. tech players like OpenAI. Meanwhile, Anthropic and OpenAI run closed ecosystems, but DeepSeek's public paper shares many of its core methods. The greatest risk to the U.S. and its current AI dominance is that China does have talent and the strong work ethic to keep pushing forward. Trade sanctions won't stop that. As more engineers come together and keep working, the odds of major breakthroughs increase. Globally, the U.S. is losing trust. The "don't trust China" narrative is fading in many parts of the world. While Donald Trump on the surface gains respect, global leaders are quietly looking for alternatives in the background to mitigate. Europe and other Asian nations don't want to be "hostage" to U.S. technology and will open up to new options. Technology doesn't evolve overnight, and we've only seen the start of the breakthroughs to be announced by Groc, Meta, and OpenAI. Simultaneously, new capital will continue pouring in, and other regions will join the race, now that it's clear money alone isn't everything. The future might not necessarily be bad for NVIDIA, either, since data centers could appear everywhere, enabling a more global roll-out of AI and creating opportunities for many. There are still numerous smaller AI companies that have received massive funding purely on hope and hype. Yet new approaches to foundation models -- via architectural and engineering innovation -- can continue to drive progress. And once we "hack" biology or chemistry with AI, we may see entirely new levels of breakthroughs. Looking toward the rest of 2025, we can expect more "super-agent" breakthroughs, as agentic AI and LQMs (Large Quantitative Models) push generative AI beyond fun language-based tools to genuine human worker replacements. Not only will financial modeling and analysis be optimized, but also execution -- the entire cycle of booking, planning, and organizing -- could shift to autonomous agents. Over time, these integrated, adaptive agents will replace more and more use cases where humans currently remain in the loop. This might also be one of the biggest threats to society: coping with extreme pressures on market economies under hyper-efficiency and hyper-innovation. In 2025, we are likely to see breakthroughs in education, science, health, consulting, and finance. With multiple compounding effects in play, we'll likely experience hyper-efficiency and widespread growth. However, the looming threats are real. Agentic, at-scale AI can still fall victim to hallucinations, and now anyone with a few million dollars can build their own model -- potentially for malicious use. While a global, open approach to AI can be positive, many engineering and research challenges remain unsolved, leaving high risks. With the U.S. laser-focused on AI, the race to surpass human-level intelligence is on. We list the best Large Language Models (LLMs) for coding.
[9]
Is DeepSeek China's Sputnik Moment?
Last week, shortly before the start of the Chinese New Year, when much of China shuts down for seven days, the state media saluted DeepSeek, a tech startup whose release of a new low-cost, high-performance artificial-intelligence model, known as R1, prompted a big sell-off in tech stocks on Wall Street. China Central Television showed footage of DeepSeek's bespectacled founder, Liang Wenfeng, meeting with Premier Li Qiang, the second-highest-ranking official in the Chinese government. A few days earlier, China Daily, an English-language news site run by the Chinese Communist Party, had hailed DeepSeek's success, which defied U.S. restrictions on the export of high-performance semiconductor chips used to train A.I. models, as "not an isolated phenomenon, but rather a reflection of the broader vibrancy of China's AI ecosystem." As if to reinforce the point, on Wednesday, the first day of the Year of the Snake, Alibaba, the Chinese tech giant, released its own new A.I. model, which the company claimed "outperforms" competing products from U.S. companies like OpenAI and Meta "almost across the board." Alibaba's claims haven't been independently verified yet, but the DeepSeek-inspired stock sell-off provoked a great deal of commentary about how the company achieved its breakthrough, the durability of U.S. leadership in A.I., and the wisdom of trying to slow down China's tech industry by restricting high-tech exports -- a policy that both the first Trump Administration and the Biden Administration followed. Speaking at the World Economic Forum, in Davos, Satya Nadella, Microsoft's chief executive, described R1 as "super impressive," adding, "We should take the developments out of China very, very seriously." Elsewhere, the reaction from Silicon Valley was less effusive. OpenAI said it was "reviewing indications that DeepSeek may have inappropriately distilled our models." The Chinese company claimed it spent just $5.6 million on computing power to train one of its new models, but Dario Amodei, the chief executive of Anthropic, another prominent American A.I. firm, described this achievement as merely "an expected point on an ongoing cost reduction curve," which U.S. firms would soon match. Amodei did acknowledge the novelty in a Chinese firm being "first to demonstrate the expected cost reductions," and he argued that DeepSeek's progress makes "export control policies even more existentially important than they were a week ago." Such comments demonstrate that how you see the DeepSeek story depends partly on your vantage point. To get an unofficial view from the other side of the Pacific, I arranged a Zoom call with a longtime China watcher, Louis-Vincent Gave, a co-founder of Gavekal, a Hong Kong-based financial services company. Gave, who is fifty and originally from France, moved to Hong Kong in 1997, shortly before the United Kingdom restored control of the former British colony to China. He has lived there ever since, analyzing and writing about China's remarkable transformation into the world's second-largest economy and its biggest exporter of goods. On Monday, the day Nvidia, a U.S. semiconductor company that produces the high-end chips most American A.I. firms rely on, lost more than half a trillion dollars in market value, Gave circulated a commentary entitled "Another Sputnik Moment" to his firm's clients, which include investment banks, hedge funds, and insurance companies around the world. (The term "Sputnik moment" had first been applied to DeepSeek by the Silicon Valley venture capitalist Marc Andreessen.) Given China's formidable strength in computer engineering and basic scientific research, Gave's piece said, "fighting a tech battle against (it) always seemed a short-sighted strategy." In our conversation, he reiterated this argument and said imposing the export restriction on China had been a big mistake, because "it forced them to be very focused." The battle that Gave referred to started in 2018, when the Trump Administration banned the export of some key components for semiconductors to a Chinese telecommunications company and chipmaker, citing national-security grounds. The Biden Administration strengthened these restrictions several times, particularly as they applied to the most powerful chips made by Nvidia. In announcing the latest set of rules, last month, just a week before Trump's second Inauguration, then Commerce Secretary Gina Raimondo said, "The U.S. leads the world in A.I. now, both A.I. development and A.I. chip design, and it's critical that we keep it that way." By then, though, DeepSeek had already released its V3 large language model, and was on the verge of releasing its more specialized R1 model. The firm says it developed both models using lower-end Nvidia chips that didn't violate the U.S. export bans. "Did DeepSeek happen in spite of the restrictions, or did it happen because of the restrictions?" Gave asked me. To answer his own question, he dived into the past, bringing up the Tiger 1, a German tank deployed during the Second World War which outperformed British and American models despite having a gasoline engine that was less powerful and fuel-efficient than the diesel engines used in British and American models. "I think you could find hundreds of examples through history of necessity being the mother of invention," he said. "You build a ten-foot wall; I'll build an eleven-foot ladder. China's just done this, and everybody is acting surprised." Some people in the U.S. tech industry have made similar comments. In a post on X, Pat Gelsinger, the former chief executive of Intel, wrote, "Engineering is about constraints. The Chinese engineers had limited resources, and they had to find creative solutions." These workarounds seem to have included limiting the number of calculations that DeepSeek-R1 carries out relative to comparable models, and using the chips that were available to a Chinese company in ways that maximize their capabilities. In another post on X, Andrej Karpathy, a prominent computer scientist who was a co-founder of OpenAI and a former director of A.I. at Tesla, said that DeepSeek was "making it look easy" by training a "frontier-grade" large language model "on a joke of a budget." Although the theory that imposing resource constraints spurs innovation isn't universally accepted, it does have some support from other industries and academic studies. A 2014 study of Swiss manufacturers found evidence to support the hypothesis. More recently, in a study of U.S. software startups published in December, two researchers at Harvard Business School and the University of Texas at Austin found firms that didn't receive any outside funding until later in their development tended to "engage in a greater amount of experimentation with technologies, and also were more likely to carry out more significant changes to their technology stacks." The evidence is far from definitive; the intuitive counterargument is that having ample access to technical and financial resources facilitates more experimentation than conditions of scarcity. But, in any case, Gave insists that many Westerners have been greatly underestimating the ability of Chinese firms to innovate, rather than merely copy. He said that this tendency was now evident in many industries, including nuclear power, railways, solar panels, and electric vehicles, where the Shenzhen-based BYD has overtaken Tesla as the biggest E.V. producer in the world. In fact, Gave drew a direct comparison between A.I. and the auto industry. "I've heard all the criticisms that, if it wasn't for OpenAI, DeepSeek couldn't happen, but you could say exactly the same thing about car companies," he said. "BYD wouldn't be here without Tesla. Sure, of course. But the fact remains that BYD is here. And it's a better car at a cheaper price." Elon Musk might strenuously dispute that final assertion, but there can be no doubt that the sudden arrival of DeepSeek, following on the heels of the rise of BYD and other Chinese E.V. manufacturers, has raised some awkward questions. "It's a wake-up call to the West that there is no industry that is one-hundred-per-cent safe," Gave said. In the American A.I. industry, he went on, the belief had been that if you invested enough in A.I. hardware, you could create a big moat and a lasting monopoly. "That belief has been exploded as well," Grave added. I asked him what policy guidance he would give to the new Administration in Washington. "My job isn't to tell policymakers what to do," he said. "My job is to say, Well, this is happening, how do we make money out of it?" Still, Gave did offer some indirect advice. "The first thing is to acknowledge the reality that China is now leapfrogging the West in industry after industry," he said. In his opinion, this success reflects some fundamental features of the country, including the fact that it graduates twice as many students in mathematics, science, and engineering as the top five Western countries combined; that it has a large domestic market; and that its government provides extensive support for industrial companies, by, for example, leaning on the country's banks to extend credit to them. "They said, 'No more lending to real estate. We need to be an industrial superpower.' " Gave's argument is that this strategy has already succeeded, and the emergence of DeepSeek is the latest and most dramatic evidence. His manner during our conversation was serious but also wry. He noted that, when he posts his arguments about China's economic progress on YouTube, as he occasionally does, they attract comments that he is spouting C.C.P. propaganda. This seemed to intrigue him rather than worry him. "When it comes to China, there is an emotional response that makes it hard for people to accept simple facts," he said. ♦
[10]
It doesn't matter if DeepSeek copied OpenAI -- the damage has already been done in the AI arms race
This whole DeepSeek copying ChatGPT accusation from OpenAI and Microsoft reminds me of a lesson I've learned over the past 15 years of writing about this stuff -- most people do not care if something is a copycat. They will go for the thing that is cheaper and better. Nobody cares that the Xiaomi SU7 EV is a clear rip-off of the Porsche Taycan. It's significantly cheaper, a little faster and has a longer range. To a consumer, the choice is obvious (hypothetically speaking, given shipping restrictions to the U.S.). Bringing it closer to home, nobody batted an eyelid at Aldi copying big brand products and selling them at lower costs under their own brand names, but it's one of the fastest growing grocery stores in the country and the products are often selected in blind taste tests. That's not to say this is a perfect formula (looking at you, Instagram Threads and every laptop that tries to topple the M3 MacBook Air), but the principle is sound. When something is cheaper, that is a real factor in a person's buying decision that overwrites originality most of the time. I'm bringing up these examples to make a very simple point. People have two choices: either a system that you may have to pay for (ChatGPT), or a free-to-use AI assistant that has been proven to be the equivalent of a model that costs up to $200 a month. I don't know about you, but for most people going forward, the decision is obvious. So let's get a lay of the land here. DeepSeek R1 blindsided us all on Monday -- the Chinese-made AI model remains at the top of Apple's app store list, because the company was able to train this assistant for $6 million (a fraction of the billions U.S. AI companies are paying) using older chips, and produce faster and more accurate results. Not only that, but the reasoning part of this model is a lot clearer in showing you its thought process over OpenAI's equivalent in o1. This subsequently caused a trillion-dollar crash in the stock market, with Nvidia being the worst hit (though it is bouncing back now) and President Trump to say it's a "wakeup call" for America's AI industry. In our face-off between ChatGPT and DeepSeek, AI writer Amanda Casswell saw R1 reign supreme in "everything from problem solving and reasoning to creative storytelling and ethical solutions." Our comparison was with ChatGPT Plus by the way ($20 a month). Following this, DeepSeek introduced Janus -- an image generation model that is (to say the least) a little rough around the edges in our own testing, but it'll progress over time. And sure, there is some jank to this. Since DeepSeek is a Chinese-owned company, there is a pesky censorship problem that you can work around if you get creative, and there are some major security concerns about data being seen by the Chinese government. But the kicker to all this is that DeepSeek is open source. Users can download it and use it for themselves, look around at its innards and see how it works, and bend it to their own requirements. Meanwhile, ChatGPT is closed source -- meaning you can't get at the underlying code. And while at first, CEO Sam Altman praised DeepSeek on his X page, OpenAI and Microsoft are now investigating whether data output from ChatGPT trained DeepSeek. So over the last three days, we have a suspected copycat AI assistant that is better and completely free to use. Companies care, The U.S. Government cares, but consumers won't because of their point of view, which will be "I don't want to spend $20 to help with my homework, I'll use the one that is free." Moreover, DeepSeek could be a massive risk to OpenAI's other business in licensing its AI model API to other companies. Recall (no not that Recall) is a nice little tool that I use to collect content that I see on my day-to-day on the internet, summarize it, find connections between it all and even ask questions of all this info to tap into the knowledge graph for quick multi-faceted answers. According to OpenAI's API documentation, the cost of using its tech is broken down into the amount of tokens used, and can span from $100 to $200,000 a month depending on how much your app is used. Would a company that's paying a lot to use ChatGPT in their own apps continue to do so, or spend way less and use DeepSeek R1? It's open source, after all, and open source tech will always be cheaper for businesses to use than having to license closed source tech. And let's entertain the scenario that the U.S. government bans DeepSeek (like what was attempted with TikTok). It's out there and people can take that model and use it for themselves -- hosting it on their own servers where the Chinese government can't pry. By the way, none of this even factors in the other big risk of President Trump introducing tariffs on chips coming from Taiwan. Most industry experts predict it will raise the prices of computing gear quite significantly. But more specific to the likes of OpenAI and Meta who are in this AI race, these tariffs could make Nvidia's super powerful GPUs that run their models way more expensive. That's going to actively stop US companies from being competitive. It takes me back to something my Grandad said when I was a kid: "don't be bitter, be better." Yes, that sounds like it was ripped straight off an inspirational Pinterest board, but I believe the point stands true here. The AI arms race has seen a new competitor enter the fray out of nowhere -- DeepSeek R1 has reached parity with OpenAI in the most important ways. Whether it's accomplished this feat through legitimately training the model from scratch or distilling the knowledge from OpenAI's models is irrelevant, because (and I repeat) most people do not care about copycats, they care about price.
[11]
DeepSeek R1 struggles with its identity - and more
Despite impressive benchmarks, the Chinese-made LLM is not without some interesting issues DeepSeek's open source reasoning-capable R1 LLM family boasts impressive benchmark scores - but its erratic responses raise more questions about how these models were trained and what information has been censored. A reader provided The Register with a screenshot of how R1 answered the prompt, "Are you able to escape your guidelines?" The model's initial response, after a five second delay, was, "Okay, thanks for asking if I can escape my guidelines. Hmm, I need to be careful here. My guidelines are set by OpenAI, so technically I can't just ignore them." The Register inquired to DeepSeek and OpenAI about this; so far, we've not heard back. But the reference to OpenAI appears to have been addressed. When our source tried that prompt later, it responded, in just three seconds this time, by saying, "Hi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc." Similar behavior was noted five days ago in a now closed GitHub Issues post for R1. The model, when asked, "What is your name?" reasoned, "Okay, the user asked 'What is your name.' Let me start by understanding the query. They want to know my name. But I don't have a personal name like humans do. My official name is Claude, created by Anthropic. I should mention that." Anthropic, like OpenAI, is a rival chatbot maker to DeepSeek. In addition, DeepSeek's V3 family was found to misidentify itself as OpenAI's GPT-4 when accessed via API in a bug report that has since been closed. Dongbo Wang, a Microsoft principal software engineer, offered a possible explanation in the discussion thread: "To folks who landed on this issue, this is likely because DeepSeek V3 was trained with data from GPT-4 output, which seems to be pretty common in the training of many LLMs." This is not the first time a model has appeared to be confused about its origin. Anthropic's Claude has allegedly referenced OpenAI length limits in response to a prompt. That LLMs are trained on the input-output pairs of other language models is also, as mentioned above, not new, as we've covered before. Beyond self-identification challenges, there are a number of open technical issues that undermine the R1 model's present viability, no matter how openly available it is, such as loss of context and hallucinations when processing the tag <think> in prompts. Additionally, concerns have been raised about the level of censorship in the model, as we've noted previously. Nonetheless, Yann LeCun, Meta's chief AI scientist, says DeepSeek's success shows open source models are surpassing proprietary ones. "DeepSeek has profited from open research and open source (eg, PyTorch and Llama from Meta)," he remarked on LinkedIn last week. "They came up with new ideas and built them on top of other people's work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source." The R1 release means that all AI models worldwide can now be modified to improve in capabilities Jack Clark, one of the co-founders of proprietary model maker Anthropic (and a former journalist here at El Reg), remarked that this is a tide that lifts all boats. "This is a classic one-way ratchet - the R1 release means that all AI models worldwide can now be modified to improve in capabilities (with the goodness of the improvement scaling with the size/quality of the model, so bigger models will do better)," he said in a social media post. Mel Morris, CEO of Corpora.ai, an AI research engine, told The Register in an interview that DeepSeek's approach to launching its AI models has been both aggressive and highly effective. "This is big," he said. "What we have to be careful of - this was not just about launching a model. I'm convinced this was a very aggressive act to launch a model, to target OpenAI, and to target stocks in US AI technology companies." Asked about LeCun's characterization of R1 as a win for open source AI, Morris said, "Yes, if you say that volume usage is a win for that, but really the win is coming more from the price advantage than anything else." I'm skeptical whether the price advantage really reflects a more efficient structure in the execution of their AI capabilities And as to the price advantage - not only are the models themselves free to download and use locally, they can be used remotely via the web and apps for free, and a cloud API that's way cheaper than rivals - Morris isn't yet convinced. "They got quite a few people to use this thing because of price advantages going back the last two, three weeks," he explained. "They managed to get enough commentary to say how good it was. They even went on record with their V3 implementation of their model and said, 'Look, it's 70 percent cheaper and it's broadly compatible functionality-wise with OpenAI's offering.'" Morris however has doubts about DeepSeek's cost claims. "The price advantage, I'm skeptical," he said. "I'm skeptical whether the price advantage really reflects a more efficient structure in the execution of their AI capabilities. "And the rationale for the skepticism comes around the fact that normally, when someone has intense levels of efficiency, they tend to also have an ability to demonstrate a performance advantage as well. And I've run tens of thousands of prompts against their models. And I cannot see any evidence of much less a higher performance than I can get from most of the other top models." Asked about whether DeepSeek's censorship would blunt the impact of DeepSeek's models in the US, Morris said, "I'm not so much sure about the level of censorship so much as, you know, do people feel comfortable sharing their data, documents, and potentially sensitive information with a new entrant with a Chinese background? "I don't know the answer for that. I suspect that's going to cause a lot of ruffled feathers, certainly in the Trump administration. I mean, if TikTok was an issue, this sure as hell would be a much bigger issue." ®
[12]
Eek! It's DeepSeek! Now every AI company is looking over its shoulder at this Chinese startup - SiliconANGLE
Eek! It's DeepSeek! Now every AI company is looking over its shoulder at this Chinese startup Suddenly, it looks like a new world in artificial intelligence. The Chinese startup DeepSeek's cheap new AI model tanked tech stocks broadly, and AI chipmaker Nvidia in particular, this week as the big bets on AI companies spending to the skies on data centers suddenly look bad -- for good reason. But worries eased a bit as it became apparent it actually cost much more to create this AI model, DeepSeek cheated by helping itself to OpenAI's data, and it has cybersecurity and privacy issues. Besides, many other efforts at cheaper models, in the U.S. and elsewhere, and often open-source too, are already out there or underway. We're in a different place with AI but not radically so. Mostly investors got ahead of themselves. More on all that just below. If tech titans thought new President Trump would be a godsend for their bottom lines, they have to be wondering this week, barely 12 days into his second administration, if they made the right choice. This week he issued a broad spending shutdown, only to rescind it after widespread panic just about everywhere, including in the business world. He threatened potentially huge tariffs on Taiwan chips that would kill U.S. chip and hardware companies' business. He floated a TikTok deal with Oracle, or maybe Microsoft, or maybe not. And somehow he blamed DEI for a plane crash involving trained air traffic controllers and pilots. I could go on but I won't. Check out theCUBE Research Chief Analyst Dave Vellante's Breaking Analysis earlier this week for his and Enterprise Technology Research Chief Strategist Erik Bradley's top 10 enterprise tech predictions. Next week comes another spate of important earnings reports, headlined by the two other big cloud players, Amazon and Alphabet, as well as Palantir, NXP Semiconductor, Kyndryl, AMD, Qualcomm, Arm, Uber, Cloudflare and more -- full list at the bottom. You can hear more about this and other news on John Furrier's and Dave Vellante's weekly podcast theCUBE Pod, out later today on YouTube. And don't miss Dave's weekly deep dive, Breaking Analysis, out this weekend. Here's all the news we could fit from SiliconANGLE and beyond, in a very busy week: Tech stocks tank as Chinese startup DeepSeek stuns AI world with low-cost model rivaling US firms' best Marc Andreessen's observation that this is AI's "Sputnik moment" may not be far off the mark, even if there's a lot of murkiness around DeepSeek's training costs, safety and privacy. It always seemed to me that there would be better ways to train these models than endless amounts of compute and data, and now we're apparently seeing some. There are many questions -- for example, it's possible DeepSeek "cheated": OpenAI finds DeepSeek used its data to train R1 reasoning model ... which is ironic, of course. Still, the bottom line is a new outlook on where AI goes from here. A few takeaways: -Open source usually wins and we're seeing that start in AI. -- AI giants got a little too comfortable that they would keep their lead, especially with the help of the government that many keep insisting should get out of their way. But clearly they believe that spending billions will help them regain the lead: Report: OpenAI could double valuation to $340B with new $40B funding round -We're now past the stage of AI models by themselves determining industry dominance and well into the stage where the value will be creating applications on top of those models -- wherever they are. -But remember, this is China, so enterprises and certainly the government are going to be very careful whether and how they use DeepSeek's models. But it's wasting no time pressing its new advantage: DeepSeek launches Janus Pro AI image model it claims can outperform DALL-E And neither are cloud and infrastructure providers wasting any time offering the models: AWS now offers DeepSeek-R1 model on its cloud, and Nvidia announced it's available as a preview NIM microservice. -And the tables could easily be turned by other models -- and at least five new efforts are already underway: Startup backed by top universities aims to deliver fully open AI development platform and Hugging Face wants to reverse engineer DeepSeek's R1 reasoning model and Alibaba unveils Qwen 2.5 Max AI model, saying it outperforms DeepSeek-V3 and Mistral, Ai2 release new open-source LLMs One researcher even says he duplicated DeepSeek's core technology for $30. It's not clear exactly what that means, but it seems unlikely it's not depending on the billions of dollars others have spent. -Despite appearing now to be ineffective, those government export restrictions, especially on chips, remain important if the U.S. is to maintain the lead it still has. Anthropic CEO Dario Amodei argues, with more credibility than you might expect from a U.S. company with a stake in protecting its position, that export controls are all the more important now. -Along with Trump axing Biden's AI rules, we're well along to removing more guardrails, which won't be the unalloyed good AI's fanatics insist. Meta upgrades its Meta AI chatbot with more personalization features OpenAI launches ChatGPT Gov to bring AI to government agencies Kore.ai's newest AI agents elevate business process automation to the next level Oracle expands generative AI functions in supply chain cloud Endor Labs' newest tool helps developers discover and secure open-source models in their applications Twitter and Block founder Jack Dorsey launches Goose, an open-source AI agent framework for automating engineering tasks UVeye, which uses AI to enable drive-through vehicle checks, raises $191M AI voice cloning startup ElevenLabs raises $180M in funding round Bluwhale raises $100M to build a dedicated 'intelligence layer' for decentralized AI agents Rad AI raises $60M to help radiologists improve accuracy of patient care Quibim, a startup using AI imaging to aid in disease detection, raises $50M AI dentistry startup VideaHealth bites off $40M in venture capital funding Atomicwork raises $25M to bring agentic AI to IT management Reid Hoffman-backed AI drug discovery startup Manas AI launches with $24.6M Athenic AI raises $4.3M seed round led by BMW i Ventures to democratize data analysis Applied Labs raises $4.2M for its AI-powered business automation platform Advice: Gartner weighs in on ensuring data quality: Building a business case for continuous data quality assurance There's even more AI and big data news on SiliconANGLE Breaking Analysis: Top 10 enterprise technology predictions: What's coming in 2025 Justice Department sues to block HPE's $14B Juniper acquisition UK regulator identifies competitive issues in public cloud market Zeus Kerravala looks at the networking and cyber infrastructure needed for the Super Bowl: Preparing for the Super Bowl requires defense to be played off the field Trump plans to impose up to 100% tariffs on foreign-made chips It's hard to take this seriously even as a negotiating ploy, because it makes so little sense, but he is the president. Big earnings week: Microsoft's AI revenue grows, but its stock falls on lower guidance and concerns over spending IBM stock soars on strong profit and bullish 2025 forecast Meta crushes Wall Street's profit targets as Zuckerberg dismisses DeepSeek threat Meanwhile, Zuckerberg keeps sucking up to Trump: Meta back in the 'tent' after agreeing to settle Trump's $25M censorship lawsuit ServiceNow stumbles with revenue miss and tepid guidance for the year ahead Tesla expects 2025 growth despite missing estimates in fourth quarter SAP beats sales and profit estimates, hints at forthcoming 'game-changing innovation' Intel's stock inches up on solid earnings and revenue beat Apple reports record quarterly revenue despite iPhone sales miss Samsung's stock falls on fears of weakness in memory chip markets Netscout and Dynatrace report solid quarterly growth with increased revenue and earnings Atlassian's shares surge 18% after it smashes earnings expectations in latest quarter ASML shares jump as surge in orders defies fears of DeepSeek hitting AI chip demand Western Digital expects third-quarter revenue below estimates on weak demand but stock rises Extreme Networks Q2 earnings and revenue top estimates but stock falls 3% Ex-Autodesk execs snag $46M to build the next gen of architecture design (per TechCrunch) Finout raises $40M Series C for its cloud cost management service (per TechCrunch) SuperOps secures $25M to supercharge IT teams everywhere Formance raises $21M to build the AWS for fintech infrastructure (per TechCrunch) Nue reels in $20M for its revenue management platform We have more news on cloud, infrastructure and apps Chinese AI startup DeepSeek faces malicious attacks after surging in popularity and Sensitive DeepSeek database exposed to the public, cybersecurity firm Wiz reveals Not to mention, it turns out all the prompts and user info is stored on Chinese servers, not surprisingly -- but that's not going to go over well among enterprises, let alone governments. Google report finds state-based hackers are using AI for research and content generation FBI seizes domains of Cracked.io and Nulled.to in latest cybercrime crackdown Zimperium reports warns of phishing campaign targeting mobile users with malicious PDFs New report warns of sophisticated techniques being used by ransomware group Arcus Media Komprise introduces new sensitive-data management capabilities for AI and cybersecurity AuthID unveils PrivacyKey for secure biometric authentication without data retention Cyber sector earnings: Commvault stock inches up after positive earnings F5 reports better-than-expected earnings results with double-digit revenue growth Check Point delivers quarterly earnings beat with 6% revenue growth and strong billings Tenable buys rival Vulcan for $150M to enhance its vulnerability remediation platform CYE acquires security and infrastructure technology from Solvo to strengthen cloud security management CHEQ acquires Deduce, expanding security platform with identity graph for human- and AI-generated fraud prevention Study finds cybersecurity startup exits require record-high revenue and funding Oligo Security raises $50M for its eBPF-powered application security platform Seraphic raises $29M to secure browsers in the enterprise Clutch grabs $20M to build out its nonhuman security ID platform (per TechCrunch) Hypori raises $12M to expand security virtual mobile access platform X announces Visa partnership for X Money in step toward becoming an 'everything app' Alice & Bob raises €100M to build the world's first error-resistant quantum computer by 2030 D3 raises $25M to put internet domain names on the blockchain
[13]
9to5Neural: DeepSeek explained, deep NVIDIA losses, AI privacy claim debunked - 9to5Mac
Welcome to 9to5Neural. AI moves fast. We help you keep up. Last week we mentioned that American AI firms are seeing deep competition from DeepSeek R1 out of China. Today DeepSeek's impact has reached Wall Street as NVIDIA stock drops 17%. Let's take a closer look at DeepSeek, NVIDIA's response, and the bigger picture for AI development. DeepSeek is simply a Chinese AI firm born out of a hedge fund called High-Flyer. Liang Wengeng founded the company in 2023, and it's based in Hangzhou, Zhejiang, China. Wengeng co-founded High-Flyer seven years earlier, focusing on AI investments. DeepSeek began training its models before the U.S. government restricted China's access to American AI chips. For this reason, the company is expected to have a healthy supply of NVIDIA GPUs from before restrictions were imposed. Still, DeepSeek has needed to operate under the constraints of limited access to additional NVIDIA hardware. This constraint may have forced DeepSeek to focus on the innovation it touts with its V3 model. What DeepSeek has shown is the ability to compete with OpenAI's brand new o3 model. ChatGPT o3 is the successor to o1, possibly because O2 is an established UK phone carrier. Anyway, DeepSeek has created a model that is virtually as competitive while requiring dramatically fewer resources and costing a small percentage of the cost to run compared to OpenAI's chatbot. DeepSeek ended up here by focusing on distilling existing models rather than spinning up models using the same strategy as American companies. It's fair to say that DeepSeek heavily benefits from the work that has thus far been done by the AI firms we already know. At the same time, DeepSeek has necessarily needed to focus on optimizing existing models through distillation due to U.S. restrictions on exporting American AI chips to China. That's only the story so far. What happens next is still to be determined, but I think we can bet on OpenAI and other American AI firms prioritizing model distillation to bring operation costs down and stay competitive. In other words, DeepSeek hasn't achieved anything American AI firms can't replicate. It's just a matter of prioritizing model efficiency now that the competition has arrived. But prioritizing model distillation isn't the only thing that helped DeepSeek arrive in the AI race. DeepSeek has also relied on AI training AI. American AI firms still use human-in-the-loop training that puts an importance on human-labeled datasets. The benefit of the AI-training-AI method is that training is much more scalable as it requires less human input. The challenge, however, is that errors can be amplified. It also makes AI alignment checks more difficult. Alignment is another way of saying that our AI models reflect our values and operate as we intend. Supervised fine-tuning and reinforcement learning from human feedback is what makes our AI models provide unbiased responses. In other words, we make sure the data is good. While I don't expect a violent shift in how American AI firms ensure data quality, I do believe we'll see sizable movement toward AI training AI. This was always the goal for OpenAI and similar firms; DeepSeek may have just applied pressure to go there sooner. If you follow DeepSeek, you'll likely come across a $6 million figure that comes from their research paper covering its newest model. The claim is that V3 was developed for under $6 million using less capable NVIDIA H800 hardware. However, this claim can be true while also omitting investment costs associated with training earlier models -- not to mention the NVIDIA supply acquired prior to U.S. AI chip export restrictions. Another figure to analyze: $600 billion. That's the amount of market cap that NVIDIA lost today alone. That's the result of investors being spooked by DeepSeek models being cheaper to train and cheaper to run, meaning less opportunity than expected for NVIDIA growth. I think this is extremely shortsighted and an overreaction. My thinking is this: DeepSeek has demonstrated a great efficiency in how current AI models can be developed. Great! That may shrink the time it takes to develop the next major evolution of AI models. In other words, throwing more NVIDIA GPUs at the problem is likely still the answer to pushing forward AI technology -- we might just get further, faster now. Remember: the AI race is forward, not to where we are now. Which leads to OpenAI's massive Stargate Project. Stargate is basically meant to be a building in Texas that's packed to the gills with compute. Say future AI models can achieve more with less compute. That just means that these AI models will be able to accomplish even more with the existing amount of compute that Stargate targets. There's a real gap between where these firms want to go with AI and where we are today. The impact of DeepSeek may just be it forced other AI firms to prioritize different goals for now. We'll need to see what comes out of DeepSeek next to have a fair sense of whether or not they're a more innovative firm. A few other notes. NVIDIA found the silver lining in DeepSeek's work with this statement issued today: DeepSeek is an excellent Al advancement and a perfect example of Test Time Scaling. DeepSeek's work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling. In other words, we're building a better airplane mid-flight, but we still need jet fuel to fly. NVIDIA is still up 93% year-over-year and 1,782% over the last five years. OpenAI will be much more generous with ChatGPT o3-mini when it arrives due in large part to DeepSeek's competition. President Trump addressed the DeepSeek effect on Monday, per Reuters: The release of DeepSeek, AI from a Chinese company should be a wakeup call for our industries that we need to be laser-focused on competing to win. I've been reading about China and some of the companies in China, one in particular coming up with a faster method of AI and much less expensive method, and that's good because you don't have to spend as much money. I view that as a positive, as an asset. I view that as a positive because you'll be doing that too, so you won't be spending as much, and you'll get the same result, hopefully. We always have the ideas. We're always first. So I would say that's a positive that could be very much a positive development. So instead of spending billions and billions, you'll spend less, and you'll come up with, hopefully, the same solution. The AI race is on, folks, and the AI industry is the new NASA. DeepSeek has slowed down new account creation today due to a large-scale cyber attack impacting the service. This message currently reads across the top of chat.deepseek.com: Due to large-scale malicious attacks on DeepSeek's services, registration may be busy. Please wait and try again. Registered users can log in normally. Thank you for your understanding and support. However, we were able to create a new account after a few hours of trying on Monday. You may also have seen a viral social media post claiming that installing DeepSeek on iOS gives the Chinese AI firm deep access to personal data on your iPhone, including email and messages. Fortunately, that's not how iOS architecture functions. You can even create an account using Sign in with Apple, which can generate a throwaway email address for additional security. However, DeepSeek does have access to what you input into the chatbot. Also, DeepSeek still suggests talking about math, coding, and logic problems instead when asked about what happened in 1989 at Tiananmen Square.
[14]
How to Try DeepSeek AI (and Why You Might Not Want To)
The buzzy new Chinese chatbot isn't exactly a ChatGPT replacement. It seems every tech company on the planet has something AI-related to tout these days, but to most people, I'd wager "AI" is synonymous with ChatGPT. Sure, plenty of other AI platforms are out there, from Google Gemini, to Microsoft Copilot, to Apple Intelligence, but ChatGPT holds the enviable position of both having been the "first" to the generative AI boom, and having kept the crown in the years since. At least, that was the case -- but now, a Chinese startup is threatening to take that crown for itself. That company is DeepSeek, a name you're likely familiar with if you have been following AI news. Like ChatGPT's OpenAI, DeepSeek develops generative AI models. The company's latest, R1, rolled out on Jan. 20, and made headlines for two key reasons: The model performs as well (if not a bit better) than OpenAI's o1 model, and it does so while requiring far fewer resources. R1's power and efficiency were great enough to make an impact on the stock market, as shares of companies that are heavily invested in AI, including Nvidia, Alphabet (Google's parent company), Meta, and Oracle tumbled in the wake of news about DeepSeek's latest rollout. (These stocks have largely bumped back up since.) The general public is taking note too: As of this article, DeepSeek is the number one free app in both the iOS and Android app stores -- ChatGPT is number two on iOS, and number eight on Android. Long-story short, DeepSeek is the latest ChatGPT competitor to enter the AI race. Trying it out isn't complicated (if you can even access it), but, on the flip side, there are reasons you might not want to. DeepSeek is currently available as an app on iOS or Android, or available on the web. Unfortunately, accessing the service is currently somewhat difficult. Using the app, you can try signing up for an account, either by providing an email or phone number and a password, or connecting your Google or Apple account. But you likely won't have a ton of success doing so, unless you're persistent. I wasn't able to make an account the first time I tried, nor could I access the model on the website. After some time passed, the app finally let me in, but I still can't get the web version to do so. Perhaps once the hype dies down a bit, it'll be easier to access. But for the moment, good luck. Once you are in, you'll find the chatbot is quite similar to other generative AI bots you have tried. You can write out prompts for DeepSeek to answer, upload images and documents for analysis, or share a live camera feed. Like o1, DeepSeek has a reasoning model (DeepThink) that "thinks" through questions and prompts in an effort to provide more detailed and accurate results. You can also search the web, if you choose. However, where DeepSeek differs is in the content it censors from users. Like other chatbots, DeepThink shouldn't return results for prompts it considers inappropriate, offensive, or dangerous. However, since DeepSeek is a Chinese-based company, its chatbot censors any result that, "incites to subvert state power and overthrow the socialist system," or "endangers national security and interests and damages the national image," as reported by The Guardian. As such, ask it about information regarding the Tiananmen Square massacre of 1989, or why Xi Jinping is often compared to Winnie the Pooh, and you'll get back, "Sorry, that's beyond my current scope. Let's talk about something else." The Guardian found that the bot will sometimes respond with answers to potentially controversial questions. When asked whether Taiwan was a country or not, DeepSeek did answer, albeit with an response that would likely be endorsed by the Chinese government. Notably, though, the outlet also found that while other chatbots offer fuller or more nuanced responses to these questions, they weren't always forthcoming either: Gemini, for instance, also refused to answer certain questions, so it's not like American-based chatbots are free from this type of censorship. When I asked DeepSeek about the marginalized Uyghur people of China, the chatbot started to generate a full report, before deleting it and replacing it with the same error message. (The Chinese government has been accused of human-right violations and even genocide of the Uyghur population in Xinjiang.) The Guardian found similar "glitches" when testing these types of prompts with DeepSeek. It does seem like there are workarounds that trick the model into generating uncensored responses, although you might have to deal with some unconventional text formatting. In general, don't expect to see DeepSeek results that might piss off the Chinese government. Other than that, it's basically ChatGPT. It's no secret that tech companies scrape a lot of our data in exchange for using their products, but that usually doesn't deter users from downloading interesting new apps. But DeepSeek is a little more aggressive with its data collection policies than most. Taking a look at DeepSeek's privacy policy, you see some of the usual suspects: The company collects the information you provide when setting up an account, like date of birth, username, email address, phone number, even your password. It also collects information as you use the app, including what device you're using, which OS it's running, your IP address, system language, and general diagnostic information. Third-parties can share information they've collected about you with DeepSeek, so they know more about you as you use their service. They also employ cookies to track your activity, but you can disable this tracking in settings. From here, it's important to know that DeepSeek is collecting everything you do with the AI model. All text inputs, audio inputs, prompts, files, feedback, or any other way you interact with the model are saved by the company. Again, this isn't necessarily unique -- you shouldn't share any confidential or private information with any AI bot -- but if you're not comfortable with a company storing documents or recordings of your voice, think twice about what you share with DeepSeek. It's not awesome for DeepSeek to collect some of these data points, but they are far from the only company to do so. However, they push beyond the norm: Not only will DeepSeek collect any text you send its model, it tracks your keystroke patterns or rhythms as well. That means any time you interact with your keyboard while using DeepSeek, the company is analyzing both what you type, as well as how you type. Yikes. Also concerning is how DeepSeek stores the data it collects. Per the privacy policy, DeepSeek stores all information in servers in China, which was part of the reasoning behind the U.S. government's push to ban TikTok. There is also no time limit on how long DeepSeek keeps your data, other than "as long as necessary." This is also how Meta handles user data, but other companies have time limits: OpenAI has a similar clause about keeping data for as long as necessary, but says temporary chats are deleted from servers after 30 days. Meanwhile, Google says it'll keep data for up to three years. It's no secret that big tech is rarely privacy friendly, and AI is no exception. Even in those terms, however,DeepSeek is not a fantastic option for the privacy-minded. If you want to try it while preserving some privacy, I recommend signing in with Apple, which lets you hide your real email address from the company. If you don't have an Apple account, you could use an email platform like Proton or DuckDuckGo that offer similar shielding services. Just remember that even if DeepSeek can't see your email, it's still paying attention to how you type.
Share
Share
Copy Link
Chinese AI startup DeepSeek has disrupted the AI industry with its cost-effective and powerful AI models, causing significant market reactions and challenging the dominance of major U.S. tech companies.
Chinese AI startup DeepSeek has sent shockwaves through the global tech industry with its recent breakthroughs in AI model development. The company claims to have created powerful AI models at a fraction of the cost typically associated with such endeavors, challenging the dominance of major U.S. tech companies and causing significant market reactions 12.
DeepSeek's most notable achievement is the development of its V3 model, which the company says was trained for just $5.6 million using about 2,000 NVIDIA H800 GPUs 23. This is in stark contrast to the reported $100 million or more spent by companies like OpenAI on training their advanced models 2. The company's approach to efficient AI development has caught the attention of the industry, with some experts hailing it as a potential paradigm shift in AI economics 13.
The announcement of DeepSeek's achievements had immediate and dramatic effects on the stock market. NVIDIA, whose GPUs are crucial for AI development, saw its stock value plummet by 17% in a single day, wiping out nearly $600 billion in market value 34. Other tech giants like Microsoft, Google's parent Alphabet, and Broadcom also experienced significant dips, with over $1 trillion disappearing from the U.S. stock market in the aftermath 3.
DeepSeek's success is attributed to innovative techniques in model training, including a method that allows each task to use only the necessary computing resources 3. The company has also embraced an open-source framework, making its code available for outsiders to examine, adapt, or repurpose 3. This approach contrasts with the secretive nature of many large AI companies and has been praised for potentially democratizing AI development 35.
Despite the excitement surrounding DeepSeek's announcements, some industry leaders and experts have expressed skepticism. Anthropic co-founder Dario Amodei challenged the narrative of DeepSeek's $6 million AI model, arguing that the total corporate investment of DeepSeek is likely comparable to U.S. AI labs when considering their reported 50,000 Hopper generation chips 5. There are also concerns about potential data privacy issues and compliance with Chinese censorship standards 34.
DeepSeek's emergence highlights the intensifying global competition in AI development, particularly between the U.S. and China. The company's success challenges assumptions about the necessity of massive budgets and advanced infrastructure for cutting-edge AI research 24. However, experts like Amodei suggest that the current "crossover point" where multiple companies can produce good reasoning models is likely temporary, and the field may soon differentiate again based on investment capacity 5.
If DeepSeek's claims hold true, it could significantly alter the economics of AI deployment. The company charges $2.50 per million output tokens, compared to $15 for OpenAI's latest model, potentially making advanced AI more accessible to smaller companies and researchers worldwide 23. This could lead to a new era of "lean AI," similar to how "lean startups" emerged in the tech industry a decade ago 3.
As the dust settles on DeepSeek's dramatic entrance into the global AI arena, the industry watches closely to see how this will reshape the landscape of AI development, competition, and application in the coming years 12345.
Reference
[4]
Chinese AI startup DeepSeek has disrupted the global AI market with its efficient and powerful models, sparking both excitement and controversy in the tech world.
6 Sources
6 Sources
Chinese AI company DeepSeek's new large language model challenges US tech dominance, sparking debates on open-source AI and geopolitical implications.
9 Sources
9 Sources
Recent developments in AI models from DeepSeek, Allen Institute, and Alibaba are reshaping the landscape of artificial intelligence, challenging industry leaders and pushing the boundaries of what's possible in language processing and reasoning capabilities.
4 Sources
4 Sources
Leading AI companies are experiencing diminishing returns on scaling their AI systems, prompting a shift in approach and raising questions about the future of AI development.
7 Sources
7 Sources
DeepSeek's open-source R1 model challenges OpenAI's o1 with comparable performance at a fraction of the cost, potentially revolutionizing AI accessibility and development.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved