Curated by THEOUTPOST
On Tue, 25 Mar, 8:02 AM UTC
16 Sources
[1]
DeepSeek V3 Is Now Reportedly the Best Non-Reasoning AI Model
DeepSeek V3 is now the best non-reasoning AI model on the market, besting OpenAI, Grok and Google, according to firm Artificial Analysis on Tuesday. While AI companies trade blows for the top spot as new updates release, what makes DeepSeek V3's accomplishment noteworthy is that it's the first time an open weights model is leading the pack. Artificial Analysis, an AI benchmarking company, says the intelligence index measures responses in reasoning, knowledge, math and coding. Open weights means that DeepSeek V3 is fully open source and that the entire model is publicly available for users to tweak and modify. OpenAI, despite its name, isn't willing to give away all of ChatGPT's secrets. The AI company keeps a tight lid on how it trains its models and the underlying architecture of how it operates. There are merits to both the open and closed approach to software development, and the choice between the two often comes down to business interests. Given that V3 isn't a "reasoning" AI model, it isn't as powerful. At the same time, it's much faster, meaning it's cheaper and more feasible to run for most applications. DeepSeek V3 launched last December and its latest update puts increased competition on American AI companies. DeepSeek gained prominence earlier this year when it released its R1 reasoning model for free. It was the first time a high-level reasoning model, one that recursively went back and checked its answers before giving a final output, was widely available to the masses. DeepSeek R1 also impressed with its lower cost to run, showing that the firm had made innovations in efficiency. But the world of reasoning AI models are better suited for research or tasks that require massive datasets. Non-reasoning models, like GPT-4.5 and Google Gemini 2.0, are better suited for most applications, as they're faster and cheaper to operate. DeepSeek V3's ascendance shows that Chinese AI firms can outcompete American companies while also keeping things open source. This threat is why OpenAI is pushing the Trump administration to lift restrictions on training with copyrighted material, arguing that if fair use limitations remain, American AI companies won't be able to compete against China.
[2]
DeepSeek's V3 AI model gets a major upgrade - here's what's new
Not that it ever left, but it appears Chinese AI startup DeepSeek is back in the news -- this time with an updated version of its V3 model, released in December. Also: The best AI chatbots On Tuesday, the company officially announced V3-0324, named after its release month and day. A day earlier, people noticed DeepSeek had uploaded the new model to HuggingFace, but with little additional information. Like R1 -- DeepSeek's top-performing model released in January and an OpenAI competitor -- the new version is open source (in that its weights are public, not its actual code) under an MIT license. In a post on X, DeepSeek noted that the update shows better coding skills for web development and a "major boost in reasoning performance," but it still recommends it be used for less complex reasoning tasks. R1 remains the lab's top reasoning model, ranking in fourth place on the Chatbot Arena. Also: What is DeepSeek AI? Is it safe? Here's everything you need to know DeepSeek said the update shows improved performance over V3 on several industry-standard benchmarks, most notably the AIME (American Invitational Mathematics Examination) math benchmark, scoring nearly 20 points higher. While benchmarks have become too easy for most models, a problem known as benchmark saturation, AIME is still considered more challenging than most. In January, Scale AI and the Center for AI Safety (CAIS) released Humanity's Last Exam to combat saturation. Also: This new AI benchmark measures how much models lie That said, because it is based on high school math content, AIME's answers are publicly available online, meaning they can be included in training data. Also: Is OpenAI doomed? Open-source models may crush it, warns expert According to DeepSeek, other improvements include "enhanced" writing style and improved quality, especially for longer-form content. Some Reddit commenters are speculating that the release of the upgrade could foreshadow the arrival of R2, which is anticipated to be as disruptive as R1. You can access V3-0324 now via HuggingFace or directly through DeepSeek's website and app, though you may want to consider the major security holes and user privacy concerns first. While V3 and R1 proved to be very easily and dangerously jailbroken, it's unclear as of now whether DeepSeek added any layers of security in V3-0324.
[3]
Chinese AI start-ups overhaul business models after DeepSeek's success
Chinese artificial intelligence start-ups are overhauling their business models as they fight to remain competitive following the widespread adoption of rival DeepSeek's technology across the country. Zhipu, once considered China's most prominent large language model start-up, has pinned its hopes on an initial public offering to sustain its cash-intensive growth as it focuses on building up its enterprise sales business, according to two people familiar with the matter. Among China's other leading generative AI start-ups, 01.ai has stopped "pre-training" large language models to focus on selling tailored AI business solutions using DeepSeek's models; Baichuan has opted to concentrate on the healthcare market; and Moonshot has slashed its marketing budget for its Kimi chatbot to focus on model training. People close to these companies, which all declined or did not respond to requests for comment, said the shifts show how DeepSeek has drastically altered the shape of China's burgeoning AI industry. Since the launch of its breakthrough R1 model in late January, the Hangzhou-based start-up was quickly crowned the country's AI champion by Beijing and has seen lightning adoption of its technology everywhere from hospitals to local governments. It has left some of the country's top AI start-ups -- which over the past two years have gained significant backing from domestic investors as part of the AI boom -- to re-evaluate their existing strategies in an effort to replicate DeepSeek's success. "The Chinese LLM market is rapidly consolidating around a handful of leaders," said Wang Tiezhen, an engineer at AI research hub Hugging Face. "DeepSeek has prompted many companies to redirect resources to applications rather than foundational model development." Beijing-based 01.ai, founded by venture capitalist and former head of Google China Kai-Fu Lee, has pivoted its business in what he has called "the DeepSeek age". The group, which has launched a series of open-source models called Yi, stopped pre-training -- in which developers use massive data sets to train models -- in late 2024 because of rising costs as its rivals trained ever larger and more powerful models. In a deal with Alibaba, its foundational model team was transferred to the internet giant, according to people familiar with the matter. Last week, 01.ai announced it would sell tailored AI solutions to companies wanting to deploy DeepSeek's models. 01.ai is pitching its expertise in the so-called "mixture of experts" as its competitive advantage, the method also used by DeepSeek to train its models. Rather than training one "dense model" on a vast database that has scraped data from the internet and other sources, the approach combines many smaller models trained on industry-specific data. The MOE structure allows chip-poor companies to train larger models on less computing power, but can be more challenging for third-party developers to deploy. DeepSeek, which has decided to focus on research rather than seeking to maximise revenues by selling applications to companies, has left a gap to be filled by intermediaries like 01.ai. Internet giant Baidu has also pivoted to offer the same service in recent weeks. Moonshot courted attention last year for its viral AI chatbot Kimi but its popularity has suffered following frequent outages and rivals launching competitive products. In recent weeks, the start-up has cut marketing spending for Kimi as it increases its focus on model training to replicate the breakout success of DeepSeek and improve its chatbot's performance, according to two people familiar with the matter. But as Kimi is overtaken by other apps, Moonshot is charting an uncertain future as it burns through cash doing model training without stable revenue. The start-up has sought to make money by inviting users to send virtual gifts to "Kimi", the AI character behind the chatbot. It raised more than $1.3bn in financing through two investment rounds last year, with a mixture of computing credits from Chinese tech giant Alibaba and cash from venture capital firms, according to people familiar with the deals. In early 2024, Alibaba considered Moonshot a potential acquisition target and secured the first right to buy the start-up in any future sale as part of their $800mn investment, the people said. In recent months, Alibaba has reined in start-up investments, after founder Jack Ma directed chief executive Eddie Wu to focus instead on internal AI efforts. The shift makes it less likely that Alibaba will seek to acquire Kimi in the future, the people added. Beijing-based start-up Baichuan has doubled down on its healthcare business after previously working on consumer facing AI chatbots and enterprise business pitches to educational, financial and healthcare companies. In February, Baichuan dismissed its sales team focused on selling its tailored financial AI application to banks and investment funds and ended the business line, said two people familiar with the matter. At the time, the company leadership announced to employees that it was focusing on developing its technology for hospitals, which includes an AI doctor that assists with diagnosis. In contrast, Zhipu, founded by Tang Jie, a prominent computer scientist from Tsinghua University, is still pursuing multiple business lines. It has launched several consumer applications as well as an enterprise business selling personalised AI applications to local governments and companies, a notoriously competitive and low-margin business in China. The start-up has been burning through cash as it builds its enterprise sales business. In 2024, Zhipu made Rmb300mn ($41mn) in sales and Rmb2bn in losses, according to three investors briefed on the figures. The ballooning costs have prompted concern among some investors after DeepSeek demonstrated a pathway to building cutting-edge models on a smaller budget. In contrast to DeepSeek's small workforce of about 160 employees, Zhipu employs about 800 people, making it the largest LLM start-up by headcount. Zhipu is hoping for a cash boost after receiving one of Beijing's coveted recommendation letters for an IPO, according to two people familiar with the matter. The company needs approval from the regulators before it can pursue a listing on the tech-focused Star Innovation Board. The start-up received Beijing's nod before DeepSeek altered the competitive landscape of AI players in China. Zhipu previously told investors it was aiming to list before the end of the year, said two people with knowledge of the matter. But they added that the DeepSeek developments could affect it if it pushes ahead with an IPO. Investors in the company have also expressed concern that the government's embrace of DeepSeek could threaten Zhipu's business model of selling tailored AI solutions to local governments, according to two people familiar with the matter. But DeepSeek has shaken up the AI race in China, leading some rivals to decide whether to challenge the group directly or adopt its open-source models to focus on a smaller potential market. "By adopting top-tier models, companies can eliminate the need to invest tens of millions of dollars annually in training inferior in-house alternatives," said Hugging Face's Wang.
[4]
China's DeepSeek releases AI model upgrade, intensifies rivalry with OpenAI
BEIJING, March 25 (Reuters) - Chinese artificial intelligence startup DeepSeek released a major upgrade to its V3 large language model, intensifying competition with U.S. tech leaders like OpenAI and Anthropic. The new model, DeepSeek-V3-0324, was made available through AI development platform Hugging Face, marking the company's latest push to establish itself in the rapidly evolving AI market. The latest model demonstrates significant improvements in areas such as reasoning and coding capabilities compared to its predecessor, with benchmark tests showing enhanced performance across multiple technical metrics published on Hugging Face. DeepSeek has rapidly emerged as a notable player in the global AI landscape in recent months, releasing a series of models that compete with Western counterparts while offering lower operational costs. The company launched its V3 model in December, followed by the release of its R1 model in January. Reporting by Liam Mo and Brenda Goh; Editing by Kim Coghill Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence
[5]
DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that's a nightmare for OpenAI
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese AI startup DeepSeek has quietly released a new large language model that's already sending ripples through the artificial intelligence industry -- not just for its capabilities, but for how it's being deployed. The 641-gigabyte model, dubbed DeepSeek-V3-0324, appeared on AI repository Hugging Face today with virtually no announcement, continuing the company's pattern of low-key but impactful releases. What makes this launch particularly notable is the model's MIT license -- making it freely available for commercial use -- and early reports that it can run directly on consumer-grade hardware, specifically Apple's Mac Studio with M3 Ultra chip. "The new DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!" wrote AI researcher Awni Hannun on social media. While the $9,499 Mac Studio might stretch the definition of "consumer hardware," the ability to run such a massive model locally is a major departure from the data center requirements typically associated with state-of-the-art AI. DeepSeek's stealth launch strategy disrupts AI market expectations The 685-billion-parameter model arrived with no accompanying whitepaper, blog post, or marketing push -- just an empty README file and the model weights themselves. This approach contrasts sharply with the carefully orchestrated product launches typical of Western AI companies, where months of hype often precede actual releases. Early testers report significant improvements over the previous version. AI researcher Xeophon proclaimed in a post on X.com: "Tested the new DeepSeek V3 on my internal bench and it has a huge jump in all metrics on all tests. It is now the best non-reasoning model, dethroning Sonnet 3.5." This claim, if validated by broader testing, would position DeepSeek's new model above Claude Sonnet 3.5 from Anthropic, one of the most respected commercial AI systems. And unlike Sonnet, which requires a subscription, DeepSeek-V3-0324's weights are freely available for anyone to download and use. How DeepSeek V3-0324's breakthrough architecture achieves unmatched efficiency DeepSeek-V3-0324 employs a mixture-of-experts (MoE) architecture that fundamentally reimagines how large language models operate. Traditional models activate their entire parameter count for every task, but DeepSeek's approach activates only about 37 billion of its 685 billion parameters during specific tasks. This selective activation represents a paradigm shift in model efficiency. By activating only the most relevant "expert" parameters for each specific task, DeepSeek achieves performance comparable to much larger fully-activated models while drastically reducing computational demands. The model incorporates two additional breakthrough technologies: Multi-Head Latent Attention (MLA) and Multi-Token Prediction (MTP). MLA enhances the model's ability to maintain context across long passages of text, while MTP generates multiple tokens per step instead of the usual one-at-a-time approach. Together, these innovations boost output speed by nearly 80%. Simon Willison, a developer tools creator, noted in a blog post that a 4-bit quantized version reduces the storage footprint to 352GB, making it feasible to run on high-end consumer hardware like the Mac Studio with M3 Ultra chip. This represents a potentially significant shift in AI deployment. While traditional AI infrastructure typically relies on multiple Nvidia GPUs consuming several kilowatts of power, the Mac Studio draws less than 200 watts during inference. This efficiency gap suggests the AI industry may need to rethink assumptions about infrastructure requirements for top-tier model performance. China's open source AI revolution challenges Silicon Valley's closed garden model DeepSeek's release strategy exemplifies a fundamental divergence in AI business philosophy between Chinese and Western companies. While U.S. leaders like OpenAI and Anthropic keep their models behind paywalls, Chinese AI companies increasingly embrace permissive open-source licensing. This approach is rapidly transforming China's AI ecosystem. The open availability of cutting-edge models creates a multiplier effect, enabling startups, researchers, and developers to build upon sophisticated AI technology without massive capital expenditure. This has accelerated China's AI capabilities at a pace that has shocked Western observers. The business logic behind this strategy reflects market realities in China. With multiple well-funded competitors, maintaining a proprietary approach becomes increasingly difficult when competitors offer similar capabilities for free. Open-sourcing creates alternative value pathways through ecosystem leadership, API services, and enterprise solutions built atop freely available foundation models. Even established Chinese tech giants have recognized this shift. Baidu announced plans to make its Ernie 4.5 model series open-source by June, while Alibaba and Tencent have released open-source AI models with specialized capabilities. This movement stands in stark contrast to the API-centric strategy employed by Western leaders. The open-source approach also addresses unique challenges faced by Chinese AI companies. With restrictions on access to cutting-edge Nvidia chips, Chinese firms have emphasized efficiency and optimization to achieve competitive performance with more limited computational resources. This necessity-driven innovation has now become a potential competitive advantage. DeepSeek V3-0324: The foundation for an AI reasoning revolution The timing and characteristics of DeepSeek-V3-0324 strongly suggest it will serve as the foundation for DeepSeek-R2, an improved reasoning-focused model expected within the next two months. This follows DeepSeek's established pattern, where its base models precede specialized reasoning models by several weeks. "This lines up with how they released V3 around Christmas followed by R1 a few weeks later. R2 is rumored for April so this could be it," noted Reddit user mxforest. The implications of an advanced open-source reasoning model cannot be overstated. Current reasoning models like OpenAI's o1 and DeepSeek's R1 represent the cutting edge of AI capabilities, demonstrating unprecedented problem-solving abilities in domains from mathematics to coding. Making this technology freely available would democratize access to AI systems currently limited to those with substantial budgets. The potential R2 model arrives amid significant revelations about reasoning models' computational demands. Nvidia CEO Jensen Huang recently noted that DeepSeek's R1 model "consumes 100 times more compute than a non-reasoning AI," contradicting earlier industry assumptions about efficiency. This reveals the remarkable achievement behind DeepSeek's models, which deliver competitive performance while operating under greater resource constraints than their Western counterparts. If DeepSeek-R2 follows the trajectory set by R1, it could present a direct challenge to GPT-5, OpenAI's next flagship model rumored for release in coming months. The contrast between OpenAI's closed, heavily-funded approach and DeepSeek's open, resource-efficient strategy represents two competing visions for AI's future. How to experience DeepSeek V3-0324: A complete guide for developers and users For those eager to experiment with DeepSeek-V3-0324, several pathways exist depending on technical needs and resources. The complete model weights are available from Hugging Face, though the 641GB size makes direct download practical only for those with substantial storage and computational resources. For most users, cloud-based options offer the most accessible entry point. OpenRouter provides free API access to the model, with a user-friendly chat interface. Simply select DeepSeek V3 0324 as the model to begin experimenting. DeepSeek's own chat interface at chat.deepseek.com has likely been updated to the new version as well, though the company hasn't explicitly confirmed this. Early users report the model is accessible through this platform with improved performance over previous versions. Developers looking to integrate the model into applications can access it through various inference providers. Hyperbolic Labs announced immediate availability as "the first inference provider serving this model on Hugging Face," while OpenRouter offers API access compatible with the OpenAI SDK. DeepSeek's new model prioritizes technical precision over conversational warmth Early users have reported a noticeable shift in the model's communication style. While previous DeepSeek models were praised for their conversational, human-like tone, "V3-0324" presents a more formal, technically-oriented persona. "Is it only me or does this version feel less human like?" asked Reddit user nother_level. "For me the thing that set apart deepseek v3 from others were the fact that it felt more like human. Like the tone the words and such it was not robotic sounding like other llm's but now with this version its like other llms sounding robotic af." Another user, AppearanceHeavy6724, added: "Yeah, it lost its aloof charm for sure, it feels too intellectual for its own good." This personality shift likely reflects deliberate design choices by DeepSeek's engineers. The move toward a more precise, analytical communication style suggests a strategic repositioning of the model for professional and technical applications rather than casual conversation. This aligns with broader industry trends, as AI developers increasingly recognize that different use cases benefit from different interaction styles. For developers building specialized applications, this more precise communication style may actually represent an advantage, providing clearer and more consistent outputs for integration into professional workflows. However, it may limit the model's appeal for customer-facing applications where warmth and approachability are valued. How DeepSeek's open source strategy is redrawing the global AI landscape DeepSeek's approach to AI development and distribution represents more than a technical achievement -- it embodies a fundamentally different vision for how advanced technology should propagate through society. By making cutting-edge AI freely available under permissive licensing, DeepSeek enables exponential innovation that closed models inherently constrain. This philosophy is rapidly closing the perceived AI gap between China and the United States. Just months ago, most analysts estimated China lagged 1-2 years behind U.S. AI capabilities. Today, that gap has narrowed dramatically to perhaps 3-6 months, with some areas approaching parity or even Chinese leadership. The parallels to Android's impact on the mobile ecosystem are striking. Google's decision to make Android freely available created a platform that ultimately achieved dominant global market share. Similarly, open-source AI models may outcompete closed systems through sheer ubiquity and the collective innovation of thousands of contributors. The implications extend beyond market competition to fundamental questions about technology access. Western AI leaders increasingly face criticism for concentrating advanced capabilities among well-resourced corporations and individuals. DeepSeek's approach distributes these capabilities more broadly, potentially accelerating global AI adoption. As DeepSeek-V3-0324 finds its way into research labs and developer workstations worldwide, the competition is no longer simply about building the most powerful AI, but about enabling the most people to build with AI. In that race, DeepSeek's quiet release speaks volumes about the future of artificial intelligence. The company that shares its technology most freely may ultimately wield the greatest influence over how AI reshapes our world.
[6]
Deepseek's new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
DeepSeek claims its AI models can match or beat those of American AI developers like OpenAI and Anthropic DeepSeek dropped a major upgrade to its AI model this week, which has people buzzing almost as much as they did when the Chinese AI startup first made its splash earlier this year. The new DeepSeek-V3-0324 model is now live on Hugging Face, setting up an even starker rivalry with OpenAI and other AI developers. According to the company's tests, DeepSeek's new iteration of its V3 model boasts measurable boosts in reasoning and coding ability. Better thinking and coding might not sound revolutionary on their own, but the pace of improvement and DeepSeek's plans make this release notable. Formed just last year, DeepSeek has been moving fast, starting with the December release of the original V3 model. A month later, the R1 model for more comprehensive research debuted. Now comes V3-0324, named for its March 2024 release. The improvements bring the model to near-parity with OpenAI's GPT-4 or Anthropic's Claude 2 models. But, even if they aren't quite the same power, they run a lot cheaper, according to DeepSeek. That's ultimately a huge selling point as AI use, and thus AI costs, continue to increase. Training AI models is notoriously expensive, and OpenAI and Google have huge cloud budgets that most companies couldn't reach without partnerships like OpenAI's with Microsoft. That exclusivity vanishes if DeepSeek's cheaper achievements become more common. U.S. dominance of AI models is starting to slip anyway, thanks in part to Chinese startups like DeepSeek. It no longer seems shocking when the hottest model emerges from Shenzhen or Hangzhou. Geopolitical considerations, as well as business concerns, have spurred calls to ban DeepSeek from at least the U.S. government. You probably won't see DeepSeek's latest release changing everything for your schedule tomorrow, though. It hints that the ballooning demand for computational power and energy to fuel next-generation AI might not be as staggering as feared. It also just might mean that the AI chatbot rewriting your resume or debugging your website also speaks fluent Mandarin.
[7]
DeepSeek in China-US AI competition
DeepSeek influences US-China competition and cooperation, temporarily impacting the US AI industry while also triggering stricter chip controls on China. DIGITIMES observed that the rise of DeepSeek has influenced the AI development directions in both China and the US, bringing significant changes to the economy, policies, and markets. The enthusiasm surrounding DeepSeek has complicated the competitive relationship between the two nations in the AI sector, necessitating rapid adjustments to AI-related policies. Additionally, the launch of DeepSeek R1 has accelerated the development and implementation of global large language model (LLM) technology. Most in the global tech industry hold a positive view, believing that DeepSeek will drive innovation in the AI industry and influence future trends in AI applications. The Chinese startup DeepSeek's release of its open-source R1 model has garnered global attention due to its price being less than half that of OpenAI's offerings while maintaining similar performance, meaning that users can access an LLM with performance comparable to OpenAI's o1 model at a significantly lower cost.
[8]
DeepSeek-V3 is the Highest Scoring Non-Reasoning Model - 'A Milestone for Open Source'
The model outperformed all other non-reasoning models across several benchmarks but trailed behind DeepSeek-R1, OpenAI's o1, o3-mini, and other reasoning models. DeepSeek on Monday announced a new update to its general-purpose AI model DeepSeek-V3. The updated model 'DeepSeek V3-0324' now ranks highest in benchmarks among all non-reasoning models. Artificial Analysis, a platform that benchmarks AI models, stated, "This is the first time an open weights model is the leading non-reasoning model, marking a milestone for open source." The model scored the highest points among all non-reasoning models on the platform's 'Intelligence Index'. Source: Artificial Analysis In the GPQA Diamond benchmark, the model achieved a score of 66%, surpassing GPT-4o (54%) and Gemini 2.0 Pro Experimental (62%) and matching Anthropic's Claude 3.7 Sonnet (66%). This benchmark assesses AI models on complex, graduate-level science questions. Likewise, the model outperformed all other non-reasoning models across several benchmarks. However, it still trails behind DeepSeek-R1, OpenAI's o1, o3-mini, and other reasoning models. Reasoning models consume additional time to perform a step-by-step thinking process to respond, whereas non-reasoning models prioritise speed and often respond immediately. Source: Artificial Analysis The performance of DeepSeek V3-0324 across all popular benchmarks can be found in Artificial Analysis. It is also rumoured on X that DeepSeek V3-0324 may be the base model for the forthcoming DeepSeek-R2 reasoning model. Recently, Reuters reported that DeepSeek plans to release R2 "as early as possible". The company initially intended to launch it in early May but is now contemplating an earlier timeline. The model is expected to produce "better coding" and can reason in languages beyond English. "This release is arguably even more impressive than R1 -- and potentially indicates that R2 is going to be another significant leap forward," added Artificial Analysis. A few months ago, DeepSeek shook the AI ecosystem and significantly impacted NVIDIA's market cap by providing state-of-the-art performance despite using a minimal number of GPUs for training. In addition to their impressive performance, models from DeepSeek are also favoured for their cost efficiency. It was recently announced that DeepSeek would provide discounts for its API platform during non-peak hours - from 16:30 to 00:30 daily. In a recent GitHub post, the company reported a theoretical daily profit margin of 545% for its inference services despite the limitations in monetisation and discounted pricing structures. While Chinese AI models rival those from the United States, fierce competition exists among the major players within China too. Big-tech companies like Alibaba, Baidu, Tencent, and ByteDance have all been regularly announcing AI models across multiple domains, each trying to outperform the other.
[9]
DeepSeek's new open-source colossus upends the AI status quo
Just two days ago, Chinese AI startup DeepSeek quietly dropped a bombshell on Hugging Face: a 685-billion-parameter large language model called DeepSeek-V3-0324. While some innovations arrive with fanfare, this release was different. No splashy press briefings. No polished blog posts. Just a massive set of model weights, an MIT license, and a few technical whispers that were enough to set the AI community ablaze. Now, as developers scramble to test it, the model has already raised alarm bells for leading Western AI companies like OpenAI -- not only for its raw power and efficiency, but for where it can run: a Mac Studio M3 Ultra. It was never supposed to be this simple to host a model of this scale. Yet early reports suggest DeepSeek-V3-0324 is operational, generating over 20 tokens per second on a single machine. For many AI insiders, that is both a tantalizing breakthrough and a serious wake-up call. Most large-scale AI releases follow a familiar script: a teaser announcement, an official paper, and a PR push. DeepSeek, however, opted for its trademark "under-the-radar" approach, quietly uploading 641 GB of data under an MIT license. The model's empty README might suggest an afterthought. In reality, it signals a deliberate, self-assured stance: "Here's our model -- do what you want, and good luck outdoing it." This modus operandi stands in stark contrast to the meticulously orchestrated product reveals in Silicon Valley. AI researchers usually expect detailed documentation, performance benchmarks, and shiny demos. DeepSeek's gambit, on the other hand, hinges on raw, open availability. Want to know how it works? Download it and see for yourself. The Mac Studio M3 Ultra may not sit in everyone's home office -- it's a $9,499 device and definitely high-end. Even so, the fact that DeepSeek-V3-0324 can run locally on this hardware is remarkable. Contemporary models of comparable size typically demand far larger GPU clusters chewing through power in dedicated data centers. This shift in computing requirements could herald a new era where advanced AI isn't strictly tethered to large corporate servers. Early tests from AI researcher Awni Hannun confirm that a 4-bit quantized version of DeepSeek-V3 can exceed 20 tokens per second on this system. That's dizzying speed for a multi-hundred-billion-parameter model. Part of the secret lies in DeepSeek's "mixture-of-experts (MoE)" architecture, which intelligently activates only a fraction of its total parameters for any given task. Critics once dismissed MoE as too specialized; DeepSeek's success suggests it might just be the most efficient path for massive-scale AI. Bigger is not always better, but DeepSeek-V3-0324 is both: enormous in scope and surprisingly nimble. A well-known researcher, Xeophon, posted their initial tests indicating "a huge jump in all metrics" compared to the previous version of DeepSeek. The claim that it has dethroned Claude Sonnet 3.5 by Anthropic -- until recently considered an elite commercial system -- is turning heads. If verified, DeepSeek could stand near the summit of AI language modeling. The difference in distribution models is just as noteworthy. Claude Sonnet, like many Western systems, generally requires a paid subscription for its best offerings. By contrast, DeepSeek's brand-new 0324 release is free to download under MIT terms. Developers everywhere can experiment without handing over credit cards or running into usage limits -- a starkly different approach that highlights the shifting center of gravity in AI. Beyond its MoE architecture, DeepSeek-V3-0324 incorporates two major technical leaps: In practical terms, these optimizations slash the time it takes to process or generate text. Because DeepSeek doesn't engage all 685 billion parameters for every request, it can be more efficient than smaller but fully activated models. Simon Willison, a respected figure in developer tools, reported that a 4-bit version of DeepSeek-V3-0324 dips to around 352 GB. This smaller size makes it relatively feasible for specialized workstations and some high-end personal systems. DeepSeek's success can't be divorced from the bigger conversation around Chinese AI companies embracing open-source licensing. While industry mainstays like OpenAI and Anthropic keep proprietary reins on their models, firms such as Baidu, Alibaba, and Tencent have joined DeepSeek in releasing advanced models under permissive terms. The result is an AI ecosystem defined by shared progress rather than guarded, walled-off technology. This strategy dovetails with China's quest for AI leadership. Hardware restrictions and limited access to the latest Nvidia chips forced these companies to innovate. The outcome? Models like DeepSeek-V3-0324 are engineered to excel even without top-tier GPU clusters. Now that these efficient models are freely circulating, developers worldwide are seizing the opportunity to build at a fraction of the usual cost. DeepSeek appears to be working in phases: it unveils a foundational model, then follows up with a "reasoning" version. The rumored DeepSeek-R2 could debut in the next month or two, echoing the pattern set by V3's December release, followed by an R1 model that specialized in more advanced problem-solving. Should R2 outperform OpenAI's much-anticipated GPT-5, it will further tilt the scales toward open-source AI's future dominance. Many industry veterans assumed only big, resource-rich players could handle the ballooning complexity of top-tier models. DeepSeek's quiet success challenges that assumption. And as reasoning models typically consume significantly more compute than standard ones, improvements in R2 would spotlight DeepSeek's radical efficiency approach. Downloading the entire 641 GB dataset from Hugging Face is no trivial feat. But for many developers, the easiest path is through third-party inference providers such as Hyperbolic Labs or OpenRouter. These platforms let you tap into DeepSeek-V3-0324 without needing your own data center. Both have pledged near-instant updates whenever DeepSeek pushes changes. Meanwhile, chat.deepseek.com likely runs on the new version already -- though the startup hasn't explicitly confirmed it. Early adopters report faster responses and improved accuracy, albeit at the cost of some conversational warmth. If you're a developer who needs more formal, technical outputs, this shift in style is probably a boon. But casual users wanting a friendlier, more "human" chat bot might notice a chillier tone. Interestingly, many testers have commented on the model's new voice. Earlier DeepSeek releases were known for their surprisingly approachable style. The updated 0324 iteration tends toward a serious, precise manner. Complaints about "robotic" or "overly intellectual" responses are popping up in online forums, suggesting DeepSeek pivoted to a more professional setting rather than small talk. Whether this style makes the model more or less engaging depends heavily on usage. For coding or scientific research, the clarity of its responses might be a boon. Meanwhile, general audiences might find the interactions stiffer than expected. Regardless, this purposeful personality shift signals how top AI players are carefully tuning their models for specific market segments. DeepSeek's release forces a bigger question about how advanced AI should be shared. Open source inherently invites broad collaboration and rapid iteration. By handing out the full model, DeepSeek cedes some control -- but gains an army of researchers, hobbyists, and startups all contributing to its ecosystem. For U.S. rivals, who mostly keep their technology on a short leash, DeepSeek's approach raises a strategic dilemma. It mirrors how Android's open model eventually overtook other operating systems that tried to keep everything locked down. If DeepSeek or other Chinese AI ventures manage to replicate that phenomenon in the AI space, we could see the same unstoppable wave of global adoption. Most crucially, the open model ensures advanced AI isn't just the domain of industry titans. With the right hardware, a wide range of organizations can now deploy leading-edge capabilities. That, more than anything, is what keeps CEOs of Western AI firms up at night. The fact that DeepSeek-V3-0324 can reliably run on a single, well-equipped workstation upends standard thinking about infrastructure needs. According to Nvidia's own statements, advanced reasoning models demand immense power and are often confined to specialized data centers. DeepSeek's counterexample suggests that, once compressed and optimized, next-generation AI could slip into surprisingly modest environments. And if the rumored DeepSeek-R2 matches or surpasses Western equivalents, it's possible we'll witness an open-source reasoning revolution. What was once the exclusive domain of big-budget companies might become a standard resource available to startups, independent researchers, and everyday developers.
[10]
DeepSeek's V3 upgrade challenges OpenAI and Anthropic in global AI race
Alibaba has unveiled its latest AI reasoning model, QwQ-32B, whose performance is on par with DeepSeek's R1 model despite using significantly fewer parameters. Chinese artificial intelligence startup DeepSeek released a major upgrade to its V3 large language model, intensifying competition with U.S. tech leaders like OpenAI and Anthropic. The new model, DeepSeek-V3-0324, was made available through AI development platform Hugging Face, marking the company's latest push to establish itself in the rapidly evolving AI market. The latest model demonstrates significant improvements in areas such as reasoning and coding capabilities compared to its predecessor, with benchmark tests showing enhanced performance across multiple technical metrics published on Hugging Face. DeepSeek has rapidly emerged as a notable player in the global AI landscape in recent months, releasing a series of models that compete with Western counterparts while offering lower operational costs. The company launched its V3 model in December, followed by the release of its R1 model in January.
[11]
China's DeepSeek Unveils Latest Update in Race With OpenAI
DeepSeek stunned the industry and sparked a selloff in US markets DeepSeek released updates to its V3 model that promise to deliver better programming capabilities, underscoring the Chinese AI startup's intent to remain a step ahead of competitors. The V3-0324 update -- posted on Hugging Face this week without a formal announcement -- claims to address real-world challenges while setting benchmarks for accuracy and efficiency. In January, DeepSeek surged past ChatGPT to become the most popular free app on Apple's US app store. Its achievements, including an R1 model that seemingly performed as well as OpenAI's best, stunned the industry and sparked a selloff in US markets. The V3 is an older DeepSeek platform. The startup's AI services have ignited a debate about whether cutting-edge platforms can be built for far less than the billions that US firms are pouring into datacenter construction. © 2025 Bloomberg LP
[12]
China's DeepSeek releases AI model upgrade, intensifies rivalry with OpenAI
The latest model demonstrates significant improvements in areas such as reasoning and coding capabilities compared to its predecessor, with benchmark tests showing enhanced performance across multiple technical metrics published on Hugging Face.Chinese artificial intelligence startup DeepSeek released a major upgrade to its V3 large language model, intensifying competition with US tech leaders like OpenAI and Anthropic. The new model, DeepSeek-V3-0324, was made available through AI development platform Hugging Face, marking the company's latest push to establish itself in the rapidly evolving AI market. The latest model demonstrates significant improvements in areas such as reasoning and coding capabilities compared to its predecessor, with benchmark tests showing enhanced performance across multiple technical metrics published on Hugging Face. DeepSeek has rapidly emerged as a notable player in the global AI landscape in recent months, releasing a series of models that compete with Western counterparts while offering lower operational costs. The company launched its V3 model in December, followed by the release of its R1 model in January.
[13]
China's DeepSeek Ups The Heat In OpenAI Rivalry, Upgrades V3 Model Improving Coding And Reasoning Capability - Alphabet (NASDAQ:GOOG), Amazon.com (NASDAQ:AMZN)
How to Spot the Market Bottom: Matt Maley has navigated every major market turn in the last 35 years, and on Wednesday, March 26, at 6 PM ET, he's revealing how to recognize when the worst is over, the trades to make before the next bull market takes off, and the stocks and sectors that will lead the recovery. On Tuesday, Chinese AI startup DeepSeek released its upgraded DeepSeek-V3-0324 model, enhancing reasoning and coding abilities. Available on Hugging Face, the model shows improved performance in technical benchmarks, intensifying competition with U.S. leaders like OpenAI and Anthropic. The updated model has shown improvements across multiple benchmarks, particularly scoring 59.4 on the American Invitational Mathematics Examination, up from 39.6 in its predecessor, according to a report by the South China Morning Post. Also Read: Alibaba's Tsai Sounds Alarm On Signs Of AI Bubble In US: 'People Are Building Data Centers On Speculation' It also gained 10 points on LiveCodeBench, reaching a score of 49.2. The new model, with 685 billion parameters, uses an MIT software license, unlike DeepSeek V3, which has 671 billion parameters and a commercial license. Earlier this year, DeepSeek's R1 model disrupted American tech supremacy, sparking debates about Big Tech's significant investments in large language models and data centers. R1 made waves with its performance and lower costs, but analysts reportedly believe DeepSeek's biggest impact is encouraging the use of open-source AI models, CNBC reports. This shift has been a key factor in the company's influence on the industry. Wei Sun, principal analyst at Counterpoint Research, told CNBC that DeepSeek's success shows open-source strategies drive faster innovation and wider adoption, with many companies adopting the model. She also mentioned that R1 is influencing China's AI scene, prompting major firms like Baidu to open-source their own LLMs in response. Recently, Kai-Fu Lee, the former head of Alphabet Inc.'s GOOG GOOGL Google China and founder of AI startup 01.AI, said that the rise of open-source AI models like DeepSeek has exposed an existential risk to OpenAI's business -- and he's pivoting his company accordingly. He questioned the long-term sustainability of OpenAI's business model, especially when competing against open-source projects that offer similar quality at a fraction of the cost. Tim Wang, managing partner at Monolith Management, shared in an interview with CNBC that models from companies like DeepSeek have been powerful enablers in China, showing how progress can be made with fewer resources. He noted that open-source models have reduced costs, allowing for product innovation -- an area where Chinese companies excel. Wang compared this development to the "Android moment," when Google's decision to make its operating system's source code available sparked innovation in the app ecosystem. He added that the perception of China being 12 to 24 months behind the U.S. in AI has now shifted to just 3 to 6 months. This shift in China's AI landscape, driven by open-source models like DeepSeek, is changing global competition and challenging traditional business models. Read Next: Goldman Sachs Predicts $305 Billion AI Windfall While Adoption Remains Slow Image via Shutterstock AMZNAmazon.com Inc$204.750.73%Stock Score Locked: Want to See it? Benzinga Rankings give you vital metrics on any stock - anytime. Reveal Full ScoreEdge RankingsMomentum75.34Growth94.54Quality77.25Value48.39Price TrendShortMediumLongOverviewGOOGAlphabet Inc$171.691.04%GOOGLAlphabet Inc$169.441.05%MSFTMicrosoft Corp$395.060.50%TSLATesla Inc$276.40-0.71%Market News and Data brought to you by Benzinga APIs
[14]
China's DeepSeek reveales updated AI model as OpenAI rivalry intensifies - Reuters By Investing.com
Investing.com - DeepSeek has unveiled an update to its low-cost artificial intelligence model, intensifying the Chinese start-up's bid to rival AI industry leaders like ChatGPT-maker OpenAI and Anthropic, Reuters reported on Tuesday. The newest version of the offering, DeepSeek-V3-0324, has become available through the AI development platform Hugging Face, the news agency said. Reuters added that benchmark metrics on Hugging Face show that the model has displayed significant improvements in key areas like reasoning and coding when compared to its predecessor. DeepSeek, which launched its V3 model in December and R1 offering in January, has emerged as a provider of competitive alternatives to Western AI counterparts at a fraction of the cost. Silicon Valley executives and U.S. tech firm engineers have praised the V3 and R1 for delivering performance on par with OpenAI and Facebook-owner Meta Platforms' (NASDAQ:META) most advanced models. Depending on the task, the R1 in particular costs between 20 to 50 times less to use than the OpenAI o1 model, DeepSeek has said. Although these claims have faced skepticism from some sections of the global AI landscape, DeepSeek's rise has still fueled worries among investors over the necessity -- and eventual financial returns -- of massive AI spending by mega-cap technology companies. Earlier this year, these fears sparked a sharp downturn in stocks, with AI chipmaker Nvidia (NASDAQ:NVDA) shedding $593 billion in market value on January 27, a record one-day decline for any company.
[15]
DeepSeek V3-0324 update advances AI accessibility and performance By Investing.com
Investing.com -- The Chinese AI research group, DeepSeek, released the latest update to its DeepSeek V3 model, named DeepSeek V3-0324, on March 24, 2025. This open-source model presents significant enhancements in reasoning, coding, and frontend development capabilities. It surpasses its predecessor and rivals top models such as Claude Sonnet 3.5, showing noticeable improvements in benchmarks like MMLU-Pro and LiveCodeBench. The DeepSeek V3-0324 update is available under the MIT license on platforms like Hugging Face and OpenRouter. This release is part of DeepSeek's continuous effort to improve AI accessibility and performance. DeepSeek V3-0324 has now become the highest-scoring non-reasoning model on Artificial Analysis' Intelligence Index. This index includes seven evaluations that span reasoning, knowledge, math, and coding. This is the first time an open weights model has led among non-reasoning models, marking a significant milestone for open source AI. The DeepSeek V3-0324 model has gained seven points in the Artificial Analysis Intelligence Index, surpassing all other non-reasoning models. It ranks behind DeepSeek's own R1 model and other reasoning models from OpenAI, Anthropic, and Alibaba (NYSE:BABA) in the Intelligence Index. Non-reasoning models provide immediate answers without taking time to 'think', which makes them beneficial in latency-sensitive use cases. Three months ago, when DeepSeek released the V3 model, it was noted that it was close to leading proprietary models from Anthropic and Google (NASDAQ:GOOGL) but did not surpass them. With the release of DeepSeek V3-0324, DeepSeek has not only released the best open-source model but also leads the frontier of non-reasoning open weights models. This surpasses all proprietary non-reasoning models, including Gemini 2.0 Pro, Claude 3.7 Sonnet, and Llama 3.3 70B. The details of DeepSeek V3-0324 mostly align with the December 2024 version of DeepSeek V3. These include a context window of 128k (limited to 64k on DeepSeek's first-party API), total parameters of 671B, active parameters of 37B, native FP8 precision, text-only, and an MIT License. While DeepSeek V3-0324 lags behind leading reasoning models, including DeepSeek's own R1, it holds considerable value for many uses. The increased latency associated with allowing reasoning models to 'think' before answering can make them unusable in certain scenarios.
[16]
China's DeepSeek releases AI model upgrade, intensifies rivalry with OpenAI
BEIJING (Reuters) - Chinese artificial intelligence startup DeepSeek released a major upgrade to its V3 large language model, intensifying competition with U.S. tech leaders like OpenAI and Anthropic. The new model, DeepSeek-V3-0324, was made available through AI development platform Hugging Face, marking the company's latest push to establish itself in the rapidly evolving AI market. The latest model demonstrates significant improvements in areas such as reasoning and coding capabilities compared to its predecessor, with benchmark tests showing enhanced performance across multiple technical metrics published on Hugging Face. DeepSeek has rapidly emerged as a notable player in the global AI landscape in recent months, releasing a series of models that compete with Western counterparts while offering lower operational costs. The company launched its V3 model in December, followed by the release of its R1 model in January. (Reporting by Liam Mo and Brenda Goh; Editing by Kim Coghill)
Share
Share
Copy Link
Chinese AI startup DeepSeek releases a major upgrade to its V3 language model, showcasing improved performance and efficiency. The open-source model challenges industry leaders with its ability to run on consumer hardware.
Chinese AI startup DeepSeek has released a significant upgrade to its V3 large language model, dubbed DeepSeek-V3-0324, intensifying competition with industry giants like OpenAI and Anthropic. The new model, which appeared on AI repository Hugging Face with little fanfare, demonstrates substantial improvements in reasoning and coding capabilities compared to its predecessor 1.
DeepSeek-V3-0324 employs a mixture-of-experts (MoE) architecture, activating only about 37 billion of its 685 billion parameters during specific tasks. This selective activation represents a paradigm shift in model efficiency, allowing performance comparable to much larger fully-activated models while drastically reducing computational demands 5.
The model incorporates two additional breakthrough technologies:
One of the most striking features of DeepSeek-V3-0324 is its ability to run on consumer-grade hardware. Early reports suggest that a 4-bit quantized version of the model can achieve speeds of over 20 tokens per second on an Apple Mac Studio with an M3 Ultra chip 5. This development challenges traditional assumptions about the infrastructure requirements for top-tier AI model performance.
DeepSeek's decision to release the model under an MIT license, making it freely available for commercial use, exemplifies a growing trend among Chinese AI companies 2. This open-source approach contrasts sharply with the closed, API-centric strategies of Western leaders like OpenAI and Anthropic 5.
The availability of cutting-edge models like DeepSeek-V3-0324 is transforming China's AI ecosystem, enabling startups, researchers, and developers to build upon sophisticated AI technology without massive capital expenditure 3.
DeepSeek's rapid ascent has prompted other Chinese AI startups to reevaluate their strategies:
Early testers report that DeepSeek-V3-0324 may now be the best non-reasoning AI model, potentially surpassing Claude Sonnet 3 from Anthropic 5. If validated, this claim would solidify DeepSeek's position as a formidable competitor in the global AI market.
As DeepSeek and other Chinese AI companies continue to innovate and release open-source models, the competitive landscape of the AI industry may see significant shifts, challenging the dominance of established Western players and potentially accelerating the pace of AI development worldwide.
Reference
[3]
Chinese AI startup DeepSeek has disrupted the global AI market with its efficient and powerful models, sparking both excitement and controversy in the tech world.
6 Sources
6 Sources
DeepSeek's open-source R1 model challenges OpenAI's o1 with comparable performance at a fraction of the cost, potentially revolutionizing AI accessibility and development.
6 Sources
6 Sources
Chinese AI startup DeepSeek releases DeepSeek V3, an open-weight AI model with 671 billion parameters, outperforming leading open-source models and rivaling proprietary systems in various benchmarks.
7 Sources
7 Sources
Recent developments in AI models from DeepSeek, Allen Institute, and Alibaba are reshaping the landscape of artificial intelligence, challenging industry leaders and pushing the boundaries of what's possible in language processing and reasoning capabilities.
4 Sources
4 Sources
Chinese AI company DeepSeek's new large language model challenges US tech dominance, sparking debates on open-source AI and geopolitical implications.
9 Sources
9 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved