Curated by THEOUTPOST
On Fri, 27 Dec, 12:02 AM UTC
7 Sources
[1]
DeepSeek v3 The First Open AI Model to Rival OpenAI and Anthropic
DeepSeek v3 is an open-weight AI model that stands as a direct competitor to proprietary systems like OpenAI's ChatGPT and Anthropic's Claude AI models. By combining advanced technical innovations, cost-efficient training, and impressive performance benchmarks, it represents a significant milestone in the evolution of open source AI. This overview by Prompt Engineering explores its defining features, technical advancements, performance metrics, and challenges, offering a comprehensive view of its role in the AI landscape. DeepSeek v3 isn't just another AI model; it's a fantastic option for those who value accessibility, collaboration, and innovation. With its impressive performance, cost-efficient training, and open-weight design, it's proving that open source AI can stand toe-to-toe with the giants. Whether you're a developer looking for a flexible tool, a researcher eager to push boundaries, or simply someone curious about the future of AI, DeepSeek v3 offers a glimpse into what's possible when technology is made for everyone. DeepSeek v3 is an open-weight AI model boasting over 600 billion parameters, with 37 billion specifically optimized for production use. Trained on 14.8 trillion high-quality tokens, the model was developed at a cost of $5.6 million over 57 days using a constrained GPU cluster. This scale of training enables it to rival -- and in some cases outperform -- proprietary models like GPT-4 and Claude 3.5 in specific benchmarks. The open-weight nature of DeepSeek v3 is one of its most defining attributes. Unlike closed proprietary systems, it allows developers and researchers to access, modify, and build upon the model. This openness fosters collaboration and innovation, making it a valuable tool for advancing AI research and practical applications. By providing unrestricted access, DeepSeek v3 enables you to explore new possibilities, whether in academic research, software development, or enterprise solutions. DeepSeek v3 delivers exceptional performance across a wide range of tasks, showcasing its versatility and efficiency. Its key capabilities include: These features position DeepSeek v3 as a strong contender in the AI space, particularly for applications requiring speed, precision, and adaptability. Its ability to handle diverse tasks with efficiency makes it a versatile tool for both research and practical use. DeepSeek v3 incorporates several advanced technical features that distinguish it from other models in the market. These innovations not only enhance its performance but also contribute to its cost-efficiency: These technical advancements highlight the innovative approach behind DeepSeek v3, setting a benchmark for future AI development practices. By addressing the challenges of scalability and cost, it paves the way for more accessible and efficient AI solutions. Independent benchmarks reveal that DeepSeek v3 performs on par with or surpasses proprietary models in several critical domains. Its performance highlights include: These results underscore the model's potential for applications requiring nuanced decision-making, problem-solving, and technical expertise. Its ability to deliver consistent results across diverse tasks makes it a reliable choice for both research and industry use. DeepSeek v3 is designed with accessibility and flexibility in mind, offering multiple ways for users to interact with and deploy the model. Key features include: This accessibility makes DeepSeek v3 an attractive option for developers, researchers, and organizations looking to explore new use cases or enhance existing systems. Its flexibility ensures that it can be seamlessly integrated into a wide range of applications. While DeepSeek v3 offers numerous advantages, it also faces several challenges that must be addressed to ensure its reliability and ethical use: Addressing these challenges will be critical for making sure the long-term success and ethical deployment of DeepSeek v3. By prioritizing transparency and accountability, developers can build trust and confidence in the model's capabilities. The release of DeepSeek v3 represents a significant step forward for open-weight AI models. Its success highlights the potential of open source innovation to challenge the dominance of proprietary systems, offering accessible and cost-efficient alternatives. As the AI landscape continues to evolve, models like DeepSeek v3 will play a crucial role in driving advancements in accessibility, collaboration, and technical innovation. By combining innovative features with an open-weight design, DeepSeek v3 sets a new standard for what open source AI can achieve. Its ability to deliver high performance at a fraction of the cost of proprietary models makes it a compelling choice for developers, researchers, and organizations worldwide. As challenges are addressed and the model continues to evolve, its impact on the AI ecosystem is likely to grow, shaping the future of artificial intelligence for years to come.
[2]
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3. Available via Hugging Face under the company's license agreement, the new model comes with 671B parameters but uses a mixture-of-experts architecture to activate only select parameters, in order to handle given tasks accurately and efficiently. According to benchmarks shared by DeepSeek, the offering is already topping the charts, outperforming leading open-source models, including Meta's Llama 3.1-405B, and closely matching the performance of closed models from Anthropic and OpenAI. The release marks another major development closing the gap between closed and open-source AI. Ultimately, DeepSeek, which started as an offshoot of Chinese quantitative hedge fund High-Flyer Capital Management, hopes these developments will pave the way for artificial general intelligence (AGI), where models will have the ability to understand or learn any intellectual task that a human being can. What does DeepSeek-V3 bring to the table? Just like its predecessor DeepSeek-V2, the new ultra-large model uses the same basic architecture revolving around multi-head latent attention (MLA) and DeepSeekMoE. This approach ensures it maintains efficient training and inference -- with specialized and shared "experts" (individual, smaller neural networks within the larger model) activating 37B parameters out of 671B for each token. While the basic architecture ensures robust performance for DeepSeek-V3, the company has also debuted two innovations to further push the bar. The first is an auxiliary loss-free load-balancing strategy. This dynamically monitors and adjusts the load on experts to utilize them in a balanced way without compromising overall model performance. The second is multi-token prediction (MTP), which allows the model to predict multiple future tokens simultaneously. This innovation not only enhances the training efficiency but enables the model to perform three times faster, generating 60 tokens per second. "During pre-training, we trained DeepSeek-V3 on 14.8T high-quality and diverse tokens...Next, we conducted a two-stage context length extension for DeepSeek-V3," the company wrote in a technical paper detailing the new model. "In the first stage, the maximum context length is extended to 32K, and in the second stage, it is further extended to 128K. Following this, we conducted post-training, including Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on the base model of DeepSeek-V3, to align it with human preferences and further unlock its potential. During the post-training stage, we distill the reasoning capability from the DeepSeekR1 series of models, and meanwhile carefully maintain the balance between model accuracy and generation length." Notably, during the training phase, DeepSeek used multiple hardware and algorithmic optimizations, including the FP8 mixed precision training framework and the DualPipe algorithm for pipeline parallelism, to cut down on the costs of the process. Overall, it claims to have completed DeepSeek-V3's entire training in about 2788K H800 GPU hours, or about $5.57 million, assuming a rental price of $2 per GPU hour. This is much lower than the hundreds of millions of dollars usually spent on pre-training large language models. Llama-3.1, for instance, is estimated to have been trained with an investment of over $500 million. Strongest open-source model currently available Despite the economical training, DeepSeek-V3 has emerged as the strongest open-source model in the market. The company ran multiple benchmarks to compare the performance of the AI and noted that it convincingly outperforms leading open models, including Llama-3.1-405B and Qwen 2.5-72B. It even outperforms closed-source GPT-4o on most benchmarks, except English-focused SimpleQA and FRAMES -- where the OpenAI model sat ahead with scores of 38.2 and 80.5 (vs 24.9 and 73.3), respectively. Notably, DeepSeek-V3's performance particularly stood out on the Chinese and math-centric benchmarks, scoring better than all counterparts. In the Math-500 test, it scored 90.2, with Qwen's score of 80 the next best. The only model that managed to challenge DeepSeek-V3 was Anthropic's Claude 3.5 Sonnet, outperforming it with higher scores in MMLU-Pro, IF-Eval, GPQA-Diamond, SWE Verified and Aider-Edit. The work shows that open-source is closing in on closed-source models, promising nearly equivalent performance across different tasks. The development of such systems is extremely good for the industry as it potentially eliminates the chances of one big AI player ruling the game. It also gives enterprises multiple options to choose from and work with while orchestrating their stacks. Currently, the code for DeepSeek-V3 is available via GitHub under an MIT license, while the model is being provided under the company's model license. Enterprises can also test out the new model via DeepSeek Chat, a ChatGPT-like platform, and access the API for commercial use. DeepSeek is providing the API at the same price as DeepSeek-V2 until February 8. After that, it will charge $0.27/million input tokens ($0.07/million tokens with cache hits) and $1.10/million output tokens.
[3]
DeepSeek's Groundbreaking AI Model V3 Outshines Meta and OpenAI, Achieving Unmatched Performance with Fewer Resources
DeepSeek, a Chinese AI start-up, has unveiled its latest large language model, DeepSeek V3, a game-changing development in artificial intelligence.DeepSeek, a Chinese AI start-up, has made headlines with the release of its advanced large language model, DeepSeek V3. The model, boasting 671 billion parameters, outperformed prominent AI models like Meta's Llama 3.1 and OpenAI's GPT-4o in benchmark tests evaluating text understanding, coding, and problem-solving. This achievement is a major step for China's AI industry. The Hangzhou-based company revealed in a WeChat post that DeepSeek V3 was developed at an impressive cost of just $5.58 million, utilizing only 2.78 million GPU hours. By contrast, Meta's Llama 3.1 required 30.8 million GPU hours. DeepSeek relied on Nvidia's H800 GPUs, tailored for the Chinese market, sidestepping US sanctions that block access to advanced chips. Also Read: After 6th gen stealth fighter jet, China now unveils world's largest amphibious ship in a big challenge to U.S; here are specifications and all details Computer scientist Andrej Karpathy praised the achievement on X (formerly Twitter), noting that DeepSeek managed to create a frontier-grade model with minimal resources. According to DeepSeek's technical report, the V3 model not only surpassed Meta's and Alibaba's models but also delivered results comparable to OpenAI's GPT-4o and Amazon-backed Anthropic's Claude 3.5 Sonnet. DeepSeek, spun off in 2022 from High-Flyer Quant, emphasizes cost-effective AI development. The company's Fire Flyer GPU clusters have been instrumental in driving innovation. DeepSeek aims to democratize AI, offering its models for third-party application development alongside its chatbot services. Also Read : Apple Pulls iPhone 14 and SE from Europe as USB-C Mandate Changes the Game What is DeepSeek V3? DeepSeek V3 is a large language model (LLM) developed by the Chinese start-up DeepSeek. It has 671 billion parameters and is designed to understand and generate text more effectively than many existing models. How does DeepSeek V3 compare to other AI models? DeepSeek V3 outperforms models like Meta's Llama 3.1 and OpenAI's GPT-4o in various benchmark tests, including text generation, coding, and problem-solving. It matches the capabilities of some of the most advanced AI systems available.
[4]
DeepSeek-V3 is Now The Best Open Source AI Model
DeepSeek, a Chinese AI research lab backed by High-Flyer Capital Management has released DeepSeek-V3, the latest version of their frontier model. The Mixture-of-Experts model features a total of 671B total parameters, with 37B activated for each token. The model has been trained on 14.8 trillion tokens. DeepSeek has released the model on GitHub and a detailed technical paper outlining its capabilities. DeepSeek AI also released the benchmark scores, and it outperformed Meta's flagship Llama 3.1 405B parameter model, among many other closed-source models. It is also three times faster than its predecessor, the DeepSeek V2. "Comprehensive evaluations demonstrate that DeepSeek-V3 has emerged as the strongest open-source model currently available and achieves performance comparable to leading closed-source models like GPT-4o and Claude-3.5-Sonnet," read the technical paper. Moreover, DeepSeek also mentioned that it has distilled its reasoning capabilities from the DeepSeek R1 series of models. "Our pipeline elegantly incorporates the verification and reflection patterns of R1 into DeepSeek-V3 and notably improves its reasoning performance," read the paper. Moreover, the API pricing will remain the same as DeepSeek V2 until February 8, 2025. Further on, it will cost $0.27/million tokens during input and $1.10/million tokens during output. While it may not be a fair comparison, how does the model fare with OpenAI's o1? While o1 scored a 76% score on the GPQA Diamond (PhD-Level Science Questions) benchmark, DeepSeek does lag behind with a 59.1% score. The full version of o1 beats DeepSeek on multiple benchmarks. However, DeepSeek-V3 does outperform the coveted Claude 3.5 Sonnet across multiple benchmarks. That said, DeepSeek has been taking major strides in the open-source AI ecosystem over the last few months. Only a few weeks ago did the company launch the V2.5-1210, the final model in its V2 series. The model is accessible on chat.deepseek.com. Users can toggle the Internet Search feature on the website for real-time responses or integrate the model via Hugging Face. Models from the east are giving the ones from the west a run for their money, and DeepSeek isn't the only one. Alibaba's Qwen 2.5 on the other hand, offered performance parity with many leading models. The Qwen2.5-Coder series excels in code generation, matching the capabilities of GPT-4o on benchmarks like EvalPlus, LiveCodeBench, and BigCodeBench.
[5]
DeepSeek's new AI model appears to be one of the best 'open' challengers yet | TechCrunch
A Chinese lab has created what appears to be one of the most powerful "open" AI models to date. The model, DeepSeek V3, was developed by the AI firm DeepSeek, and was released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial ones. DeepSeek V3 can handle a range of text-based workloads and tasks, like coding, translating, and writing essays and emails from a descriptive prompt. According to DeepSeek's internal benchmark testing, DeepSeek V3 outperforms both downloadable, "openly" available models and "closed" AI models that can only be accessed through an API. In a subset of coding competitions hosted on Codeforces, a platform for programming contests, DeepSeek outperforms models including Meta's Llama 3.1 405B, OpenAI's GPT-4o, and Alibaba's Qwen 2.5 72B. DeepSeek V3 also crushes the competition on Aider Polgyglot, a test designed to measure, among other things, whether a model can successfully write new code that integrates into existing code. DeepSeek claims that DeepSeek V3 was trained on a data set of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data; 1 million tokens is equal to about 750,000 words. It's not just the training set that's massive. DeepSeek V3 is enormous in size: 685 billion parameters. (Parameters are the internal variables models use to make predictions or decisions.) That's around 1.6 times the size of Llama 3.1 405B, which has 405 billion parameters. Parameter count often (but not always) correlates with skill; models with more parameters tend to outperform models with fewer parameters. But large models also require beefier hardware in order to run. An unoptimized version of DeepSeek V3 would need a bank of high-end GPUs to answer questions at reasonable speeds. While it's not the most practical model, DeepSeek V3 is an achievement in some respects. DeepSeek was able to train the model using a data center of Nvidia H800 GPUs in just around two months -- GPUs that Chinese companies were recently restricted by the U.S. Commerce Department from procuring. The company also claims it only spent $5.576 million to train DeepSeek V3, a fraction of the development cost of models like OpenAI's GPT-4. The downside is, the model's political views are a bit -- filtered. Ask DeepSeek V3 about Tiananmen Square, for instance, and it won't answer. DeepSeek, being a Chinese company, is subject to benchmarking by China's internet regulator to ensure its models' responses "embody core socialist values." Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, like speculation about the Xi Jinping regime. DeepSeek, which recently unveiled DeepSeek-R1, an answer to OpenAI's o1 "reasoning" model, is a curious organization. It's backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions. DeepSeek's models have forced competitors like ByteDance, Baidu, and Alibaba to cut the usage prices for some of their models -- and make others completely free. High-Flyer builds its own server clusters for model training, one of the most recent of which reportedly has 10,000 Nvidia A100 GPUs and cost 1 billion yen (~$138 million). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve "superintelligent" AI through its DeepSeek org. In an interview earlier this year, Liang described open sourcing as a "cultural act," and characterized closed-source AI like OpenAI's a "temporary" moat. "Even OpenAI's closed-source approach hasn't stopped others from catching up," he noted.
[6]
DeepSeek-V3 Open-Source AI Model With MoE Architecture Released
The AI model also comes with advanced reasoning capabilities DeepSeek, a Chinese artificial intelligence (AI) firm, released the DeepSeek-V3 AI model on Thursday. The new open-source large language model (LLM) features a massive 671 billion parameters, surpassing the Meta Llama 3.1 model which has 405 billion parameters. Despite its size, the researchers claimed that the LLM is focused towards efficiency with its mixture-of-expert (MoE) architecture. Due to this, the AI model can only activate specific parameters relevant to the task provided and ensure efficiency and accuracy. Notably, it is a text-based model and does not have multimodal capabilities. The open-source DeepSeek-V3 AI model is currently being hosted on Hugging Face. According to the listing, the LLM is geared towards efficient inference and cost-effective training. For this, the researchers adopted Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. Essentially, the AI model only activates the parameters which are relevant to the topic of the prompt, ensuring faster processing and higher accuracy compared to typical models of this size. Pre-trained on 14.8 trillion tokens, the DeepSeek-V3 uses techniques such as supervised fine-tuning and reinforcement learning to generate high-quality responses. The Chinese firm claimed that despite its size, the AI model was fully trained in 2.788 million hours with the Nvidia H800 GPU. DeepSeek-V3's architecture also includes a load-balancing technique to minimise performance degradation. This technique was first used on its predecessor. Coming to performance, the researchers shared evals from internal testing of the model and claimed that it outperforms Meta Llama 3.1 and Qwen 2.5 models on the Big-Bench High-Performance (BBH), Massive Multitask Language Understanding (MMLU), HumanEval, MATH, and several other benchmarks. However, these are currently not verified by third-party researchers. One of the main highlights of the DeepSeek-V3 is its massive size of 671 billion parameters. While larger models exist, for example, the Gemini 1.5 Pro has one trillion parameters, such size in the open source space is rare. Prior to this, the largest open-source AI model was Meta's Llama 3.1 with 405 billion parameters. At present, DeepSeek-V3's code can be accessed by its Hugging Face listing under an MIT license for personal and commercial usage. Additionally, the AI model can also be tested via the company's online chatbot platform. Those looking to build using the AI model can also access the API.
[7]
DeepSeek open-sources new AI model with 671B parameters - SiliconANGLE
Chinese artificial intelligence developer DeepSeek today open-sourced DeepSeek-V3, a new large language model with 671 billion parameters. The LLM can generate text, craft software code and perform related tasks. DeepSeek says it outperforms two of the most advanced open-source LLMs on the market across more than a half-dozen benchmark tests. DeepSeek-V3 is based on a so-called mixture of experts, or MoE, architecture. It comprises multiple neural networks that are each optimized for a different set of tasks. When DeepSeek-V3 receives a prompt, a component known as a router sends the request to the neural network best-equipped to answer it. The MoE architecture's main benefit is that it reduces hardware costs. Sending a prompt to DeepSeek-V3 doesn't activate the entire LLM, but only the specific neural network to which the request is routed. Each such neural network has 34 billion parameters, which means it requires a relatively limited amount of infrastructure to run. Alongside its benefits, the MoE architecture also introduces certain challenges. During the training process, some of a MoE model's neural networks receive more training data than the others, which can create inconsistencies in the LLM's output quality. DeepSeek says it has developed a new method of mitigating this challenge and implemented it in DeepSeek-V3. The LLM was trained on 14.8 trillion tokens' worth of information. One token corresponds to a few letters or numbers. The training process took 2.788 million graphics processing unit hours, which means it used relatively little infrastructure. The industry's most advanced AI clusters have tens of thousands of GPUs or more that can complete such a training project in a few days. Alongside its MoE architecture, DeepSeek-V3 is equipped with several optimizations designed to boost its output quality. LLMs use a technique called attention to identify the most important details in a sentence. DeepSeek-3 implements multihead latent attention, an improved version of the technique that allows it to extract key details from a text snippet several times rather than only once. This makes the LLM less likely to overlook important information. DeepSeek-V also features a so-called multitoken prediction feature. Language models usually generate text one token at a time. DeepSeeek-V3, in contrast, generates several at once, which speeds up inference.
Share
Share
Copy Link
Chinese AI startup DeepSeek releases DeepSeek V3, an open-weight AI model with 671 billion parameters, outperforming leading open-source models and rivaling proprietary systems in various benchmarks.
Chinese AI startup DeepSeek has unveiled its latest large language model, DeepSeek V3, marking a significant advancement in open-source artificial intelligence. This ultra-large model, boasting 671 billion parameters, has emerged as a formidable competitor to both open-source and proprietary AI systems [1][2].
DeepSeek V3 employs a mixture-of-experts architecture, activating only 37 billion parameters out of its total 671 billion for each token. This approach ensures efficient processing while maintaining high performance [2]. The model incorporates two key innovations:
Trained on 14.8 trillion high-quality tokens, DeepSeek V3 underwent a two-stage context length extension, reaching up to 128K tokens [2]. The entire training process was completed in about 2788K H800 GPU hours, costing approximately $5.57 million – significantly less than the hundreds of millions typically spent on training large language models [2][3].
DeepSeek V3 has demonstrated impressive performance across various benchmarks:
DeepSeek V3 is designed for broad accessibility:
The release of DeepSeek V3 represents a significant step in narrowing the gap between closed and open-source AI models. Its performance and cost-efficiency challenge the dominance of proprietary systems, potentially democratizing access to advanced AI capabilities [1][5].
Despite its achievements, DeepSeek V3 faces several challenges:
Additionally, being a Chinese company, DeepSeek is subject to regulatory oversight, which may influence certain model responses on sensitive topics [5].
DeepSeek V3's success highlights the potential of open-source innovation in AI. As the model continues to evolve and address challenges, it could significantly impact the AI ecosystem, driving advancements in accessibility, collaboration, and technical innovation [1][4]. This development may also intensify competition in the AI industry, potentially leading to more rapid advancements and reduced costs for AI technologies [4][5].
Reference
[1]
[3]
[4]
DeepSeek, a Chinese AI company, has launched R1-Lite-Preview, an open-source reasoning model that reportedly outperforms OpenAI's o1 preview in key benchmarks. The model showcases advanced reasoning capabilities and transparency in problem-solving.
11 Sources
Chinese AI startup DeepSeek releases open-source reasoning models R1 and R1-Zero, claiming to match or outperform OpenAI's o1 on various benchmarks at a fraction of the cost.
4 Sources
Chinese AI company DeepSeek unveils a highly efficient large language model, DeepSeek-V3, trained at a fraction of the cost of Western counterparts, raising questions about the effectiveness of US chip export restrictions.
2 Sources
OpenAI unveils o3 and o3 Mini models with impressive capabilities in reasoning, coding, and mathematics, sparking debate on progress towards Artificial General Intelligence (AGI).
35 Sources
NVIDIA quietly released a new open-source AI model, Llama-3.1-Nemotron-70B-Instruct, which has reportedly outperformed leading models from OpenAI and Anthropic in benchmark tests, signaling a shift in NVIDIA's AI strategy.
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved