Curated by THEOUTPOST
On Tue, 25 Mar, 4:02 PM UTC
2 Sources
[1]
DeepSeek releases improved DeepSeek-V3 model under MIT license - SiliconANGLE
DeepSeek releases improved DeepSeek-V3 model under MIT license DeepSeek today released an improved version of its DeepSeek-V3 large language model under a new open-source license. Software developer and blogger Simon Willison was first to report the update. DeepSeek itself didn't issue an announcement. The new model's Readme file, a component of code repositories that usually contains explanatory notes, is currently empty. DeepSeek-V3 is an open-source LLM that made its debut in December. It forms the basis of DeepSeek-R1, the reasoning model that propelled the Chinese artificial intelligence lab to prominence earlier this year. DeepSeek-V3 is a general-purpose model that isn't specifically optimized for reasoning, but it can solve some math problems and generate code. Until now, the LLM was distributed under a custom open-source license. The new release that DeepSeek rolled out today switches to the widely-used MIT License. Developers can use the updated model in commercial projects and modify it with practically no limitations. More notably, it appears that the new DeepSeek-V3 release is more capable and hardware-efficient than the original. Most cutting-edge LLMs can only run on data center graphics cards. Awni Hannun, a research scientist at Apple Inc.'s machine learning research group, successfully ran the new DeepSeek-V3 release on a Mac Studio. The model managed to generate output at a rate of about 20 tokens per second. The Mac Studio in question featured a high-end configuration with a $9,499 price tag. Deploying DeepSeek-V3 on the model required optimizing it using four-bit quantization. This is a model optimization technique that trades off some output accuracy for lower memory usage and latency. According to an X post spotted by VentureBeat, the new DeepSeek-V3 version is better at programming than the original release. The post contains what is described as a benchmark test that evaluated the model's ability to generate Python and Bash code. The new release achieved a score of about 60%, which is several percentage points better than the original DeepSeek-V3. The model still trails behind DeepSeek-R1, the Chinese AI lab's reasoning-optimized LLMs. The latest DeepSeek-V3 release also achieved a lower score than Qwen-32B, another reasoning-optimized model. Although DeepSeek-V3 features 671 billion parameters, it only activates about 37 billion when answering prompts. This arrangement enables the model to make do with less infrastructure than traditional LLMs that activate all their parameters. According to DeepSeek, the LLM also more efficient than DeepSeek-R1. The original version of DeepSeek-V3 was trained on a dataset that included 14.8 trillion tokens. The training process used about 2.8 million graphics card hours, significantly less than what frontier LLMs typically require. To improve the model's output quality, DeepSeek engineers fine-tuned it using prompt responses from DeepSeek-R1.
[2]
DeepSeek updates V3 AI model, adopts new license By Investing.com
Investing.com-- DeepSeek updated its V3 artificial intelligence model this week with claimed upgrades to the model's reasoning and programming skills, while also updating its open source license. The model was updated to V3-0324 on Github, while its license was updated to MIT- a popular open source license originating from the Massachusetts Institute of Technology. DeepSeek did not make a formal announcement on the update. The model performed substantially better than its predecessor, early testing showed, and appeared to be ahead of comparable thinking models, such as ChatGPT's o3-mini, AI entrepreneur Paul Gauthier said on X. DeepSeek's update comes after the release of its R1 model sent waves through global markets in late-January, as the model appeared to match or even surpass the performance of its rivals while using older hardware and a fraction of their budgets. DeepSeek also spurred increased confidence in China's AI capabilities, with a host of other Chinese tech majors, such as Baidu Inc (NASDAQ:BIDU), Bytedance, Alibaba Group Holdings Ltd (NYSE:BABA), and Tencent Holdings Ltd (HK:0700), capitalizing on this popularity by releasing new AI models. Tencent formally released its Hunyuan T1 reasoning model last week, which it claimed rivaled DeepSeek R1 in performance and price.
Share
Share
Copy Link
DeepSeek has released an improved version of its DeepSeek-V3 large language model under the MIT License, offering better performance in programming and reasoning tasks while increasing its accessibility for commercial use.
DeepSeek, a Chinese artificial intelligence lab, has quietly rolled out an updated version of its DeepSeek-V3 large language model (LLM) with significant improvements and a new open-source license. The release, first reported by software developer Simon Willison, marks a notable advancement in the accessibility and capabilities of open-source AI models 1.
The latest iteration of DeepSeek-V3, dubbed V3-0324, introduces several notable improvements:
MIT License Adoption: The model has transitioned from a custom open-source license to the widely-used MIT License, allowing developers to use and modify the model in commercial projects with minimal restrictions 1.
Improved Performance: Early benchmarks suggest that the new version outperforms its predecessor in programming tasks. A reported benchmark test showed the model achieving a score of about 60% in generating Python and Bash code, several percentage points higher than the original DeepSeek-V3 1.
Hardware Efficiency: Despite its 671 billion parameters, DeepSeek-V3 only activates about 37 billion when responding to prompts, making it more efficient than traditional LLMs 1.
While DeepSeek-V3 is a general-purpose model, it has shown promising capabilities in specific areas:
Reasoning and Math Skills: The model can solve some math problems and generate code, although it's not specifically optimized for reasoning like its counterpart, DeepSeek-R1 1.
Competitive Performance: Early testing indicates that the updated V3 model performs better than comparable models like ChatGPT's o3-mini, according to AI entrepreneur Paul Gauthier 2.
Hardware Compatibility: Awni Hannun, a research scientist at Apple Inc.'s machine learning research group, successfully ran the new DeepSeek-V3 on a high-end Mac Studio, generating output at about 20 tokens per second 1.
The release of the improved DeepSeek-V3 model has broader implications for the AI industry:
Open-Source Advancement: By releasing under the MIT License, DeepSeek is contributing to the democratization of AI technology, potentially accelerating innovation in the field 12.
Chinese AI Capabilities: The update follows the success of DeepSeek's R1 model, which had previously demonstrated China's growing prowess in AI development 2.
Industry Competition: DeepSeek's advancements have spurred increased activity among Chinese tech giants, with companies like Baidu, Bytedance, Alibaba, and Tencent releasing new AI models to capitalize on the momentum 2.
The original DeepSeek-V3 model was trained on a dataset of 14.8 trillion tokens, using approximately 2.8 million graphics card hours – significantly less than what is typically required for frontier LLMs. To enhance output quality, DeepSeek engineers fine-tuned the model using prompt responses from DeepSeek-R1 1.
As the AI landscape continues to evolve rapidly, DeepSeek's latest release represents a significant step forward in making powerful language models more accessible and efficient for developers and researchers worldwide.
Reference
[2]
Chinese AI startup DeepSeek releases DeepSeek V3, an open-weight AI model with 671 billion parameters, outperforming leading open-source models and rivaling proprietary systems in various benchmarks.
7 Sources
7 Sources
Chinese AI startup DeepSeek releases a major upgrade to its V3 language model, showcasing improved performance and efficiency. The open-source model challenges industry leaders with its ability to run on consumer hardware.
16 Sources
16 Sources
DeepSeek's open-source R1 model challenges OpenAI's o1 with comparable performance at a fraction of the cost, potentially revolutionizing AI accessibility and development.
6 Sources
6 Sources
Microsoft integrates DeepSeek R1 into its Azure AI Foundry and GitHub, expanding AI model accessibility while raising questions about competition and intellectual property in the AI industry.
14 Sources
14 Sources
Chinese AI startup DeepSeek announces plans to release key code repositories and data to the public, marking a significant move towards transparency and open-source AI development.
8 Sources
8 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved