2 Sources
[1]
Accelerate DeepSeek Reasoning Models With NVIDIA GeForce RTX 50 Series AI PCs
The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market. A New Class of Models That Reason Reasoning models are a new class of large language models (LLMs) that spend more time on "thinking" and "reflecting" to work through complex problems, while describing the steps required to solve a task. The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time -- and thus compute -- on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems. Reasoning models can enhance user experiences on PCs by deeply understanding a user's needs, taking actions on their behalf and allowing them to provide feedback on the model's thought process -- unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more. The DeepSeek Difference The DeepSeek-R1 family of distilled models is based on a large 671-billion-parameter mixture-of-experts (MoE) model. MoE models consist of multiple smaller expert models for solving complex problems. DeepSeek models further divide the work and assign subtasks to smaller sets of experts. DeepSeek employed a technique called distillation to build a family of six smaller student models -- ranging from 1.5-70 billion parameters -- from the large DeepSeek 671-billion-parameter model. The reasoning capabilities of the larger DeepSeek-R1 671-billion-parameter model were taught to the smaller Llama and Qwen student models, resulting in powerful, smaller reasoning models that run locally on RTX AI PCs with fast performance. Peak Performance on RTX Inference speed is critical for this new class of reasoning models. GeForce RTX 50 Series GPUs, built with dedicated fifth-generation Tensor Cores, are based on the same NVIDIA Blackwell GPU architecture that fuels world-leading AI innovation in the data center. RTX fully accelerates DeepSeek, offering maximum inference performance on PCs. Experience DeepSeek on RTX in Popular Tools NVIDIA's RTX AI platform offers the broadest selection of AI tools, software development kits and models, opening access to the capabilities of DeepSeek-R1 on over 100 million NVIDIA RTX AI PCs worldwide, including those powered by GeForce RTX 50 Series GPUs. High-performance RTX GPUs make AI capabilities always available -- even without an internet connection -- and offer low latency and increased privacy because users don't have to upload sensitive materials or expose their queries to an online service.
[2]
DeepSeek-R1 Now Live With NVIDIA NIM
DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes -- using reason to arrive at the best answer -- is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference. As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency. To help developers securely experiment with these capabilities and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an NVIDIA NIM microservice preview on build.nvidia.com. The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system. Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform. The DeepSeek-R1 NIM microservice simplifies deployments with support for industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure. Using NVIDIA AI Foundry with NVIDIA NeMo software, enterprises will also be able to create customized DeepSeek-R1 NIM microservices for specialized AI agents. DeepSeek-R1 -- a Perfect Example of Test-Time Scaling DeepSeek-R1 is a large mixture-of-experts (MoE) model. It incorporates an impressive 671 billion parameters -- 10x more than many other popular open-source LLMs -- supporting a large input context length of 128,000 tokens. The model also uses an extreme number of experts per layer. Each layer of R1 has 256 experts, with each token routed to eight separate experts in parallel for evaluation. Delivering real-time answers for R1 requires many GPUs with high compute performance, connected with high-bandwidth and low-latency communication to route prompt tokens to all the experts for inference. Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput is made possible by using the NVIDIA Hopper architecture's FP8 Transformer Engine at every layer -- and the 900 GB/s of NVLink bandwidth for MoE expert communication. Getting every floating point operation per second (FLOPS) of performance out of a GPU is critical for real-time inference. The next-generation NVIDIA Blackwell architecture will give test-time scaling on reasoning models like DeepSeek-R1 a giant boost with fifth-generation Tensor Cores that can deliver up to 20 petaflops of peak FP4 compute performance and a 72-GPU NVLink domain specifically optimized for inference. Get Started Now With the DeepSeek-R1 NIM Microservice Developers can experience the DeepSeek-R1 NIM microservice, now available on build.nvidia.com. Watch how it works: With NVIDIA NIM, enterprises can deploy DeepSeek-R1 with ease and ensure they get the high efficiency needed for agentic AI systems.
Share
Copy Link
NVIDIA introduces acceleration for DeepSeek-R1 reasoning models on GeForce RTX 50 Series GPUs and launches a NIM microservice, enhancing AI capabilities for local PCs and enterprise deployments.
NVIDIA has announced significant advancements in AI reasoning capabilities through support for DeepSeek-R1 models on its latest GeForce RTX 50 Series GPUs and the introduction of a new NIM microservice. These developments aim to enhance AI performance on local PCs and enterprise deployments, marking a notable step forward in AI accessibility and efficiency 12.
DeepSeek-R1 represents a novel category of large language models (LLMs) designed for advanced reasoning and problem-solving. These models employ a "test-time scaling" approach, allocating more compute resources during inference to tackle complex tasks. The DeepSeek-R1 family, based on a 671-billion-parameter mixture-of-experts (MoE) model, has been distilled into smaller, yet powerful versions ranging from 1.5 to 70 billion parameters 1.
NVIDIA's GeForce RTX 50 Series GPUs, featuring fifth-generation Tensor Cores and based on the Blackwell architecture, offer unprecedented AI performance for consumer PCs:
To cater to developers and enterprises, NVIDIA has launched the DeepSeek-R1 NIM microservice:
The full 671-billion-parameter DeepSeek-R1 model demonstrates impressive performance:
The upcoming NVIDIA Blackwell architecture promises even greater advancements:
These developments by NVIDIA represent a significant leap in making advanced AI reasoning capabilities more accessible and efficient, both for individual users and enterprise applications. The combination of powerful hardware and optimized software solutions paves the way for more sophisticated AI applications in various fields, from personal computing to large-scale enterprise deployments.
Summarized by
Navi
[2]
NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.
9 Sources
Technology
3 hrs ago
9 Sources
Technology
3 hrs ago
As nations compete for dominance in space, the risk of satellite hijacking and space-based weapons escalates, transforming outer space into a potential battlefield with far-reaching consequences for global security and economy.
7 Sources
Technology
19 hrs ago
7 Sources
Technology
19 hrs ago
OpenAI updates GPT-5 to make it more approachable following user feedback, sparking debate about AI personality and user preferences.
6 Sources
Technology
11 hrs ago
6 Sources
Technology
11 hrs ago
A pro-Russian propaganda group, Storm-1679, is using AI-generated content and impersonating legitimate news outlets to spread disinformation, raising concerns about the growing threat of AI-powered fake news.
2 Sources
Technology
19 hrs ago
2 Sources
Technology
19 hrs ago
A study reveals patients' increasing reliance on AI for medical advice, often trusting it over doctors. This trend is reshaping doctor-patient dynamics and raising concerns about AI's limitations in healthcare.
3 Sources
Health
11 hrs ago
3 Sources
Health
11 hrs ago