Curated by THEOUTPOST
On Wed, 18 Dec, 4:03 PM UTC
3 Sources
[1]
Nvidia may postpone volume ramp-up of Blackwell machines: TrendForce
Nvidia may have to postpone the volume ramp of next-generation AI servers based on the B200 and GB200 platforms due to overheating, power consumption, and the necessity to optimize interconnections, according to a TrendForce report. The market research firm believes that mass production and peak shipments of Blackwell machines will occur sometime in mid-2025, which means a nearly half-year delay. Nvidia has yet to confirm or deny the claims. As expected, Nvidia and its partners can ship only limited quantities of Blackwell-based servers in 2024, as the company will have to use its low-yielding B200 for them. However, Dell is already shipping Blackwell server racks. However, although refined versions of Nvidia's B200 processors entered mass production in October and, therefore, will get to the company's hands in January, TrendForce does not expect the ramp of Blackwell-based servers to skyrocket immediately. According to the firm, due to overheating, power consumption, and requirements for higher-speed interconnects, mass production and peak shipments of B200 and GB200 will occur only between the second and the third quarter of 2025. Just several months ago, it was reported that an Nvidia NVL72 rack based on the GB200 platform with 72 B200 GPUs would consume 120 kW of power, which already is significantly higher than current AI server racks (typical high-density rack power is up to 20kW, while an H100-based rack reportedly consumes around 40kW). TrendForce now claims that Nvidia had updated the specification of the device, and now it consumes 140 kW, which is more than typical data centers can provide to a single rack. The problem is that Nvidia's Blackwell GPUs were reportedly prone to overheating in servers equipped with 72 processors even when the racks consumed up to 120 kW per rack. This issue has forced Nvidia to repeatedly revise its server rack designs, as overheating not only reduces GPU performance but also risks hardware damage. A 140 kW per rack power consumption means further alterations to server designs, which could result in setbacks. Increased power consumption means additional cooling requirements. Liquid cooling is essential for Blackwell servers, but modern sidecar coolant distribution units (CDUs) can only handle 60 kW -- 80 kW of thermal power. To that end, cooling system providers are optimizing cold plate designs and aiming to double or triple the capacity of CDUs. TrendForce expects the performance of liquid-to-liquid in-row CDUs to exceed 1.3 mW, with further advancements possible, so excessive heat dissipation will eventually cease to be a major problem. However, according to the report, power consumption and heat management are not the only issues that Nvidia and its partners have to solve. TrendForce claims that Nvidia has to optimize its interconnections but doesn't elaborate on which interconnections must be optimized. It remains to be seen how the claimed teething problems with Nvidia's B200 and GB200 servers affect the launch timeframe and availability of B200A based on simplified Blackwell processors and the B300 and GB300 machines featuring refreshed Blackwell GPUs. While B200A will likely feature a considerably lower power consumption compared to B200/GB200, the refreshed B300-series Blackwell GPUs promise to come with more memory and feature higher compute performance, which usually comes at higher power, so these products will likely consume even more than 140 kW per rack, necessitating even more sophisticated components and cooling.
[2]
NVIDIA GB200 AI server mass production, peak shipments could be delayed until Q2 or Q3 2025
NVIDIA's new GB200 rack-mounted AI servers are still experiencing issues, with a new report suggesting that the supply chain requires more time for optimization and adjustment, and it could see mass production and peak shipments delayed until Q2 or even Q3 2025. In a new report from TrendForce, their latest report says that the supply chain needs more time for GB200 rack servers, and that it's mostly because of the higher design specifications of the GB200 rack, including its requirement for high-speed interconnect interfaces, and thermal design power (TDP), which "significantly exceed market norms". TrendForce is now projecting that mass production and peak shipments of NVIDIA GB200 rack servers are "unlikely to occur until between Q2 and Q3 of 2025". The NVIDIA GB rack series includes the GB200 and new GB300 models, which feature even more complex tech and higher production costs. NVIDIA's new GB200 NVL72 AI server is expected to become "the most widely adopted model in 2025" which will potentially account for up to 80% of total deployments as NVIDIA ranks up its push into the market with GB200. The high-speed interconnect issue stems from NVIDIA's in-house NVLink connectivity (the high-speed connection between GPUs) with GB200 using fifth-generation NVLink and offers significantly higher total bandwidth than the current industry standard of PCIe 5.0. TrendForce notes that the TDP of the 2024-dominant HGX AI server typically ranges from between 60kW to 80kW per rack, but the new GB200 NVL72 AI server can reach an insane 140kW per rack, which is close to doubling the power demands over current racks. CSPs (cloud service providers) are pushing the adoption of liquid-cooling solutions over air-cooling solutions, because air cooling is no longer enough for the higher thermal loads.
[3]
NVIDIA's GB200 rack needs more supply chain optimization, mass production expected in Q2 and Q3 of 2025 By Investing.com
Investing.com -- The NVIDIA (NASDAQ:NVDA) GB200 rack-mounted solution requires further optimization and adjustment in its supply chain, according to recent research by TrendForce. The complex design specifications of the GB200 rack, including high-speed interconnect interfaces and thermal design power (TDP) requirements that exceed market norms, are the primary reasons for this need. As a result, TrendForce predicts that mass production and peak shipments will likely take place between Q2 and Q3 of 2025. The NVIDIA GB rack series, which includes the GB200 and GB300 models, is characterized by complex technology and higher production costs. This makes it a preferred solution for large Cloud Service Providers (CSPs) and other potential users such as Tier-2 data centers, national sovereign cloud providers, and academic research institutions working on High-Performance Computing (HPC) and Artificial Intelligence (AI) applications. The GB200 NVL72 model is expected to be the most popular in 2025, possibly accounting for up to 80% of total deployments as NVIDIA increases its market efforts. NVIDIA's proprietary NVLink technology is integral to the company's strategy to enhance the computational performance of AI and HPC server systems. This technology allows for high-speed connections between GPU chips. The GB200 uses the fifth-generation NVLink, providing a total bandwidth that significantly surpasses the current industry standard, PCIe 5.0. The TDP of the HGX AI server, which dominated in 2024, typically ranges from 60 kW to 80 kW per rack. However, the GB200 NVL72's TDP reaches 140 kW per rack, doubling power requirements. This has led manufacturers to speed up the adoption of liquid cooling solutions, as traditional air cooling methods cannot handle such high thermal loads. The advanced design requirements for the GB200 have raised concerns about possible delays in component availability and system shipments. TrendForce states that the production of Blackwell GPU chips is progressing mostly as planned, with only limited shipments expected in 4Q24. Production volume is expected to increase gradually from 1Q25 onwards. However, due to ongoing supply chain adjustments for the AI server system components, shipments at the end of 2024 are expected to be lower than industry expectations. As a result, TrendForce predicts that the peak shipment period for the GB200 full-rack system will be delayed to between Q2 and Q3 of 2025. The GB200 NVL72's TDP of 140 kW has made liquid cooling essential, as it surpasses the capabilities of traditional air-cooled solutions. The adoption of liquid-cooling components is gaining momentum, with leading industry players investing heavily in research and development for liquid cooling technologies. Notably, suppliers of coolant distribution units are striving to improve cooling efficiency by increasing rack sizes and developing more efficient cold plate designs. Current sidecar CDUs can dissipate between 60 kW and 80 kW, but future designs are expected to double or even triple this cooling capacity. The development of liquid-to-liquid in-row CDU systems has allowed cooling performance to exceed 1.3 mW, with further improvements expected as computational power demands continue to grow.
Share
Share
Copy Link
Nvidia's next-generation Blackwell AI servers, including the GB200 and GB300 models, may experience delays in mass production and peak shipments until mid-2025 due to overheating, power consumption, and interconnection optimization issues.
Nvidia, the leading AI chip manufacturer, may be facing significant challenges with its next-generation Blackwell AI servers. According to a report by TrendForce, the mass production and peak shipments of Blackwell machines, including the B200 and GB200 platforms, could be postponed until mid-2025, representing a delay of nearly six months 12.
The primary issues causing the potential delay are:
Overheating: The Blackwell GPUs are reportedly prone to overheating in servers equipped with 72 processors, even at high power consumption levels 1.
Power Consumption: The power requirements for Blackwell-based servers have increased significantly. An Nvidia NVL72 rack based on the GB200 platform with 72 B200 GPUs is now expected to consume 140 kW of power, up from the previously reported 120 kW 12.
Interconnection Optimization: TrendForce claims that Nvidia needs to optimize its interconnections, particularly the high-speed NVLink technology used for GPU-to-GPU communication 3.
The extreme power consumption of Blackwell servers necessitates advanced cooling solutions:
The potential delay could have significant implications for the AI hardware market:
The AI industry is adapting to the challenges posed by these high-performance servers:
As Nvidia works to overcome these technical hurdles, the AI hardware landscape continues to evolve, with power efficiency and thermal management becoming increasingly critical factors in the development of next-generation AI infrastructure.
Reference
[1]
[2]
NVIDIA prepares to launch its next-generation Blackwell GB200 AI servers in December, with major cloud providers like Microsoft among the first recipients. This move aims to address supply issues and meet the growing demand for AI computing power.
3 Sources
3 Sources
NVIDIA's latest GB200 AI servers are at the center of controversy, with reports of overheating issues and order reductions from major tech companies. Taiwanese suppliers deny these claims, while the industry grapples with the transition to liquid cooling technology.
6 Sources
6 Sources
NVIDIA's next-generation Blackwell AI GPUs are experiencing unprecedented demand, with the entire supply sold out for the next 12 months. Major tech companies are aggressively acquiring these GPUs, highlighting the intense competition in the AI hardware market.
6 Sources
6 Sources
NVIDIA is set to unveil its GB300 'Blackwell Ultra' AI GPUs at GTC 2025, featuring fully liquid-cooled AI clusters. The new servers promise significant performance improvements and mark a shift in cooling technology for AI infrastructure.
3 Sources
3 Sources
NVIDIA is expected to offer US-compliant GB20 Blackwell AI servers to China, while facing potential high costs for Blackwell server cabinets. This development highlights the complexities of international tech trade and the increasing value of AI infrastructure.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved