NVIDIA Blackwell dominates MLPerf Training 6.0 with fastest times and 8,192-GPU scale

2 Sources

Share

NVIDIA Blackwell swept every category in MLPerf Training 6.0, the industry's leading AI training benchmark. The platform achieved the fastest training times across all seven benchmarks, scaled to 8,192 GPUs, and was the only vendor to submit results for every test. GB300 systems delivered up to 60% faster performance than GB200, while competitors failed to submit results for newer workloads.

NVIDIA Blackwell Achieves Fastest Training Times Across All Benchmarks

NVIDIA Blackwell has solidified its position as the dominant force in AI training, sweeping every category in MLPerf Training 6.0, the industry's most rigorous peer-reviewed benchmark suite for evaluating AI training hardware

1

. The platform delivered the fastest training times across all seven benchmarks, scaled to 8,192 GPUs using NVIDIA Blackwell NVL72 systems, and remained the only vendor to submit results across the entire suite

2

. The performance gap between NVIDIA and competitors proved substantial—what NVIDIA accomplished in 4.46 minutes took the nearest alternative 58.63 minutes, representing a 13.1x time difference

2

.

Source: Wccftech

Source: Wccftech

This round introduced two new mixture-of-experts workloads reflecting the growing importance of MoE architectures in frontier AI models: DeepSeek-V3 671B and GPT-OSS-20B

1

. For these newest benchmarks, competitors including AMD's MI300 and MI350 series failed to submit any results, leaving NVIDIA as the sole participant

2

.

GB300 Systems Outpace GB200 by 60% in Large Model Training

The performance difference between GB300 systems and GB200 systems proved significant, with GB300 NVL72 delivering up to 1.6x faster training than GB200 NVL72 at the same scale

1

. This translates to approximately 60% faster performance, driven by higher compute density with NVFP4, expanded memory capacity, and a higher power ceiling that enables GPUs to sustain peak performance

2

.

Source: NVIDIA

Source: NVIDIA

Within each NVL72 rack-scale system, fifth-generation NVLink Switches connect all 72 GPUs with high bandwidth, creating a unified pool of compute and memory that enables them to function as one giant GPU

1

. This architecture proves particularly effective for large-scale MoE training, which faces significant all-to-all communication challenges as tokens must be routed across GPUs to reach the correct expert subnetwork.

Microsoft Azure and CoreWeave Demonstrate Scalability at 8,192 GPUs

NVIDIA's partners achieved notable milestones demonstrating the scalability of Blackwell-based AI training hardware. Microsoft Azure scaled Llama 3.1 405B training to 8,192 GPUs using GB200 NVL72 systems, reaching the reference quality target in 7.07 minutes—the fastest time to train for this benchmark

1

. CoreWeave delivered the fastest time to train for DeepSeek-V3 671B, reaching the quality target in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems connected with Spectrum-X Ethernet networking

1

.

NVIDIA also submitted results at 5,120 GPUs with GB200 NVL72 systems on Llama 3.1 405B, one of the largest dense LLMs in the suite, representing the largest-scale Blackwell-based submission in MLPerf Training to date

1

.

NVIDIA's Dominance in MLPerf Leaves Competition Behind

The competitive landscape in MLPerf Training 6.0 revealed a widening gap between NVIDIA and other AI training hardware vendors. In the Flux1 benchmark, 32 GB300 GPUs proved faster than 512 AMD MI300X and 64 MI320X accelerators, with no submissions made for AMD's newer MI350 series

2

. Across Llama 2 70B and Llama 3.1 8B benchmarks, GB300 and GB200 8-accelerator systems consistently outpaced competitors at equivalent configurations.

NVIDIA continues advancing low-precision training innovation across different model architectures, recently using NVFP4 to pretrain the massive 550-billion-parameter NVIDIA Nemotron 3 Ultra model

1

. The platform showcased NVFP4 training methods that increase performance while meeting strict accuracy requirements across large- and small-scale pretraining as well as fine-tuning workloads.

For organizations building frontier AI models, these results signal that NVIDIA's infrastructure shapes critical factors including iteration speed, model scale capabilities, and job completion reliability. With the upcoming Vera Rubin platform on the horizon and continued software optimizations, NVIDIA's position in AI training appears set to strengthen further, particularly as production training runs span weeks or months across hundreds of thousands of GPUs.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved