NVIDIA Blackwell Sweeps MLPerf Training 6.0 Benchmarks

NVIDIA Blackwell Achieves Fastest Training Times Across All Benchmarks

NVIDIA Blackwell has solidified its position as the dominant force in AI training, sweeping every category in MLPerf Training 6.0, the industry's most rigorous peer-reviewed benchmark suite for evaluating AI training hardware 1

. The platform delivered the fastest training times across all seven benchmarks, scaled to 8,192 GPUs using NVIDIA Blackwell NVL72 systems, and remained the only vendor to submit results across the entire suite 2

. The performance gap between NVIDIA and competitors proved substantial—what NVIDIA accomplished in 4.46 minutes took the nearest alternative 58.63 minutes, representing a 13.1x time difference 2

Source: Wccftech

This round introduced two new mixture-of-experts workloads reflecting the growing importance of MoE architectures in frontier AI models: DeepSeek-V3 671B and GPT-OSS-20B 1

. For these newest benchmarks, competitors including AMD's MI300 and MI350 series failed to submit any results, leaving NVIDIA as the sole participant 2

GB300 Systems Outpace GB200 by 60% in Large Model Training

The performance difference between GB300 systems and GB200 systems proved significant, with GB300 NVL72 delivering up to 1.6x faster training than GB200 NVL72 at the same scale 1

. This translates to approximately 60% faster performance, driven by higher compute density with NVFP4, expanded memory capacity, and a higher power ceiling that enables GPUs to sustain peak performance 2

Source: NVIDIA

Within each NVL72 rack-scale system, fifth-generation NVLink Switches connect all 72 GPUs with high bandwidth, creating a unified pool of compute and memory that enables them to function as one giant GPU 1

. This architecture proves particularly effective for large-scale MoE training, which faces significant all-to-all communication challenges as tokens must be routed across GPUs to reach the correct expert subnetwork.

Microsoft Azure and CoreWeave Demonstrate Scalability at 8,192 GPUs

NVIDIA's partners achieved notable milestones demonstrating the scalability of Blackwell-based AI training hardware. Microsoft Azure scaled Llama 3.1 405B training to 8,192 GPUs using GB200 NVL72 systems, reaching the reference quality target in 7.07 minutes—the fastest time to train for this benchmark 1

. CoreWeave delivered the fastest time to train for DeepSeek-V3 671B, reaching the quality target in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems connected with Spectrum-X Ethernet networking 1

NVIDIA also submitted results at 5,120 GPUs with GB200 NVL72 systems on Llama 3.1 405B, one of the largest dense LLMs in the suite, representing the largest-scale Blackwell-based submission in MLPerf Training to date 1

NVIDIA's Dominance in MLPerf Leaves Competition Behind

The competitive landscape in MLPerf Training 6.0 revealed a widening gap between NVIDIA and other AI training hardware vendors. In the Flux1 benchmark, 32 GB300 GPUs proved faster than 512 AMD MI300X and 64 MI320X accelerators, with no submissions made for AMD's newer MI350 series 2

. Across Llama 2 70B and Llama 3.1 8B benchmarks, GB300 and GB200 8-accelerator systems consistently outpaced competitors at equivalent configurations.

NVIDIA continues advancing low-precision training innovation across different model architectures, recently using NVFP4 to pretrain the massive 550-billion-parameter NVIDIA Nemotron 3 Ultra model 1

. The platform showcased NVFP4 training methods that increase performance while meeting strict accuracy requirements across large- and small-scale pretraining as well as fine-tuning workloads.

For organizations building frontier AI models, these results signal that NVIDIA's infrastructure shapes critical factors including iteration speed, model scale capabilities, and job completion reliability. With the upcoming Vera Rubin platform on the horizon and continued software optimizations, NVIDIA's position in AI training appears set to strengthen further, particularly as production training runs span weeks or months across hundreds of thousands of GPUs.

NVIDIA Blackwell dominates MLPerf Training 6.0 with fastest times and 8,192-GPU scale

NVIDIA Blackwell Achieves Fastest Training Times Across All Benchmarks

GB300 Systems Outpace GB200 by 60% in Large Model Training

Microsoft Azure and CoreWeave Demonstrate Scalability at 8,192 GPUs

NVIDIA's Dominance in MLPerf Leaves Competition Behind

References

Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Related Stories

NVIDIA Blackwell Ultra Dominates MLPerf Training v5.1, Sets 10-Minute Record for Llama 3.1 405B

NVIDIA Blackwell Dominates MLPerf Inference Benchmarks, AMD's MI325X Challenges Hopper

NVIDIA's Blackwell B200 GPU Shatters AI Performance Records in MLPerf Inference Benchmark

Recent Highlights

OpenAI rogue agent compromised multiple services in unprecedented AI security breach

AI Kill Switch Act gives DHS power to shut down rogue AI systems after OpenAI security breach

Nvidia forms Open Secure AI Alliance with Microsoft, but OpenAI, Google and Anthropic sit out

Recent Highlights

Today's Top Stories

Trump administration bans Chinese robots and inverters to protect US AI infrastructure

AI company employees ask US government for tools to slow down AI development after security breach

Anthropic AI cracks post-quantum cryptography and finds faster AES attack autonomously

Apple's Siri AI-powered smart home hub is finally ready after years waiting on assistant upgrade