NVIDIA Blackwell Ultra Dominates MLPerf Training v5.1, Sets 10-Minute Record for Llama 3.1 405B

NVIDIA Achieves Complete MLPerf Training Dominance

NVIDIA has achieved an unprecedented sweep of all seven benchmarks in MLPerf Training v5.1, the industry's most rigorous AI training performance evaluation. The company's Blackwell Ultra-powered GB300 NVL72 rack-scale system delivered record-breaking results across large language models, image generation, recommender systems, computer vision, and graph neural networks 1

. Notably, NVIDIA was the only platform to submit results across every test category, demonstrating the versatility and maturity of its CUDA software ecosystem 2

Revolutionary NVFP4 Precision Technology

The standout achievement of this benchmark round was NVIDIA's successful implementation of NVFP4 precision for large language model training—a first in MLPerf Training history. This breakthrough enables calculations to be performed at significantly higher speeds while maintaining strict accuracy requirements 1

. The Blackwell Ultra architecture can perform FP4 calculations at triple the rate of FP8, delivering substantially greater AI compute performance. NVIDIA's engineering teams innovated across every layer of the technology stack to adopt this precision level, making it the only platform to successfully submit MLPerf Training results using FP4 calculations 2

Record-Breaking Performance Metrics

The GB300 NVL72 system demonstrated extraordinary performance improvements over previous generations. Compared to the prior Hopper architecture, Blackwell Ultra delivered more than 4x the performance for Llama 3.1 405B pretraining and nearly 5x the performance for Llama 2 70B LoRA fine-tuning using the same number of GPUs 1

. The most impressive achievement was setting a new Llama 3.1 405B training record of just 10 minutes using more than 5,000 Blackwell GPUs working in coordination—2.7x faster than the best Blackwell-based result from the previous round 2

Advanced Architecture and Infrastructure

The Blackwell Ultra architecture incorporates significant technological advances, including new Tensor Cores offering 15 petaflops of NVFP4 AI compute, twice the attention-layer compute capacity, and 279GB of HBM3e memory per GPU 1

. The complete GB300 NVL72 system provides an impressive 40TB total memory capacity combining GPU and CPU memory. Supporting this performance is the NVIDIA Quantum-X800 InfiniBand platform, the industry's first end-to-end 800 Gb/s scale-up networking solution, which doubles scale-out networking bandwidth compared to previous generations 2

New Benchmark Categories and Ecosystem Participation

This MLPerf round introduced two new benchmarks: Llama 3.1 8B and FLUX.1 image generation model. NVIDIA set performance records on both, achieving 5.2 minutes for Llama 3.1 8B training with 512 Blackwell Ultra GPUs and 12.5 minutes for FLUX.1 with 1,152 Blackwell GPUs 1

. The NVIDIA ecosystem demonstrated strong participation with submissions from 15 organizations including major technology partners such as Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro, highlighting the broad adoption of NVIDIA's AI training platform across the industry 2

NVIDIA Blackwell Ultra Dominates MLPerf Training v5.1, Sets 10-Minute Record for Llama 3.1 405B

NVIDIA Achieves Complete MLPerf Training Dominance

Revolutionary NVFP4 Precision Technology

Record-Breaking Performance Metrics

Advanced Architecture and Infrastructure

New Benchmark Categories and Ecosystem Participation

References

NVIDIA Wins Every MLPerf Training v5.1 Benchmark

NVIDIA Blackwell Ultra Secures Win Across All Seven MLPerf AI Training Benchmarks, GB300 NVL72 Sets Record 10 Minutes Training Time For Llama 405B

Related Stories

NVIDIA's Blackwell B200 GPU Shatters AI Performance Records in MLPerf Inference Benchmark

Nvidia's Blackwell Ultra GB300 Dominates MLPerf Benchmarks with Significant Performance Gains

NVIDIA's Blackwell GPUs Deliver Up to 2.2x Performance Boost in MLPerf v4.1 AI Training Benchmarks

Recent Highlights

Grok faces global investigations as xAI blames users for AI-generated CSAM and deepfakes

Hyundai to deploy 30,000 Atlas robots in car factories by 2028, beating Tesla to production

Instagram Chief Warns AI Images Are Outpacing Our Ability to Distinguish Real from Fake

Recent Highlights

Today's Top Stories

Elon Musk's xAI raises $20 billion from Nvidia and investors as regulatory scrutiny intensifies

ChatGPT gave drug advice to teen for 18 months before fatal overdose, mother claims

Lenovo and Motorola launch Qira AI assistant to unify phones, PCs, and wearables seamlessly

Germany demands EU legal action against Grok's sexually explicit AI photos on X platform