Nvidia's Blackwell GPUs Dominate Latest MLPerf AI Training Benchmarks

Reviewed byNidhi Govil

6 Sources

Share

Nvidia's new Blackwell GPUs show significant performance gains in AI model training, particularly for large language models, according to the latest MLPerf benchmarks. The results highlight Nvidia's continued dominance in AI hardware.

Nvidia's Blackwell GPUs Lead in MLPerf Training Benchmarks

Nvidia has once again demonstrated its dominance in AI hardware with its latest Blackwell GPUs, showcasing significant performance gains in the most recent MLPerf training benchmarks. The results, released by MLCommons, a nonprofit consortium of over 125 members, highlight Nvidia's continued leadership in AI model training, particularly for large language models (LLMs)

1

.

Benchmark Performance and Improvements

The MLPerf Training v5.0 benchmarks included six tests covering various AI tasks, with the most resource-intensive being the LLM pre-training task. This round featured Meta's Llama 3.403B model, which is more than twice the size of the previously used GPT3 and has a four times larger context window

1

.

Key performance highlights include:

  1. Nvidia's Blackwell GPUs achieved the fastest training times across all six benchmarks

    3

    .
  2. On the new Llama 3.405B pre-training benchmark, Blackwell delivered 2.2x greater performance compared to the previous generation architecture at the same scale

    4

    .
  3. For the Llama 2 70B LoRA fine-tuning benchmark, Nvidia DGX B200 systems with eight Blackwell GPUs showed 2.5x more performance than the previous round's submission with the same number of GPUs

    3

    .

Scaling and Efficiency

The benchmarks also demonstrated impressive scaling capabilities:

  1. In the fastest results, 2,496 Blackwell chips completed the training test in 27 minutes

    2

    .
  2. It required more than three times as many of Nvidia's previous generation chips to achieve a faster time

    5

    .
  3. The performance scaling with more GPUs was notably close to linear, achieving 90% of the ideal performance

    1

    .

Technological Advancements

Nvidia's performance improvements are attributed to several factors:

  1. The NVL72 package, which efficiently connects 36 Grace CPUs and 72 Blackwell GPUs

    1

    .
Source: IEEE Spectrum

Source: IEEE Spectrum

  1. Advancements in the Blackwell architecture, including high-density liquid-cooled racks and 13.4TB of coherent memory per rack

    3

    .
Source: NVIDIA Blog

Source: NVIDIA Blog

  1. Fifth-generation Nvidia NVLink and NVLink Switch interconnect technologies for scale-up

    3

    .
  2. Nvidia Quantum-2 InfiniBand networking for scale-out capabilities

    3

    .

Industry Implications and Future Outlook

The benchmark results underscore Nvidia's vision for "AI factories" – large-scale computing infrastructures designed to train and deploy next-generation AI applications

3

. This concept aligns with the industry trend of creating smaller, more efficient GPU clusters for specific AI training tasks, as noted by Chetan Kapoor, chief product officer at CoreWeave

2

.

While Nvidia maintains its lead, competitors are not far behind. AMD's latest Instinct MI325X GPU demonstrated performance on par with Nvidia's H200s in the LLM fine-tuning benchmark, suggesting they are about one generation behind Nvidia

1

.

As the AI hardware landscape continues to evolve, these benchmarks provide crucial insights into the capabilities of different chip architectures and their potential impact on the development of increasingly sophisticated AI models and applications.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo