The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Thu, 3 Apr, 12:05 AM UTC
3 Sources
[1]
Are Nvidia's Blackwell GPUs Truly Unstoppable in AI Inference? See How AMD's Instinct MI325 Stacks Up!
In the latest round of machine learning benchmark results from MLCommons, computers built around Nvidia's new Blackwell GPU architecture outperformed all others. But AMD's latest spin on its Instinct GPUs, the MI325, proved a match for the Nvidia H200, the product it was meant to counter. The comparable results were mostly on tests of one of the smaller-scale large language models Llama2 70B (for 70 billion parameters). However, in an effort to keep up with a rapidly changing AI landscape, MLPerf added three new benchmarks to better reflect where machine learning is headed. MLPerf runs benchmarking for machine learning systems in an effort to provide an apples-to-apples comparison between computer systems. Submitters use their own software and hardware, but the underlying neural networks must be the same. There are a total of 11 benchmarks for servers now, with three added this year. It has been "hard to keep up with the rapid development of the field," says Miro Hodak, the co-chair of MLPerf Inference. ChatGPT only appeared in late 2022, OpenAI unveiled its first large language model (LLM) that can reason through tasks last September, and LLMs have grown exponentially -- GPT3 had 175 billion parameters, while GPT4 is thought to have nearly 2 trillion. As a result of the breakneck innovation, "we've increased the pace of getting new benchmarks into the field," says Hodak. The new benchmarks include two LLMs. The popular and relatively compact Llama2-70B is already an established MLPerf benchmark, but the consortium wanted something that mimicked the responsiveness people are expecting of chatbots today. So the new benchmark "Llama2-70B Interactive" tightens the requirements. Computers must produce at least 25 tokens per second under any circumstance and cannot take more than 450 milliseconds to begin an answer. Seeing the rise of "agentic AI" -- networks that can reason through complex tasks -- MLPerf sought to test an LLM that would have some of the characteristics needed for that. They chose Llama3.1 405B for the job. That LLM has what's called a wide context window. That's a measure of how much information -- documents, samples of code, etc. -- it can take in at once. For Llama3.1 405B that's 128,000 tokens, more than 30 times as much as Llama2 70B. The final new benchmark, called RGAT, is what's called a graph attention network. It acts to classify information in a network. For example, the dataset used to test RGAT consist of scientific papers, which all have relationships between authors, institutions, and fields of studies, making up 2 terabytes of data. RGAT must classify the papers into just under 3000 topics. Nvidia continued its domination of MLPerf benchmarks through its own submissions and those of some 15 partners such as Dell, Google, and Supermicro. Both its first and second generation Hopper architecture GPUs -- the H100 and the memory-enhanced H200 -- made strong showings. "We were able to get another 60 percent performance over the last year" from Hopper, which went into production in 2022, says Dave Salvator, director of accelerated computing products at Nvidia. "It still has some headroom in terms of performance." But it was Nvidia's Blackwell architecture GPU, the B200, that really dominated. "The only thing faster than Hopper is Blackwell," says Salvator. The B200 packs in 36 percent more high-bandwidth memory than the H200, but more importantly it can perform key machine-learning math using numbers with a precision as low as 4 bits instead of the 8 bits Hopper pioneered. Lower precision compute units are smaller, so more fit on the GPU, which leads to faster AI computing. In the Llama3.1 405B benchmark, an eight-B200 system from Supermicro delivered nearly four times the tokens per second of an eight-H200 system by Cisco. And the same Supermicro system was three times as fast as the quickest H200 computer at the interactive version of Llama2-70B. Nvidia used its combination of Blackwell GPUs and Grace CPU, called GB200, to demonstrate how well its NVL72 data links can integrate multiple servers in a rack, so they perform as if they were one giant GPU. In an unverified result the company shared with reporters, a full rack of GB200-based computers delivers 869,200 tokens/s on Llama2 70B. The fastest system reported in this round of MLPerf was an Nvidia B200 server that delivered 98,443 tokens/s. AMDis positioning its latest Instinct GPU, the MI325X, as providing competitive performance to Nvidia's H200. MI325X has the same architecture as its predecessor MI300 but adds even more high-bandwidth memory and memory bandwidth -- 288 gigabytes and 6 terabytes per second (a 50 percent and 13 percent boost respectively). Adding more memory is a play to handle larger and larger LLMs. "Larger models are able to take advantage of these GPUs because the model can fit in a single GPU or a single server," says Mahesh Balasubramanian, director of data center GPU marketing at AMD. "So you don't have to have that communication overhead of going from one GPU to another GPU or one server to another server. When you take out those communications your latency improves quite a bit." AMD was able to take advantage of the extra memory through software optimization to boost the inference speed of DeepSeek-R1 8-fold. On the Llama2-70B test, an eight-GPU MI325X computers came within 3 to 7 percent the speed of a similarly tricked out H200-based system. And on image generation the MI325X system was within 10 percent of the Nvidia H200 computer. AMD's other noteworthy mark this round was from its partner, Mangoboost, which showed nearly four-fold performance on the Llama2-70B test by doing the computation across four computers. Intel has historically put forth CPU-only systems in the inference competition to show that for some workloads you don't really need a GPU. This time around saw the first data from Intel's Xeon 6 chips, which were formerly known as Granite Rapids and are made using Intel's 3-nanometer process. At 40,285 samples per second, the best image recognition results for a dual-Xeon 6 computer was about one-third the performance of a Cisco computer with two Nvidia H100s. Compared to Xeon 5 results from October 2024, the new CPU provides about an 80 percent boost on that benchmark and an even bigger boost on object detection and medical imaging. Since it first started submitting Xeon results in 2021 (the Xeon 3), the company has achieve an 11-fold boost in performance on Resnet. For now, it seems Intel has quit the field in the AI accelerator chip battle. Its alternative to the Nvidia H100, Gaudi 3, did not make an appearance in the new MLPerf results nor in version 4.1, released last October. Gaudi 3 got a later than planned release because its software was not ready. In the opening remarks at Intel Vision 2025, the company's invite-only customer conference, newly minted CEO Lip Bu Tan seemed to apologize for Intel's AI efforts. "I'm not happy with our current position," he told attendees. "You're not happy either. I hear you loud and clear. We are working toward a competitive system. It won't happen overnight, but we will get there for you." Google'sTPU v6e chip also made a showing, though the results were restricted only to the image generation task. At 5.48 queries per second, the 4-TPU system saw a 2.5x boost over a similar computer using its predecessor TPU v5e in the October 2024 results. Even so, 5.48 queries per second was roughly in line with a similarly-sized Lenovo computer using Nvidia H100s.
[2]
Speed Demon: NVIDIA Blackwell Takes Pole Position in Latest MLPerf Inference Results
NVIDIA GB200 NVL72 system boosts AI factory profitability with outstanding throughput even as models get larger and more complex. In the latest MLPerf Inference V5.0 benchmarks, which reflect some of the most challenging inference scenarios, the NVIDIA Blackwell platform set records -- and marked NVIDIA's first MLPerf submission using the NVIDIA GB200 NVL72 system, a rack-scale solution designed for AI reasoning. Delivering on the promise of cutting-edge AI takes a new kind of compute infrastructure, called AI factories. Unlike traditional data centers, AI factories do more than store and process data -- they manufacture intelligence at scale by transforming raw data into real-time insights. The goal for AI factories is simple: deliver accurate answers to queries quickly, at the lowest cost and to as many users as possible. The complexity of pulling this off is significant and takes place behind the scenes. As AI models grow to billions and trillions of parameters to deliver smarter replies, the compute required to generate each token increases. This requirement reduces the number of tokens that an AI factory can generate and increases cost per token. Keeping inference throughput high and cost per token low requires rapid innovation across every layer of the technology stack, spanning silicon, network systems and software. The latest updates to MLPerf Inference, a peer-reviewed industry benchmark of inference performance, include the addition of Llama 3.1 405B, one of the largest and most challenging-to-run open-weight models. The new Llama 2 70B Interactive benchmark features much stricter latency requirements compared with the original Llama 2 70B benchmark, better reflecting the constraints of production deployments in delivering the best possible user experiences. In addition to the Blackwell platform, the NVIDIA Hopper platform demonstrated exceptional performance across the board, with performance increasing significantly over the last year on Llama 2 70B thanks to full-stack optimizations. The GB200 NVL72 system -- connecting 72 NVIDIA Blackwell GPUs to act as a single, massive GPU -- delivered up to 30x higher throughput on the Llama 3.1 405B benchmark over the NVIDIA H200 NVL8 submission this round. This feat was achieved through more than triple the performance per GPU and a 9x larger NVIDIA NVLink interconnect domain. While many companies run MLPerf benchmarks on their hardware to gauge performance, only NVIDIA and its partners submitted and published results on the Llama 3.1 405B benchmark. Production inference deployments often have latency constraints on two key metrics. The first is time to first token (TTFT), or how long it takes for a user to begin seeing a response to a query given to a large language model. The second is time per output token (TPOT), or how quickly tokens are delivered to the user. The new Llama 2 70B Interactive benchmark has a 5x shorter TPOT and 4.4x lower TTFT -- modeling a more responsive user experience. On this test, NVIDIA's submission using an NVIDIA DGX B200 system with eight Blackwell GPUs tripled performance over using eight NVIDIA H200 GPUs, setting a high bar for this more challenging version of the Llama 2 70B benchmark. Combining the Blackwell architecture and its optimized software stack delivers new levels of inference performance, paving the way for AI factories to deliver higher intelligence, increased throughput and faster token rates. The NVIDIA Hopper architecture, introduced in 2022, powers many of today's AI inference factories, and continues to power model training. Through ongoing software optimization, NVIDIA increases the throughput of Hopper-based AI factories, leading to greater value. On the Llama 2 70B benchmark, first introduced a year ago in MLPerf Inference v4.0, H100 GPU throughput has increased by 1.5x. The H200 GPU, based on the same Hopper GPU architecture with larger and faster GPU memory, extends that increase to 1.6x. Hopper also ran every benchmark, including the newly added Llama 3.1 405B, Llama 2 70B Interactive and graph neural network tests. This versatility means Hopper can run a wide range of workloads and keep pace as models and usage scenarios grow more challenging. This MLPerf round, 15 partners submitted stellar results on the NVIDIA platform, including ASUS, Cisco, CoreWeave, Dell Technologies, Fujitsu, Giga Computing, Google Cloud, Hewlett Packard Enterprise, Lambda, Lenovo, Oracle Cloud Infrastructure, Quanta Cloud Technology, Supermicro, Sustainable Metal Cloud and VMware. The breadth of submissions reflects the reach of the NVIDIA platform, which is available across all cloud service providers and server makers worldwide. MLCommons' work to continuously evolve the MLPerf Inference benchmark suite to keep pace with the latest AI developments and provide the ecosystem with rigorous, peer-reviewed performance data is vital to helping IT decision makers select optimal AI infrastructure. Learn more about MLPerf.
[3]
NVIDIA Blackwell & AMD MI325X Showdown In Latest MLPerf Inference Benchmarks: B200 Shatters Records, Instinct Fights Against Hopper
NVIDIA & AMD have just submitted the latest MLPerf Inference performance benchmarks of their latest GPUs, including Blackwell B200 & Instinct MI325X. NVIDIA Blackwell B200, AMD Instinct MI325X & More Added To The Latest MLPerf Inference Benchmarks, Green Team Miles Ahead of The Competition In Raw Performance MLPerf Inference v5.0 performance benchmarks are out and the GPU giants have submitted their latest results powered by their latest chips. As we have seen in the past, it's not just about raw GPU horsepower, but software optimizations and support for the new AI ecosystems and workloads matter a lot too. The GB200 NVL72 system -- connecting 72 NVIDIA Blackwell GPUs to act as a single, massive GPU -- delivered up to 30x higher throughput on the Llama 3.1 405B benchmark over the NVIDIA H200 NVL8 submission this round. This feat was achieved through more than triple the performance per GPU and a 9x larger NVIDIA NVLink interconnect domain. While many companies run MLPerf benchmarks on their hardware to gauge performance, only NVIDIA and its partners submitted and published results on the Llama 3.1 405B benchmark. Production inference deployments often have latency constraints on two key metrics. The first is time to first token (TTFT), or how long it takes for a user to begin seeing a response to a query given to a large language model. The second is time per output token (TPOT), or how quickly tokens are delivered to the user. The new Llama 2 70B Interactive benchmark has a 5x shorter TPOT and 4.4x lower TTFT -- modeling a more responsive user experience. On this test, NVIDIA's submission using an NVIDIA DGX B200 system with eight Blackwell GPUs tripled performance over using eight NVIDIA H200 GPUs, setting a high bar for this more challenging version of the Llama 2 70B benchmark. Combining the Blackwell architecture and its optimized software stack delivers new levels of inference performance, paving the way for AI factories to deliver higher intelligence, increased throughput and faster token rates. via NVIDIA With that said, we start by talking about the Green Giant, who has once again taken the lead and scored impressive records with its latest Blackwell GPUs such as the B200. The GB200 NVL72 rack with a total of 72 B200 chips takes the lead, offering a 30x higher performance throughput at the Llama 3.1 405B benchmarks versus the last-generation NVIDIA H200. NVIDIA also saw a tripling in the Llama 70B benchmark when comparing an 8 GPU B200 system against an 8 GPU H200 system. AMD is also submitting its newest Instinct MI325X 256 GB accelerator, which can be seen present in an x8 configuration. The AMD results put them on par with the H200 system and the larger memory capacity surely helps with massive LLMs though they are still far behind the Blackwell B200 & with the Ultra platform arriving later this year in the form of the B300, AMD would have to keep the pace up in both hardware and software segments. They do have the Instinct MI350 series. There are also benchmarks for the Hopper H200 series, which has seen continued optimizations. As compared to just last year, the inference performance has been raised by 50 percent, which is a substantial gain for firms that are continuing to rely on the platforms.
Share
Share
Copy Link
NVIDIA's new Blackwell GPUs set records in MLPerf Inference v5.0 benchmarks, while AMD's Instinct MI325X shows competitive performance against NVIDIA's H200 in specific tests.
In the recently released MLPerf Inference v5.0 benchmarks, NVIDIA's new Blackwell GPU architecture has demonstrated unprecedented performance, solidifying the company's leadership in AI computing. The benchmarks, which now include new tests to reflect the rapidly evolving AI landscape, showcase the capabilities of NVIDIA's latest offerings, particularly the B200 GPU 12.
The NVIDIA GB200 NVL72 system, featuring 72 Blackwell GPUs working as a single unit, delivered up to 30 times higher throughput on the new Llama 3.1 405B benchmark compared to the previous generation H200 NVL8 system. This remarkable improvement was achieved through more than triple the performance per GPU and a significantly larger NVIDIA NVLink interconnect domain 2.
In the new Llama 2 70B Interactive benchmark, which has stricter latency requirements, an NVIDIA DGX B200 system with eight Blackwell GPUs tripled the performance of a similar system using eight H200 GPUs. This test better reflects the demands of production deployments in delivering responsive user experiences 2.
While NVIDIA dominated the benchmarks, AMD's latest Instinct GPU, the MI325X, showed competitive performance against NVIDIA's H200 in specific tests. In the Llama2-70B test, an eight-GPU MI325X system came within 3 to 7 percent of the speed of a similarly configured H200-based system. For image generation tasks, the MI325X system was within 10 percent of the NVIDIA H200 computer's performance 13.
MLPerf has introduced three new benchmarks to keep pace with rapid developments in AI:
NVIDIA's Hopper architecture, introduced in 2022, continues to show improvements through software optimizations. The H100 GPU has seen a 1.5x increase in throughput on the Llama 2 70B benchmark over the past year, while the memory-enhanced H200 extends that increase to 1.6x 2.
The advancements demonstrated in these benchmarks have significant implications for AI factories – infrastructure designed to manufacture intelligence at scale. The increased performance of Blackwell and the optimizations in Hopper architecture contribute to higher throughput and faster token rates, potentially reducing the cost per token for large language model inference 2.
The MLPerf benchmarks saw participation from 15 NVIDIA partners, including major cloud service providers and server manufacturers. This broad involvement reflects the widespread adoption of NVIDIA's AI platforms across the industry 23.
As AI models continue to grow in size and complexity, the race for more efficient and powerful hardware intensifies. While NVIDIA maintains a clear lead with its Blackwell architecture, AMD's competitive showing in specific benchmarks indicates ongoing innovation in the field. The upcoming NVIDIA Ultra platform, featuring the B300 GPU, promises to push performance boundaries even further 3.
Reference
[1]
IEEE Spectrum: Technology, Engineering, and Science News
|Are Nvidia's Blackwell GPUs Truly Unstoppable in AI Inference? See How AMD's Instinct MI325 Stacks Up![2]
The Official NVIDIA Blog
|Speed Demon: NVIDIA Blackwell Takes Pole Position in Latest MLPerf Inference ResultsNVIDIA's latest Blackwell B200 GPU demonstrates unprecedented AI performance in the MLPerf Inference 4.1 benchmarks, outperforming its predecessor and competitors. The results showcase significant advancements in generative AI and large language model processing.
4 Sources
4 Sources
NVIDIA's new Blackwell AI GPUs have set new performance records in MLPerf v4.1 AI training benchmarks, showing up to 2.2x faster performance compared to their predecessor, the Hopper GPUs. This significant leap in AI training capabilities has implications for various AI applications, including large language models.
4 Sources
4 Sources
MLCommons introduces new benchmarks for generative AI, with Nvidia's GPUs leading in most tests. The benchmarks highlight the industry's focus on efficient hardware for AI applications.
3 Sources
3 Sources
Nvidia announces the Blackwell Ultra B300 GPU, offering 1.5x faster performance than its predecessor with 288GB HBM3e memory and 15 PFLOPS of dense FP4 compute, designed to meet the demands of advanced AI reasoning and inference.
9 Sources
9 Sources
AMD announces its latest AI GPU accelerators, the Instinct MI325X and MI355X, aiming to compete with Nvidia's offerings in the rapidly growing AI chip market.
5 Sources
5 Sources