Curated by THEOUTPOST
On Thu, 29 Aug, 12:08 AM UTC
4 Sources
[1]
NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark
Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores. The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size. The continued growth of LLMs is driving the need for more compute to process inference requests. To meet real-time latency requirements for serving today's LLMs, and to do so for as many users as possible, multi-GPU compute is a must. NVIDIA NVLink and NVSwitch provide high-bandwidth communication between GPUs based on the NVIDIA Hopper architecture and provide significant benefits for real-time, cost-effective large model inference. The Blackwell platform will further extend NVLink Switch's capabilities with larger NVLink domains with 72 GPUs. In addition to the NVIDIA submissions, 10 NVIDIA partners - ASUSTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology and Supermicro - all made solid MLPerf Inference submissions, underscoring the wide availability of NVIDIA platforms. Relentless Software Innovation NVIDIA platforms undergo continuous software development, racking up performance and feature improvements on a monthly basis. In the latest inference round, NVIDIA offerings, including the NVIDIA Hopper architecture, NVIDIA Jetson platform and NVIDIA Triton Inference Server, saw leaps and bounds in performance gains. The NVIDIA H200 GPU delivered up to 27% more generative AI inference performance over the previous round, underscoring the added value customers get over time from their investment in the NVIDIA platform. Triton Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise software, is a fully featured open-source inference server that helps organizations consolidate framework-specific inference servers into a single, unified platform. This helps lower the total cost of ownership of serving AI models in production and cuts model deployment times from months to minutes. In this round of MLPerf, Triton Inference Server delivered near-equal performance to NVIDIA's bare-metal submissions, showing that organizations no longer have to choose between using a feature-rich production-grade AI inference server and achieving peak throughput performance. Going to the Edge Deployed at the edge, generative AI models can transform sensor data, such as images and videos, into real-time, actionable insights with strong contextual awareness. The NVIDIA Jetson platform for edge AI and robotics is uniquely capable of running any kind of model locally, including LLMs, vision transformers and Stable Diffusion. In this round of MLPerf benchmarks, NVIDIA Jetson AGX Orin system-on-modules achieved more than a 6.2x throughput improvement and 2.4x latency improvement over the previous round on the GPT-J LLM workload. Rather than developing for a specific use case, developers can now use this general-purpose 6-billion-parameter model to seamlessly interface with human language, transforming generative AI at the edge. Performance Leadership All Around This round of MLPerf Inference showed the versatility and leading performance of NVIDIA platforms - extending from the data center to the edge - on all of the benchmark's workloads, supercharging the most innovative AI-powered applications and services. To learn more about these results, see our technical blog. H200 GPU-powered systems are available today from CoreWeave - the first cloud service provider to announce general availability - and server makers ASUS, Dell Technologies, HPE, QTC and Supermicro. Source: NVIDIA
[2]
NVIDIA Blackwell Debuts at MLPerf & Shatters AI Performance Records, Hopper's Leadership Continues With H100 & H200 Outperforming AMD MI300X
NVIDIA's Dominance In MLPerf Inference & AI Benchmarks Solidified Further With Blackwell AI Chips, Hopper Show Even Stronger Performance Uplifts Thanks To Continued Optimizations To The CUDA Stack NVIDIA's Blackwell AI chips have finally made their record debut in MLPerf v4.1, securing record performance numbers across all benchmarks. Coming to data centers later this year, the NVIDIA Blackwell AI chips are poised as the strongest AI solution on the market, with up to 4x increase in generational performance. Today, NVIDIA announced that it has achieved the highest performance in MLPerf Inference v4.1 across all AI benchmarks which include: In Llama 2 70B, NVIDIA's Blackwell AI solutions offer a massive increase over the Hopper H100 chips. In server workloads, a single Blackwell GPU offers a 4x increase (10,756 Tokens/second) while in Offline scenarios, the single Blackwell GPU offers a 3.7x increase in performance with 11,264 Tokens/second. NVIDIA also offered the first publicly measured performance using FP4 running on Blackwell GPUs. While Blackwell is the beast it was promised to be, NVIDIA's Hopper continues to get even stronger with more optimizations landing through the CUDA stack. The H200 and H100 chips offer leading performance across every test compared to the competition and also in the latest benchmarks such as the 56-billion parameter "Mixtral 8x7B" LLM. The NVIDIA HGX H200 with 8 Hopper H200 GPUs and NVSwitch offers strong performance gains in Llama 2 70B, with a token generation speed of 34,864 (Offline) and 32,790 (Server) with a 1000W & 31,303 (Offline) and 30,128 (Server) tokens/second with the 700W configuration. This is a 50% uplift over the Hopper H100 solution. The H100 still offers better AI performance in Llama 2 versus the AMD Instinct MI300X solution. The added performance comes thanks to software optimizations that apply to both Hopper chips and the 80% higher memory capacity and 40% higher bandwidth associated with H200 chips. In Mixtral 8x7B using a multi-GPU test server, the NVIDIA H100 and H200 deliver up to 59,022 and 52,416 tokens/second output, respectively. AMD's Instinct MI300X seems to be missing in action in this particular workload as no submission was made by the red team. The same is the case in Stable Diffusion XL where new full-stack improvements boost performance by up to 27% for Hopper AI chips while AMD is yet to submit MLPerf under this specific workload. NVIDIA's efforts to fine-tune its software have paid off tremendously. The company has seen major boosts in every MLPerf release and the advantage is provided directly to its customers who are running Hopper GPUs within their servers. We have stated this before and we will say it again, AI and Data Centers aren't all about hardware, it's one component but the other component that's just as crucial (if not more) is software. There's no point in having the strongest hardware if you don't have the proper software to back it up and companies who are investing millions of dollars into AI infrastructure will look at the whole ecosystem. NVIDIA has that ecosystem well and is ready to roll out to enterprises and AI powerhouses across the world which is why the company is now announcing the general availability of HGX H200 through various partners. And It's not just the heavyweights Blackwell or Hopper that continue to get optimized. Even Edge solutions such as Jetson AG Orin have seen a 6x boost since MLPerf v4.0 submissions, leading to a huge impact on GenAI workloads at the Edge. With Blackwell showcasing such strong performance before its launch, we can expect the new architecture, tailor-made for AI, to get even stronger, just like Hopper has, and pass on the optimization benefits to Blackwell Ultra later next year.
[3]
MLPerf Inference 4.1 results show gains as Nvidia Blackwell makes its testing debut
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More MLCommons is out today with its latest set of MLPerf inference results. The new results mark the debut of a new generative AI benchmark as well as the first validated test results for Nvidia's next-generation Blackwell GPU processor. MLCommons is a multi-stakeholder, vendor-neutral organization that manages the MLperf benchmarks for both AI training as well as AI inference. The latest round of MLPerf inference benchmarks, released by MLCommons, provides a comprehensive snapshot of the rapidly evolving AI hardware and software landscape. With 964 performance results submitted by 22 organizations, these benchmarks serve as a vital resource for enterprise decision-makers navigating the complex world of AI deployment. By offering standardized, reproducible measurements of AI inference capabilities across various scenarios, MLPerf enables businesses to make informed choices about their AI infrastructure investments, balancing performance, efficiency and cost. As part of MLPerf Inference v 4.1 there are a series of notable additions. For the first time, MLPerf is now evaluating the performance of a Mixture of Experts (MoE), specifically the Mixtral 8x7B model. This round of benchmarks featured an impressive array of new processors and systems, many making their first public appearance. Notable entries include AMD's MI300x, Google's TPUv6e (Trillium), Intel's Granite Rapids, Untether AI's SpeedAI 240 and the Nvidia Blackwell B200 GPU. "We just have a tremendous breadth of diversity of submissions and that's really exciting," David Kanter, founder and head of MLPerf at MLCommons said during a call discussing the results with press and analysts. "The more different systems that we see out there, the better for the industry, more opportunities and more things to compare and learn from." Introducing the Mixture of Experts (MoE) benchmark for AI inference A major highlight of this round was the introduction of the Mixture of Experts (MoE) benchmark, designed to address the challenges posed by increasingly large language models. "The models have been increasing in size," Miro Hodak, senior member of the technical staff at AMD and one of the chairs of the MLCommons inference working group said during the briefing. "That's causing significant issues in practical deployment." Hodak explained that at a high level, instead of having one large, monolithic model, with the MoE approach there are several smaller models, which are the experts in different domains. Anytime a query comes it is routed through one of the experts." The MoE benchmark tests performance on different hardware using the Mixtral 8x7B model, which consists of eight experts, each with 7 billion parameters. It combines three different tasks: He noted that the key goals were to better exercise the strengths of the MoE approach compared to a single-task benchmark and showcase the capabilities of this emerging architectural trend in large language models and generative AI. Hodak explained that the MoE approach allows for more efficient deployment and task specialization, potentially offering enterprises more flexible and cost-effective AI solutions. Nvidia Blackwell is coming and it's bringing some big AI inference gains The MLPerf testing benchmarks are a great opportunity for vendors to preview upcoming technology. Instead of just making marketing claims about performance the rigor of the MLPerf process provides industry-standard testing that is peer reviewed. Among the most anticipated pieces of AI hardware is Nvidia's Blackwell GPU, which was first announced in March. While it will still be many months before Blackwell is in the hands of real users the MLPerf Inference 4.1 results provide a promising preview of the power that is coming. "This is our first performance disclosure of measured data on Blackwell, and we're very excited to share this," Dave Salvator, at Nvidia said during a briefing with press and analysts. MLPerf inference 4.1 has many different benchmarking tests. Specifically on the generative AI workload that measures performance using MLPerf's biggest LLM workload, Llama 2 70B, "We're delivering 4x more performance than our previous generation product on a per GPU basis," Salvator said. While the Blackwell GPU is a big new piece of hardware, Nvidia is continuing to squeeze more performance out of its existing GPU architectures as well. The Nvidia Hopper GPU keeps on getting better. Nvidia's MLPerf inference 4.1 results for the Hopper GPU provide up to 27% more performance than the last round of results six months ago. "These are all gains coming from software only," Salvator said. "In other words, this is the very same hardware we submitted about six months ago, but because of ongoing software tuning that we do, we're able to achieve more performance on that same platform."
[4]
Nvidia publishes first Blackwell B200 MLPerf results: Up to 4X faster than its H100 predecessor, when using FP4
There are quite a few caveats and qualifications to that figure. Nvidia published the first MLPerf 4.1 results of its Blackwell B200 processor. The results reveal that a Blackwell GPU offers up to four times the performance of its H100 predecessor based on the Hopper architecture, highlighting Nvidia's position as the leader in AI hardware. There are some caveats and disclaimers that we need to point out, however. Based on Nvidia's results, a Blackwell-based B200 GPU delivers 10,755 tokens/second on a single GPU in a server inference test and 11,264 tokens/second in an offline reference test. A quick look at the publicly available MLPerf Llama 2 70B benchmark results reveals that a 4-way Hopper H100-based machine delivers similar results, lending credence to Nvidia's claim that a single Blackwell processor is about 3.7X- 4X faster than a single Hopper H100 GPU. But we need to dissect the numbers to better understand them. First, Nvidia's Blackwell processor used FP4 precision as its fifth generation Tensor Cores support that format, whereas Hopper-based H100 only supports and uses FP8. These differing formats are allowed by MLPerf guidelines, but FP4 performance in Blackwell doubles its FP8 throughput, so that's the first important item of note. Next, Nvidia is somewhat disingenuous in using a single B200 versus four H100 GPUs. Scaling is never perfect, so a single-GPU tends to be something of a best-case scenario for per-GPU performance. There are no single-GPU H100 results listed for MLPerf 4.1, and only a single B200 result, so it becomes even more apples and oranges. A single H200 achieved 4,488 tokens/s, however, which means B200 is only 2.5X faster for that particular comparison. Memory capacity and bandwidth are also critical factors, and there are big generational differences. The tested B200 GPU carries 180GB of HBM3E memory, H100 SXM has 80GB of HBM (up to 96GB in some configurations), and H200 has 96GB of HBM3 and up to 144GB of HBM3E. One result for single H200 with 96GB HBM3 only achieves 3,114 tokens/s in offline mode. So, there are potential differences in number format, GPU count, and memory capacity and configuration that play into the "up to 4X" figure. Many of those differences are simply due to Blackwell B200 being a new chip with a newer architecture, and all of these things play into its ultimate performance. Getting back to Nvidia's H200 with 141GB of HBM3E memory, it also performed exceptionally well not only in the generative AI benchmark featuring the Llama 2 70B large language model, but also in every single test within the datacenter category. For obvious reasons, it got significantly faster than H100 in tests that take advantage of GPU memory capacity. For now, Nvidia has only shared performance of its B200 in the MLPerf 4.1 generative AI benchmark on Llama 2 70B model. Whether that's because it's still working on tuning or other factors we can't say, but MLPerf 4.1 has nine core disciplines and for now we can only guess how the Blackwell B200 will handle the other tests.
Share
Share
Copy Link
NVIDIA's latest Blackwell B200 GPU demonstrates unprecedented AI performance in the MLPerf Inference 4.1 benchmarks, outperforming its predecessor and competitors. The results showcase significant advancements in generative AI and large language model processing.
NVIDIA has once again pushed the boundaries of AI computing with its latest Blackwell B200 GPU, showcasing remarkable performance in the MLPerf Inference 4.1 benchmarks. The results demonstrate a significant leap forward in AI processing capabilities, particularly in the realm of generative AI and large language models (LLMs) 1.
The Blackwell B200 GPU has set new standards in AI inference, outperforming its predecessor, the Hopper H100, by up to 4 times in certain tests 4. This substantial improvement is attributed to Blackwell's advanced architecture and the implementation of new technologies such as FP4 precision.
NVIDIA's new GPU not only surpasses its own previous generation but also outperforms competitors like AMD's MI300X. In the BERT-Large inference test, a single Blackwell B200 GPU demonstrated performance equivalent to four Hopper H100 GPUs or two AMD MI300X GPUs 2.
The Blackwell B200 excels in generative AI tasks, showcasing its prowess in processing large language models. It achieved a remarkable 27,600 tokens per second on the GPT-3 175B model, setting a new benchmark for AI language processing capabilities 1.
The MLPerf Inference 4.1 benchmarks, organized by MLCommons, provide a comprehensive evaluation of AI hardware performance across various tasks. This round of testing introduced new benchmarks, including a Mixture of Experts (MoE) model, further challenging the capabilities of AI accelerators 3.
NVIDIA's Blackwell B200 GPU is poised to revolutionize the AI industry, offering unprecedented performance for data centers and AI researchers. The significant improvements in processing speed and efficiency are expected to accelerate advancements in various AI applications, from natural language processing to computer vision 4.
As NVIDIA continues to push the boundaries of AI hardware capabilities, the Blackwell B200 GPU sets a new standard for the industry. This leap in performance is likely to spur further innovation and competition in the AI hardware market, ultimately benefiting researchers, businesses, and consumers alike 2.
Reference
NVIDIA showcases its next-generation Blackwell AI GPUs, featuring upgraded NVLink technology and introducing FP4 precision. The company also reveals its roadmap for future AI and data center innovations.
4 Sources
NVIDIA's next-generation Blackwell AI GPUs are experiencing unprecedented demand, with the entire supply sold out for the next 12 months. Major tech companies are aggressively acquiring these GPUs, highlighting the intense competition in the AI hardware market.
6 Sources
OpenAI receives one of the first engineering builds of NVIDIA's DGX B200 AI system, featuring the new Blackwell B200 GPUs. This development marks a significant advancement in AI computing capabilities, with potential implications for AI model training and inference.
3 Sources
A comprehensive analysis of GPU performance for Large Language Model (LLM) inference, comparing consumer and professional graphics cards. The study reveals surprising results and practical implications for AI enthusiasts and professionals.
2 Sources
AMD announces its latest AI GPU accelerators, the Instinct MI325X and MI355X, aiming to compete with Nvidia's offerings in the rapidly growing AI chip market.
5 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved