5 Sources
5 Sources
[1]
Microsoft deploys world's first 'supercomputer-scale' GB300 NVL72 Azure cluster -- 4,608 GB300 GPUs linked together to form a single, unified accelerator capable of 1.44 PFLOPS of inference
Microsoft has just upgraded its Azure cloud platform with Nvidia's Blackwell Ultra, deploying what it calls the world's first large-scale GB300 NVL72 supercomputing cluster. This cluster includes several racks housing exactly 4,608 GB300 GPUs connected by NVLink 5 switch fabric, which is then interconnected via Nvidia's Quantum-X800 InfiniBand networking fabric across the entire cluster. This allows a single NVL72 rack to have a total memory bandwidth of 130 TB/s, with each rack providing 800 Gb/s of interconnect bandwidth per GPU. The 4,608 number, specified by Nvidia, denotes 64x GB300 NVL72 systems at play here, considering each rack has 72 Blackwell GPUs and 36 Grace CPUs (totaling 2,592 Arm cores). That would technically put it way behind a full hyperscale expansion, but this still marks a significant milestone for Nvidia's Grace Blackwell GB300, which recently set new benchmark records in inference performance. Microsoft says this cluster will be dedicated to OpenAI workloads, allowing for advanced reasoning models to run even faster and enable model training in "weeks instead of months." At rack level, each NVL72 system is said to deliver 1,440 petaflops of FP4 Tensor performance, powered by 37 terabytes of unified "fast memory," which breaks down to 20 TB HBM3E for the GPU, and 17 TB LPDDR5X for the Grace CPU. As mentioned earlier, this memory is pooled together using NVLink 5 to make each rack work as a single, unified accelerator capable of 130 TB/s of direct bandwidth. The memory throughput is one of the most impressive parts of the GB300 NVL72, so it's important to understand how it works. The Quantum-X800 InfiniBand platform enables each of the 4,608 intra-connected GPUs to have a bandwidth of 800 Gb/s at the rack-to-rack level. In the end, every single GPU becomes connected, across racks and within. The GB300 NVL72 cluster is liquid-cooled, using standalone heat exchangers and facility loops designed to minimize water usage under intense workloads. Nvidia says Microsoft needed to reimagine every layer of its data center for this deployment, and Microsoft happily points out how this is just the first of many clusters to come that will spread GB300 across the globe, taking it to full hyperscale potential. OpenAI and Microsoft already use GB200 clusters for training models, so this serves as a natural extension of their exclusive partnership. Nvidia itself is heavily invested in OpenAI, with both recently signing a Letter of Intent (LoI) for a major strategic partnership that will see the chipmaker pour $100 billion into OpenAI, progressively. On the other hand, OpenAI will use Nvidia GPUs for its next-gen AI infrastructure, deploying at least 10 gigawatts (GW) worth of accelerators, starting with Vera Rubin next year. So, this GB300 NVL72 supercluster can be looked at as a precursor, almost materializing that investment since Microsoft is the one deploying the cluster for OpenAI, using Nvidia hardware.
[2]
Microsoft Azure Unveils World's First NVIDIA GB300 NVL72 Supercomputing Cluster for OpenAI
New state-of-the-art platform enables next-generation AI model development and deployment while furthering American leadership in AI. Microsoft Azure today announced the new NDv6 GB300 VM series, delivering the industry's first supercomputing-scale production cluster of NVIDIA GB300 NVL72 systems, purpose-built for OpenAI's most demanding AI inference workloads. This supercomputer-scale cluster features over 4,600 NVIDIA Blackwell Ultra GPUs connected via the NVIDIA Quantum-X800 InfiniBand networking platform. Microsoft's unique systems approach applied radical engineering to memory and networking to provide the massive scale of compute required to achieve high inference and training throughput for reasoning models and agentic AI systems. Today's achievement is the result of years of a deep partnership between NVIDIA and Microsoft purpose-building AI infrastructure for the world's most demanding AI workloads and to deliver infrastructure for the next frontier of AI. It marks another leadership moment, ensuring that leading-edge AI drives innovation in the United States. "Delivering the industry's first at-scale NVIDIA GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon -- it reflects Microsoft Azure and NVIDIA's shared commitment to optimize all parts of the modern AI data center," said Nidhi Chappell, corporate vice president of Microsoft Azure AI Infrastructure. "Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed." Inside the Engine: The NVIDIA GB300 NVL72 At the heart of Azure's new NDv6 GB300 VM series is the liquid-cooled, rack-scale NVIDIA GB300 NVL72 system. Each rack is a powerhouse, integrating 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs into a single, cohesive unit to accelerate training and inference for massive AI models. The system provides a staggering 37 terabytes of fast memory and 1.44 exaflops of FP4 Tensor Core performance per VM, creating a massive, unified memory space essential for reasoning models, agentic AI systems and complex multimodal generative AI. NVIDIA Blackwell Ultra is supported by the full-stack NVIDIA AI platform, including collective communication libraries that tap into new formats like NVFP4 for breakthrough training performance, as well as compiler technologies like NVIDIA Dynamo for the highest inference performance in reasoning AI. The NVIDIA Blackwell Ultra platform excels at both training and inference. In the recent MLPerf Inference v5.1 benchmarks, NVIDIA GB300 NVL72 systems delivered record-setting performance using NVFP4. Results included up to 5x higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 reasoning model compared with the NVIDIA Hopper architecture, along with leadership performance on all newly introduced benchmarks like the Llama 3.1 405B model. The Fabric of a Supercomputer: NVLink Switch and NVIDIA Quantum-X800 InfiniBand To connect over 4,600 Blackwell Ultra GPUs into a single, cohesive supercomputer, Microsoft Azure's cluster relies on a two-tiered NVIDIA networking architecture designed for both scale-up performance within the rack and scale-out performance across the entire cluster. Within each GB300 NVL72 rack, the fifth-generation NVIDIA NVLink Switch fabric provides 130 TB/s of direct, all-to-all bandwidth between the 72 Blackwell Ultra GPUs. This transforms the entire rack into a single, unified accelerator with a shared memory pool -- a critical design for massive, memory-intensive models. To scale beyond the rack, the cluster uses the NVIDIA Quantum-X800 InfiniBand platform, purpose-built for trillion-parameter-scale AI. Featuring NVIDIA ConnectX-8 SuperNICs and Quantum-X800 switches, NVIDIA Quantum-X800 provides 800 Gb/s of bandwidth per GPU, ensuring seamless communication across all 4,608 GPUs. Microsoft Azure's cluster also uses NVIDIA Quantum-X800's advanced adaptive routing, telemetry-based congestion control and performance isolation capabilities, as well as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) v4, which accelerates operations to significantly boost the efficiency of large-scale training and inference. Driving the Future of AI Delivering the world's first production NVIDIA GB300 NVL72 cluster at this scale required a reimagination of every layer of Microsoft's data center -- from custom liquid cooling and power distribution to a reengineered software stack for orchestration and storage. This latest milestone marks a big step forward in building the infrastructure that will unlock the future of AI. As Azure scales to its goal of deploying hundreds of thousands of NVIDIA Blackwell Ultra GPUs, even more innovations are poised to emerge from customers like OpenAI. Learn more about this announcement on the Microsoft Azure blog.
[3]
Microsoft Azure Debuts Nvidia GB300 Cluster for OpenAI AI Workloads
Microsoft has procured a total of 72 Nvidia Blackwell Ultra GPUs Microsoft Azure unveiled the new NDv6 GB300 virtual machine (VM) series on Thursday. Claimed to be the industry's first supercomputing-scale production cluster of the Nvidia GB300 NVL72 systems, it will be made available for OpenAI's "most demanding artificial intelligence (AI) inference workloads." The Redmond-based tech giant says these VMs are optimised for reasoning models, agentic AI systems, and multimodal generative AI workflows. Interestingly, with the new architecture, Azure has upgraded from ND GB200 v6 VMs, which were introduced less than a year ago. Microsoft Azure Upgrades Cloud Computing Stack With New Nvidia Hardware In a blog post, Microsoft's cloud division, Azure, announced the creation of its new virtual machines. The cluster is powered by more than 4,600 Nvidia GB300 NVL72 systems, which feature the company's Blackwell Ultra GPUs connected via its InfiniBand network. Microsoft claims that the cluster will enable model training in weeks instead of months and deliver high throughput for inference workloads. It is said to support training models with "hundreds of trillions of parameters." Breaking down the system, the cloud division has implemented a rack-scale architecture, where each rack contains 18 virtual machines with a total of 72 GPUs and 36 Nvidia Grace CPUs. Each GPU can communicate at 800GBps per GPU via Nvidia's Quantum-X800 InfiniBand, which uses two GB200 NVL72 systems. Inside each rack, the chips are connected with ultra-fast links that can move 130TB of data per second. There's a huge 37TB of very fast memory to handle massive calculations. Overall, it can perform up to 1,440 petaflops (PFLOPS) of AI calculations per second using FP4 Tensor Cores, making it one of the fastest systems in the world for AI tasks. Within each rack, NVLink and NVSwitch, special high-speed connections that let GPUs talk to each other extremely quickly, allow 37TB of memory to exchange data at up to 130TB per second. This tight integration means AI models can process larger tasks faster, handle longer sequences of information, and run complex agentic (AI that can make decisions on its own) or multimodal (AI that can process multiple types of data like text, images, and audio together) workloads with minimal delays. Microsoft says to expand beyond a single rack, Azure uses a full fat-tree, non-blocking network, a networking design that ensures all racks can communicate without slowdowns, powered by InfiniBand. This allows AI training to scale efficiently across tens of thousands of GPUs while keeping communication delays minimal. By reducing synchronisation overhead (the time GPUs spend waiting for each other), GPUs spend more time computing, helping researchers train massive AI models faster and at lower cost. Azure's co-designed stack combines custom protocols, collective libraries, and in-network computing to ensure the network is reliable and fully utilised. Additionally, Microsoft's cooling systems use standalone heat exchanger units along with facility cooling to reduce water use. On the software side, the company says it has reengineered stacks for storage, orchestration, and scheduling.
[4]
Microsoft Azure Gets An Ultra Upgrade With NVIDIA's GB300 "Blackwell Ultra" GPUs, 4600 GPUs Connected Together To Run Over Trillion Parameter AI Models
NVIDIA GB300 "Blackwell Ultra" Crunches Through 100s of Trillions of Parameters AI Models Within Microsoft's latest Azure Platform Microsoft's Azure has received the Blackwell Ultra upgrade. The latest large-scale and production cluster has been integrated with over 4,600 GPUs based on NVIDIA's GB300 NVL72 architecture, all connected using the next-generation InfiniBand interconnect fabric. This deployment paves the way for Microsoft to scale to hundreds of thousands of Blackwell Ultra GPUs deployed across various datacenters across the globe, and all tackling one workload, AI. According to Microsoft, the Azure cluster with NVIDIA GB300 NVL72 "Blackwell Ultra" GPUs can reduce training times from months to weeks, and unlocks the way for training models that are over 100s of trillions of parameters large. NVIDIA also leads the way with its strong Inference performance that has been demonstrated countless times in MLPerf benchmarks, and also the most recent InferenceMAX AI tests. The new Microsoft Azure ND GB300 v6 VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads. Each rack features a total of 18 VMs with 72 GPUs each. The following are the main specs highlights: At the rack level, NVLink and NVSwitch reduce memory and bandwidth constraints, enabling up to 130TB per second of intra-rack data-transfer connecting 37TB total of fast memory. Each rack becomes a tightly coupled unit, delivering higher inference throughput at reduced latencies on larger models and longer context windows, empowering agentic and multimodal AI systems to be more responsive and scalable than ever. To scale beyond the rack, Azure deploys a full fat-tree, non-blocking architecture using NVIDIA Quantum-X800 Gbp/s InfiniBand, the fastest networking fabric available today. This ensures that customers can scale up training of ultra-large models efficiently to tens of thousands of GPUs with minimal communication overhead, thus delivering better end-to-end training throughput. Reduced synchronization overhead also translates to maximum utilization of GPUs, which helps researchers iterate faster and at lower costs despite the compute-hungry nature of AI training workloads. Azure's co-engineered stack, including custom protocols, collective libraries, and in-network computing, ensures the network is highly reliable and fully utilized by the applications. Features like NVIDIA SHARP accelerate collective operations and double effective bandwidth by performing math in the switch, making large-scale training and inference more efficient and reliable. Azure's advanced cooling systems use standalone heat exchanger units and facility cooling to minimize water usage while maintaining thermal stability for dense, high-performance clusters like GB300 NVL72. We also continue to develop and deploy new power distribution models capable of supporting the high energy density and dynamic load balancing required by the ND GB300 v6 VM class of GPU clusters. via Microsoft As per NVIDIA, the Microsoft Azure partnership marks a leadership moment for the United States in leading the AI race. The newest Azure VMs are now deployed and ready for use by customers.
[5]
Microsoft Azure deploys world's first NVIDIA GB300 NVL72 supercomputer for OpenAI
Microsoft announces on Azure the launch of the world's first cluster using the NVIDIA GB300 NVL72 system, designed for OpenAI's artificial intelligence applications. This platform brings together more than 4,600 NVIDIA Blackwell Ultra graphics processors, interconnected via the InfiniBand Quantum-X800 network, to deliver exceptional computing power. Each unit combines 72 GPUs and 36 Grace CPUs, totaling up to 1.44 exaflops per virtual machine. According to Nidhi Chappell, vice president of Microsoft Azure AI Infrastructure, this project illustrates "the shared commitment of Microsoft and NVIDIA to optimize the entire modern data center." This supercomputer reinforces US leadership in next-generation AI infrastructure.
Share
Share
Copy Link
Microsoft Azure has deployed the industry's first supercomputing-scale production cluster using NVIDIA's GB300 NVL72 systems, specifically designed for OpenAI's demanding AI workloads. This groundbreaking infrastructure marks a significant milestone in AI computing capabilities.
Microsoft Azure has made a significant leap in artificial intelligence computing capabilities with the introduction of the world's first supercomputing-scale production cluster using NVIDIA's GB300 NVL72 systems
1
2
. This cutting-edge infrastructure, known as the NDv6 GB300 VM series, is specifically designed to handle OpenAI's most demanding AI inference workloads.Source: NVIDIA Blog
The new cluster boasts over 4,600 NVIDIA Blackwell Ultra GPUs, interconnected via the NVIDIA Quantum-X800 InfiniBand networking platform
2
. Each rack in this system contains 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs, delivering an astounding 1.44 exaflops of FP4 Tensor Core performance per VM1
.Source: NDTV Gadgets 360
The cluster's impressive performance is made possible by its advanced networking architecture. Within each rack, the fifth-generation NVIDIA NVLink Switch fabric provides 130 TB/s of direct, all-to-all bandwidth between the 72 Blackwell Ultra GPUs
2
. This creates a unified accelerator with a shared memory pool of 37 terabytes of fast memory, crucial for handling massive, memory-intensive AI models3
.To enable communication across the entire cluster, Microsoft Azure employs the NVIDIA Quantum-X800 InfiniBand platform. This provides 800 Gb/s of bandwidth per GPU, ensuring seamless interaction among all 4,608 GPUs
2
. The implementation of a full fat-tree, non-blocking network architecture allows for efficient scaling of AI training across tens of thousands of GPUs while minimizing communication delays3
.This new infrastructure is set to revolutionize AI model development and deployment. Microsoft claims that the cluster will enable model training in weeks instead of months and support the training of models with hundreds of trillions of parameters
3
4
. The system is optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads, paving the way for more advanced and responsive AI applications.Related Stories
To support such a high-performance system, Microsoft has implemented custom liquid cooling solutions. The cluster uses standalone heat exchanger units along with facility cooling to reduce water usage while maintaining thermal stability
3
. Additionally, new power distribution models have been developed to support the high energy density and dynamic load balancing required by these advanced GPU clusters4
.This deployment marks a significant milestone in the development of AI infrastructure, reinforcing the United States' leadership in next-generation AI technologies
5
. As Microsoft Azure aims to scale to hundreds of thousands of NVIDIA Blackwell Ultra GPUs across various data centers globally, we can expect further innovations and breakthroughs in AI capabilities, particularly from customers like OpenAI2
.Summarized by
Navi
[2]
[3]
11 Oct 2024β’Technology
20 Nov 2024β’Technology
19 Mar 2025β’Technology