



6 Sources
6 Sources
[1]

Microsoft deploys world's first 'supercomputer-scale' GB300 NVL72 Azure cluster -- 4,608 GB300 GPUs linked together to form a single, unified accelerator capable of 1.44 PFLOPS of inference
Microsoft has just upgraded its Azure cloud platform with Nvidia's Blackwell Ultra, deploying what it calls the world's first large-scale GB300 NVL72 supercomputing cluster. This cluster includes several racks housing exactly 4,608 GB300 GPUs connected by NVLink 5 switch fabric, which is then interconnected via Nvidia's Quantum-X800 InfiniBand networking fabric across the entire cluster. This allows a single NVL72 rack to have a total memory bandwidth of 130 TB/s, with each rack providing 800 Gb/s of interconnect bandwidth per GPU. The 4,608 number, specified by Nvidia, denotes 64x GB300 NVL72 systems at play here, considering each rack has 72 Blackwell GPUs and 36 Grace CPUs (totaling 2,592 Arm cores). That would technically put it way behind a full hyperscale expansion, but this still marks a significant milestone for Nvidia's Grace Blackwell GB300, which recently set new benchmark records in inference performance. Microsoft says this cluster will be dedicated to OpenAI workloads, allowing for advanced reasoning models to run even faster and enable model training in "weeks instead of months." At rack level, each NVL72 system is said to deliver 1,440 petaflops of FP4 Tensor performance, powered by 37 terabytes of unified "fast memory," which breaks down to 20 TB HBM3E for the GPU, and 17 TB LPDDR5X for the Grace CPU. As mentioned earlier, this memory is pooled together using NVLink 5 to make each rack work as a single, unified accelerator capable of 130 TB/s of direct bandwidth. The memory throughput is one of the most impressive parts of the GB300 NVL72, so it's important to understand how it works. The Quantum-X800 InfiniBand platform enables each of the 4,608 intra-connected GPUs to have a bandwidth of 800 Gb/s at the rack-to-rack level. In the end, every single GPU becomes connected, across racks and within. The GB300 NVL72 cluster is liquid-cooled, using standalone heat exchangers and facility loops designed to minimize water usage under intense workloads. Nvidia says Microsoft needed to reimagine every layer of its data center for this deployment, and Microsoft happily points out how this is just the first of many clusters to come that will spread GB300 across the globe, taking it to full hyperscale potential. OpenAI and Microsoft already use GB200 clusters for training models, so this serves as a natural extension of their exclusive partnership. Nvidia itself is heavily invested in OpenAI, with both recently signing a Letter of Intent (LoI) for a major strategic partnership that will see the chipmaker pour $100 billion into OpenAI, progressively. On the other hand, OpenAI will use Nvidia GPUs for its next-gen AI infrastructure, deploying at least 10 gigawatts (GW) worth of accelerators, starting with Vera Rubin next year. So, this GB300 NVL72 supercluster can be looked at as a precursor, almost materializing that investment since Microsoft is the one deploying the cluster for OpenAI, using Nvidia hardware.
[2]

Microsoft Azure Unveils World's First NVIDIA GB300 NVL72 Supercomputing Cluster for OpenAI
New state-of-the-art platform enables next-generation AI model development and deployment while furthering American leadership in AI. Microsoft Azure today announced the new NDv6 GB300 VM series, delivering the industry's first supercomputing-scale production cluster of NVIDIA GB300 NVL72 systems, purpose-built for OpenAI's most demanding AI inference workloads. This supercomputer-scale cluster features over 4,600 NVIDIA Blackwell Ultra GPUs connected via the NVIDIA Quantum-X800 InfiniBand networking platform. Microsoft's unique systems approach applied radical engineering to memory and networking to provide the massive scale of compute required to achieve high inference and training throughput for reasoning models and agentic AI systems. Today's achievement is the result of years of a deep partnership between NVIDIA and Microsoft purpose-building AI infrastructure for the world's most demanding AI workloads and to deliver infrastructure for the next frontier of AI. It marks another leadership moment, ensuring that leading-edge AI drives innovation in the United States. "Delivering the industry's first at-scale NVIDIA GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon -- it reflects Microsoft Azure and NVIDIA's shared commitment to optimize all parts of the modern AI data center," said Nidhi Chappell, corporate vice president of Microsoft Azure AI Infrastructure. "Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed." Inside the Engine: The NVIDIA GB300 NVL72 At the heart of Azure's new NDv6 GB300 VM series is the liquid-cooled, rack-scale NVIDIA GB300 NVL72 system. Each rack is a powerhouse, integrating 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs into a single, cohesive unit to accelerate training and inference for massive AI models. The system provides a staggering 37 terabytes of fast memory and 1.44 exaflops of FP4 Tensor Core performance per VM, creating a massive, unified memory space essential for reasoning models, agentic AI systems and complex multimodal generative AI. NVIDIA Blackwell Ultra is supported by the full-stack NVIDIA AI platform, including collective communication libraries that tap into new formats like NVFP4 for breakthrough training performance, as well as compiler technologies like NVIDIA Dynamo for the highest inference performance in reasoning AI. The NVIDIA Blackwell Ultra platform excels at both training and inference. In the recent MLPerf Inference v5.1 benchmarks, NVIDIA GB300 NVL72 systems delivered record-setting performance using NVFP4. Results included up to 5x higher throughput per GPU on the 671-billion-parameter DeepSeek-R1 reasoning model compared with the NVIDIA Hopper architecture, along with leadership performance on all newly introduced benchmarks like the Llama 3.1 405B model. The Fabric of a Supercomputer: NVLink Switch and NVIDIA Quantum-X800 InfiniBand To connect over 4,600 Blackwell Ultra GPUs into a single, cohesive supercomputer, Microsoft Azure's cluster relies on a two-tiered NVIDIA networking architecture designed for both scale-up performance within the rack and scale-out performance across the entire cluster. Within each GB300 NVL72 rack, the fifth-generation NVIDIA NVLink Switch fabric provides 130 TB/s of direct, all-to-all bandwidth between the 72 Blackwell Ultra GPUs. This transforms the entire rack into a single, unified accelerator with a shared memory pool -- a critical design for massive, memory-intensive models. To scale beyond the rack, the cluster uses the NVIDIA Quantum-X800 InfiniBand platform, purpose-built for trillion-parameter-scale AI. Featuring NVIDIA ConnectX-8 SuperNICs and Quantum-X800 switches, NVIDIA Quantum-X800 provides 800 Gb/s of bandwidth per GPU, ensuring seamless communication across all 4,608 GPUs. Microsoft Azure's cluster also uses NVIDIA Quantum-X800's advanced adaptive routing, telemetry-based congestion control and performance isolation capabilities, as well as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) v4, which accelerates operations to significantly boost the efficiency of large-scale training and inference. Driving the Future of AI Delivering the world's first production NVIDIA GB300 NVL72 cluster at this scale required a reimagination of every layer of Microsoft's data center -- from custom liquid cooling and power distribution to a reengineered software stack for orchestration and storage. This latest milestone marks a big step forward in building the infrastructure that will unlock the future of AI. As Azure scales to its goal of deploying hundreds of thousands of NVIDIA Blackwell Ultra GPUs, even more innovations are poised to emerge from customers like OpenAI. Learn more about this announcement on the Microsoft Azure blog.
[3]

Microsoft Azure upgraded to NVIDIA GB300 'Blackwell Ultra' with 4600 GPUs connected together
TL;DR: Microsoft has deployed its first large-scale Azure cluster featuring over 4,600 NVIDIA GB300 "Blackwell Ultra" GPUs, enabling AI model training from months to weeks. This high-performance system supports multitrillion-parameter models with advanced NVLink bandwidth and InfiniBand interconnects, setting a new standard in AI infrastructure. Microsoft has just announced that its first at-scale production cluster of NVIDIA's new GB300 "Blackwell Ultra" GPUs has been installed. Check it out: The new large-scale and production cluster packs over 4600 GPUs based on NVIDIA's new GB300 NVL72 architecture, connected through next-gen InfiniBand interconnect fabric. The new deployment allows Microsoft to scale to hundreds of thousands of Blackwell Ultra GPUs deployed throughout datacenters across the planet, all working on one workload: AI. Microsoft says its new Azure cluster powered by NVIDIA GB300 NVL72 "Blackwell Ultra" GPUs can reduce training times from months down to weeks, unlocking the way for training models that are over 100s of trillions of parameters large. The new Microsoft Azure ND GB300 v6 VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads. Each of the racks sports 18 VMs with 72 GPUs per rack, here are some of the specification highlights: * 72 NVIDIA Blackwell Ultra GPUs (with 36 NVIDIA Grace CPUs). * 800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x GB200 NVL72). * 130 terabytes (TB) per second of NVIDIA NVLink bandwidth within rack. * 37TB of fast memory. * Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance. Ian Buck, Vice President of Hyperscale and High-performance Computing at NVIDIA said: "Microsoft Azure's launch of the NVIDIA GB300 NVL72 supercluster is an exciting step in the advancement of frontier AI. This co-engineered system delivers the world's first at-scale GB300 production cluster, providing the supercomputing engine needed for OpenAI to serve multitrillion-parameter models. This sets the definitive new standard for accelerated computing". Nidhi Chappell, corporate vice president of Microsoft Azure AI Infrastructure, said: "Delivering the industry's first at-scale NVIDIA GB300 NVL72 production cluster for frontier AI is an achievement that goes beyond powerful silicon - it reflects Microsoft Azure and NVIDIA's shared commitment to optimize all parts of the modern AI data center. Our collaboration helps ensure customers like OpenAI can deploy next-generation infrastructure at unprecedented scale and speed".
[4]

Microsoft Azure Debuts Nvidia GB300 Cluster for OpenAI AI Workloads
Microsoft has procured a total of 72 Nvidia Blackwell Ultra GPUs Microsoft Azure unveiled the new NDv6 GB300 virtual machine (VM) series on Thursday. Claimed to be the industry's first supercomputing-scale production cluster of the Nvidia GB300 NVL72 systems, it will be made available for OpenAI's "most demanding artificial intelligence (AI) inference workloads." The Redmond-based tech giant says these VMs are optimised for reasoning models, agentic AI systems, and multimodal generative AI workflows. Interestingly, with the new architecture, Azure has upgraded from ND GB200 v6 VMs, which were introduced less than a year ago. Microsoft Azure Upgrades Cloud Computing Stack With New Nvidia Hardware In a blog post, Microsoft's cloud division, Azure, announced the creation of its new virtual machines. The cluster is powered by more than 4,600 Nvidia GB300 NVL72 systems, which feature the company's Blackwell Ultra GPUs connected via its InfiniBand network. Microsoft claims that the cluster will enable model training in weeks instead of months and deliver high throughput for inference workloads. It is said to support training models with "hundreds of trillions of parameters." Breaking down the system, the cloud division has implemented a rack-scale architecture, where each rack contains 18 virtual machines with a total of 72 GPUs and 36 Nvidia Grace CPUs. Each GPU can communicate at 800GBps per GPU via Nvidia's Quantum-X800 InfiniBand, which uses two GB200 NVL72 systems. Inside each rack, the chips are connected with ultra-fast links that can move 130TB of data per second. There's a huge 37TB of very fast memory to handle massive calculations. Overall, it can perform up to 1,440 petaflops (PFLOPS) of AI calculations per second using FP4 Tensor Cores, making it one of the fastest systems in the world for AI tasks. Within each rack, NVLink and NVSwitch, special high-speed connections that let GPUs talk to each other extremely quickly, allow 37TB of memory to exchange data at up to 130TB per second. This tight integration means AI models can process larger tasks faster, handle longer sequences of information, and run complex agentic (AI that can make decisions on its own) or multimodal (AI that can process multiple types of data like text, images, and audio together) workloads with minimal delays. Microsoft says to expand beyond a single rack, Azure uses a full fat-tree, non-blocking network, a networking design that ensures all racks can communicate without slowdowns, powered by InfiniBand. This allows AI training to scale efficiently across tens of thousands of GPUs while keeping communication delays minimal. By reducing synchronisation overhead (the time GPUs spend waiting for each other), GPUs spend more time computing, helping researchers train massive AI models faster and at lower cost. Azure's co-designed stack combines custom protocols, collective libraries, and in-network computing to ensure the network is reliable and fully utilised. Additionally, Microsoft's cooling systems use standalone heat exchanger units along with facility cooling to reduce water use. On the software side, the company says it has reengineered stacks for storage, orchestration, and scheduling.
[5]

Microsoft Azure Gets An Ultra Upgrade With NVIDIA's GB300 "Blackwell Ultra" GPUs, 4600 GPUs Connected Together To Run Over Trillion Parameter AI Models
NVIDIA GB300 "Blackwell Ultra" Crunches Through 100s of Trillions of Parameters AI Models Within Microsoft's latest Azure Platform Microsoft's Azure has received the Blackwell Ultra upgrade. The latest large-scale and production cluster has been integrated with over 4,600 GPUs based on NVIDIA's GB300 NVL72 architecture, all connected using the next-generation InfiniBand interconnect fabric. This deployment paves the way for Microsoft to scale to hundreds of thousands of Blackwell Ultra GPUs deployed across various datacenters across the globe, and all tackling one workload, AI. According to Microsoft, the Azure cluster with NVIDIA GB300 NVL72 "Blackwell Ultra" GPUs can reduce training times from months to weeks, and unlocks the way for training models that are over 100s of trillions of parameters large. NVIDIA also leads the way with its strong Inference performance that has been demonstrated countless times in MLPerf benchmarks, and also the most recent InferenceMAX AI tests. The new Microsoft Azure ND GB300 v6 VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads. Each rack features a total of 18 VMs with 72 GPUs each. The following are the main specs highlights: At the rack level, NVLink and NVSwitch reduce memory and bandwidth constraints, enabling up to 130TB per second of intra-rack data-transfer connecting 37TB total of fast memory. Each rack becomes a tightly coupled unit, delivering higher inference throughput at reduced latencies on larger models and longer context windows, empowering agentic and multimodal AI systems to be more responsive and scalable than ever. To scale beyond the rack, Azure deploys a full fat-tree, non-blocking architecture using NVIDIA Quantum-X800 Gbp/s InfiniBand, the fastest networking fabric available today. This ensures that customers can scale up training of ultra-large models efficiently to tens of thousands of GPUs with minimal communication overhead, thus delivering better end-to-end training throughput. Reduced synchronization overhead also translates to maximum utilization of GPUs, which helps researchers iterate faster and at lower costs despite the compute-hungry nature of AI training workloads. Azure's co-engineered stack, including custom protocols, collective libraries, and in-network computing, ensures the network is highly reliable and fully utilized by the applications. Features like NVIDIA SHARP accelerate collective operations and double effective bandwidth by performing math in the switch, making large-scale training and inference more efficient and reliable. Azure's advanced cooling systems use standalone heat exchanger units and facility cooling to minimize water usage while maintaining thermal stability for dense, high-performance clusters like GB300 NVL72. We also continue to develop and deploy new power distribution models capable of supporting the high energy density and dynamic load balancing required by the ND GB300 v6 VM class of GPU clusters. via Microsoft As per NVIDIA, the Microsoft Azure partnership marks a leadership moment for the United States in leading the AI race. The newest Azure VMs are now deployed and ready for use by customers.
[6]

Microsoft Azure deploys world's first NVIDIA GB300 NVL72 supercomputer for OpenAI
Microsoft announces on Azure the launch of the world's first cluster using the NVIDIA GB300 NVL72 system, designed for OpenAI's artificial intelligence applications. This platform brings together more than 4,600 NVIDIA Blackwell Ultra graphics processors, interconnected via the InfiniBand Quantum-X800 network, to deliver exceptional computing power. Each unit combines 72 GPUs and 36 Grace CPUs, totaling up to 1.44 exaflops per virtual machine. According to Nidhi Chappell, vice president of Microsoft Azure AI Infrastructure, this project illustrates "the shared commitment of Microsoft and NVIDIA to optimize the entire modern data center." This supercomputer reinforces US leadership in next-generation AI infrastructure.
Share
Share
Copy Link
Microsoft has deployed a groundbreaking supercomputer-scale cluster of NVIDIA GB300 NVL72 systems on its Azure platform, featuring over 4,600 Blackwell Ultra GPUs. This state-of-the-art infrastructure aims to accelerate AI model training and inference for OpenAI's most demanding workloads.
Microsoft has unveiled a revolutionary upgrade to its Azure cloud platform, deploying the world's first large-scale NVIDIA GB300 NVL72 supercomputing cluster
1
2
. This cutting-edge infrastructure, featuring over 4,600 NVIDIA Blackwell Ultra GPUs, is set to redefine the landscape of AI model training and inference3
.
Source: NDTV Gadgets 360
The new Azure NDv6 GB300 VM series boasts impressive specifications:
3
3
2
2
3
This powerful configuration allows the cluster to function as a single, unified accelerator, capable of handling massive AI workloads with unprecedented efficiency
1
.
Source: NVIDIA Blog
Microsoft claims that this new infrastructure can dramatically reduce AI model training times from months to weeks
3
. The cluster is optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads, supporting the training of models with hundreds of trillions of parameters4
.The deployment of this cluster is the result of a deep partnership between NVIDIA and Microsoft, aimed at building AI infrastructure for the world's most demanding workloads
2
. It's particularly noteworthy that this cluster will be dedicated to OpenAI workloads, furthering the collaboration between these tech giants1
.To achieve this level of performance, Microsoft had to reimagine every layer of its data center
2
. Key innovations include:1
5
4
Related Stories
This deployment marks a significant milestone in the advancement of AI infrastructure. Ian Buck, VP of Hyperscale and High-performance Computing at NVIDIA, stated that this system "sets the definitive new standard for accelerated computing"
3
. As Microsoft aims to scale to hundreds of thousands of NVIDIA Blackwell Ultra GPUs globally, it's poised to unlock new frontiers in AI research and development5
.The introduction of Microsoft's NVIDIA GB300 NVL72 supercomputing cluster represents a leap forward in AI infrastructure. As this technology scales and evolves, it promises to accelerate AI innovation across various sectors, potentially leading to breakthroughs in fields ranging from scientific research to commercial applications.
Summarized by

Navi
[2]
[3]
[4]