NVIDIA and Google Cloud unveil massive AI infrastructure scaling to nearly 1 million GPUs

3 Sources

Share

NVIDIA and Google Cloud announced an expanded AI collaboration at Google Cloud Next in Las Vegas, introducing next-generation infrastructure for agentic AI. The partnership brings NVIDIA Vera Rubin-powered A5X instances that can scale up to 960,000 GPUs, delivering 10x lower inference costs and 10x higher token throughput than previous generations.

NVIDIA and Google Cloud Expand Partnership for Advancing Agentic AI

NVIDIA and Google Cloud announced a major expansion of their decade-long partnership at Google Cloud Next in Las Vegas this week, unveiling next-generation infrastructure designed to accelerate agentic and physical AI from development into production

1

. The expanded AI collaboration introduces NVIDIA Vera Rubin-powered A5X bare-metal instances capable of scaling to 80,000 NVIDIA Rubin GPUs within a single site cluster and up to 960,000 NVIDIA Rubin GPUs in a multisite cluster

3

. This massive AI infrastructure represents a significant leap in computing capacity, enabling customers to run their largest AI workloads on performance-optimized systems.

Source: NVIDIA

Source: NVIDIA

Next-Generation Infrastructure Delivers 10x Performance Gains

The new A5X instances utilize NVIDIA ConnectX-9 SuperNICs combined with Google's next-generation Virgo networking, delivering up to 10x lower inference cost per token and 10x higher token throughput per megawatt compared to the prior generation

1

. This dramatic improvement stems from extreme codesign across chips, systems and software within the NVIDIA Vera Rubin NVL72 rack-scale systems. Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud, emphasized that "the next decade of AI will be shaped by customers' ability to run their most demanding workloads on a truly integrated, AI-optimized infrastructure stack"

1

.

Google Unveils AI Hypercomputer With 8th-Gen TPUs and NVIDIA Blackwell GPUs

Google formally announced its AI Hypercomputer at Google Cloud Next, a high-performance computing datacenter that unifies performance-optimized hardware for compute, storage, networking, open software, and ML frameworks

2

. The system houses Google's latest custom TPUv8 series alongside Axion CPUs and NVIDIA Blackwell GPUs. The TPU 8t training chip offers 121 Exaflops per pod of FP4 compute capacity, representing a 2.84x increase over the previous Ironwood generation, while the TPU 8i inference chip delivers 331.8 Exaflops per pod, a 6.74x improvement

2

.

Source: Wccftech

Source: Wccftech

OpenAI and Leading Labs Deploy on New Infrastructure

OpenAI is already running large-scale inference workloads on NVIDIA GB300 and GB200 NVL72 systems on Google Cloud for ChatGPT and other demanding applications

3

. Thinking Machines Lab is scaling its Tinker API on A4X Max VMs with GB300 NVL72 systems to accelerate training

1

. Google Cloud's comprehensive NVIDIA Blackwell portfolio ranges from A4 VMs with NVIDIA HGX B200 systems to rack-scale A4X VMs with NVIDIA GB200 NVL72 and A4X Max NVIDIA GB300 NVL72 systems, extending to fractional G4 VMs with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs

1

.

Secure AI With Confidential Computing and Google Gemini Models

Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are now in preview on Google Distributed Cloud, enabling customers to deploy frontier models wherever sensitive data resides

1

. The companies introduced Confidential G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs, marking the first confidential computing offering of NVIDIA Blackwell GPUs in the cloud

3

. This enables Google Gemini to run in protected environments where prompts and fine-tuning data stay encrypted, helping regulated industries access AI without compromising security or performance.

Agentic AI Tools and Physical AI Applications

The partnership extends to agentic AI development with NVIDIA Nemotron 3 Super now available on Gemini Enterprise Agent Platform

3

. Google Cloud and NVIDIA introduced a managed reinforcement learning API built with NVIDIA NeMo RL for accelerating training at scale. CrowdStrike uses NVIDIA NeMo open libraries to generate synthetic data and fine-tune Nemotron for cybersecurity applications on Managed Training Clusters

3

. For physical AI applications, NVIDIA Omniverse libraries and the NVIDIA Isaac Sim robotics simulation framework are now available on Google Cloud Marketplace, enabling development of digital twins and robotics applications

3

. Major customers including the US Department of Energy, Boston Dynamics, Citadel Securities, and Axia Energy are already utilizing the AI Hypercomputer for their training clusters and inference workloads

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo