Nvidia unveils Vera Rubin platform with seven chips to power agentic AI infrastructure

Reviewed byNidhi Govil

8 Sources

Share

Nvidia announced the Vera Rubin platform at GTC 2026, integrating seven chips including the newly acquired Groq 3 LPU to accelerate AI inference. The platform promises 10x higher inference throughput per watt at one-tenth the cost per token compared to Blackwell systems. OpenAI, Anthropic, and major cloud providers have committed to deploying the infrastructure.

Nvidia Launches Seven-Chip AI Platform for Next-Generation Computing

Nvidia unveiled its Vera Rubin platform at GTC 2026, marking what CEO Jensen Huang described as a generational leap in AI infrastructure designed to power the shift toward agentic AI

1

. The platform brings together seven chips now in full production: the Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU

3

. These components work together as a unified supercomputer designed to handle every phase of AI development, from massive-scale pretraining and post-training to real-time agentic inference.

Source: Wccftech

Source: Wccftech

The announcement comes with backing from major AI companies including OpenAI, Anthropic, and Meta, along with commitments from Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure to offer the platform

4

. Sam Altman, CEO of OpenAI, stated that "with Nvidia Vera Rubin, we'll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people"

3

. Dario Amodei, CEO of Anthropic, emphasized that the platform provides the compute, networking, and system design needed to advance safety and reliability for increasingly complex reasoning and agentic workflows

4

.

Groq 3 LPU Delivers Low-Latency Inference Through SRAM Architecture

The Groq 3 LPU represents a significant addition to Nvidia's arsenal, addressing the growing demand for low-latency inference in trillion-parameter large language models

1

. Unlike traditional AI accelerators that rely on HBM memory, each Groq 3 LPU incorporates 500 MB of SRAM, delivering 150 TB/s of bandwidth compared to the 22 TB/s offered by HBM4 on Rubin GPUs

1

. This massive bandwidth advantage makes the chip ideal for bandwidth-sensitive AI decode operations.

Source: The Register

Source: The Register

Nvidia will deploy these chips in Groq 3 LPX racks comprising 256 Groq 3 LPUs, offering 128GB of SRAM with 40 PB/s of bandwidth for inference acceleration and connecting the chips with a dedicated scale-up interface of 640 TB/s per rack

1

. When deployed alongside Vera Rubin NVL72 systems, the LPUs function as decode accelerators while Rubin GPUs handle compute-intensive prefill processing

2

. Ian Buck, VP of Hyperscale and HPC at Nvidia, explained that the LPU boosts decode performance at "every layer of the AI model on every token"

1

. The combined platform delivers up to 35x higher inference throughput per megawatt and up to 10x more revenue opportunity for trillion-parameter models

3

.

Vera Rubin NVL72 Rack Transforms Training and Inference Economics

The flagship Vera Rubin NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs

3

. This rack-scale system delivers breakthrough efficiency, training large mixture-of-experts models with one-fourth the number of GPUs compared to the Blackwell platform while achieving up to 10x higher inference throughput per watt at one-tenth the cost per token

3

. The system scales seamlessly with Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilization across massive GPU clusters

3

.

Source: NVIDIA

Source: NVIDIA

For reinforcement learning and agentic AI workloads, Nvidia introduced the Vera CPU Rack, which packs 256 Vera CPUs into a single liquid-cooled cluster

5

. These racks provide scalable infrastructure for the CPU-based environments needed to test and validate results generated by GPU systems, delivering results twice as efficiently and 50% faster than traditional CPUs

3

. The platform also includes the BlueField-4 STX storage rack, which acts as "context memory" for maintaining coherence during massive multi-turn interactions, potentially increasing inference throughput by up to five times

5

.

Strategic Shift Toward AI Factories and Multi-Agent Systems

Nvidia positions the Vera Rubin platform as essential infrastructure for the emerging era of multi-agent systems, where AI agents communicate with each other rather than solely serving human users through chatbot interfaces

1

. Buck explained that what seems like a reasonable rate of 100 tokens per second for human interaction becomes glacial for AI agent intercommunication, and the combination of Rubin GPUs and Groq LPUs enables throughput of 1500 tokens per second or more

1

.

The platform represents Nvidia's strategic evolution from selling discrete chips and standalone servers toward fully integrated rack-scale systems, POD-scale deployments, and complete AI factories

3

. This shift addresses the infrastructure buildout Jensen Huang characterized as historic, supported by an ecosystem of more than 80 MGX partners with global supply chain capabilities

3

. The addition of Groq technology, acquired for $20 billion, helps Nvidia compete in the low-latency inference frontier where companies like Cerebras have challenged its GPU-centric approach

1

. Buck indicated that the Groq 3 LPU integration may reduce the role of the previously announced Rubin CPX inference accelerator, as both chips target similar enhancements but the Groq LPU doesn't require the large amounts of GDDR7 memory that Rubin CPX modules need

1

. With inference providers potentially charging as much as $45 per million tokens generated using this technology, compared to approximately $15 per million output tokens for current top-tier models, the economic implications for the AI industry are substantial

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo