Nvidia Unveils Rubin CPX: A New GPU Optimized for Long-Context AI Inference

Reviewed byNidhi Govil

11 Sources

Share

Nvidia announces the Rubin CPX, a GPU designed for long-context AI workloads, featuring 128GB of GDDR7 memory and 30 petaFLOPs of NVFP4 compute power. This new chip is part of Nvidia's 'disaggregated inference' strategy, aimed at improving AI performance for tasks like video generation and software development.

Nvidia Introduces Rubin CPX: A New Era in AI Inference

Nvidia, the leading GPU manufacturer, has unveiled its latest innovation in AI hardware: the Rubin CPX GPU. Announced at the AI Infrastructure Summit, this new chip is specifically designed to handle long-context AI workloads, marking a significant advancement in the field of artificial intelligence processing

1

.

Source: Benzinga

Source: Benzinga

Technical Specifications and Performance

The Rubin CPX boasts impressive specifications:

  • 30 petaFLOPs of NVFP4 compute power
  • 128GB of GDDR7 memory
  • Hardware-accelerated attention mechanism, 3x faster than the GB300 NVL72
  • Four NVENC and four NVDEC units for video acceleration

    2

    5

Source: Analytics India Magazine

Source: Analytics India Magazine

Disaggregated Inference: A New Approach

Nvidia's Rubin CPX is part of a broader strategy called 'disaggregated inference'. This approach splits AI workloads into two phases:

  1. Context phase: Handled by compute-optimized GPUs like the Rubin CPX
  2. Generation phase: Managed by memory bandwidth-optimized GPUs like the standard Rubin

    2

This strategy aims to improve efficiency and performance for AI tasks requiring extensive context processing, such as video generation and software development

3

.

Applications and Industry Impact

The Rubin CPX is designed to excel in scenarios where AI models need to process massive amounts of context:

  • Video generation: Processing up to 1 million tokens for an hour of video content
  • Code generation: Handling large codebases for AI-assisted programming
  • Research and high-definition video workflows

    4

Nvidia claims that a $100 million investment in systems using Rubin CPX could potentially generate $5 billion in token revenue, highlighting the significant economic impact of this technology

4

.

Deployment and Future Roadmap

The Rubin CPX will be available as part of Nvidia's Vera Rubin NVL144 CPX rack, which includes:

  • 144 Rubin CPX GPUs
  • 144 standard Rubin GPUs
  • 36 Vera CPUs
  • 100TB of high-speed memory
  • 1.7PB/s of memory bandwidth

    5

The entire system is capable of delivering 8 exaFLOPs of NVFP4 compute power. Shipments are expected to begin in late 2026

1

5

.

Source: The Register

Source: The Register

Looking ahead, Nvidia's roadmap includes:

  • Rubin Ultra: Expected in 2027
  • Feynman: Slated for 2028

These future iterations promise even higher density modules, HBM4E memory, and faster networking capabilities

5

.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo