7 Sources
7 Sources
[1]
Nvidia unveils new GPU designed for long-context inference | TechCrunch
At the AI Infrastructure Summit on Tuesday, Nvidia announced a new GPU called the Rubin CPX, designed for context windows larger than 1 million tokens. Part of the chip giant's forthcoming Rubin series, the CPX is optimized for processing large sequences of context and is meant to be used as part of a broader "disaggregated inference" infrastructure approach. For users, the result will be better performance on long-context tasks like video generation or software development. Nvidia's relentless development cycle has resulted in enormous profits for the company, which brought in $41.1 billion in data center sales in its most recent quarter. The Rubin CPX is slated to be available at the end of 2026.
[2]
Nvidia unveils AI chips for video, software generation
Sept 9 (Reuters) - Nvidia (NVDA.O), opens new tab said on Tuesday it would launch a new artificial intelligence chip by the end of next year, designed to handle complex functions such as creating videos and software. The chips, dubbed "Rubin CPX", will be built on Nvidia's next-generation Rubin architecture -- the successor to its latest "Blackwell" technology that marked the company's foray into providing larger processing systems. As AI systems grow more sophisticated, tackling data-heavy tasks such as "vibe coding" or AI-assisted code generation and video generation, the industry's processing needs are intensifying. AI models can take up to 1 million tokens to process an hour of video content -- a challenging feat for traditional GPUs, the company said. Tokens refer to the units of data processed by an AI model. To remedy this, Nvidia will integrate various steps of the drawn-out processing sequence such as video decoding, encoding, and inference -- when AI models produce an output -- together into its new chip. Investing $100 million in these new systems could help generate $5 billion in token revenue, the company said, as Wall Street increasingly focuses on the return from pouring hundreds of billions of dollars into AI hardware. The race to develop the most sophisticated AI systems has made Nvidia the world's most valuable company, commanding a dominant share of the AI chip market with its pricey, top-of-the-line processors. Reporting by Arsheeya Bajwa in Bengaluru; Editing by Shilpi Majumdar Our Standards: The Thomson Reuters Trust Principles., opens new tab
[3]
Nvidia previews Rubin CPX graphics card for disaggregated inference - SiliconANGLE
Nvidia previews Rubin CPX graphics card for disaggregated inference Nvidia Corp. today previewed an upcoming chip, the Rubin CPX, that will power artificial intelligence appliances with 8 exaflops of performance. AI inference involves two main steps. First, an AI model analyzes the information on which it will draw to answer the user's prompt. Once the analysis is complete, the algorithm generates its prompt response one token at a time. Today, the two tasks are usually done using the same hardware. Nvidia plans to take a different approach with its future AI systems. Instead of performing both steps of the inference workflow using the same graphics card, it plans to assign each step to a different chip. The company calls this approach disaggregated inference. Nvidia's upcoming Rubin CPX chip is optimized for the initial, so-called context phase of the two-step inference workflow. The company will use it to power a rack-scale system called the Vera Rubin NVL144 CPX (pictured.) Each appliance will combine 144 Rubin CPX chips with 144 Rubin GPUs, upcoming processors optimized for both phases of the inference workflow. The accelerators will be supported by 36 central processing units. Nvidia says the upcoming system will provide 8 exaflops of computing capacity. One exaflop corresponds to a quintillion, or a million trillion, computing operations per second. That's more than seven times the performance of the top-end GB300 NVL72 appliances currently sold by Nvidia. Under the hood, the Rubin CPX is based on a monolithic die design with 128 gigabytes of integrated GDDR7 memory. Nvidia also included components optimized to run the attention mechanism of large language models. An LLM's attention mechanism enables it to identify and prioritize the most important parts of the text snippet it's processing. According to Nvidia, the Rubin CPX can perform the task three times faster than its current-generation silicon. "We've tripled down on the attention processing," said Ian Buck, Nvidia's vice president of hyperscale and high-performance computing. The executive detailed that video processing workloads will receive a speed boost as well. The Rubin CPX includes hardware-level support for video encoding and decoding. That's the process of compressing a clip before it's transmitted over the network to save bandwidth and then restoring the original file. According to Nvidia, the Rubin CPX will enable AI models to process prompts with one million tokens' worth of data. That corresponds to tens of thousands of lines of code or one hour of video. In many cases, increasing the amount of data an AI model can consider while generating a prompt response boosts its output quality.
[4]
NVIDIA Launches Rubin CPX GPU for Million-Token AI Workloads
The semiconductor giant also introduced its AI Factory reference designs along with MLPerf Inference v5.1 results showing record performance for its Blackwell Ultra GPUs. NVIDIA has introduced Rubin CPX, a new class of GPU designed to process massive AI workloads such as million-token coding and long-form video applications. The launch took place at the AI Infra Summit in Santa Clara, where the company also shared new benchmark results from its Blackwell Ultra architecture. The system is scheduled for availability at the end of 2026. According to NVIDIA, every $100 million invested in Rubin CPX infrastructure could generate up to $5 billion in token revenue. "Just as RTX revolutionised graphics and physical AI, Rubin CPX is the first CUDA GPU purpose-built for massive-context AI, where models reason across millions of tokens of knowledge at once," NVIDIA chief Jensen Huang said. What Rubin CPX Offers Rubin CPX accelerates attention mec
[5]
Nvidia Launches New Chip To Boost AI Coding And Video Tools - NVIDIA (NASDAQ:NVDA)
On Tuesday, Nvidia NVDA unveiled the Rubin CPX (Core Partitioned X-celerator) GPU (Graphics Processing Unit), a new processor class built for massive-context artificial intelligence workloads such as million-token coding and generative video at the AI Infra Summit. The chip integrates long-context inference, video processing, and NVFP4 computing power to deliver faster, more efficient performance for next-generation AI applications. The Rubin CPX works with Nvidia Vera CPUs and Rubin GPUs inside the Vera Rubin NVL144 CPX platform, which delivers 7.5x more AI performance than prior systems, 100TB of memory, and 1.7 petabytes per second of bandwidth in a single rack. Also Read: Nvidia Stock Slips As Broadcom's OpenAI Chip Deal Signals Rising AI Rivalry Rubin CPX seeks to transform coding assistants, accelerate generative video, and unlock new opportunities for AI developers and creators by enabling large-scale context reasoning. Nvidia's stock gained 25% year-to-date, topping the NASDAQ Composite Index's 13% returns. Analysts set a consensus price forecast of $209.57 for Nvidia based on 37 ratings. The three most recent ratings came from Citigroup on August 28, 2025, JP Morgan on September 4, 2025, and Citigroup again on September 8, 2025. Their average forecast of $208.33 implies a 24.32% upside for Nvidia shares. However, Citi analyst Atif Malik cut Nvidia's price forecast to $210 from $220, warning that rising AI chip competition will weigh on future sales. He cited Broadcom's AVGO $10 billion custom chip order and growing traction with Alphabet GOOGL Google's Tensor Processing Units (TPUs) as key risks, predicting Nvidia's 2026 GPU sales could fall about 4% below prior estimates. Malik flagged Nvidia's reliance on its top two clients for 39% of quarterly revenue as another concern, while noting Broadcom's XPU momentum with partners like Meta Platforms META, OpenAI, and Oracle ORCL could further erode Nvidia's dominance. Price Action: NVDA stock is trading lower by 0.15% to $168.05 at last check on Tuesday. Read Next: Qualcomm And Valeo Broaden Collaboration To Speed Hands Off Driving Features Photo: Shutterstock NVDANVIDIA Corp$168.540.14%Stock Score Locked: Want to See it? Benzinga Rankings give you vital metrics on any stock - anytime. Reveal Full ScoreEdge RankingsMomentum87.99Growth97.86Quality92.71Value4.41Price TrendShortMediumLongOverviewAVGOBroadcom Inc$337.82-2.27%GOOGLAlphabet Inc$237.041.28%METAMeta Platforms Inc$760.791.13%ORCLOracle Corp$238.11-0.16%Market News and Data brought to you by Benzinga APIs
[6]
NVIDIA Rubin CPX GPU Is Designed For Super AI Tasks Including Million-Token Coding & GenAI, Up To 128 GB GDDR7 Memory, 30 PFLOPs of FP4
NVIDIA is unveiling new details of its next-gen Rubin AI platform, which will feature Vera CPUs alongside a new Rubin CPX chip with up to 128 GB GDDR7 memory. NVIDIA Rubin AI Platform Doubles Down On AI With Groundbreaking Speed & Efficiency, Rubin CPX GPUs Offer Up To 128 GB GDDR7 Memory NVIDIA has already distilled a lot of information about its next-gen Rubin AI platforms and even teased its next-next-gen Feynman platform. Today, NVIDIA is providing additional information on its Rubin GPUs and the respective platform, which will feature a range of new technologies such as Vera CPUs and the ConnectX-9 SuperNICs. NVIDIA today announced NVIDIA Rubin CPX, a new class of GPU purpose-built for massive-context processing. This enables AI systems to handle million-token software coding and generative video with groundbreaking speed and efficiency. Rubin CPX works hand in hand with NVIDIA Vera CPUs and Rubin GPUs inside the new NVIDIA Vera Rubin NVL144 CPX platform. This integrated NVIDIA MGX system packs 8 exaflops of AI compute to provide 7.5x more AI performance than NVIDIA GB300 NVL72 systems, as well as 100TB of fast memory and 1.7 petabytes per second of memory bandwidth in a single rack. A dedicated Rubin CPX compute tray will also be offered for customers looking to reuse existing Vera Rubin 144 systems. NVIDIA Rubin CPX enables the highest performance and token revenue for long-context processing -- far beyond what today's systems were designed to handle. This transforms AI coding assistants from simple code-generation tools into sophisticated systems that can comprehend and optimize large-scale software projects. To process video, AI models can take up to 1 million tokens for an hour of content, pushing the limits of traditional GPU compute. Rubin CPX integrates video decoder and encoders, as well as long-context inference processing, in a single chip for unprecedented capabilities in long-format applications such as video search and high-quality generative video. Built on the NVIDIA Rubin architecture, the Rubin CPX GPU uses a costβefficient, monolithic die design packed with powerful NVFP4 computing resources and is optimized to deliver extremely high performance and energy efficiency for AI inference tasks. via NVIDIA The brand new addition to the Rubin family is also a new class of GPUs that are purpose-built for AI tasks such as million-token software coding and GenAI. These new GPUs are said to deliver "Groundbreaking" speed and efficiency. The NVIDIA Rubin CPX chips will be accommodated alongside NVIDIA's next-gen Vera CPUs, the successor to the Grace CPU, inside the Vera Rubin NVL 144 CPX platform. This is an MGX system which offers up to 8 Exaflops of AI compute, a 7.5x uplift over the Grace Blackwell GB300 NVL72 platform. The system will also offer 100 TB of fast memory and a memory bandwidth of 1.7 Petabytes. The system offers 3x higher Attention performance than GB300 NVL72. Some features of the NVIDIA Vera Rubin CPX platform versus the Grace Blackwell platform: * 7.5x higher AI compute (8 Exaflops NVFP4) * 3.0x higher bandwidth (1.7 PB/s bandwidth) * 4.0x higher memory (150 TB in GDDR7) Talking about each chip, the NVIDIA Rubin CPX GPU will offer 30 PFLOPs of NVFP4 AI compute power & pack up to 128 GB of GDDR7 memory. Now, GDDR7 memory on a data center platform is an interesting choice. NVIDIA says that they have chosen GDDR7 instead of HBM for Rubin CPX due to its cost-efficient nature. These also come with 4x the NVENC and NVDNC capabilities. These expanded video capabilities will help a lot in GenAI tasks. NVIDIA expects the availability of the first Rubin CPX systems by the end of 2026, while Vera Rubin itself is expected to enter production soon, with a launch planned by GTC 2026.
[7]
NVIDIA Unveils Its Newest 'Rubin CPX' AI GPUs, Featuring 128 GB GDDR7 Memory & Targeted Towards High-Value Inference Workloads
NVIDIA has surprisingly unveiled a rather 'new class' of AI GPUs, featuring the Rubin CPX AI chip that offers immense inferencing power when combined with a rack-scale cluster. NVIDIA's Rubin CPX GPU Will Be Available In a Rack-Scale Configuration, Scaling To new Performance Levels Team Green has realized that AI inferencing is probably the next place to focus on when it comes to computing capabilities, and the firm has now announced a new class of AI chips under the 'CFX' lineup, with initial debut coming with the Rubin series. Announced at the AI Infra Summit, Team Green unveiled the Rubin CPX GPU, which is targeted towards long-context AI, and more importantly, will co-exist alongside Rubin GPUs and Vera CPUs. NVIDIA claims that the chip will bring in a 'revolution' when it comes to performing AI inference efficiently. In terms of specifications, the Rubin CPX features 30 petaFLOPs of NVFP4 compute, 128 GB of GDDR7 memory, and will feature in the 'exclusive' NVIDIA Vera Rubin NVL144 CPX rack, which will integrate 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs to deliver eight exaFLOPs of NVFP4 compute. This figure alone is 7.5x times higher than Blackwell Ultra, and with technologies such as Spectrum-X Ethernet, NVIDIA plans to deliver a whopping million-token context AI inference workloads, scaling to new levels of performance. The platform is claimed to deliver " 30x to 50x return on investment", and the Vera Rubin NVL144 CPX rack will break the computing barriers present in "building the next generation of generative AI applications". Rubin CPX will also be available in other configurations as well, but they are yet to be announced, however, the chip is seen as a relatively low-cost solution, considering the integration of GDDR7 memory, rather than HBM. Team Green is covering all corners of the AI industry, leaving competitors little room to outpace them. NVIDIA has now swiftly transitioned towards focusing on inferencing, and with next-gen Rubin AI lineup dropping next year, we can see a huge leap in computing capabilities.
Share
Share
Copy Link
Nvidia announces the Rubin CPX, a new GPU designed for processing AI workloads with over 1 million tokens. Set for release in late 2026, this chip promises to revolutionize video generation, software development, and other long-context AI tasks.
Nvidia has unveiled the Rubin CPX GPU, engineered for AI workloads demanding context windows exceeding 1 million tokens. This next-generation chip, part of the new Rubin architecture, represents a significant leap for long-context inference, crucial for evolving AI applications like video generation and complex software development
1
.Source: Benzinga
The Rubin CPX features a monolithic die with 128GB GDDR7 memory, optimized components for large language model attention mechanisms, and hardware support for video encoding/decoding. Key performance gains include 3x faster attention processing and the ability to handle one million tokens of data
3
. Nvidia's "disaggregated inference" approach further optimizes AI processing by separating input analysis and response generation, with CPX focusing on the initial context phase3
.Source: Analytics India Magazine
Related Stories
This innovation aims to revolutionize AI-driven tasks, with Nvidia projecting a $100 million investment in Rubin CPX infrastructure could generate up to $5 billion in token revenue
2
. Despite Nvidia's market dominance, competition from Broadcom and Google's TPUs is rising5
. The Rubin CPX, slated for release in late 2026 as part of the Vera Rubin NVL144 CPX system, promises over 8 exaflops of computing capacity, seven times current top-end systems, to meet the escalating demands of sophisticated AI applications3
.Source: SiliconANGLE
Summarized by
Navi
[4]
23 Aug 2025β’Technology
10 Jun 2025β’Technology
05 Sept 2025β’Technology
1
Business and Economy
2
Technology
3
Business and Economy