26 Sources
26 Sources
[1]
Nvidia launches powerful new Rubin chip architecture | TechCrunch
Today at the Consumer Electronics show, Nvidia CEO Jensen Huang officially launched the company's new Rubin computing architecture, which he described as the state of the art in AI hardware. The new architecture is currently in production and is expected to ramp up further in the second half of the year. "Vera Rubin is designed to address this fundamental challenge that we have: The amount of computation necessary for AI is skyrocketing." Huang told the audience. "Today, I can tell you that Vera Rubin is in full production." The Rubin architecture, which was first announced in 2024, is the latest result of Nvidia's relentless hardware development cycle, which has transformed Nvidia into the most valuable corporation in the world. The Rubin architecture will replace the Blackwell architecture, which in turn, replaced the Hopper and Lovelace architectures. Rubin chips are already slated for use by nearly every major cloud provider, including high-profile Nvidia partnerships with Anthropic, OpenAI, and Amazon Web Services. Rubin systems will also be used in HPE's Blue Lion supercomputer and the upcoming Doudna supercomputer at Lawrence Berkeley National Lab. Named for the astronomer Vera Florence Cooper Rubin, the Rubin architecture consists of six separate chips designed to be used in concert. The Rubin GPU stands at the center, but the architecture also addresses growing bottlenecks in storage and interconnection with new improvements in the Bluefield and NVLink systems respectively. The architecture also includes a new Vera CPU, designed for agentic reasoning. Explaining the benefits of the new storage, Nvidia's senior director of AI infrastructure solutions Dion Harris pointed to the growing cache-related memory demands of modern AI systems. "As you start to enable new types of workflows, like agentic AI or long-term tasks, that puts a lot of stress and requirements on your KV cache," Harris told reporters on a call, referring to a memory system used by AI models to condense inputs. "So we've introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently." As expected, the new architecture also represents a significant advance in speed and power efficiency. According to Nvidia's tests, the Rubin architecture will operate three and a half times faster than the previous Blackwell architecture on model-training tasks and five times faster on inference tasks, reaching as high as 50 petaflops. The new platform will also support eight times more inference compute per watt. Rubin's new capabilities come amid intense competition to build AI infrastructure, which has seen both AI labs and cloud providers scramble for Nvidia chips as well as the facilities necessary to power them. On an earnings call in October 2025, Huang estimated that between $3 trillion and $4 trillion will be spent on AI infrastructure over the next five years.
[2]
Jensen Huang Says Nvidia's New Vera Rubin Chips Are in 'Full Production'
Nvidia CEO Jensen Huang says that the company's next-generation AI superchip platform, Vera Rubin, is on schedule to begin arriving to customers later this year. "Today, I can tell you that Vera Rubin is in full production," Huang said during a press event on Monday at the annual CES technology trade show in Las Vegas. Rubin will cut the cost of running AI models to about one-tenth of Nvidia's current leading chip system, Blackwell, the company told analysts and journalists during a call on Sunday. Nvidia also said Rubin can train certain large models using roughly one-fourth as many chips as Blackwell requires. Taken together, those gains could make advanced AI systems significantly cheaper to operate and make it harder for Nvidia's customers to justify moving away from its hardware. Nvidia said on the call that two of its existing partners, Microsoft and CoreWeave, will be among the first companies to begin offering services powered by Rubin chips later this year. Two major AI data centers that Microsoft is currently building in Georgia and Wisconsin will eventually include thousands of Rubin chips, Nvidia added. Some of Nvidia's partners have already started running their next-generation AI models on early Rubin systems, the company said. The semiconductor giant also said it's working with Red Hat, which makes open source enterprise software for banks, automakers, airlines, and government agencies, to offer more products that will run on the new Rubin chip system. Nvidia's latest chip platform is named after Vera Rubin, an American astronomer who reshaped how scientists understand the properties of galaxies. The system includes six different chips, including the Rubin GPU and an Vera CPU, both of which are built using Taiwan Semiconductor Manufacturing Company's 3 nanometer fabrication process and the most advanced bandwidth memory technology currently available. Nvidia's sixth-generation interconnect and switching technologies link the various chips together. Each part of this chip system is "completely revolutionary and the best of its kind," Huang proclaimed during the company's CES press conference. Nvidia has been developing the Rubin system for years, and Huang first announced the chips were coming during a keynote speech in 2024. Last year, the company said that systems built on Rubin would begin arriving in the second half of 2026. It's unclear exactly what Nvidia means by saying that Vera Rubin is in "full production." Typically, production for chips this advanced -- which Nvidia is building with its longtime partner TSMC -- starts at low volume while the chips go through testing and validation and ramps up at a later stage.
[3]
Nvidia launches Vera Rubin AI computing platform at CES 2026
Nvidia claims the Rubin GPU is capable of delivering five times as much AI training compute as Blackwell. The Vera Rubin architecture as whole can train a large "mixture of experts" (MOE) AI model in the same amount of time as Blackwell while using a quarter of the GPUs and at one-seventh the token cost. The Rubin launch was originally expected for late this year. Its early arrival today comes just a couple of months after Nvidia reported record high data center revenue, up 66 percent over the prior year. That growth was driven by demand for Blackwell and Blackwell Ultra GPUs, which have set a high bar for Rubin's success and served as a bellwether for the "AI bubble". Products and services running on Rubin will be available from Nvidia's partners starting in the second half of 2026.
[4]
Why Nvidia's new Rubin platform could change the future of AI computing forever
The first platforms will roll out to partners later in the year. The last several years have been stupendous for Nvidia. When generative AI became all the rage, demand for the tech giant's hardware skyrocketed as companies and developers scrambled for its graphics cards to train their large language models (LLMs). During CES 2026, Nvidia held a press conference to unveil its latest innovation in the AI space: the Rubin platform. Also: CES 2026 live updates: Biggest TV, smart glasses, phone news, and more we've seen so far Nvidia announced what the technology can do, and it's all pretty dense, so to keep things concise, I'm only focusing on the highlights. Rubin is an AI supercomputing platform designed to make "building, deploying, and securing the world's largest and most advanced AI systems at the lowest cost" possible. According to Nvidia, the platform can deliver up to a 10x reduction in inference token costs and requires four times fewer graphics cards to train mixture-of-experts (MoE) models compared to the older Blackwell platform. The easiest way to think about Nvidia Rubin is to imagine Blackwell, but on a much grander scale. The goal with Rubin is to accelerate mainstream adoption of advanced AI models, particularly in the consumer space. One of the biggest hurdles holding back widespread adoption of LLMs is cost. As models grow larger and more complex, the hardware and infrastructure required to train and support the models become astronomically expensive. By sharply reducing those token costs via Rubin, Nvidia hopes to make large-scale AI deployment more practical. Also: Nvidia's physical AI models clear the way for next-gen robots - here's what's new Nvidia said that it used an "extreme codesign" approach when developing the Rubin platform, creating a single AI supercomputer made up of six integrated chips. At the center is an Nvidia Vera CPU, an energy-efficient processor for large-scale AI factories, built with 88 custom Olympus cores, full Armv9.2 compatibility, and fast NVLink-C2C connectivity to deliver high performance. Working alongside the CPU is the Nvidia Rubin GPU, serving as the platform's primary workhorse. Sporting a third-generation Transform Engine, it is capable of delivering up to 50 petaflops of NVFP4 computational power. Connecting everything together is the Nvidia NVLink 6 Switch, enabling ultra-fast GPU-to-GPU communication. Nvidia's ConnectX-9 SuperNIC handles high-speed networking, while the Bluefield-4 DPU offloads some of the workload from the CPU and GPU so they focus more on AI models. Rounding everything out is the company's Spectrum-6 Ethernet switch to provide next-gen networking for AI data centers. Also: The most exciting AI wearable at CES 2026 might not be smart glasses after all The Rubin will be available in multiple configurations, such as the Nvidia Vera Rubin NVL72. This combines 36 Nvidia Vera CPUs, 72 Nvidia Rubin GPUs, an Nvidia NVLink 6 switch, multiple Nvidia ConnectX-9 SuperNICs, and Nvidia BlueField-4 DPUs. Judging from all the news, I don't think these supercomputing platforms will be something that the average person can buy from Best Buy. Nvidia said that the first of these Rubin platforms will roll out to partners sometime in the second half of 2026. Among the first will be Amazon Web Services, Google Cloud, and Microsoft. If Nvidia's gamble pays off, these computers could usher in a new era of AI computing where scale is much more manageable.
[5]
Nvidia's focus on rack-scale AI systems is a portent for the year to come -- Rubin points the way forward for company, as data center business booms
Those who tuned into Nvidia's CES keynote on January 5 may have found themselves waiting for a familiar moment that never arrived. There was no GeForce reveal and no tease of the next RTX generation. For the first time in roughly five years, Nvidia stood on the CES stage without a new GPU announcement to anchor the show. That absence was no accident. Rather than refresh its graphics lineup, Nvidia used CES 2026 to talk about the Vera Rubin platform and launch its flagship NVL72 AI supercomputer, both slated for production in the second half of 2026 -- a reframing of what Nvidia now considers its core product. The company is no longer content to sell accelerators one card at a time; it is selling entire AI systems instead. Vera Rubin is not being positioned as a conventional GPU generation, even though it includes a new GPU architecture. Nvidia describes it as a rack-scale computing platform built from multiple classes of silicon that are designed, validated, and deployed together. At its center are Rubin GPUs and Vera CPUs, joined by NVLink 6 interconnects, BlueField 4 DPUs, and Spectrum 6 Ethernet switches. Each rack integrates 72 Rubin GPUs and 36 Vera CPUs into a single logical system. Nvidia says each Rubin GPU can deliver up to 50 PFLOPS of NVFP4 compute for AI inference using low-precision formats, roughly five times the throughput of its Blackwell predecessor in similar inference workloads. Memory capacity and bandwidth scale accordingly, with HBM4 pushing hundreds of gigabytes per GPU and aggregate rack bandwidth measured in the hundreds of terabytes per second. These monolithic Vera Rubin systems are designed to reduce the cost of inference by an order of magnitude compared with Blackwell-based deployments. That claim rests on several pillars: higher utilization through tighter coupling, reduced communication overhead via NVLink 6, and architectural changes that target the realities of large language models rather than traditional HPC workloads. One of those changes is how Nvidia handles model context. BlueField 4 DPUs introduce a shared memory tier for long-context inference, storing key-value data outside the GPU frame buffer and making it accessible across the rack. As models push toward million-token context windows, memory access and synchronization increasingly dominate runtime. Nvidia is seems to be taking the view that treating context as a first-class system resource, rather than a per-GPU issue, will unlock more consistent scaling. This emphasis on pre-integrated systems reflects how Nvidia's largest customers now buy hardware. Hyperscalers and AI labs deploy accelerators in standardized blocks, often measured in racks or data halls rather than individual cards. By delivering those blocks as finished products, Nvidia shortens deployment timelines and reduces the tuning work customers must do themselves. CES became the venue to outline that vision, even if it meant leaving traditional GPU announcements off the agenda. Suddenly, the lack of a new GeForce announcement becomes a whole lot easier to explain. Nvidia's current consumer line-up of 50-series GPUs is still pretty fresh, and it continues to command prices in excess of $3,500 per unit. Introducing an interim refresh would carry higher costs at a time when memory pricing is at all-time highs, and supply remains tight. The company has also leaned more heavily on software updates, particularly DLSS and other AI-assisted rendering techniques, to extend the useful life of existing GPUs. From a purely commercial perspective, consumer GPUs now represent a smaller (and, unfortunately, shrinking) share of Nvidia's revenue and focus than they did even two years ago, let alone five. Data center products tied to AI training and inference account for the majority of growth, and those customers need system-level gains, not incremental improvements in graphics performance. Lisa Su, during her AMD keynote, said it best: "There's never been a technology like AI." CES, once a showcase for new PC hardware, has become a stage for AI announcements. This does not mean Nvidia -- or AMD for that matter -- is abandoning gaming or professional graphics. Rather, it suggests a lengthening cadence between major GPU architectures. When the next GeForce generation arrives, it is likely to incorporate lessons from Rubin, particularly around memory hierarchy and interconnect efficiency, rather than simply increasing shader counts. Nvidia's system-centric approach inevitably invites comparison with rivals pursuing similar strategies. AMD is pairing its Instinct accelerators with EPYC CPUs in tightly coupled server designs, while Intel is attempting to unify CPUs, GPUs, and accelerators under a common programming model. Apple has taken vertical integration even further in consumer devices, designing CPUs, GPUs, and neural engines as a single system on a chip. What distinguishes Nvidia is the depth of its software stack. CUDA, TensorRT, and the company's AI frameworks remain deeply entrenched in research and production environments. By extending that stack everywhere it can, Nvidia increases the switching cost for customers who might otherwise consider alternative silicon. There are risks to this approach, and large customers are increasingly exploring in-house accelerators to reduce dependence on a single vendor, and complex rack-scale systems raise the stakes for manufacturing or design issues. Because of this, Nvidia's ability to deliver Rubin on schedule will matter just as much as the performance metrics presented at CES. Still, the decision to use CES 2026 to spotlight Vera Rubin rather than a new GPU points to where Nvidia sees its future. Let's face it: We, and Nvidia, all know that the next phase of computing will be defined less by individual chips and more by how effectively those chips are integrated into scalable systems. Nvidia is therefore aligning itself with where the demand and investment are, even if that means placing less emphasis on the hardware that defined the company for decades.
[6]
Nvidia CEO Says New Rubin Chips Are on Track, Helping Speed AI
The Rubin processor is 3.5 times better at training and five times better at running AI software than its predecessor, Blackwell, and customers including Microsoft will be among the first to deploy the new hardware in the second half of the year. Nvidia Corp. Chief Executive Officer Jensen Huang said that the company's highly anticipated Rubin data center processors are in production and customers will soon be able to try out the technology. All six of the chips for a new generation of computing equipment -- named after astronomer Vera Rubin -- are back from manufacturing partners and on track for deployment by customers in the second half of the year, Huang said at the CES trade show in Las Vegas Monday. "Demand is really high," he said. The growing complexity and uptake of artificial intelligence software is placing a strain on existing computer resources, creating the need for much more, Huang said. Nvidia, based in Santa Clara, California, is seeking to maintain its edge as the leading maker of artificial intelligence accelerators, the chips used by data center operators to develop and run AI models. Some on Wall Street have expressed concern that competition is mounting for Nvidia -- and that AI spending can't continue at its current pace. Data center operators also are developing their own AI accelerators. But Nvidia has maintained bullish long-term forecasts that point to a total market in the trillions of dollars. Rubin is Nvidia's latest accelerator and is 3.5 times better at training and five times better at running AI software than its predecessor, Blackwell, the company said. A new central processing unit has 88 cores -- the key data-crunching elements -- and provides twice the performance of the component that it's replacing. The company is giving details of its new products earlier in the year than it typically does -- part of a push to keep the industry hooked on its hardware, which has underpinned an explosion in AI use. Nvidia usually dives into product details at its spring GTC event in San Jose, California. Even while talking up new offerings, Nvidia said previous generations of products are still performing well. The company also has seen strong demand from customers in China for the H200 chip that the Trump administration has said it will consider letting the chipmaker ship to that country. License applications have been submitted, and the US government is deciding what it wants to do with them, Chief Financial Officer Colette Kress told analysts. Regardless of the level of license approval, Kress said, Nvidia has enough supply to serve customers in the Asian nation without affecting the company's ability to ship to customers elsewhere in the world. For Huang, CES is yet another stop on his marathon run of appearances at events, where he's announced products, tie-ups and investments all aimed at adding momentum to the deployment of AI systems. His counterpart at Nvidia's closest rival, Advanced Micro Devices Inc.'s Lisa Su, was slated to give a keynote presentation at the show later Monday. Get the Tech Newsletter bundle. Get the Tech Newsletter bundle. Get the Tech Newsletter bundle. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg may send me offers and promotions. Plus Signed UpPlus Sign UpPlus Sign Up By submitting my information, I agree to the Privacy Policy and Terms of Service. The new hardware, which also includes networking and connectivity components, will be part of its DGX SuperPod supercomputer while also being available as individual products for customers to use in a more modular way. The step-up in performance is needed because AI has shifted to more specialized networks of models that not only sift through massive amounts of inputs but need to solve particular problems through multistage processes. The company emphasized that Rubin-based systems will be cheaper to run than Blackwell versions because they'll return the same results using smaller numbers of components. Microsoft Corp. and other large providers of remote computing will be among the first to deploy the new hardware in the second half of the year, Nvidia said. For now, the majority of spending on Nvidia-based computers is coming from the capital expenditure budgets of a handful of customers, including Microsoft, Alphabet Inc.'s Google Cloud and Amazon.com Inc.'s AWS. Nvidia is pushing software and hardware aimed at broadening the adoption of AI across the economy, including robotics, health care and heavy industry. As part of that effort, Nvidia announced a group of tools designed to accelerate development of autonomous vehicles and robots.
[7]
Nvidia unpacks Vera Rubin rack system at CES
CES used to be all about consumer electronics, TVs, smartphones, tablets, PCs, and - over the last few years - automobiles. Now, it's just another opportunity for Nvidia to peddle its AI hardware and software -- in particular its next-gen Vera Rubin architecture. The AI arms dealer boasts that, compared to Blackwell, the chips will deliver up to 5x higher floating point performance for inference, 3.5x for training, along with 2.8x more memory bandwidth and an NvLink interconnect that's now twice as fast. But don't get too excited just yet. It's not like the chips are launching earlier than previously expected. They're still expected to arrive in the second half of the year, just like Blackwell and Blackwell Ultra did. Nvidia normally holds off until GTC in March to reveal its next-gen chips. Perhaps AMD's aggressive rack scale roadmap has Nvidia's CEO Jensen Huang nervous. Announced at Advancing AI late last spring and expected later this year, AMD's double-wide Helios racks promise to deliver performance on par with Vera Rubin NVL72 while offering customers 50 percent more HBM4. Nvidia has also been teasing the Vera Rubin platform for nearly a year now, to the point where there's not much we didn't already know about the platform. But even though you won't be able to get your hands on Rubin for a few more months, it's never too early for a closer look at what the multi-million dollar machines will buy you. The flagship system for Nvidia's Vera Rubin CPU and GPU architectures is once again its NVL72 rack systems. At first blush, the machine doesn't look all that different from its Blackwell and Blackwell Ultra-based siblings. But under the hood, Nvidia has been hard at work refining the architecture for better serviceability and telemetry. Switch trays can now be serviced without taking down the machine first. Nvidia also has new reliability, availability, and serviceability features which enable customers to check in on the health of the GPUs without dropping them from the cluster first. These health checks can now run between training checkpoints or jobs, Ian Buck, Nvidia's VP and General Manager of Hyperscale and HPC, tells El Reg. At the heart of the rack is the Vera Rubin superchip, which, if history tells us anything, should bear the VR200 code name. Much like Blackwell, the Vera Rubin superchip features two dual-die Rubin GPUs, each capable of churning out 50 petaFLOPS of inference performance or 35 petaFLOPS for training. Both of those numbers refer to peak performance achievable when using NVLFP4 data type. According to Buck, for this generation, Nvidia is using a new adaptive compression technique that's better suited to generative AI and mixture of experts (MoE) model inference to achieve the 50 petaFLOP claim rather than structured sparsity. As you may recall, while structured sparsity did have benefits for certain workloads, it didn't offer many if any advantages for LLM inference. We've asked Nvidia about higher precision data types, like FP8 and BF16 which remain relevant for vision language model inference, image generation, fine tuning, and training workloads; we'll let you know if we hear back. The GPUs are fed by 288 GB of HBM4 memory -- 576 GB per superchip -- which, despite delivering the same capacity as the Blackwell Ultra-based GB300, is now 2.8x faster at 22 TB/s per socket (44 TB/s per superchip). If that number seems a little high, that's because Nvidia initially targeted 13 TB/s of HBM4 bandwidth when it first teased Rubin last year. Buck tells us that the jump to 22 TB/s was attained entirely through silicon and doesn't rely on techniques like memory compression. The two Rubin GPUs are paired to Nvidia's new Vera CPU via a 1.8 TB/s NvLink-C2C interconnect. The CPU contains 88 of Nvidia's custom Arm-based Olympus cores and is paired with 1.5 TB of LPDDR5x memory -- 3x that of the GB200. We guess we know why memory is in such short supply these days. Actually it's more complicated than that, but this certainly isn't helping the situation. However, one of the most important features Vera brings to the table is support for confidential computing across the system's NvLink domain, something that previously was only available on x86-based HGX systems. Nvidia's Vera Rubin NVL72 racks feature 72 Rubin GPUs, 20.7 TB of HBM4, 36 Vera CPUs, 54 TB of LPDDR5x which are spread across 18 compute blades interconnected by nine NvSwitch 6 blades which deliver 3.6 TB/s of bandwidth to each GPU -- twice that of last gen. Nvidia isn't ready to say how much power that additional compute and bandwidth will require. However, Buck tells us that while it will be higher, we shouldn't expect power to double. If you're scratching your head wondering "didn't Nvidia say this thing was supposed to have 144 GPUs?" you wouldn't be the only one. At GTC 2025, Huang announced that they were changing the way they counted GPUs from the package to the dies on board. In that sense, the Blackwell-based NVL72s also had 144 GPUs, but Nvidia was going to wait for Vera Rubin to make the switch to the new convention. It seems Nvidia has since changed its mind and is sticking with the established naming convention. Having said that, we may yet see Nvidia racks with at least 144 GPUs on board before long. The Rubin CPUs we've talked about up to this point actually are one of two accelerators announced so far. Rubin CPX is the other. Unveiled in September, the chip is a more niche product, designed specifically to accelerate the compute-intense prefill phase of LLM inference. Since prefill isn't bandwidth-bound, CPX doesn't need HBM and can instead make do with slower DRAM. Each CPX accelerator will be capable of churning out 30 petaFLOPS of NVFP4 compute and will sport 128 GB of GDDR7 memory. In a graphic shared this summer, Nvidia showed an NVL144 CPX blade with four 288 GB Rubin SXM modules and eight Rubin CPX prefill accelerators for a total of 12 GPUs per node. The complete rack system would only need 12 compute blades for the thing to have 144 GPUs, though only 48 of them would be connected via NVLink. As with past Nvidia rack systems, eight NVL72 racks form a SuperPOD with the GPU slinger's Spectrum-X Ethernet and/or Quantum-X InfiniBand the glue used to stitch them together. Multiple SuperPODS can then be combined to form larger compute environments for training or distributed inference. If you aren't ready to make the switch to Nvidia's rack-scale kit, don't worry. Eight-way (NVL8) HGX systems based around the Rubin platform are still available, but we're told liquid cooling is no longer a suggestion, but a requirement. These smaller systems, 64 to be exact, can also be combined to form a SuperPOD with 512 GPUs -- just shy of the of the more powerful NVL72 SuperPOD at 576. For this generation, Nvidia also has two new NICs, which it teased on a few occasions over the last year. At GTC DC, Nvidia showed off the ConnectX-9, a 1.6 Tbps "superNIC" designed for high-speed distributed computing, which we sometimes call the backend network. For storage, management, and security, Nvidia is pushing its BlueField-4 data processing units (DPUs), which feature an integrated 800 Gbps ConnectX-9 NIC and a 64-core Grace CPU on board. This, we should note, isn't the same Grace CPU found in the GB200, but a newer version based on Arm's Neoverse V3 core architecture. The beefier CPU is designed to offload software defined networking, storage, security, and can also run hypervisors for virtualized environments. Cramming 64 Grace cores onto a NIC might seem like overkill, but Nvidia has a specific reason for wanting that much compute hanging off the machine like a computer in front of a computer. Alongside all its shiny new hardware, Nvidia showed off what it's describing as a "new class of memory between the GPU and storage," designed to offload key value (KV) caches. The basic idea isn't new. KV caches store the model's state. You can think of this like its short-term memory. Calculating the key value vectors is one of the more compute-intensive aspects of the inference. Because inference workloads often involve passing over the same info multiple times, it makes sense to cache the computed vectors in memory. By doing this, only changes need to be computed and data in the cache can be reused. This sounds simple, but, in practice, KV caches can be quite large, easily consuming tens of gigabytes in order to keep track of 100,000 or so tokens. That might sound like a lot, but a single user running a code assistant or agent can blow through that rather quickly. As we understand it, Nvidia's Inference Context Storage platform will work with storage platforms from multiple partner vendors, and will take advantage of the BlueField-4 DPU, NIXL GPU direct storage libraries, and optimize KV cache offloading for maximum performance and efficiency. Combined with technologies like Rubin CPX, this kind of high-performance KV offloading should allow the GPUs to spend more time generating tokens and less time waiting on data to be shuffled about and recomputed. Nvidia's decision to "launch" Rubin -- again it isn't actually shipping in volume yet -- betrays an increasingly competitive compute landscape. As we mentioned earlier, AMD's Helios rack systems promise to deliver floating point performance roughly equivalent to Nvidia's Vera Rubin NVL72 at 2.9 exaFLOPS versus 2.5-3.6 exaFLOPS of FP4, respectively. For applications that can't take advantage of Nvidia's adaptive compression tech, Helios is, at least on paper, faster. However, with Nvidia planning to ship faster memory on Rubin than initially planned, AMD no longer has a bandwidth advantage. It does still have a capacity lead with 432 GB of HBM4 per GPU socket compared to 288 GB on Rubin. In theory, this should allow the AMD-based system to serve 50 percent larger MoE models on a single double-wide rack. In practice, the real-world performance is going to depend heavily on how well tunneling Ultra Accelerator Link (UALink) over Broadcom's Tomahawk 6 Ethernet switches actually works. AMD's MI450-series GPUs appear very well positioned to compete against Rubin, but as we've seen repeated with Amazon and Google, the ability to scale that compute often makes a bigger difference than the chip's individual performance. AMD is also having to play catch up on the software ecosystem front. The company's HIP and ROCm libraries have certainly come a long way since the MI300X made its debut at the end of 2023, but the company still has a ways to go. Nvidia certainly isn't making the situation any easier for AMD. At CES, the GPU giant unveiled a slew of new software frameworks aimed at enterprises, robotics devs, and the automotive industry. This includes the development of new foundation models for domain specific applications like retrieval augmented generation, safety, speech, and autonomous driving. The latter, called Alpamayo, is a relatively small "reasoning vision language action" model designed to help level-4 autonomous vehicles better handle unique and fast evolving road conditions. Level-4 capable vehicles are capable of driving fully autonomously, unsupervised driving in specific environments, like high-ways or urban environments. Nvidia's autonomous driving stack is due to hit US roads late this year with the level-2++ capable Mercedes Benz CLA. This class of autonomous vehicle is capable of driving itself in similar conditions as level-4, but requires the supervision of a human operator. With Nvidia kicking off the New Year with Rubin -- a chip we hadn't expected to get a good look at for another three months -- we're left to wonder what we'll see at GTC, which is slated to run from March 16-19 in San Jose, California. In addition to the regular mix of software libraries and foundation models, we expect to get a lot more details on the Kyber racks that'll underpin the company's Vera Rubin Ultra platform starting in 2027. As you might have noticed, Nvidia, AMD, AWS, and others have gotten in the habit of pre-announcing products well in advance of them shipping or becoming generally available. As the saying goes: enterprises don't buy products, they buy roadmaps. In this case, however, it's really about ensuring they have somewhere to put them. Nvidia's Kyber racks are expected to pull 600 kilowatts of power which means datacenter operators need to start preparing now, if they want to deploy them on day one. We don't yet have a full picture of what Vera Rubin Ultra will offer, but we know it'll feature four reticle-sized Rubin Ultra GPUs, 1TB of HBM4e, and will deliver 100 petaFLOPS of FP4 performance. As things currently stand, Nvidia plans to cram 144 of these GPU packages (576 GPU dies) into a single NvLink domain which is expected to deliver 15 exaFLOPS of FP4 inference performance or 10 exaFLOPS for training. ®
[8]
Nvidia launches Vera Rubin NVL72 AI supercomputer at CES -- promises up to 5x greater inference performance and 10x lower cost per token than Blackwell, coming 2H 2026
AI is everywhere at CES 2026, and Nvidia GPUs are at the center of the expanding AI universe. Today, during his CES keynote, CEO Jensen Huang shared his plans for how the company will remain at the forefront of the AI revolution as the technology reaches far beyond chatbots into robotics, autonomous vehicles, and the broader physical world. First up, Huang officially launched Vera Rubin, Nvidia's next-gen AI data center rack-scale architecture. Rubin is the result of what the company calls "extreme co-design" across six types of chips: the Vera CPU, the Rubin GPU, the NVLink 6 switch, the ConnectX-9 SuperNIC, the BlueField-4 data processing unit, and the Spectrum-6 Ethernet switch. Those building blocks all come together to create the Vera Rubin NVL72 rack. Demand for AI compute is insatiable, and each Rubin GPU promises much more of it for this generation: 50 PFLOPS of inference performance with the NVFP4 data type, 5x that of Blackwell GB200, and 35 PFLOPS of NVFP4 training performance, 3.5x that of Blackwell. To feed those compute resources, each Rubin GPU package has eight stacks of HBM4 memory delivering 288GB of capacity and 22 TB/s of bandwidth. Per-GPU compute is just one building block in the AI data center. As leading large language models have shifted from dense architectures that activate every parameter to produce a given output token to mixture-of-experts (MoE) architectures that only activate a portion of the available parameters per token, it has become possible to scale up those models relatively efficiently. However, communication among those experts within models requires vast amounts of inter-node bandwidth. Vera Rubin introduces NVLink 6 for scale-up networking, which boosts per-GPU fabric bandwidth to 3.6 TB/s (bi-directional). Each NVLink 6 switch boasts 28 TB/s of bandwidth, and each Vera Rubin NVL72 rack has nine of these switches for 260 TB/s of total scale-up bandwidth. The Nvidia Vera CPU implements 88 custom Olympus Arm cores with what Nvidia calls "spatial multi-threading," for up to 176 threads in flight. The NVLink C2C interconnect used to coherently connect the Vera CPU to the Rubin GPUs has doubled in bandwidth, to 1.8 TB/s. Each Vera CPU can address up to 1.5 TB of SOCAMM LPDDR5X memory with up to 1.2 TB/s of memory bandwidth. To scale out Vera Rubin NVL72 racks into DGX SuperPods of eight racks each, Nvidia is introducing a pair of Spectrum-X Ethernet switches with co-packaged optics, all built up from its Spectrum-6 chip. Each Spectrum-6 chip offers 102.4 Tb/s of bandwidth, and Nvidia is offering it in two switches. The SN688 boasts 409.6 Tb/s of bandwidth for 512 ports of 800G Ethernet or 2048 ports of 200G. The SN6810 offers 102.4 Tb/s of bandwidth that can be channeled into 128 ports of 800G or 512 ports of 200G Ethernet. Both of these switches are liquid-cooled, and Nvidia claims they're more power-efficient, more reliable, and offer better uptime, presumably against hardware that lacks silicon photonics. As context windows grow to millions of tokens, Nvidia says that operations on the key-value cache that holds the history of interactions with an AI model become the bottleneck for inference performance. To break through that bottleneck, Nvidia is using its next-gen BlueField 4 DPUs to create what it calls a new tier of memory: the Inference Context Memory Storage Platform. The company says this tier of storage is meant to enable efficient sharing and reuse of key-value cache data across AI infrastructure, resulting in better responsiveness and throughput and predictable, power-efficient scaling of agentic AI architectures. For the first time, Vera Rubin also expands Nvidia's trusted execution environment to the entire rack by securing the chip, fabric, and network level, which Nvidia says is key to ensuring secrecy and security for AI frontier labs' precious state-of-the-art models. All told, each Vera Rubin NVL72 rack offers 3.6 exaFLOPS of NVFP4 inference performance, 2.5 exaFLOPS of NVFP4 training performance, 54 TB of LPDDR5X memory connected to the Vera CPUs, and 20.7 TB of HBM4 offering 1.6 PB/s of bandwidth. To keep those racks productive, Nvidia highlighted several reliability, availability, and serviceability (RAS) improvements at the rack level, such as a cable-free modular tray design that enables much quicker swapping of components versus prior NVL72 racks, improved NVLink resiliency that allows for zero-downtime maintenance, and a second-generation RAS engine that allows for zero-downtime health checks. All of this raw compute and bandwidth is impressive on its face, but the total cost of ownership picture is likely most important to Nvidia's partners as they ponder massive investments in future capacity. With Vera Rubin, Nvidia says it takes only 1/4 the number of GPUs to train MoE models versus Blackwell, and that Rubin can cut the cost per token for MoE inference down by as much as 10x across a broad range of models. If we invert those figures, it suggests that Rubin can also increase training throughput and deliver vastly more tokens in the same rack space. Nvidia says it's gotten all six of the chips it needs to build Vera Rubin NVL72 systems back from the fabs and that it's pleased with the performance of the workloads it's running on them. The company expects that it will ramp into volume production of Vera Rubin NVL72 systems in the second half of 2026, which remains consistent with its past projections regarding Rubin availability.
[9]
Nvidia unveils Vera Rubin early, signaling a faster AI hardware cycle
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Looking ahead: Nvidia kicked off the year with an unusual move: unveiling its next-generation AI computing architecture months ahead of schedule. At CES 2026 in Las Vegas, CEO Jensen Huang used his keynote to introduce the company's Vera Rubin server systems - a clear signal that Nvidia intends to press its advantage as demand for ever-larger AI models accelerates. The Rubin launch, now slated for mid-2026 availability, marks a shift in Nvidia's traditional rollout cadence. The company typically reserves major chip announcements for its spring developer conference, but Huang said the pace of AI development is forcing the entire semiconductor industry to move faster. "The amount of computing necessary for AI is skyrocketing," Huang told the audience. "The race is on for AI. Everyone is trying to get to the next frontier." Nvidia claims the Rubin GPU delivers roughly five times the training compute of Blackwell. Vera Rubin represents Nvidia's largest architectural leap since the Blackwell generation. Rather than a single chip, the platform is built around a tightly integrated system of six components: the Vera CPU, Rubin GPU, a sixth-generation NVLink switch, ConnectX-9 networking, the BlueField-4 data processing unit, and the Spectrum-X 102.4-terabit-per-second co-packaged optical interconnect. Nvidia executives describe the result as "six chips that make one AI supercomputer." Each part is designed to reduce bottlenecks across both AI training and inference. Nvidia claims the Rubin GPU delivers roughly five times the training compute of Blackwell. When applied to large mixture-of-experts models - now a standard approach for frontier-scale systems - the company says Rubin can match Blackwell's training time using one-quarter the number of GPUs and at roughly one-seventh the cost per processed token. Huang framed the architecture as a response to deeper shifts in how AI workloads are evolving, particularly around inference. In his view, inference is no longer a simple pattern-matching task, but a "thinking process," as models increasingly need to reason over long sequences, multiple data types, and real-world context. That idea feeds into Nvidia's broader vision of simulation-driven AI, where virtual environments train systems to operate in the physical world. The Vera Rubin platform is designed to support the massive compute and memory demands of these workloads, especially for robotics, autonomous vehicles, and digital twins. Huang said Nvidia's goal is to deliver "the entire stack," from silicon to networking to software, so developers can focus on building applications rather than stitching infrastructure together. The announcement also underscores how far Nvidia has expanded beyond GPUs alone. With Rubin, the company has fused compute, networking, memory, and security into a single rack-scale platform, aiming to eliminate the bottlenecks that increasingly define AI performance. Huang argued that this level of integration effectively positions Nvidia as both the world's largest networking hardware company and the top chipmaker for AI computing. For inference tasks, Rubin promises a 10-fold cost reduction compared with Blackwell, according to Nvidia. The platform supports third-generation confidential computing and will be the first rack-scale trusted computing system upon full deployment. The early unveiling follows Nvidia's record data center revenue, which rose 66% year-over-year in the last quarter, driven largely by demand for Blackwell and Blackwell Ultra GPUs. That success has set high expectations for Rubin. Analysts view the ahead-of-schedule announcement as a signal that development and manufacturing remain on track, and that Nvidia intends to move quickly as the next wave of AI infrastructure spending ramps up.
[10]
NVIDIA DGX SuperPOD Sets the Stage for Rubin-Based Systems
NVIDIA DGX Rubin systems unify the latest NVIDIA breakthroughs in compute, networking and software to deliver up to 10x reduction in inference token cost compared with the NVIDIA Blackwell platform -- accelerating any AI workload, from inference and training to long-context reasoning. NVIDIA DGX SuperPOD is paving the way for large-scale system deployments built on the NVIDIA Rubin platform -- the next leap forward in AI computing. At the CES trade show in Las Vegas, NVIDIA today introduced the Rubin platform, comprising six new chips designed to deliver one incredible AI supercomputer, and engineered to accelerate agentic AI, mixture‑of‑experts (MoE) models and long‑context reasoning. The Rubin platform unites six chips -- the NVIDIA Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet Switch -- through an advanced codesign approach that accelerates training and reduces the cost of inference token generation. DGX SuperPOD remains the foundational design for deploying Rubin‑based systems across enterprise and research environments. The NVIDIA DGX platform addresses the entire technology stack -- from NVIDIA computing to networking to software -- as a single, cohesive system, removing the burden of infrastructure integration and allowing teams to focus on AI innovation and business results. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Jensen Huang, founder and CEO of NVIDIA. New Platform for the AI Industrial Revolution The Rubin platform used in the new DGX systems introduces five major technology advancements designed to drive a step‑function increase in intelligence and efficiency: * Sixth‑Generation NVIDIA NVLink -- 3.6TB/s per GPU and 260TB/s per Vera Rubin NVL72 rack for massive MoE and long‑context workloads. * NVIDIA Vera CPU -- 88 NVIDIA custom Olympus cores, full Armv9.2 compatibility and ultrafast NVLink-C2C connectivity for industry-leading efficient AI factory compute. * NVIDIA Rubin GPU -- 50 petaflops of NVFP4 compute for AI inference featuring a third-generation Transformer Engine with hardware‑accelerated compression. * Third‑Generation NVIDIA Confidential Computing -- Vera Rubin NVL72 is the first rack-scale platform delivering NVIDIA Confidential Computing, which maintains data security across CPU, GPU and NVLink domains. * Second‑Generation RAS Engine -- Spanning GPU, CPU and NVLink, the NVIDIA Rubin platform delivers real-time health monitoring, fault tolerance and proactive maintenance, with modular cable-free trays enabling 3x faster servicing. Together, these innovations deliver up to 10x reduction in inference token cost of the previous generation -- a critical milestone as AI models grow in size, context and reasoning depth. DGX SuperPOD: The Blueprint for NVIDIA Rubin Scale‑Out Rubin-based DGX SuperPOD deployments will integrate: * NVIDIA DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems * NVIDIA BlueField‑4 DPUs for secure, software‑defined infrastructure * NVIDIA Inference Context Memory Storage Platform for next-generation inference * NVIDIA ConnectX‑9 SuperNICs * NVIDIA Quantum‑X800 InfiniBand and NVIDIA Spectrum‑X Ethernet * NVIDIA Mission Control for automated AI infrastructure orchestration and operations NVIDIA DGX SuperPOD with DGX Vera Rubin NVL72 unifies eight DGX Vera Rubin NVL72 systems, featuring 576 Rubin GPUs, to deliver 28.8 exaflops of FP4 performance and 600TB of fast memory. Each DGX Vera Rubin NVL72 system -- combining 36 Vera CPUs, 72 Rubin GPUs and 18 BlueField‑4 DPUs -- enables a unified memory and compute space across the rack. With 260TB/s of aggregate NVLink throughput, it eliminates the need for model partitioning and allows the entire rack to operate as a single, coherent AI engine. NVIDIA DGX SuperPOD with DGX Rubin NVL8 systems delivers 64 DGX Rubin NVL8 systems featuring 512 Rubin GPUs. NVIDIA DGX Rubin NVL8 systems bring Rubin performance into a liquid-cooled form factor with x86 CPUs to give organizations an efficient on-ramp to the Rubin era for any AI project in the develop‑to‑deploy pipeline. Powered by eight NVIDIA Rubin GPUs and sixth-generation NVLink, each DGX Rubin NVL8 delivers 5.5x NVFP4 FLOPS compared with NVIDIA Blackwell systems. Next‑Generation Networking for AI Factories The Rubin platform redefines the data center as a high-performance AI factory with revolutionary networking, featuring NVIDIA Spectrum-6 Ethernet switches, NVIDIA Quantum-X800 InfiniBand switches, BlueField-4 DPUs and ConnectX-9 SuperNICs, designed to sustain the world's most massive AI workloads. By integrating these innovations into the NVIDIA DGX SuperPOD, the Rubin platform eliminates the traditional bottlenecks of scale, congestion and reliability. Optimized Connectivity for Massive-Scale Clusters The next-generation 800Gb/s end-to-end networking suite provides two purpose-built paths for AI infrastructure, ensuring peak efficiency whether using InfiniBand or Ethernet: * NVIDIA Quantum-X800 InfiniBand: Delivers the industry's lowest latency and highest performance for dedicated AI clusters. It utilizes Scalable Hierarchical Aggregation and Reduction Protocol (SHARP v4) and adaptive routing to offload collective operations to the network. * NVIDIA Spectrum-X Ethernet: Built on the Spectrum-6 Ethernet switch and ConnectX-9 SuperNIC, this platform brings predictable, high-performance scale-out and scale-across connectivity to AI factories using standard Ethernet protocols, optimized specifically for the "east-west" traffic patterns of AI workloads. Engineering the Gigawatt AI Factory These innovations represent an extreme codesign with the Rubin platform. By mastering congestion control and performance isolation, NVIDIA is paving the way for the next wave of gigawatt AI factories. This holistic approach ensures that as AI models grow in complexity, the networking fabric of the AI factory remains a catalyst for speed rather than a constraint. NVIDIA Software Advances AI Factory Operations and Deployments NVIDIA Mission Control -- AI data center operation and orchestration software for NVIDIA Blackwell-based DGX systems -- will be available for Rubin-based NVIDIA DGX systems to enable enterprises to automate the management and operations of their infrastructure. NVIDIA Mission Control accelerates every aspect of infrastructure operations, from configuring deployments to integrating with facilities to managing clusters and workloads. With intelligent, integrated software, enterprises gain improved control over cooling and power events for NVIDIA Rubin, as well as infrastructure resiliency. NVIDIA Mission Control enables faster response with rapid leak detection, unlocks access to NVIDIA's latest efficiency innovations and maximizes AI factory productivity with autonomous recovery. NVIDIA DGX systems also support the NVIDIA AI Enterprise software platform, including NVIDIA NIM microservices, such as for the NVIDIA Nemotron-3 family of open models, data and libraries. DGX SuperPOD: The Road Ahead for Industrial AI DGX SuperPOD has long served as the blueprint for large‑scale AI infrastructure. The arrival of the Rubin platform will become the launchpad for a new generation of AI factories -- systems designed to reason across thousands of steps and deliver intelligence at dramatically lower cost, helping organizations build the next wave of frontier models, multimodal systems and agentic AI applications. NVIDIA DGX SuperPOD with DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems will be available in the second half of this year. See notice regarding software product information.
[11]
Nvidia New Rubin Platform Shows Memory Is No Longer 'Afterthought' in AI
A boom in AI demand and the accompanying shortage in memory supply is all anyone in the industry is talking about. At CES 2026 in Las Vegas, Nevada, it was also at the heart of Nvidia's latest major product releases. On Monday, the company officially launched the Rubin platform, made up of six chips that combine into one AI supercomputer, which company officials claim is more efficient than the Blackwell models and boasts increases in compute and memory bandwidth. “Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof,†Nvidia CEO Jensen Huang said in a press release. Rubin-based products will be available from Nvidia partners in the second half of 2026, company executives said, naming AWS, Anthropic, Google, Meta, Microsoft, OpenAI, Oracle, and xAI among the companies expected to adopt Rubin. “The efficiency gains in the NVIDIA Rubin platform represent the kind of infrastructure progress that enables longer memory, better reasoning, and more reliable outputs," Anthropic CEO Dario Amodei said in the press release. GPUs have become an expensive and scarce commodity as the rapidly scaling data center projects drain the global memory chip supply. According to a recent report from Tom's Hardware, gigantic data center projects required roughly 40% of the global DRAM chip output. The shortage has gotten to such a point that it is causing price hikes in consumer electronics and is rumored to impact GPU prices as well. According to a report from South Korean news agency Newsis, chipmaker AMD is expected to raise the prices of some of its GPU offerings later this month, and Nvidia will allegedly follow suit in February. Nvidia's focus has been on evading this chip bottleneck. Just last month, the tech giant made its largest purchase ever with Groq, a chipmaker that specializes in inference. Now, with a product that promises high levels of inference and the ability to train complex models with fewer chips at a lower cost, the company might be hoping to ease some of those shortage-driven worries in the industry. Company executives shared that Rubin delivers up to ten times reduction in inference token costs and four times reduction in the number of GPUs used to train models that rely on an AI architecture called mixture of experts (MoE), like DeepSeek. To add on that, the company is also unveiling a new class of AI-native storage infrastructure designed specifically for inference, called Inference Context Memory Storage Platform. Agentic AI, the tech world's hot new thing for the last year or so, has put an increased importance on AI memory. Rather than simply responding to single questions, AI systems are now expected to remember much more information about earlier interactions to autonomously carry out some tasks, which means there is more data to be managed during the inference stage. The new platform aims to solve that by adding a new tier of memory for inference, to store some context data and extend the GPU's memory capacity. "The bottleneck is shifting from compute to context management," Nvidia's senior director of HPC and AI hyperscale infrastructure solutions Dion Harris said. "To scale, storage can no longer be an afterthought." "As inference scales to giga-scale, context becomes a first-class data type, and the new Nvidia inference context memory storage platform is ideally positioned to support it," Harris claimed. Time will tell if efficiency can successfully address some of the bottlenecks brought about by the intense chip demand. But even if the memory problem is resolved, the AI industry will continue to face other bottlenecks in its unprecedented growth, most notably via the immense strain that data centers put on the U.S. power grid.
[12]
NVIDIA unveils Rubin six-chip system for next-gen AI at CES 2026
NVIDIA used the CES 2026 stage today to formally launch its new Rubin computing architecture, positioning it as the company's most advanced AI hardware platform to date. CEO Jensen Huang said Rubin has already entered full production and will scale further in the second half of the year, signaling NVIDIA's confidence in demand. Huang framed Rubin as a direct response to the explosive growth in AI workloads, particularly large-scale training and long-horizon reasoning tasks. He told the audience that AI computation must continue to rise at an unprecedented pace.
[13]
Nvidia's new Vera Rubin chips: 4 things to know
Nvidia's new superchip is here. Credit: Patrick T. Fallon / AFP via Getty Images Nvidia CEO Jensen Huang announced at CES 2026 in Las Vegas this week that its new superchip platform, dubbed Vera Rubin, was on schedule and set to be released later this year. The news was one of the key takeaways from the highly anticipated keynote from Huang. Nvidia is the dominant player powering the AI industry, so a new line of chips is obviously a big deal. Here are four things to know as we await Vera Rubin's drop later this year. Nvidia introduced six chips on the so-called Rubin platform, one of which is the so-called Vera Rubin superchip that combines one Vera CPU and two Rubin GPUs in a processor. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," Huang said in a statement. "With our annual cadence of delivering a new generation of AI supercomputers -- and extreme codesign across six new chips -- Rubin takes a giant leap toward the next frontier of AI." Massive AI companies will look to package different parts of this new line of chips together to make massive supercomputers that power their products. "These huge systems are what hyperscalers like Microsoft, Google, Amazon, and social media giant Meta are spending billions of dollars to get their hands on," wrote Yahoo. Nvidia assured the public the chips were set to be released this year, but when, exactly, remains unclear. "Typically, production for chips this advanced -- which Nvidia is building with its longtime partner TSMC -- starts at low volume while the chips go through testing and validation and ramps up at a later stage," wrote Wired. There had been rumors of delays, so the announcement at CES seems aimed at quelling those fears. Nvidia has promised the Vera Rubin superchips are powerful and more efficient, which should, in turn, make AI products relying on them more efficient. That's why major companies will likely be lining up to purchase the new line of products. Huang said the Rubin chips could generate tokens -- the units used to measure output -- ten times more efficiently. We're still waiting to get all the details -- and to see when the chips actually hit the market -- but the announcement certainly was a major bit of AI news out of CES.
[14]
Nvidia debuts Rubin chip with 336B transistors and 50 petaflops of AI performance - SiliconANGLE
Nvidia debuts Rubin chip with 336B transistors and 50 petaflops of AI performance Nvidia Corp. today announced a new flagship graphics processing unit, Rubin, that provides five times the inference performance of Blackwell. The GPU made its debut at CES alongside five other data center chips. Customers can deploy them together in a rack called the Vera Rubin NVL72 that Nvidia says ships with 220 trillion transistors, more bandwidth than the entire internet and real-time component health checks. Rubin includes 336 billion transistors that provide 50 petaflops of performance when processing NVFP4 data. Blackwell, Nvidia's previous-generation GPU architecture, provided up to 10 petaflops. Rubin's training speed, meanwhile, is 250% faster at 35 petaflops. Some of the chip's computing power is provided by a module called the Transformer Engine that also shipped with Blackwell. According to Nvidia, Rubin's Transformer Engine is based on a newer design with a performance-boosting feature called hardware-accelerated adaptive compression. Compressing a file reduces the number of bits it contains. That decreases the amount of data AI models have to crunch and thereby speeds up processing. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Nvidia Chief Executive Officer Jensen Huang. "With our annual cadence of delivering a new generation of AI supercomputers -- and extreme codesign across six new chips -- Rubin takes a giant leap toward the next frontier of AI." Nvidia plans to ship its new silicon as part of an appliance called the Vera Rubin NVL72 NVL72. It will combine 72 Rubin chips with 36 of the company's new Vera central processing units, which also made their debut at CES. Vera includes 88 cores based on a custom design called Olympus. They're compatible with Armv9.2, a widely-used version of Arm Holdings plc's CPU instruction set architecture. The Vera Rubin NVL72 keeps its chips in modules called trays. According to Nvidia, the trays have a cable-free design that cuts assembly and servicing times by a factor of up to 18 compared to Blackwell-based appliances. The RAS Engine, a subsystem that the company's GPU racks use to automate certain maintenance tasks, has been upgraded as well. It provides fault tolerance features and performs real-time health checks to verify that the hardware is working as expected. Nvidia says that the Vera Rubin NVL72 provides 260 terabits per second per bandwidth, which is more than the entire internet. The appliance processes AI models' traffic with the help of three different chips called the NVLink 6 Switch, Spectrum-6 and ConnectX-9. All three were announced at CES today. NVLink 6 Switch enables multiple GPUs inside a Vera Rubin NVL72 rack to exchange data with one another at once. That data exchange is needed to coordinate the GPUs' work while they're running distributed AI models. The Spectrum-6, in turn, is an Ethernet switch that facilitates connections between GPUs installed in different racks. Nvidia's third new networking chip, the ConnectX-9, is what's known as a SuperNIC. It's a hardware interface that a server can use to access the network of the host data center. ConnectX-9 performs networking tasks that were historically carried out by a server's CPU, which leaves more processing capacity for AI workloads. Rounding out the list of chips that Nvidia debuted today is the BlueField-4. It's a DPU, or data processing unit. A DPU offloads work from a server's main processor much like a SuperNIC, but it does so across a broader range of tasks. The BlueField-4 can perform not only networking-related computations but also certain cybersecurity and storage management operations. The BlueField-4 powers a new storage system that Nvidia calls the Inference Context Memory Storage Platform. According to the company, it will help optimize large language models' key-value cache. An LLM's attention mechanism, the component it uses to determine which data points to use and how, often repeat the same calculations. A key-value cache allows an LLM to perform a frequently recurring calculation only once, save the results and then reuse those results. That's more hardware-efficient than calculating the same output from scratch every time it's needed. The Vera Rubin NVL72 will ship alongside a smaller appliance called the DGX Rubin NVL8 that includes 8 Rubin GPUs instead of 72. The two systems form the basis of the DGX SuperPOD, a new reference architecture for building AI clusters. It combines Nvidia's latest chips with a software platform called Mission Control that companies can use to manage their AI infrastructure.
[15]
NVIDIA officially unveils Rubin: its next-gen AI platform with huge upgrades, next-gen HBM4
TL;DR: At CES 2026, NVIDIA CEO Jensen Huang unveiled the Rubin AI platform, a six-chip, extreme-codesigned system delivering 50 petaflops and cutting AI token costs to one-tenth of its predecessor. Rubin integrates GPUs, CPUs, advanced networking, and AI-native storage to accelerate large-scale AI innovation economically and efficiently. NVIDIA founder and CEO Jensen Huang proudly took the stage at CES 2026, unveiling the company's next-generation Rubin AI platform. NVIDIA's new Rubin AI platform is the successor to its dominant Blackwell AI chips, with Rubin being the first extreme-codesigned, six-chip AI platform, with Jensen adding that it's now in full production. NVIDIA is aiming to "push AI to the next frontier" with Rubin, not just offering far more computing power, but slicing the cost of generating tokens to around 1/10 of Blackwell, making large-scale AI "far more economical to deploy". The use of extreme codesign means that designing all of the components together is essential because scaling AI to gigascale requires tighter integration innovation between chips, trays, racks, networking, storage, and software to remove bottlenecks. This massively reduces the costs of training and inference, added Huang. Jensen also unveiled AI-native storage with NVIDIA Inference Context Memory Storage Platform, an AI-native KV-cache tier that increases long-context inference with 5x higher tokens per second, 5x increased performance per TCO dollar, and 5x better power efficiency. All of these innovations turn into the new Rubin AI platform, that NVIDIA says promises to dramatically accelerate AI innovation, delivering AI tokens at 1/10 the cost. Huang said: "The faster you train AI models, the faster you can get the next frontier out to the world. This is your time to market. This is technology leadership". Huang said: "Computing has been fundamentally reshaped as a result of accelerated computing, as a result of artificial intelligence. What that means is some $10 trillion or so of the last decade of computing is now being modernized to this new way of doing computing". Huang continued: "Every single six months, a new model is emerging, and these models are getting smarter and smarter. Because of that, you could see the number of downloads has exploded". NVIDIA Rubin AI platform details:
[16]
Nvidia Just Shared Details About Its Next Big Business Move
Nvidia is gearing up to release its newest Vera Rubin superchip, designed to drastically boost AI efficiency. The chip, currently in production, is slated for launch in the latter half of 2026, the company announced at the CES tech conference in Las Vegas on January 5. The next generation superchip is meant to power massive AI models and drive the imminent transition from AI chatbots to agents. Vera Rubin brings together one Vera CPU and two Rubin GPUs within a single processor. It will function as part of the Rubin platform, along with four additional networking and storage chips. Altogether, the full entity packs 72 GPUs into one system, which can then be combined with another to form a "massive AI supercomputer," according to Yahoo Finance. Rubin uses just a quarter of the GPUs that older Blackwell systems needed to train the same model. Customers will see a boost in efficiency, as they'll be able to use extra units for other tasks.
[17]
Nvidia Introduces Vera Rubin as Successor to Blackwell AI Platform
Vera Rubin is said to deliver up to 10x reduction in inference token cost Nvidia kickstarted the Consumer Electronics Show (CES) 2026 on Monday with several artificial intelligence (AI) announcements. Among them, the biggest introduction was Vera Rubin, the Santa Clara-based tech giant's newest AI platform, which replaces Blackwell. The company also unveiled six new chipsets and a supercomputer built on the new architecture, expanded its catalogue of open-source AI models, and shared advancements made by it in the physical AI space. All of these announcements were made during Nvidia CEO Jensen Huang's keynote session. Nvidia Introduces Vera Rubin AI Platform During his keynote address, Huang introduced the Vera Rubin platform. Just like its predecessor, Blackwell, the new architecture will become the standard for the upcoming chipsets aimed at AI workflows, enterprise systems, and supercomputers. Interestingly, the new AI platform is named after American astronomer Vera Florence Cooper Rubin, who is known for providing evidence for dark matter by studying galaxy rotation curves. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof. With our annual cadence of delivering a new generation of AI supercomputers -- and extreme codesign across six new chips -- Rubin takes a giant leap toward the next frontier of AI," said Huang. The core idea behind Vera Rubin is extreme co-design, meaning Nvidia engineered the platform's components from the ground up to share data quickly, reduce costs, and improve efficiency for training and running AI models. The company also introduced six key chipset families that will be bundled into rack-scale systems called Vera Rubin NVL servers. These include the Nvidia Vera CPU, Nvidia Rubin GPU, Nvidia NVLink 6 Switch, Nvidia ConnectX-9 SuperNIC, Nvidia BlueField-4 data processing unit (DPU) and the Nvidia Spectrum-6 Ethernet Switch. As per the company's press release, the new architecture will accelerate agentic AI, advanced reasoning, and large-scale mixture-of-experts (MoE) model inference. Compared to Blackwell, it is said to offer up to 10x lower cost and up to 4x fewer GPUs to run the same tasks. Nvidia also mentioned some of the companies that will adopt Vera Rubin-based chipsets in the coming months. These include Amazon Web Services (AWS), Anthropic, Dell Technologies, Google, HPE, Lenovo, Meta, Microsoft, OpenAI, Oracle, Perplexity, Thinking Machines Lab, and xAI. Nvidia's Open Models, Data and Tools Alongside its system architecture, Nvidia detailed a suite of open models and data tools intended to accelerate AI across industries. Among the releases is the Nvidia Alpamayo family, a set of open, large-scale reasoning models and simulation frameworks designed to support safe, reasoning-based autonomous vehicle development. The family includes a reasoning-capable vision-language-action (VLA) model, simulation tools such as AlpaSim, and Physical AI Open Datasets that cover rare and complex driving scenarios. Alpamayo is part of what Huang called a "ChatGPT moment for physical AI," where machines begin to understand, reason and act in the real world, including explaining their decisions. The open nature of the models, simulation frameworks and data sets is intended to encourage transparency and faster progress among industry developers and researchers working on Level 4 advanced driver assistive systems (ADAS). Apart from this, Nvidia's Nemotron family for agentic AI, Cosmos platform for physical AI, Isaac GR00T for robotics, and Clara for biomedical AI have also been made available to the open community.
[18]
ETtech Explainer: What's Nvidia's Rubin platform, and why it matters for AI - The Economic Times
The Rubin platform moves Nvidia from a seller of powerful GPUs to delivering fully integrated AI computing systems. Rubin is made up of six chips, consisting of tightly connected processors and networking components -- Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 Super NIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch.At the Consumer Electronics Show (CES) in Las Vegas on Monday, Nvidia unveiled its Rubin artificial intelligence (AI) platform, which is expected to ship in the second half of 2026, with early adoption planned by tech giants including Microsoft, Amazon, Meta and Google. The announcement marks a shift in Nvidia's position, from a seller of powerful GPUs to now delivering fully integrated AI computing systems. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Jensen Huang, founder and CEO of Nvidia, at CES 2026. "With our annual cadence of delivering a new generation of AI supercomputers and extreme codesign across six new chips, Rubin takes a giant leap toward the next frontier of AI." Extreme codesign is a holistic approach where the different components (hardware, software, networking, algos, etc.) are engineered simultaneously and collaboratively. ET breaks down why it matters for the AI ecosystem and how Nvidia stacks up against rivals in the AI infra race. What is the Rubin platform? Rubin is Nvidia's next-generation AI computing system designed to run the most advanced AI models efficiently at scale. Instead of a single chip, Rubin is made up of six, which includes tightly connected processors and networking components: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 Super NIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch. While the Vera CPU works closely with AI accelerators, handling system-level tasks, the Rubin GPU is the main AI engine responsible for training large models. Switch allows GPUs to share data, ConnectX-9 speeds up communication, BlueField-4 offloads networking, security, and storage tasks, and Spectrum-6 assures optimal performance. How is it different from the Blackwell platform? Rubin treats AI infrastructure as one coordinated system, unlike earlier generations, where Nvidia mainly sold graphics processing units (GPUs) plugged into standard servers. Even in the recent Nvidia Blackwell platform, the GPU was the centrepiece, with CPUs, networking, and storage often supplied by different vendors. Nvidia said Rubin delivers up to a 10X reduction in inference token cost and a 4X reduction in the number of GPUs required to train AI models, compared with the Blackwell platform. Nvidia claims its Spectrum-X Ethernet Photonics switch systems deliver 5X improved power efficiency and uptime. While each GPU can move data quickly, Rubin can move extensive data, about 260 TB per second, which is more than what the entire internet can handle at once. Nvidia Rubin also introduced the Nvidia Inference Context Memory Storage Platform, a new class of AI-native storage infrastructure designed to handle inference context at a gigascale. Nvidia says Rubin is built for agentic and reasoning-heavy AI workloads. It can work across multiple steps, keep track of long conversations, and run continuously, reducing the time and energy spent on moving data around the system. How does Nvidia stack up against rivals? Rivals such as AMD have narrowed the gap in accelerator performance, but Rubin moves the needle into system-level integration, where fewer companies can match Nvidia's scale. Hyperscalers, including Google's Tensor Processing Units (TPUs), are built to support both training and real-time AI workloads, linking up to 9,216 chips in one pod. Google claims that Ironwood is nearly 30x more power efficient than its first Cloud TPU from 2018. Similarly, AWS created its own AI training hardware, the advanced 3-nanometre Trainium3 chips and the upcoming Trainium4, to support large-scale AI workloads. AWS says Trainium3 is over four times faster than its predecessor, offers four times more memory, and uses about 40% less energy. Trainium competes on price and efficiency within a single cloud, while Rubin focusses less on individual chip speed and more on end-to-end efficiency at scale. Emerging AI hardware startups such as Graphcore, Cerebras, and SambaNova are offering alternative approaches, such as the intelligence processing unit (IPU), a single AI chip, etc. Rubin is positioned as a general-purpose platform that can cater to a broad range of customers, from cloud providers to enterprises, and even AI startups. Where Nvidia may face limitations is in custom workloads optimised for specific cloud providers. Google's TPUs or AWS Trainium chips, for example, are extremely efficient with respect to the services and models they are designed for, and Rubin's general-purpose design may not always outperform these specialised solutions in niche scenarios. Additionally, the high cost and complexity of deploying fully integrated Rubin systems could limit adoption among smaller AI startups that cannot invest in full-stack Nvidia infrastructure. Rubin's clients Microsoft's next-generation Fairwater AI superfactories will feature Nvidia Vera Rubin NVL72 rack-scale systems, scaling hundreds of thousands of Nvidia Vera Rubin Superchips. CoreWeave is among the first to offer Nvidia Rubin, operated through CoreWeave Mission Control for flexibility and performance. Rubin is further expected to be adopted by a wide range of leading AI labs, cloud providers, hardware manufacturers, and startups, including AWS, Anthropic, Black Forest Labs, Cisco, Cohere, Cursor, Dell Technologies, Google, Harvey, HPE, Lambda, Lenovo, Meta, Mistral AI, Nebius, Nscale, OpenAI, OpenEvidence, Oracle Cloud Infrastructure, Perplexity, Runway, Supermicro, Thinking Machines Lab, and xAI, the company said in a media release. Looking ahead Nvidia's vertical integration spanning compute, networking, data processing, and system software increases performance predictability for large-scale deployments, while also raising migration/porting costs for customers building on Nvidia's platform. Nvidia may have raised the stakes for rivals that must now match not only its performance but also the coherence and breadth of its full AI infrastructure stack. Also Read || CES 2026: All you need to know about Nvidia's major announcements
[19]
Nvidia CEO Jensen Huang Says Blackwell Successor Vera Rubin Is In 'Full Production' At CES 2026: Here Is Everything You Need To Know - NVIDIA (NASDAQ:NVDA)
At CES 2026, Nvidia Corp (NASDAQ:NVDA) CEO Jensen Huang outlined a sweeping vision for AI's next computing cycle, confirming that the company's next-generation Vera Rubin platform is already in full production. AI Enters A New Computing Cycle, Huang Says Taking the stage at a packed Fontainebleau Las Vegas venue, Huang said the computing industry is undergoing a once-in-a-decade transition -- and this time, two shifts are happening simultaneously. Applications are now being built directly on AI, and the process of building software itself has fundamentally changed. He made a fashionably late appearance in his signature leather jacket -- shinier than usual -- greeted the crowd with a New Year's wish and immediately launched into Nvidia's success in scaling AI, pushing it toward agentic systems, teaching it the laws of nature and beyond. Huang said the computing industry undergoes a major transformation roughly every 10 to 15 years, with each shift ushering in a new platform. This time, two transitions are happening at once: applications are increasingly built around AI and the process of developing software itself is being fundamentally redefined. See Also: Maduro Down, Drone Stocks Up After Venezuela Mission AI Is Moving Beyond Chatbots Huang traced AI's evolution from early neural networks to transformers and today's large language models, arguing that the next phase extends well beyond text-based systems. He emphasized the rise of agentic AI, models capable of planning, reasoning and acting autonomously over time. "Large language models isn't the only form of information," Huang said. "Wherever the universe has information, wherever the universe has structure, we could teach a large language model." That includes what Nvidia calls physical AI -- systems trained to understand and interact with the real world using the laws of physics. Open Models Are Catching Up Fast A major theme of the keynote was Nvidia's bet on open AI ecosystems. Huang said open models are now roughly six months behind proprietary frontier models and continue to close the gap. According to Huang, about 80% of startups are building on open models and a significant share of AI usage across developer platforms now relies on open-source systems. Nvidia is releasing not only its models, but also the data and lifecycle tools used to train, evaluate and deploy them. Physical AI Powers Robots And Autonomous Vehicles Nvidia highlighted its Cosmos world foundation model, which generates realistic simulations and synthetic data to train robots and autonomous systems. Huang said Nvidia uses Cosmos internally for self-driving development. The company also unveiled Alpamayo, an open-source reasoning and decision-making AI for autonomous driving. Huang said it allows vehicles to learn from limited real-world data and handle unfamiliar scenarios. Nvidia's open models are also rapidly closing the gap with frontier systems, topping leaderboards in areas like OCR, PDF comprehension, natural language search and more. Next Era Of Robotics Systems Is Going To Be... Robots At one point, during the keynote, Star Wars' BDX droids entered the stage -- fully autonomous, powered by Nvidia Cosmos. Huang was clearly enjoying a lively back-and-forth with the droids. Nvidia is also teaming up with Siemens, signaling a major leap for manufacturing as Nvidia's physical AI uses synthetic data from digital factory twins to train next-generation robotics. Vera Rubin Enters Full Production Huang confirmed that Vera Rubin, Nvidia's next-generation AI supercomputing platform and successor to Blackwell, is now in full production. The system delivers up to five times the performance of Blackwell while improving efficiency, memory bandwidth and interconnect speeds. Vera Rubin integrates advanced GPUs, custom CPUs, high-speed networking and full-stack encryption and is designed to address what Huang called AI's next major constraint: context and data movement. "NVIDIA Rubin platform, the successor to the record‑breaking NVIDIA Blackwell architecture and the company's first extreme-codesigned, six‑chip AI platform, is now in full production," the company also later said in a blog post. Vera Rubin also integrates NVIDIA's ConnectX-9 Spectrum-X SuperNIC and can be assembled in just five minutes, compared with roughly two hours for previous systems. The entire platform is water-cooled. An NVLink 6 Switch enables all GPUs within Vera Rubin to communicate simultaneously. Huang said the system is fully encrypted for enhanced security and uses hot-water cooling at around 45°C -- a counterintuitive approach that, he noted, significantly reduces energy costs. At the beginning of Vera Rubin's presentation, Huang also referenced astronomer Vera Rubin, who observed that the outer edges of galaxies were rotating nearly as fast as their centers -- a breakthrough that led to the discovery of dark matter. Huang said Nvidia will name its next computer in her honor. Price Action: Nvidia shares were down 0.39% during Monday's regular session and slipped another 0.069% in after-hours trading, according to Benzinga Neuro. According to Benzinga's Edge Stock Rankings, Nvidia ranks in the 94th percentile for Growth and the 97th percentile for Quality. Click here to see how it compares with its peers. Read Next: Beyond Nvidia: Dan Ives Names Top AI Stocks For 2026 Photo Courtesy: glen photo on Shutterstock.com Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. NVDANVIDIA Corp$187.99-0.07%OverviewMarket News and Data brought to you by Benzinga APIs
[20]
NVIDIA Rubin Is The Most Advanced AI Platform On The Planet: Up To 50 PFLOPs With HBM4, Vera CPU With 88 Olympus Cores, And Delivers 5x Uplift Vs Blackwell
NVIDIA is formally announcing its Rubin AI platform today, which will be the heart of next-gen Data Centers, with a 5x upgrade over Blackwell. Today, NVIDIA is officially announcing its Rubin platform, which comes as a surprise because we were all expecting an update at the company's GTC event, which has already been announced. With the exciting developments in the AI segment and with all the AI talk going around at CES, NVIDIA decided to unveil its grand AI platform a little early. NVIDIA's Rubin platform is going to be made up of a total of six chips, all of which are back from fabs and in NVIDIA's labs for testing. These chips include: All of these chips combined make the Rubin platform alive inside a range of DGX, HGX, and MGX systems. At the heart of each data center is the NVIDIA Vera Rubin Superchip, featuring two Rubin GPUs, one Vera CPU, and massive amounts of memory in HBM4 and LPDDR5x configurations. The highlights of the NVIDIA Rubin technology include: So starting with the Rubin GPU, this chip features two reticle dies, each with lots of compute and tensor cores. The chip itself is designed purely for AI-intensive workloads, offering 50 FLOPs of NVFP4 Inference, 35 PFLOPs of NVFP4 Training performance, a 5x and 3.5x increase over Blackwell, respectively. The chip is also equipped with HBM4 memory, offering up to 22 TB/s bandwidth per chip, a 2.8x increase vs Blackwell and 3.6 TB/s of NVLink bandwidth per CPU, a 2x increase vs Blackwell. For the Vera CPU, NVIDIA has designed its next-gen custom Arm architecture codenamed Olympus, and the chip packs 88 cores, 176 threads (with NVIDIA Spatial Multi-Threading), 1.8 TB/s NVLink-C2C coherent memory interconnect, 1.5 TB of system memory (3x Grace), 1.2 TB/s of memory bandwidth with SOCAMM LPDDR5X, and Rack-scale confidential compute. These combine to offer 2x data processing, compression & CI/CD performance versus Grace. NVLink 6 switches offer networking fabric on the Rubin platform with 400G SerDes, 3.6 TB/s per-CPU all-to-all bandwidth, 28.8 TB/s of total bandwidth, 14.4 TFLOPS of FP8 compute in-network, & a 100% liquid cooled design. Networking is powered by the latest ConnectX-9 and BlueField-4 modules. ConnectX-9 SuperNIC offers 1.6 TB/s bandwidth with 200G PAM4 SerDes, programmable RDMA and data path accelerator, top-level security, and is optimized for massive-scale AI. The Bluefield-4 is an 800G DPU for SmartNIC and storage processor. It integrates a 64-core Grace CPU with ConnectX-9, offers 2x networking capabilities versus BlueField-3, 6x compute, and 3x memory bandwidth. All of these come together in the NVIDIA Vera Rubin NVL72 rack, which offers some impressive uplifts versus Blackwell as detailed below: NVIDIA is also announcing its Spectrum-X Ethernet Co-Packaged Optics solution, which offers a 102.4 Tb/s scale-out switch infrastructure, co-packaged 200G silicon photonics, and offers 95% of effective bandwidth at scale. The system is 5 times more efficient, 10 times more reliable, and offers 5 times higher application runtime. For its Rubin SuperPOD, NVIDIA is also unveiling the Inference Context Memory Storage platform, which is built for gigascale inference and is fully integrated with NVIDIA software solutions such as Dynamo, NIXL & DOCA. To wrap it all up, NVIDIA will be putting its Rubin platform in its bleeding-edge DGX SuperPOD with 8 Vera Rubin NVL72 racks. But that isn't it, there's also the NVIDIA DGX Rubin NVL8 for mainstream Data Centers. With all of these advancements, NVIDIA Rubin offers 10x reduction in inference token cost and 4x reduction in number of GPUs to train MoE models vs Blackwell GB200. The Rubin ecosystem is backed by a diverse range of partners and is in full production, with customers getting the first chips later this year.
[21]
Nvidia Touts New Storage Platform, Confidential Computing For Vera Rubin NVL72 Server Rack
The AI infrastructure giant used the CES 2026 keynote by Nvidia CEO Jensen Huang to mark the launch of its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. Availability from partners is set to begin in the second half of this year. Nvidia on Monday revealed a new "context memory" storage platform, "zero downtime" maintenance capabilities, rack-scale confidential computing and other new features for its forthcoming Vera Rubin NVL72 server rack for AI data centers. The AI infrastructure giant used the CES 2026 keynote by Nvidia CEO Jensen Huang to mark the launch of its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. But while the company said Rubin is in "full production," related products won't be available from partners until the second half of this year. [Related: The 10 Biggest Nvidia News Stories Of 2025] Huang and other Nvidia officials in recent months have fought back fears of the massive AI data center build-out representing a bubble by stating that the company expects to make $500 billion from Blackwell and Rubin products between the start of last year and end of this year, citing ongoing demand for generative, agentic and physical AI solutions. In promoting Rubin, Nvidia touted support from a wide range of support from large and influential tech companies, including Amazon Web Services, Microsoft, Google Cloud, CoreWeave, Cisco, Dell Technologies, HPE, Lenovo and many more. The Santa Clara, Calif.-based company plans to initially make Rubin available in two ways: through the Vera Rubin NVL72 rack-scale platform, which connects 72 Rubin GPUs and 36 of its custom, Arm-compatible Vera CPUs, and through the HGX Rubin NVL8 platform, which connects eight Rubin GPUs for servers running on x86-based CPUs. Both of these platforms will be supported by Nvidia's DGX SuperPod clusters. The rack-scale platform was originally called Vera Rubin NVL144 when it was revealed at Nvidia's GTC 2025 event last March, with the 144 number meant to reflect the number of GPU dies in each server rack. But the company eventually decided against this, instead opting to stick with the NVL72 nomenclature used for the Grace Blackwell rack-scale platforms to reflect the number of GPU packages, each of which contain two GPU dies. The GPU packages for Blackwell products also consist of two GPU dies, which are connected through a high-speed die-to-die interconnect. "Essentially we're just being consistent with how we've deployed and talked about it for Blackwell, and we're carrying that forward for Vera Rubin as well," said Dion Harris, senior director of high-performance computing and AI infrastructure solutions at Nvidia, in a briefing with journalists and analysts on Sunday. Harris said the Rubin platform, with the Vera Rubin NVL72 rack as its flagship product, consists of the Rubin GPU, the Vera CPU -- Nvidia's first CPU with custom, Arm-compatible cores -- and four other new chips the company has co-designed to "meet the needs of the most advanced models and drive down the cost of intelligence." The company provided a litany of specs and features for the Rubin platform, some of which have been shared at previous events. Each Vera CPU features 88 custom Olympus cores, 176 threads with Nvidia's new spatial multi-threading technology, 1.5 TB of system LPDDR5x memory, 1.2 TBps of memory bandwidth and confidential computing capabilities. It also features a 1.8 TBps NVLInk chip-to-chip interconnect to support coherent memory with the GPUs. In the briefing, Harris said the CPU's confidential computing feature allows Vera Rubin to deliver the "first rack-scale Trusted Execution Environment, maintaining data security across CPU, GPU and the NVLink domain [to protect] the world's largest proprietary models, training data and inference workloads." Compared to Nvidia's Grace GPU, which is based on Arm's off-the-shelf Neoverse V2 microarchitecture, Vera offers double the performance for data processing, compression and code compilation, according to the company. The Rubin GPU, on the other hand, is capable of 50 petaflops for inference computing using Nvidia's NVFP4 data format, which is five times faster than Blackwell, the company said. It can also perform 35 petaflops for NVFP4 training, which is 3.5 times faster than its predecessor. The bandwidth for its HBM4 high-bandwidth memory is 22 TBps, 2.8 times faster, while the NVLink bandwidth per GPU is 3.6 TBps, two times faster. The platform also includes the liquid-cooled NVLink 6 Switch for scale-up networking. This switch features 400G SerDes, 3.6 TBps of per-GPU bandwidth for communication between all GPUs, a total bandwidth of 28.8 TBps and 14.4 teraflops of FP8 in-network computing. In addition, the Rubin platform makes use of Nvidia's ConnectX-9 SuperNIC and BlueField-4 DPU to take scale-out networking to the next level, according to the company. All of these parts go into the Vera Rubin NVL72 platform, which is capable of 3.6 exaflops of NVFP4 inference performance, five times greater than the Blackwell-based iteration, Nvidia said. Training performance with the NVFP4 format reaches a purported 2.5 exaflops, which is 3.5 times higher than the predecessor. Vera Rubin also features 54 TB of LPDDR5x capacity, 2.5 times higher than Blackwell, and 20.7 TB of HBM4 capacity, 50 percent more than the predecessor, Nvidia said. HBM4 bandwidth reaches 1.6 PBps, 2.8 times greater, and a scale-up bandwidth of 260 TBps second, double that of the Blackwell NVL72 platform. "That's more bandwidth than the entire global internet," Harris said. Nvidia said Vera Rubin also features the third generation of its NVL72 rack resiliency technologies, which includes a cable-free modular tray design that allows for 18 times faster assembly and service. Other features include NVLink Intelligent Resiliency, which the company claims will allow for maintenance of servers with "zero downtime." "The NVLink switch trays now features zero downtime maintenance and fault tolerance, allowing racks to remain operational while switch trays are removed or partially populated," Harris said. There's also a second-generation RAS Engine for reliability, availability and serviceability needs, which Nvidia said will enable GPU diagnostics without taking the rack offline. "All of these features increase system uptime and goodput, which further drives down the cost of training and inference," Harris said. With agentic AI workloads generating massive amounts of context data, Nvidia is introducing a new storage platform it said will provide a significant boost in inference performance and power efficiency for such applications. Called the Nvidia Inference Context Memory Storage Platform, Harris said the technology uses BlueField-4 and Spectrum-X Ethernet to create "AI-native storage infrastructure for storing KV cache" -- which is a data structure that is key to optimizing the way large language models generate tokens or provide responses. Compared to traditional network storage options for storing inference context data, this new platform delivers up to five times higher tokens per second, five times better performance per dollar and five times better power efficiency, according to Harris. "That translates directly into higher throughput, lower latency and more predictable behavior," he said. "And it really matters for the workloads we've been talking about: large-context applications like multi-turn chat, retrieval-augmented generation and agentic AI multi-step reasoning. These workloads stress how efficiently context can be stored, reused and shared across the entire system." Harris said Nvidia is "working closely with our storage partners to bring a new tier of inference context memory to the Rubin platform so customers can deploy it as part of a complete, integrated AI infrastructure."
[22]
NVIDIA's 'Revolutionary' Rubin AI Chips Enter Full Production Well Ahead of Schedule, Proving Jensen's Pace Is Unmatched
NVIDIA's next-generation Rubin chips are currently in full production, despite the original timeline set for H2 2026, indicating that Jensen's AI strategy centers on being 'fast and lethal'. The Rubin AI lineup is poised to be a significant leap forward for NVIDIA in terms of architectural advancements, given that we are seeing upgrades across multiple elements. I'll dive into the improvements later on, but one of the most significant announcements by NVIDIA at CES 2026 was announcing Rubin to be under full production in Q1 2026, which is almost two quarters earlier than the anticipated timeline. According to the company, development on Rubin had already been initiated three years ago, and production was underway alongside Blackwell. NVIDIA currently operates on an annual product cadence on paper; however, when you actually examine the timeline of generational launches, you'll realize that the cycle is slightly shorter than twelve months. NVIDIA's Blackwell ramp-up initiated in H2 2025, while Blackwell Ultra mass production started in Q3 2025. Now, given that Rubin is under full production, the development shows that NVIDIA's pace of generational launches is unmatched, and this clearly demonstrates the company's commitment to staying ahead of the pack. Team Green is known to have major commitments for its Rubin AI lineup, with OpenAI officially disclosed, but we do know that hyperscalers and neoclouds would rush to get their hands on the newer architecture. NVIDIA's CFO previously revealed that Rubin mass production is slated for H2 2026, and now that the timeline has been pushed back, we could see customer shipments to pan out by H2 of 2026, which means that Rubin will become the mainstream revenue driver, alongside the ongoing Blackwell Ultra shipments. NVIDIA Rubin-based products will be available from partners the second half of 2026. Among the first cloud providers to deploy Vera Rubin-based instances in 2026 will be AWS, Google Cloud, Microsoft and OCI, as well as NVIDIA Cloud Partners CoreWeave, Lambda, Nebius and Nscale. - NVIDIA NVIDIA's Rubin platform will consist of a total of six chips, all of which are back-fabrication fabs and ready for volume production. These chips include: We already have the Rubin announcement live here, but it is simply amazing to see NVIDIA's pursuit towards dominating the AI infrastructure race, and with Rubin, the company is set to maintain its lead in the training segment.
[23]
CES 2026: Nvidia Launches Rubin Chip Architecture, Alpamayo AI Models for Vehicles
With these launches Nvidia aims to corner a large chunk of the AI infrastructure spending over the next 2-3 years As expected, Nvidia CEO Jensen Huang hogged the limelight at the Consumer Electronics Show (CES 2026) with two new launches. The first was the launch Alpamayo, a new series of open-source AI models that'd make vehicles think like humans. The second was the launch of the Rubin computing architecture that fulfils the skyrocketing processing power for AI. The new open-source AI models, simulation tools, and databases would be used to train physical robots and vehicles and are designed to assist autonomous vehicles reason through complex driving situation. Huang described the launch as "the ChatGPT moment for physical AI - where machines begin to understand, reason, and act in the real world." "Alpamayo brings reasoning to autonomous vehicles, allowing them to think through rare scenarios, drive safely in complex environments, and explain their driving decisions," he said in the keynote address while launching the Alpamayo 1, a 10 billion-parameter reason-based vision language action (VLA) model. You can watch the keynote session with Jensen Huang here Ali Kani, Nvidia's vice president of automotive actually referred to the recent incidents of autonomous vehicles stalling when the traffic signals at intersections went off. He noted that the VLA tech allows the vehicle to navigate such challenges without previous experience. "It does this by breaking down problems into steps, reasoning through every possibility, and then selecting the safest path," he told the media. Huang said Alpamayo takes sensor inputs and activates steering wheel, brakes and acceleration as well as reasons about the immediate actions it wants to take. In fact, it tells the rider what action is forthcoming, the reasons for it and the trajectory of movement. The company said the AI models' underlying code is available on Hugging Face for developers to tune it into smaller, faster versions for vehicle development and also in training the drive systems. Rubin architecture chip to speed up AI further Announced in 2024 by Nvidia, the Rubin architecture is currently in production and would ramp up in the second half of 2026. "Vera Rubin is designed to address this fundamental challenge that we have: The amount of computation necessary for AI is skyrocketing. Today, I can tell you that Vera Rubin is in full production," Huang announced. The latest launch forms part of Nvidia's frenetic hardware development cycle that has turned the company into the most valuable enterprise in the world. It is learnt that the Rubin architecture would replace the Blackwell architecture, which was brought in to replace the Hopper and Lovelace architectures. Nvidia has already signed up deals with several high-profile AI companies such as Anthropic, OpenAI, besides AWS and others. In fact, Rubin chips would be in use for almost all major cloud providers as well. The chip is named after renowned astronomer Vera Florence Cooper Rubin. Nvidia shared the details of the Rubin architecture. It comprises six separate chips designed to be used in tandem. The Rubin GPU holds the centrepiece position but the architecture resolves issues related to bottlenecks in storage and interconnections. It does so through improving the Bluefield and NVLink systems respectively. Additionally, there is a Vera CPU that is designed specifically for agentic reasoning, Huang said. According to Dion Harris, Nvidia senior director for AI infrastructure, as one starts to enable new types of workflows, like agentic AI or long-term tasks, there is a lot of stress and requirements on the KV cache. "So, we've introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently," he told the media. Moreover, the new architecture also provides a significant push in speed and power efficiencies with Nvidia claiming that during tests the Rubin architecture operates three and a half times faster than the previous Blackwell architecture on model-training and five times faster on inference tasks. We believe these two announcements are in line with what Huang had estimated in an earnings call in October last year where he stuck it out by suggesting that between $3 trillion to $4 trillion would be spent on AI infrastructure over the next five years. Looks like Rubin and Alpamayo would ensure that Nvidia corners a lion's share of these revenues.
[24]
NVIDIA's Vera Rubin Signals the Next Leap in AI Computing
The platform tightly integrates multiple components, including a next-generation Rubin GPU, a custom Vera CPU, high-bandwidth NVLink interconnects, networking chips, and data-processing units, all co-designed to work as a single system. NVIDIA notes that AI workloads are transforming at a rapid pace. Training giant models is no longer the only challenge; inference, long-context reasoning, and now demand highly efficient communication between chips. According to the company, Vera Rubin can dramatically reduce inference costs and the number of GPUs required for specific workloads compared to the previous generation. This makes it better suited for always-on 'AI factories.'
[25]
Nvidia announces mass production of its new Vera Rubin AI platform
"I can tell you that is in full production," Huang said at the CES technology trade show, which officially kicks off on Tuesday. The platform combines several chips, including the Rubin GPU and the Vera CPU, forming a supercomputer specialized in AI capable of executing advanced models with great speed and efficiency. According to the executive, this architecture offers a maximum inference performance up to five times greater than previous generations, with remarkable energy efficiency, and is designed to run advanced AI models in data centers and large-scale applications. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," Huang added. During the presentation, he highlighted 's plans to boost physical AI and introduced , a new family of AI models aimed at improving the safety and development of autonomous vehicles. The first car with this technology is expected to debut on US roads in the first quarter, followed by its arrival in European markets. " does something thats really special. Not only does it take sensor input and activate steering wheel, brakes and acceleration, it also reasons about what action is about to take," the CEO said. Huang also highlighted other physical AI developments, such as Nemotron, aimed at developing intelligent agents with reasoning and decision-making capabilities, and Cosmos, focused on AI models for understanding and simulating real-world environments. Thanks to its strategic alliances, continues to set the course for technological innovation, said , leader of the future strategy team at , during his presentation on Monday, in which he highlighted the collaboration with the American chip company. The presentation by Huang, who was accompanied on stage by robots, was part of a media day that highlighted the presence of artificial intelligence-powered devices, such as CLOiD and the new Atlas humanoid robot from Boston Dynamics, owned by Hyundai. The American chip manufacturer has achieved a high valuation on and has driven the AI ??revolution in the stock market.
[26]
Nvidia launches Vera Rubin platform, comprising 6 new chips designed to deliver one AI supercomputer
NVIDIA Corporation is the world leader in the design, development, and marketing of programmable graphics processors. The group also develops associated software. Net sales break down by family of products as follows: - computing and networking solutions (89%): data center platforms and infrastructure, Ethernet interconnect solutions, high-performance computing solutions, platforms and solutions for autonomous and intelligent vehicles, solutions for enterprise artificial intelligence infrastructure, crypto-currency mining processors, embedded computer boards for robotics, teaching, learning and artificial intelligence development, etc.; - graphics processors (11%): for PCs, game consoles, video game streaming platforms, workstations, etc. (GeForce, NVIDIA RTX, Quadro brands, etc.). The group also offers laptops, desktops, gaming computers, computer peripherals (monitors, mice, joysticks, remote controls, etc.), software for visual and virtual computing, platforms for automotive infotainment systems and cloud collaboration platforms. Net sales break down by industry between data storage (88.3%), gaming (8.7%), professional visualization (1.4%), automotive (1.3%) and other (0.3%). Net sales are distributed geographically as follows: the United States (46.9%), Singapore (18.2%), Taiwan (15.8%), China and Hong Kong (13.1%) and other (6%).
Share
Share
Copy Link
Nvidia CEO Jensen Huang announced the Vera Rubin AI computing platform is in full production at CES 2026. The next-generation AI platform delivers five times faster AI inference than Blackwell architecture while cutting costs by up to 10x. Major cloud providers including Microsoft, AWS, and Google Cloud will deploy Rubin systems starting in the second half of 2026.
Nvidia CEO Jensen Huang officially launched the company's Vera Rubin chip architecture at CES 2026, declaring the next-generation AI platform is already in full production
1
. The announcement marks a shift in how Nvidia positions its products, moving away from individual GPU sales toward complete rack-scale AI systems designed to address the skyrocketing computational demands of modern AI models5
.
Source: Tom's Hardware
"Vera Rubin is designed to address this fundamental challenge that we have: The amount of computation necessary for AI is skyrocketing," Huang told the audience at the Consumer Electronics show
1
. The Rubin platform will replace the Blackwell architecture as Nvidia's flagship AI computing solution, with production expected to ramp up further in the second half of 2026.The Rubin platform delivers substantial improvements in both speed and cost efficiency compared to its predecessor. According to Nvidia's tests, the Rubin GPU operates three and a half times faster than Blackwell on model training tasks and five times faster on AI inference tasks, reaching as high as 50 petaflops of NVFP4 computational power
1
4
. The platform can train large mixture-of-experts AI models using roughly one-fourth as many chips as Blackwell requires while delivering up to a 10x reduction in inference token costs2
4
.Power efficiency represents another major advance, with the new platform supporting eight times more inference compute per watt
1
. These gains could make advanced AI systems significantly cheaper to operate and make it harder for Nvidia's customers to justify moving away from its hardware, analysts note2
.Named after astronomer Vera Florence Cooper Rubin, the architecture consists of six separate chips designed to work in concert
1
. At the center sits the Rubin GPU, but the system also includes a Vera CPU built with 88 custom Olympus cores designed specifically for agentic reasoning4
. Both the Rubin GPU and Vera CPU are manufactured using Taiwan Semiconductor Manufacturing Company's 3 nanometer fabrication process with the most advanced bandwidth memory technology currently available2
.
Source: Wccftech
The architecture addresses growing bottlenecks in storage and interconnection through improvements in the BlueField DPU and NVLink systems. Dion Harris, Nvidia's senior director of AI infrastructure solutions, explained that new workflows like agentic AI place significant stress on KV cache memory systems. "We've introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently," Harris told reporters
1
. The BlueField-4 DPU introduces a shared memory tier for long-context inference, treating context as a first-class system resource rather than a per-GPU issue5
.Nvidia's sixth-generation NVLink interconnects and Spectrum-6 Ethernet switch provide the networking backbone, while ConnectX-9 SuperNIC handles high-speed networking
4
.Related Stories
Rubin chips are already slated for use by nearly every major cloud provider. Microsoft and CoreWeave will be among the first companies to begin offering services powered by Rubin systems later this year
2
. Two major AI data centers that Microsoft is currently building in Georgia and Wisconsin will eventually include thousands of Rubin chips2
. Other confirmed partners include Amazon Web Services, Google Cloud, Anthropic, and OpenAI1
4
.Rubin systems will also power HPE's Blue Lion supercomputer and the upcoming Doudna supercomputer at Lawrence Berkeley National Lab
1
. Nvidia is also working with Red Hat to offer more products that will run on the new system for banks, automakers, airlines, and government agencies2
.The Rubin platform launch signals a fundamental shift in Nvidia's business strategy. For the first time in roughly five years, Nvidia stood on the CES stage without a new consumer GPU announcement
5
. The company is no longer content to sell accelerators one card at a time; it is selling entire AI systems instead, reflecting how hyperscalers and AI labs now deploy hardware in standardized blocks measured in racks or data halls5
.
Source: Tom's Hardware
The flagship Nvidia Vera Rubin NVL72 configuration combines 36 Nvidia Vera CPUs, 72 Nvidia Rubin GPUs, NVLink 6 switches, multiple ConnectX-9 SuperNICs, and BlueField-4 DPUs into a single logical system
4
. This emphasis on pre-integrated systems shortens deployment timelines and reduces the tuning work customers must do themselves5
.The launch comes amid intense competition to build AI infrastructure. On an earnings call in October 2025, Huang estimated that between $3 trillion and $4 trillion will be spent on AI infrastructure over the next five years
1
. Nvidia recently reported record high data center revenue, up 66 percent over the prior year, driven by demand for Blackwell and Blackwell Ultra GPUs3
. The goal with the Rubin platform is to accelerate mainstream adoption of advanced large language models, particularly in the consumer space, by sharply reducing the astronomical costs that have held back widespread AI deployment4
.Summarized by
Navi
29 Oct 2025•Technology

10 Jun 2025•Technology

23 Aug 2025•Technology

1
Policy and Regulation

2
Technology
3
Technology
