Nvidia Vera Rubin architecture slashes AI costs by 10x with advanced networking at its core

Reviewed by Nidhi Govil

32 Sources

[1]

IEEE

Nvidia's Vera Rubin Architecture Thrives on Networking

Earlier this week, Nvidia surprise-announced their new Vera Rubin architecture (no relation to the recently unveiled telescope) at the Consumer Electronics Show in Las Vegas. The new platform, set to reach customers later this year, is advertised to offer a ten-fold reduction in inference costs and a four-fold reduction in how many GPUs it would take to train certain models, as compared to Nvidia's Blackwell architecture. The usual suspect for improved performance is the GPU. Indeed, the new Rubin GPU boasts 50 quadrillion floating-point operations per second (petaFLOPS) of 4-bit computation, as compared to 10 petaflops on Blackwell, at least for transformer-based inference workloads like large language models. However, focusing on just the GPU misses the bigger picture. There are a total of six new chips in the Vera-Rubin-based computers: the Vera CPU, the Rubin GPU, and four distinct networking chips. To achieve performance advantages, the components have to work in concert, says Gilad Shainer, senior vice president of networking at Nvidia. "The same unit connected in a different way will deliver a completely different level of performance," Shainer says. "That's why we call it extreme co-design." AI workloads, both training and inference, run on large numbers of GPUs simultaneously. "Two years back, inferencing was mainly run on a single GPU, a single box, a single server," Shainer says. "Right now, inferencing is becoming distributed, and it's not just in a rack. It's going to go across racks." To accommodate these hugely distributed tasks, as many GPUs as possible need to effectively work as one. This is the aim of the so-called scale-up network: the connection of GPUs within a single rack. Nvidia handles this connection with their NVLink networking chip. The new line includes the NVLink6 switch, with double the bandwidth of the previous version (3,600 gigabytes per second for GPU-to-GPU connections, as compared to 1,800 GB/s for NVLink5 switch). In addition to the bandwidth doubling, the scale-up chips also include double the number of SerDes -- serializer/deserializers (which allow data to be sent across fewer wires) and an expanded number of calculations that can be done within the network. "The scale-up network is not really the network itself," Shainer says. "It's computing infrastructure, and some of the computing operations are done on the network...on the switch." The rationale for offloading some operations from the GPUs to the network is two-fold. First, it allows some tasks to only be done once, rather than having every GPU having to perform them. A common example of this is the all-reduce operation in AI training. During training, each GPU computes a mathematical operation called a gradient on its own batch of data. In order to train the model correctly , all the GPUs need to know the average gradient computed across all batches. Rather than each GPU sending its gradient to every other GPU, and every one of them computing the average, it saves computational time and power for that operation to only happen once, within the network. A second rationale is to hide the time it takes to shuttle data in-between GPUs by doing computations on them en-route. Shainer explains this via an analogy of a pizza parlor trying to speed up the time it takes to deliver an order. "What can you do if you had more ovens or more workers? It doesn't help you; you can make more pizzas, but the time for a single pizza is going to stay the same. Alternatively, if you would take the oven and put it in a car, so I'm going to bake the pizza while traveling to you, this is where I save time. This is what we do." In-network computing is not new to this iteration of Nvidia's architecture. In fact, it has been in common use since around 2016. But, this iteration adds a broader swath of computations that can be done within the network to accommodate different workloads and different numerical formats, Shainer says. The rest of the networking chips included in the Rubin architecture comprise the so-called scale-out network. This is the part that connects different racks to each other within the data center. Those chips are the ConnectX-9, a networking interface card; the BlueField-4 a so-called data processing unit, which is paired with two Vera CPUs and a ConnectX-9 card for offloading networking, storage, and security tasks; and finally the Spectrum-6 Ethernet switch, which uses co-packaged optics to send data between racks. The Ethernet switch also doubles the bandwidth of the previous generations, while minimizing jitter -- the variation in arrival times of information packets. "Scale-out infrastructure needs to make sure that those GPUs can communicate well in order to run a distributed computing workload and that means I need a network that has no jitter in it," he says. The presence of jitter implies that if different racks are doing different parts of the calculation, the answer from each will arrive at different times. One rack will always be slower than the rest, and the rest of the racks, full of costly equipment, sit idle while waiting for that last packet. "Jitter means losing money," Shainer says. None of Nvidia's host of new chips are specifically dedicated to connect between data centers, termed '"scale-across." But Shainer argues this is the next frontier. "It doesn't stop here, because we are seeing the demands to increase the number of GPUs in a data center," he says. "100,000 GPUs is not enough anymore for some workloads, and now we need to connect multiple data centers together."

[2]

TechCrunch

Nvidia launches powerful new Rubin chip architecture | TechCrunch

Today at the Consumer Electronics show, Nvidia CEO Jensen Huang officially launched the company's new Rubin computing architecture, which he described as the state of the art in AI hardware. The new architecture is currently in production and is expected to ramp up further in the second half of the year. "Vera Rubin is designed to address this fundamental challenge that we have: The amount of computation necessary for AI is skyrocketing." Huang told the audience. "Today, I can tell you that Vera Rubin is in full production." The Rubin architecture, which was first announced in 2024, is the latest result of Nvidia's relentless hardware development cycle, which has transformed Nvidia into the most valuable corporation in the world. The Rubin architecture will replace the Blackwell architecture, which in turn, replaced the Hopper and Lovelace architectures. Rubin chips are already slated for use by nearly every major cloud provider, including high-profile Nvidia partnerships with Anthropic, OpenAI, and Amazon Web Services. Rubin systems will also be used in HPE's Blue Lion supercomputer and the upcoming Doudna supercomputer at Lawrence Berkeley National Lab. Named for the astronomer Vera Florence Cooper Rubin, the Rubin architecture consists of six separate chips designed to be used in concert. The Rubin GPU stands at the center, but the architecture also addresses growing bottlenecks in storage and interconnection with new improvements in the Bluefield and NVLink systems respectively. The architecture also includes a new Vera CPU, designed for agentic reasoning. Explaining the benefits of the new storage, Nvidia's senior director of AI infrastructure solutions Dion Harris pointed to the growing cache-related memory demands of modern AI systems. "As you start to enable new types of workflows, like agentic AI or long-term tasks, that puts a lot of stress and requirements on your KV cache," Harris told reporters on a call, referring to a memory system used by AI models to condense inputs. "So we've introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently." As expected, the new architecture also represents a significant advance in speed and power efficiency. According to Nvidia's tests, the Rubin architecture will operate three and a half times faster than the previous Blackwell architecture on model-training tasks and five times faster on inference tasks, reaching as high as 50 petaflops. The new platform will also support eight times more inference compute per watt. Rubin's new capabilities come amid intense competition to build AI infrastructure, which has seen both AI labs and cloud providers scramble for Nvidia chips as well as the facilities necessary to power them. On an earnings call in October 2025, Huang estimated that between $3 trillion and $4 trillion will be spent on AI infrastructure over the next five years.

[3]

Wired

Jensen Huang Says Nvidia's New Vera Rubin Chips Are in 'Full Production'

Nvidia CEO Jensen Huang says that the company's next-generation AI superchip platform, Vera Rubin, is on schedule to begin arriving to customers later this year. "Today, I can tell you that Vera Rubin is in full production," Huang said during a press event on Monday at the annual CES technology trade show in Las Vegas. Rubin will cut the cost of running AI models to about one-tenth of Nvidia's current leading chip system, Blackwell, the company told analysts and journalists during a call on Sunday. Nvidia also said Rubin can train certain large models using roughly one-fourth as many chips as Blackwell requires. Taken together, those gains could make advanced AI systems significantly cheaper to operate and make it harder for Nvidia's customers to justify moving away from its hardware. Nvidia said on the call that two of its existing partners, Microsoft and CoreWeave, will be among the first companies to begin offering services powered by Rubin chips later this year. Two major AI data centers that Microsoft is currently building in Georgia and Wisconsin will eventually include thousands of Rubin chips, Nvidia added. Some of Nvidia's partners have already started running their next-generation AI models on early Rubin systems, the company said. The semiconductor giant also said it's working with Red Hat, which makes open source enterprise software for banks, automakers, airlines, and government agencies, to offer more products that will run on the new Rubin chip system. Nvidia's latest chip platform is named after Vera Rubin, an American astronomer who reshaped how scientists understand the properties of galaxies. The system includes six different chips, including the Rubin GPU and an Vera CPU, both of which are built using Taiwan Semiconductor Manufacturing Company's 3 nanometer fabrication process and the most advanced bandwidth memory technology currently available. Nvidia's sixth-generation interconnect and switching technologies link the various chips together. Each part of this chip system is "completely revolutionary and the best of its kind," Huang proclaimed during the company's CES press conference. Nvidia has been developing the Rubin system for years, and Huang first announced the chips were coming during a keynote speech in 2024. Last year, the company said that systems built on Rubin would begin arriving in the second half of 2026. It's unclear exactly what Nvidia means by saying that Vera Rubin is in "full production." Typically, production for chips this advanced -- which Nvidia is building with its longtime partner TSMC -- starts at low volume while the chips go through testing and validation and ramps up at a later stage.

[4]

The Verge

Nvidia launches Vera Rubin AI computing platform at CES 2026

Nvidia claims the Rubin GPU is capable of delivering five times as much AI training compute as Blackwell. The Vera Rubin architecture as whole can train a large "mixture of experts" (MOE) AI model in the same amount of time as Blackwell while using a quarter of the GPUs and at one-seventh the token cost. The Rubin launch was originally expected for late this year. Its early arrival today comes just a couple of months after Nvidia reported record high data center revenue, up 66 percent over the prior year. That growth was driven by demand for Blackwell and Blackwell Ultra GPUs, which have set a high bar for Rubin's success and served as a bellwether for the "AI bubble". Products and services running on Rubin will be available from Nvidia's partners starting in the second half of 2026.

[5]

ZDNet

Why Nvidia's new Rubin platform could change the future of AI computing forever

The first platforms will roll out to partners later in the year. The last several years have been stupendous for Nvidia. When generative AI became all the rage, demand for the tech giant's hardware skyrocketed as companies and developers scrambled for its graphics cards to train their large language models (LLMs). During CES 2026, Nvidia held a press conference to unveil its latest innovation in the AI space: the Rubin platform. Also: CES 2026 live updates: Biggest TV, smart glasses, phone news, and more we've seen so far Nvidia announced what the technology can do, and it's all pretty dense, so to keep things concise, I'm only focusing on the highlights. Rubin is an AI supercomputing platform designed to make "building, deploying, and securing the world's largest and most advanced AI systems at the lowest cost" possible. According to Nvidia, the platform can deliver up to a 10x reduction in inference token costs and requires four times fewer graphics cards to train mixture-of-experts (MoE) models compared to the older Blackwell platform. The easiest way to think about Nvidia Rubin is to imagine Blackwell, but on a much grander scale. The goal with Rubin is to accelerate mainstream adoption of advanced AI models, particularly in the consumer space. One of the biggest hurdles holding back widespread adoption of LLMs is cost. As models grow larger and more complex, the hardware and infrastructure required to train and support the models become astronomically expensive. By sharply reducing those token costs via Rubin, Nvidia hopes to make large-scale AI deployment more practical. Also: Nvidia's physical AI models clear the way for next-gen robots - here's what's new Nvidia said that it used an "extreme codesign" approach when developing the Rubin platform, creating a single AI supercomputer made up of six integrated chips. At the center is an Nvidia Vera CPU, an energy-efficient processor for large-scale AI factories, built with 88 custom Olympus cores, full Armv9.2 compatibility, and fast NVLink-C2C connectivity to deliver high performance. Working alongside the CPU is the Nvidia Rubin GPU, serving as the platform's primary workhorse. Sporting a third-generation Transform Engine, it is capable of delivering up to 50 petaflops of NVFP4 computational power. Connecting everything together is the Nvidia NVLink 6 Switch, enabling ultra-fast GPU-to-GPU communication. Nvidia's ConnectX-9 SuperNIC handles high-speed networking, while the Bluefield-4 DPU offloads some of the workload from the CPU and GPU so they focus more on AI models. Rounding everything out is the company's Spectrum-6 Ethernet switch to provide next-gen networking for AI data centers. Also: The most exciting AI wearable at CES 2026 might not be smart glasses after all The Rubin will be available in multiple configurations, such as the Nvidia Vera Rubin NVL72. This combines 36 Nvidia Vera CPUs, 72 Nvidia Rubin GPUs, an Nvidia NVLink 6 switch, multiple Nvidia ConnectX-9 SuperNICs, and Nvidia BlueField-4 DPUs. Judging from all the news, I don't think these supercomputing platforms will be something that the average person can buy from Best Buy. Nvidia said that the first of these Rubin platforms will roll out to partners sometime in the second half of 2026. Among the first will be Amazon Web Services, Google Cloud, and Microsoft. If Nvidia's gamble pays off, these computers could usher in a new era of AI computing where scale is much more manageable.

[6]

Tom's Hardware

Nvidia's focus on rack-scale AI systems is a portent for the year to come -- Rubin points the way forward for company, as data center business booms

Those who tuned into Nvidia's CES keynote on January 5 may have found themselves waiting for a familiar moment that never arrived. There was no GeForce reveal and no tease of the next RTX generation. For the first time in roughly five years, Nvidia stood on the CES stage without a new GPU announcement to anchor the show. That absence was no accident. Rather than refresh its graphics lineup, Nvidia used CES 2026 to talk about the Vera Rubin platform and launch its flagship NVL72 AI supercomputer, both slated for production in the second half of 2026 -- a reframing of what Nvidia now considers its core product. The company is no longer content to sell accelerators one card at a time; it is selling entire AI systems instead. Vera Rubin is not being positioned as a conventional GPU generation, even though it includes a new GPU architecture. Nvidia describes it as a rack-scale computing platform built from multiple classes of silicon that are designed, validated, and deployed together. At its center are Rubin GPUs and Vera CPUs, joined by NVLink 6 interconnects, BlueField 4 DPUs, and Spectrum 6 Ethernet switches. Each rack integrates 72 Rubin GPUs and 36 Vera CPUs into a single logical system. Nvidia says each Rubin GPU can deliver up to 50 PFLOPS of NVFP4 compute for AI inference using low-precision formats, roughly five times the throughput of its Blackwell predecessor in similar inference workloads. Memory capacity and bandwidth scale accordingly, with HBM4 pushing hundreds of gigabytes per GPU and aggregate rack bandwidth measured in the hundreds of terabytes per second. These monolithic Vera Rubin systems are designed to reduce the cost of inference by an order of magnitude compared with Blackwell-based deployments. That claim rests on several pillars: higher utilization through tighter coupling, reduced communication overhead via NVLink 6, and architectural changes that target the realities of large language models rather than traditional HPC workloads. One of those changes is how Nvidia handles model context. BlueField 4 DPUs introduce a shared memory tier for long-context inference, storing key-value data outside the GPU frame buffer and making it accessible across the rack. As models push toward million-token context windows, memory access and synchronization increasingly dominate runtime. Nvidia is seems to be taking the view that treating context as a first-class system resource, rather than a per-GPU issue, will unlock more consistent scaling. This emphasis on pre-integrated systems reflects how Nvidia's largest customers now buy hardware. Hyperscalers and AI labs deploy accelerators in standardized blocks, often measured in racks or data halls rather than individual cards. By delivering those blocks as finished products, Nvidia shortens deployment timelines and reduces the tuning work customers must do themselves. CES became the venue to outline that vision, even if it meant leaving traditional GPU announcements off the agenda. Suddenly, the lack of a new GeForce announcement becomes a whole lot easier to explain. Nvidia's current consumer line-up of 50-series GPUs is still pretty fresh, and it continues to command prices in excess of $3,500 per unit. Introducing an interim refresh would carry higher costs at a time when memory pricing is at all-time highs, and supply remains tight. The company has also leaned more heavily on software updates, particularly DLSS and other AI-assisted rendering techniques, to extend the useful life of existing GPUs. From a purely commercial perspective, consumer GPUs now represent a smaller (and, unfortunately, shrinking) share of Nvidia's revenue and focus than they did even two years ago, let alone five. Data center products tied to AI training and inference account for the majority of growth, and those customers need system-level gains, not incremental improvements in graphics performance. Lisa Su, during her AMD keynote, said it best: "There's never been a technology like AI." CES, once a showcase for new PC hardware, has become a stage for AI announcements. This does not mean Nvidia -- or AMD for that matter -- is abandoning gaming or professional graphics. Rather, it suggests a lengthening cadence between major GPU architectures. When the next GeForce generation arrives, it is likely to incorporate lessons from Rubin, particularly around memory hierarchy and interconnect efficiency, rather than simply increasing shader counts. Nvidia's system-centric approach inevitably invites comparison with rivals pursuing similar strategies. AMD is pairing its Instinct accelerators with EPYC CPUs in tightly coupled server designs, while Intel is attempting to unify CPUs, GPUs, and accelerators under a common programming model. Apple has taken vertical integration even further in consumer devices, designing CPUs, GPUs, and neural engines as a single system on a chip. What distinguishes Nvidia is the depth of its software stack. CUDA, TensorRT, and the company's AI frameworks remain deeply entrenched in research and production environments. By extending that stack everywhere it can, Nvidia increases the switching cost for customers who might otherwise consider alternative silicon. There are risks to this approach, and large customers are increasingly exploring in-house accelerators to reduce dependence on a single vendor, and complex rack-scale systems raise the stakes for manufacturing or design issues. Because of this, Nvidia's ability to deliver Rubin on schedule will matter just as much as the performance metrics presented at CES. Still, the decision to use CES 2026 to spotlight Vera Rubin rather than a new GPU points to where Nvidia sees its future. Let's face it: We, and Nvidia, all know that the next phase of computing will be defined less by individual chips and more by how effectively those chips are integrated into scalable systems. Nvidia is therefore aligning itself with where the demand and investment are, even if that means placing less emphasis on the hardware that defined the company for decades.

[7]

Bloomberg

Nvidia CEO Says New Rubin Chips Are on Track, Helping Speed AI

The Rubin processor is 3.5 times better at training and five times better at running AI software than its predecessor, Blackwell, and customers including Microsoft will be among the first to deploy the new hardware in the second half of the year. Nvidia Corp. Chief Executive Officer Jensen Huang said that the company's highly anticipated Rubin data center processors are in production and customers will soon be able to try out the technology. All six of the chips for a new generation of computing equipment -- named after astronomer Vera Rubin -- are back from manufacturing partners and on track for deployment by customers in the second half of the year, Huang said at the CES trade show in Las Vegas Monday. "Demand is really high," he said. The growing complexity and uptake of artificial intelligence software is placing a strain on existing computer resources, creating the need for much more, Huang said. Nvidia, based in Santa Clara, California, is seeking to maintain its edge as the leading maker of artificial intelligence accelerators, the chips used by data center operators to develop and run AI models. Some on Wall Street have expressed concern that competition is mounting for Nvidia -- and that AI spending can't continue at its current pace. Data center operators also are developing their own AI accelerators. But Nvidia has maintained bullish long-term forecasts that point to a total market in the trillions of dollars. Rubin is Nvidia's latest accelerator and is 3.5 times better at training and five times better at running AI software than its predecessor, Blackwell, the company said. A new central processing unit has 88 cores -- the key data-crunching elements -- and provides twice the performance of the component that it's replacing. The company is giving details of its new products earlier in the year than it typically does -- part of a push to keep the industry hooked on its hardware, which has underpinned an explosion in AI use. Nvidia usually dives into product details at its spring GTC event in San Jose, California. Even while talking up new offerings, Nvidia said previous generations of products are still performing well. The company also has seen strong demand from customers in China for the H200 chip that the Trump administration has said it will consider letting the chipmaker ship to that country. License applications have been submitted, and the US government is deciding what it wants to do with them, Chief Financial Officer Colette Kress told analysts. Regardless of the level of license approval, Kress said, Nvidia has enough supply to serve customers in the Asian nation without affecting the company's ability to ship to customers elsewhere in the world. For Huang, CES is yet another stop on his marathon run of appearances at events, where he's announced products, tie-ups and investments all aimed at adding momentum to the deployment of AI systems. His counterpart at Nvidia's closest rival, Advanced Micro Devices Inc.'s Lisa Su, was slated to give a keynote presentation at the show later Monday. Get the Tech Newsletter bundle. Get the Tech Newsletter bundle. Get the Tech Newsletter bundle. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg may send me offers and promotions. Plus Signed UpPlus Sign UpPlus Sign Up By submitting my information, I agree to the Privacy Policy and Terms of Service. The new hardware, which also includes networking and connectivity components, will be part of its DGX SuperPod supercomputer while also being available as individual products for customers to use in a more modular way. The step-up in performance is needed because AI has shifted to more specialized networks of models that not only sift through massive amounts of inputs but need to solve particular problems through multistage processes. The company emphasized that Rubin-based systems will be cheaper to run than Blackwell versions because they'll return the same results using smaller numbers of components. Microsoft Corp. and other large providers of remote computing will be among the first to deploy the new hardware in the second half of the year, Nvidia said. For now, the majority of spending on Nvidia-based computers is coming from the capital expenditure budgets of a handful of customers, including Microsoft, Alphabet Inc.'s Google Cloud and Amazon.com Inc.'s AWS. Nvidia is pushing software and hardware aimed at broadening the adoption of AI across the economy, including robotics, health care and heavy industry. As part of that effort, Nvidia announced a group of tools designed to accelerate development of autonomous vehicles and robots.

[8]

The Register

Nvidia unpacks Vera Rubin rack system at CES

CES used to be all about consumer electronics, TVs, smartphones, tablets, PCs, and - over the last few years - automobiles. Now, it's just another opportunity for Nvidia to peddle its AI hardware and software -- in particular its next-gen Vera Rubin architecture. The AI arms dealer boasts that, compared to Blackwell, the chips will deliver up to 5x higher floating point performance for inference, 3.5x for training, along with 2.8x more memory bandwidth and an NvLink interconnect that's now twice as fast. But don't get too excited just yet. It's not like the chips are launching earlier than previously expected. They're still expected to arrive in the second half of the year, just like Blackwell and Blackwell Ultra did. Nvidia normally holds off until GTC in March to reveal its next-gen chips. Perhaps AMD's aggressive rack scale roadmap has Nvidia's CEO Jensen Huang nervous. Announced at Advancing AI late last spring and expected later this year, AMD's double-wide Helios racks promise to deliver performance on par with Vera Rubin NVL72 while offering customers 50 percent more HBM4. Nvidia has also been teasing the Vera Rubin platform for nearly a year now, to the point where there's not much we didn't already know about the platform. But even though you won't be able to get your hands on Rubin for a few more months, it's never too early for a closer look at what the multi-million dollar machines will buy you. The flagship system for Nvidia's Vera Rubin CPU and GPU architectures is once again its NVL72 rack systems. At first blush, the machine doesn't look all that different from its Blackwell and Blackwell Ultra-based siblings. But under the hood, Nvidia has been hard at work refining the architecture for better serviceability and telemetry. Switch trays can now be serviced without taking down the machine first. Nvidia also has new reliability, availability, and serviceability features which enable customers to check in on the health of the GPUs without dropping them from the cluster first. These health checks can now run between training checkpoints or jobs, Ian Buck, Nvidia's VP and General Manager of Hyperscale and HPC, tells El Reg. At the heart of the rack is the Vera Rubin superchip, which, if history tells us anything, should bear the VR200 code name. Much like Blackwell, the Vera Rubin superchip features two dual-die Rubin GPUs, each capable of churning out 50 petaFLOPS of inference performance or 35 petaFLOPS for training. Both of those numbers refer to peak performance achievable when using NVLFP4 data type. According to Buck, for this generation, Nvidia is using a new adaptive compression technique that's better suited to generative AI and mixture of experts (MoE) model inference to achieve the 50 petaFLOP claim rather than structured sparsity. As you may recall, while structured sparsity did have benefits for certain workloads, it didn't offer many if any advantages for LLM inference. We've asked Nvidia about higher precision data types, like FP8 and BF16 which remain relevant for vision language model inference, image generation, fine tuning, and training workloads; we'll let you know if we hear back. The GPUs are fed by 288 GB of HBM4 memory -- 576 GB per superchip -- which, despite delivering the same capacity as the Blackwell Ultra-based GB300, is now 2.8x faster at 22 TB/s per socket (44 TB/s per superchip). If that number seems a little high, that's because Nvidia initially targeted 13 TB/s of HBM4 bandwidth when it first teased Rubin last year. Buck tells us that the jump to 22 TB/s was attained entirely through silicon and doesn't rely on techniques like memory compression. The two Rubin GPUs are paired to Nvidia's new Vera CPU via a 1.8 TB/s NvLink-C2C interconnect. The CPU contains 88 of Nvidia's custom Arm-based Olympus cores and is paired with 1.5 TB of LPDDR5x memory -- 3x that of the GB200. We guess we know why memory is in such short supply these days. Actually it's more complicated than that, but this certainly isn't helping the situation. However, one of the most important features Vera brings to the table is support for confidential computing across the system's NvLink domain, something that previously was only available on x86-based HGX systems. Nvidia's Vera Rubin NVL72 racks feature 72 Rubin GPUs, 20.7 TB of HBM4, 36 Vera CPUs, 54 TB of LPDDR5x which are spread across 18 compute blades interconnected by nine NvSwitch 6 blades which deliver 3.6 TB/s of bandwidth to each GPU -- twice that of last gen. Nvidia isn't ready to say how much power that additional compute and bandwidth will require. However, Buck tells us that while it will be higher, we shouldn't expect power to double. If you're scratching your head wondering "didn't Nvidia say this thing was supposed to have 144 GPUs?" you wouldn't be the only one. At GTC 2025, Huang announced that they were changing the way they counted GPUs from the package to the dies on board. In that sense, the Blackwell-based NVL72s also had 144 GPUs, but Nvidia was going to wait for Vera Rubin to make the switch to the new convention. It seems Nvidia has since changed its mind and is sticking with the established naming convention. Having said that, we may yet see Nvidia racks with at least 144 GPUs on board before long. The Rubin CPUs we've talked about up to this point actually are one of two accelerators announced so far. Rubin CPX is the other. Unveiled in September, the chip is a more niche product, designed specifically to accelerate the compute-intense prefill phase of LLM inference. Since prefill isn't bandwidth-bound, CPX doesn't need HBM and can instead make do with slower DRAM. Each CPX accelerator will be capable of churning out 30 petaFLOPS of NVFP4 compute and will sport 128 GB of GDDR7 memory. In a graphic shared this summer, Nvidia showed an NVL144 CPX blade with four 288 GB Rubin SXM modules and eight Rubin CPX prefill accelerators for a total of 12 GPUs per node. The complete rack system would only need 12 compute blades for the thing to have 144 GPUs, though only 48 of them would be connected via NVLink. As with past Nvidia rack systems, eight NVL72 racks form a SuperPOD with the GPU slinger's Spectrum-X Ethernet and/or Quantum-X InfiniBand the glue used to stitch them together. Multiple SuperPODS can then be combined to form larger compute environments for training or distributed inference. If you aren't ready to make the switch to Nvidia's rack-scale kit, don't worry. Eight-way (NVL8) HGX systems based around the Rubin platform are still available, but we're told liquid cooling is no longer a suggestion, but a requirement. These smaller systems, 64 to be exact, can also be combined to form a SuperPOD with 512 GPUs -- just shy of the of the more powerful NVL72 SuperPOD at 576. For this generation, Nvidia also has two new NICs, which it teased on a few occasions over the last year. At GTC DC, Nvidia showed off the ConnectX-9, a 1.6 Tbps "superNIC" designed for high-speed distributed computing, which we sometimes call the backend network. For storage, management, and security, Nvidia is pushing its BlueField-4 data processing units (DPUs), which feature an integrated 800 Gbps ConnectX-9 NIC and a 64-core Grace CPU on board. This, we should note, isn't the same Grace CPU found in the GB200, but a newer version based on Arm's Neoverse V3 core architecture. The beefier CPU is designed to offload software defined networking, storage, security, and can also run hypervisors for virtualized environments. Cramming 64 Grace cores onto a NIC might seem like overkill, but Nvidia has a specific reason for wanting that much compute hanging off the machine like a computer in front of a computer. Alongside all its shiny new hardware, Nvidia showed off what it's describing as a "new class of memory between the GPU and storage," designed to offload key value (KV) caches. The basic idea isn't new. KV caches store the model's state. You can think of this like its short-term memory. Calculating the key value vectors is one of the more compute-intensive aspects of the inference. Because inference workloads often involve passing over the same info multiple times, it makes sense to cache the computed vectors in memory. By doing this, only changes need to be computed and data in the cache can be reused. This sounds simple, but, in practice, KV caches can be quite large, easily consuming tens of gigabytes in order to keep track of 100,000 or so tokens. That might sound like a lot, but a single user running a code assistant or agent can blow through that rather quickly. As we understand it, Nvidia's Inference Context Storage platform will work with storage platforms from multiple partner vendors, and will take advantage of the BlueField-4 DPU, NIXL GPU direct storage libraries, and optimize KV cache offloading for maximum performance and efficiency. Combined with technologies like Rubin CPX, this kind of high-performance KV offloading should allow the GPUs to spend more time generating tokens and less time waiting on data to be shuffled about and recomputed. Nvidia's decision to "launch" Rubin -- again it isn't actually shipping in volume yet -- betrays an increasingly competitive compute landscape. As we mentioned earlier, AMD's Helios rack systems promise to deliver floating point performance roughly equivalent to Nvidia's Vera Rubin NVL72 at 2.9 exaFLOPS versus 2.5-3.6 exaFLOPS of FP4, respectively. For applications that can't take advantage of Nvidia's adaptive compression tech, Helios is, at least on paper, faster. However, with Nvidia planning to ship faster memory on Rubin than initially planned, AMD no longer has a bandwidth advantage. It does still have a capacity lead with 432 GB of HBM4 per GPU socket compared to 288 GB on Rubin. In theory, this should allow the AMD-based system to serve 50 percent larger MoE models on a single double-wide rack. In practice, the real-world performance is going to depend heavily on how well tunneling Ultra Accelerator Link (UALink) over Broadcom's Tomahawk 6 Ethernet switches actually works. AMD's MI450-series GPUs appear very well positioned to compete against Rubin, but as we've seen repeated with Amazon and Google, the ability to scale that compute often makes a bigger difference than the chip's individual performance. AMD is also having to play catch up on the software ecosystem front. The company's HIP and ROCm libraries have certainly come a long way since the MI300X made its debut at the end of 2023, but the company still has a ways to go. Nvidia certainly isn't making the situation any easier for AMD. At CES, the GPU giant unveiled a slew of new software frameworks aimed at enterprises, robotics devs, and the automotive industry. This includes the development of new foundation models for domain specific applications like retrieval augmented generation, safety, speech, and autonomous driving. The latter, called Alpamayo, is a relatively small "reasoning vision language action" model designed to help level-4 autonomous vehicles better handle unique and fast evolving road conditions. Level-4 capable vehicles are capable of driving fully autonomously, unsupervised driving in specific environments, like high-ways or urban environments. Nvidia's autonomous driving stack is due to hit US roads late this year with the level-2++ capable Mercedes Benz CLA. This class of autonomous vehicle is capable of driving itself in similar conditions as level-4, but requires the supervision of a human operator. With Nvidia kicking off the New Year with Rubin -- a chip we hadn't expected to get a good look at for another three months -- we're left to wonder what we'll see at GTC, which is slated to run from March 16-19 in San Jose, California. In addition to the regular mix of software libraries and foundation models, we expect to get a lot more details on the Kyber racks that'll underpin the company's Vera Rubin Ultra platform starting in 2027. As you might have noticed, Nvidia, AMD, AWS, and others have gotten in the habit of pre-announcing products well in advance of them shipping or becoming generally available. As the saying goes: enterprises don't buy products, they buy roadmaps. In this case, however, it's really about ensuring they have somewhere to put them. Nvidia's Kyber racks are expected to pull 600 kilowatts of power which means datacenter operators need to start preparing now, if they want to deploy them on day one. We don't yet have a full picture of what Vera Rubin Ultra will offer, but we know it'll feature four reticle-sized Rubin Ultra GPUs, 1TB of HBM4e, and will deliver 100 petaFLOPS of FP4 performance. As things currently stand, Nvidia plans to cram 144 of these GPU packages (576 GPU dies) into a single NvLink domain which is expected to deliver 15 exaFLOPS of FP4 inference performance or 10 exaFLOPS for training. ®

[9]

Tom's Hardware

Nvidia launches Vera Rubin NVL72 AI supercomputer at CES -- promises up to 5x greater inference performance and 10x lower cost per token than Blackwell, coming 2H 2026

AI is everywhere at CES 2026, and Nvidia GPUs are at the center of the expanding AI universe. Today, during his CES keynote, CEO Jensen Huang shared his plans for how the company will remain at the forefront of the AI revolution as the technology reaches far beyond chatbots into robotics, autonomous vehicles, and the broader physical world. First up, Huang officially launched Vera Rubin, Nvidia's next-gen AI data center rack-scale architecture. Rubin is the result of what the company calls "extreme co-design" across six types of chips: the Vera CPU, the Rubin GPU, the NVLink 6 switch, the ConnectX-9 SuperNIC, the BlueField-4 data processing unit, and the Spectrum-6 Ethernet switch. Those building blocks all come together to create the Vera Rubin NVL72 rack. Demand for AI compute is insatiable, and each Rubin GPU promises much more of it for this generation: 50 PFLOPS of inference performance with the NVFP4 data type, 5x that of Blackwell GB200, and 35 PFLOPS of NVFP4 training performance, 3.5x that of Blackwell. To feed those compute resources, each Rubin GPU package has eight stacks of HBM4 memory delivering 288GB of capacity and 22 TB/s of bandwidth. Per-GPU compute is just one building block in the AI data center. As leading large language models have shifted from dense architectures that activate every parameter to produce a given output token to mixture-of-experts (MoE) architectures that only activate a portion of the available parameters per token, it has become possible to scale up those models relatively efficiently. However, communication among those experts within models requires vast amounts of inter-node bandwidth. Vera Rubin introduces NVLink 6 for scale-up networking, which boosts per-GPU fabric bandwidth to 3.6 TB/s (bi-directional). Each NVLink 6 switch boasts 28 TB/s of bandwidth, and each Vera Rubin NVL72 rack has nine of these switches for 260 TB/s of total scale-up bandwidth. The Nvidia Vera CPU implements 88 custom Olympus Arm cores with what Nvidia calls "spatial multi-threading," for up to 176 threads in flight. The NVLink C2C interconnect used to coherently connect the Vera CPU to the Rubin GPUs has doubled in bandwidth, to 1.8 TB/s. Each Vera CPU can address up to 1.5 TB of SOCAMM LPDDR5X memory with up to 1.2 TB/s of memory bandwidth. To scale out Vera Rubin NVL72 racks into DGX SuperPods of eight racks each, Nvidia is introducing a pair of Spectrum-X Ethernet switches with co-packaged optics, all built up from its Spectrum-6 chip. Each Spectrum-6 chip offers 102.4 Tb/s of bandwidth, and Nvidia is offering it in two switches. The SN688 boasts 409.6 Tb/s of bandwidth for 512 ports of 800G Ethernet or 2048 ports of 200G. The SN6810 offers 102.4 Tb/s of bandwidth that can be channeled into 128 ports of 800G or 512 ports of 200G Ethernet. Both of these switches are liquid-cooled, and Nvidia claims they're more power-efficient, more reliable, and offer better uptime, presumably against hardware that lacks silicon photonics. As context windows grow to millions of tokens, Nvidia says that operations on the key-value cache that holds the history of interactions with an AI model become the bottleneck for inference performance. To break through that bottleneck, Nvidia is using its next-gen BlueField 4 DPUs to create what it calls a new tier of memory: the Inference Context Memory Storage Platform. The company says this tier of storage is meant to enable efficient sharing and reuse of key-value cache data across AI infrastructure, resulting in better responsiveness and throughput and predictable, power-efficient scaling of agentic AI architectures. For the first time, Vera Rubin also expands Nvidia's trusted execution environment to the entire rack by securing the chip, fabric, and network level, which Nvidia says is key to ensuring secrecy and security for AI frontier labs' precious state-of-the-art models. All told, each Vera Rubin NVL72 rack offers 3.6 exaFLOPS of NVFP4 inference performance, 2.5 exaFLOPS of NVFP4 training performance, 54 TB of LPDDR5X memory connected to the Vera CPUs, and 20.7 TB of HBM4 offering 1.6 PB/s of bandwidth. To keep those racks productive, Nvidia highlighted several reliability, availability, and serviceability (RAS) improvements at the rack level, such as a cable-free modular tray design that enables much quicker swapping of components versus prior NVL72 racks, improved NVLink resiliency that allows for zero-downtime maintenance, and a second-generation RAS engine that allows for zero-downtime health checks. All of this raw compute and bandwidth is impressive on its face, but the total cost of ownership picture is likely most important to Nvidia's partners as they ponder massive investments in future capacity. With Vera Rubin, Nvidia says it takes only 1/4 the number of GPUs to train MoE models versus Blackwell, and that Rubin can cut the cost per token for MoE inference down by as much as 10x across a broad range of models. If we invert those figures, it suggests that Rubin can also increase training throughput and deliver vastly more tokens in the same rack space. Nvidia says it's gotten all six of the chips it needs to build Vera Rubin NVL72 systems back from the fabs and that it's pleased with the performance of the workloads it's running on them. The company expects that it will ramp into volume production of Vera Rubin NVL72 systems in the second half of 2026, which remains consistent with its past projections regarding Rubin availability.

[10]

TechSpot

Nvidia unveils Vera Rubin early, signaling a faster AI hardware cycle

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Looking ahead: Nvidia kicked off the year with an unusual move: unveiling its next-generation AI computing architecture months ahead of schedule. At CES 2026 in Las Vegas, CEO Jensen Huang used his keynote to introduce the company's Vera Rubin server systems - a clear signal that Nvidia intends to press its advantage as demand for ever-larger AI models accelerates. The Rubin launch, now slated for mid-2026 availability, marks a shift in Nvidia's traditional rollout cadence. The company typically reserves major chip announcements for its spring developer conference, but Huang said the pace of AI development is forcing the entire semiconductor industry to move faster. "The amount of computing necessary for AI is skyrocketing," Huang told the audience. "The race is on for AI. Everyone is trying to get to the next frontier." Nvidia claims the Rubin GPU delivers roughly five times the training compute of Blackwell. Vera Rubin represents Nvidia's largest architectural leap since the Blackwell generation. Rather than a single chip, the platform is built around a tightly integrated system of six components: the Vera CPU, Rubin GPU, a sixth-generation NVLink switch, ConnectX-9 networking, the BlueField-4 data processing unit, and the Spectrum-X 102.4-terabit-per-second co-packaged optical interconnect. Nvidia executives describe the result as "six chips that make one AI supercomputer." Each part is designed to reduce bottlenecks across both AI training and inference. Nvidia claims the Rubin GPU delivers roughly five times the training compute of Blackwell. When applied to large mixture-of-experts models - now a standard approach for frontier-scale systems - the company says Rubin can match Blackwell's training time using one-quarter the number of GPUs and at roughly one-seventh the cost per processed token. Huang framed the architecture as a response to deeper shifts in how AI workloads are evolving, particularly around inference. In his view, inference is no longer a simple pattern-matching task, but a "thinking process," as models increasingly need to reason over long sequences, multiple data types, and real-world context. That idea feeds into Nvidia's broader vision of simulation-driven AI, where virtual environments train systems to operate in the physical world. The Vera Rubin platform is designed to support the massive compute and memory demands of these workloads, especially for robotics, autonomous vehicles, and digital twins. Huang said Nvidia's goal is to deliver "the entire stack," from silicon to networking to software, so developers can focus on building applications rather than stitching infrastructure together. The announcement also underscores how far Nvidia has expanded beyond GPUs alone. With Rubin, the company has fused compute, networking, memory, and security into a single rack-scale platform, aiming to eliminate the bottlenecks that increasingly define AI performance. Huang argued that this level of integration effectively positions Nvidia as both the world's largest networking hardware company and the top chipmaker for AI computing. For inference tasks, Rubin promises a 10-fold cost reduction compared with Blackwell, according to Nvidia. The platform supports third-generation confidential computing and will be the first rack-scale trusted computing system upon full deployment. The early unveiling follows Nvidia's record data center revenue, which rose 66% year-over-year in the last quarter, driven largely by demand for Blackwell and Blackwell Ultra GPUs. That success has set high expectations for Rubin. Analysts view the ahead-of-schedule announcement as a signal that development and manufacturing remain on track, and that Nvidia intends to move quickly as the next wave of AI infrastructure spending ramps up.

[11]

NVIDIA

NVIDIA DGX SuperPOD Sets the Stage for Rubin-Based Systems

NVIDIA DGX Rubin systems unify the latest NVIDIA breakthroughs in compute, networking and software to deliver up to 10x reduction in inference token cost compared with the NVIDIA Blackwell platform -- accelerating any AI workload, from inference and training to long-context reasoning. NVIDIA DGX SuperPOD is paving the way for large-scale system deployments built on the NVIDIA Rubin platform -- the next leap forward in AI computing. At the CES trade show in Las Vegas, NVIDIA today introduced the Rubin platform, comprising six new chips designed to deliver one incredible AI supercomputer, and engineered to accelerate agentic AI, mixture‑of‑experts (MoE) models and long‑context reasoning. The Rubin platform unites six chips -- the NVIDIA Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet Switch -- through an advanced codesign approach that accelerates training and reduces the cost of inference token generation. DGX SuperPOD remains the foundational design for deploying Rubin‑based systems across enterprise and research environments. The NVIDIA DGX platform addresses the entire technology stack -- from NVIDIA computing to networking to software -- as a single, cohesive system, removing the burden of infrastructure integration and allowing teams to focus on AI innovation and business results. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Jensen Huang, founder and CEO of NVIDIA. New Platform for the AI Industrial Revolution The Rubin platform used in the new DGX systems introduces five major technology advancements designed to drive a step‑function increase in intelligence and efficiency: * Sixth‑Generation NVIDIA NVLink -- 3.6TB/s per GPU and 260TB/s per Vera Rubin NVL72 rack for massive MoE and long‑context workloads. * NVIDIA Vera CPU -- 88 NVIDIA custom Olympus cores, full Armv9.2 compatibility and ultrafast NVLink-C2C connectivity for industry-leading efficient AI factory compute. * NVIDIA Rubin GPU -- 50 petaflops of NVFP4 compute for AI inference featuring a third-generation Transformer Engine with hardware‑accelerated compression. * Third‑Generation NVIDIA Confidential Computing -- Vera Rubin NVL72 is the first rack-scale platform delivering NVIDIA Confidential Computing, which maintains data security across CPU, GPU and NVLink domains. * Second‑Generation RAS Engine -- Spanning GPU, CPU and NVLink, the NVIDIA Rubin platform delivers real-time health monitoring, fault tolerance and proactive maintenance, with modular cable-free trays enabling 3x faster servicing. Together, these innovations deliver up to 10x reduction in inference token cost of the previous generation -- a critical milestone as AI models grow in size, context and reasoning depth. DGX SuperPOD: The Blueprint for NVIDIA Rubin Scale‑Out Rubin-based DGX SuperPOD deployments will integrate: * NVIDIA DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems * NVIDIA BlueField‑4 DPUs for secure, software‑defined infrastructure * NVIDIA Inference Context Memory Storage Platform for next-generation inference * NVIDIA ConnectX‑9 SuperNICs * NVIDIA Quantum‑X800 InfiniBand and NVIDIA Spectrum‑X Ethernet * NVIDIA Mission Control for automated AI infrastructure orchestration and operations NVIDIA DGX SuperPOD with DGX Vera Rubin NVL72 unifies eight DGX Vera Rubin NVL72 systems, featuring 576 Rubin GPUs, to deliver 28.8 exaflops of FP4 performance and 600TB of fast memory. Each DGX Vera Rubin NVL72 system -- combining 36 Vera CPUs, 72 Rubin GPUs and 18 BlueField‑4 DPUs -- enables a unified memory and compute space across the rack. With 260TB/s of aggregate NVLink throughput, it eliminates the need for model partitioning and allows the entire rack to operate as a single, coherent AI engine. NVIDIA DGX SuperPOD with DGX Rubin NVL8 systems delivers 64 DGX Rubin NVL8 systems featuring 512 Rubin GPUs. NVIDIA DGX Rubin NVL8 systems bring Rubin performance into a liquid-cooled form factor with x86 CPUs to give organizations an efficient on-ramp to the Rubin era for any AI project in the develop‑to‑deploy pipeline. Powered by eight NVIDIA Rubin GPUs and sixth-generation NVLink, each DGX Rubin NVL8 delivers 5.5x NVFP4 FLOPS compared with NVIDIA Blackwell systems. Next‑Generation Networking for AI Factories The Rubin platform redefines the data center as a high-performance AI factory with revolutionary networking, featuring NVIDIA Spectrum-6 Ethernet switches, NVIDIA Quantum-X800 InfiniBand switches, BlueField-4 DPUs and ConnectX-9 SuperNICs, designed to sustain the world's most massive AI workloads. By integrating these innovations into the NVIDIA DGX SuperPOD, the Rubin platform eliminates the traditional bottlenecks of scale, congestion and reliability. Optimized Connectivity for Massive-Scale Clusters The next-generation 800Gb/s end-to-end networking suite provides two purpose-built paths for AI infrastructure, ensuring peak efficiency whether using InfiniBand or Ethernet: * NVIDIA Quantum-X800 InfiniBand: Delivers the industry's lowest latency and highest performance for dedicated AI clusters. It utilizes Scalable Hierarchical Aggregation and Reduction Protocol (SHARP v4) and adaptive routing to offload collective operations to the network. * NVIDIA Spectrum-X Ethernet: Built on the Spectrum-6 Ethernet switch and ConnectX-9 SuperNIC, this platform brings predictable, high-performance scale-out and scale-across connectivity to AI factories using standard Ethernet protocols, optimized specifically for the "east-west" traffic patterns of AI workloads. Engineering the Gigawatt AI Factory These innovations represent an extreme codesign with the Rubin platform. By mastering congestion control and performance isolation, NVIDIA is paving the way for the next wave of gigawatt AI factories. This holistic approach ensures that as AI models grow in complexity, the networking fabric of the AI factory remains a catalyst for speed rather than a constraint. NVIDIA Software Advances AI Factory Operations and Deployments NVIDIA Mission Control -- AI data center operation and orchestration software for NVIDIA Blackwell-based DGX systems -- will be available for Rubin-based NVIDIA DGX systems to enable enterprises to automate the management and operations of their infrastructure. NVIDIA Mission Control accelerates every aspect of infrastructure operations, from configuring deployments to integrating with facilities to managing clusters and workloads. With intelligent, integrated software, enterprises gain improved control over cooling and power events for NVIDIA Rubin, as well as infrastructure resiliency. NVIDIA Mission Control enables faster response with rapid leak detection, unlocks access to NVIDIA's latest efficiency innovations and maximizes AI factory productivity with autonomous recovery. NVIDIA DGX systems also support the NVIDIA AI Enterprise software platform, including NVIDIA NIM microservices, such as for the NVIDIA Nemotron-3 family of open models, data and libraries. DGX SuperPOD: The Road Ahead for Industrial AI DGX SuperPOD has long served as the blueprint for large‑scale AI infrastructure. The arrival of the Rubin platform will become the launchpad for a new generation of AI factories -- systems designed to reason across thousands of steps and deliver intelligence at dramatically lower cost, helping organizations build the next wave of frontier models, multimodal systems and agentic AI applications. NVIDIA DGX SuperPOD with DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems will be available in the second half of this year. See notice regarding software product information.

[12]

Gizmodo

Nvidia New Rubin Platform Shows Memory Is No Longer 'Afterthought' in AI

A boom in AI demand and the accompanying shortage in memory supply is all anyone in the industry is talking about. At CES 2026 in Las Vegas, Nevada, it was also at the heart of Nvidia's latest major product releases. On Monday, the company officially launched the Rubin platform, made up of six chips that combine into one AI supercomputer, which company officials claim is more efficient than the Blackwell models and boasts increases in compute and memory bandwidth. â€œRubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof,â€ Nvidia CEO Jensen Huang said in a press release. Rubin-based products will be available from Nvidia partners in the second half of 2026, company executives said, naming AWS, Anthropic, Google, Meta, Microsoft, OpenAI, Oracle, and xAI among the companies expected to adopt Rubin. â€œThe efficiency gains in the NVIDIA Rubin platform represent the kind of infrastructure progress that enables longer memory, better reasoning, and more reliable outputs," Anthropic CEO Dario Amodei said in the press release. GPUs have become an expensive and scarce commodity as the rapidly scaling data center projects drain the global memory chip supply. According to a recent report from Tom's Hardware, gigantic data center projects required roughly 40% of the global DRAM chip output. The shortage has gotten to such a point that it is causing price hikes in consumer electronics and is rumored to impact GPU prices as well. According to a report from South Korean news agency Newsis, chipmaker AMD is expected to raise the prices of some of its GPU offerings later this month, and Nvidia will allegedly follow suit in February. Nvidia's focus has been on evading this chip bottleneck. Just last month, the tech giant made its largest purchase ever with Groq, a chipmaker that specializes in inference. Now, with a product that promises high levels of inference and the ability to train complex models with fewer chips at a lower cost, the company might be hoping to ease some of those shortage-driven worries in the industry. Company executives shared that Rubin delivers up to ten times reduction in inference token costs and four times reduction in the number of GPUs used to train models that rely on an AI architecture called mixture of experts (MoE), like DeepSeek. To add on that, the company is also unveiling a new class of AI-native storage infrastructure designed specifically for inference, called Inference Context Memory Storage Platform. Agentic AI, the tech world's hot new thing for the last year or so, has put an increased importance on AI memory. Rather than simply responding to single questions, AI systems are now expected to remember much more information about earlier interactions to autonomously carry out some tasks, which means there is more data to be managed during the inference stage. The new platform aims to solve that by adding a new tier of memory for inference, to store some context data and extend the GPU's memory capacity. "The bottleneck is shifting from compute to context management," Nvidia's senior director of HPC and AI hyperscale infrastructure solutions Dion Harris said. "To scale, storage can no longer be an afterthought." "As inference scales to giga-scale, context becomes a first-class data type, and the new Nvidia inference context memory storage platform is ideally positioned to support it," Harris claimed. Time will tell if efficiency can successfully address some of the bottlenecks brought about by the intense chip demand. But even if the memory problem is resolved, the AI industry will continue to face other bottlenecks in its unprecedented growth, most notably via the immense strain that data centers put on the U.S. power grid.

[13]

Interesting Engineering

NVIDIA unveils Rubin six-chip system for next-gen AI at CES 2026

NVIDIA used the CES 2026 stage today to formally launch its new Rubin computing architecture, positioning it as the company's most advanced AI hardware platform to date. CEO Jensen Huang said Rubin has already entered full production and will scale further in the second half of the year, signaling NVIDIA's confidence in demand. Huang framed Rubin as a direct response to the explosive growth in AI workloads, particularly large-scale training and long-horizon reasoning tasks. He told the audience that AI computation must continue to rise at an unprecedented pace.

[14]

TechRadar

Nvidia stacks GPUs, CPUs, and DPUs in one rack to outcompute Huawei

Aggregate NVLink throughput reaches 260TB/s per DGX rack for efficiency At CES 2026, Nvidia unveiled its next-generation DGX SuperPOD powered by the Rubin platform, a system designed to deliver extreme AI compute in dense, integrated racks. According to the company, the SuperPOD integrates multiple Vera Rubin NVL72 or NVL8 systems into a single coherent AI engine, supporting large scale workloads with minimal infrastructure complexity. With liquid cooled modules, high speed interconnects, and unified memory, the system targets institutions seeking maximum AI throughput and reduced latency. Each DGX Vera Rubin NVL72 system includes 36 Vera CPUs, 72 Rubin GPUs, and 18 BlueField 4 DPUs, delivering a combined FP4 performance of 50 petaflops per system. Aggregate NVLink throughput reaches 260TB/s per rack, allowing the full memory and compute space to operate as a single coherent AI engine. The Rubin GPU incorporates a third generation Transformer Engine and hardware accelerated compression, allowing inference and training workloads to process efficiently at scale. Connectivity is reinforced by Spectrum-6 Ethernet switches, Quantum-X800 InfiniBand, and ConnectX-9 SuperNICs, which support deterministic high speed AI data transfer. Nvidia's SuperPOD design emphasizes end to end networking performance, ensuring minimal congestion in large AI clusters. Quantum-X800 InfiniBand delivers low latency and high throughput, while Spectrum-X Ethernet handles east west AI traffic efficiently. Each DGX rack incorporates 600TB of fast memory, NVMe storage, and integrated AI context memory to support both training and inference pipelines. The Rubin platform also integrates advanced software orchestration through Nvidia Mission Control, streamlining cluster operations, automated recovery, and infrastructure management for large AI factories. A DGX SuperPOD with 576 Rubin GPUs can achieve 28.8 Exaflops FP4, while individual NVL8 systems deliver 5.5x higher FP4 FLOPS than previous Blackwell architectures. By comparison, Huawei's Atlas 950 SuperPod claims 16 Exaflops FP4 per SuperPod, meaning Nvidia reaches higher efficiency per GPU and requires fewer units to achieve extreme compute levels. Rubin based DGX clusters also use fewer nodes and cabinets than Huawei's SuperCluster, which scales into thousands of NPUs and multiple petabytes of memory. This performance density allows Nvidia to compete directly with Huawei's projected compute output while limiting space, power, and interconnect overhead. The Rubin platform unifies AI compute, networking, and software into a single stack. Nvidia AI Enterprise software, NIM microservices, and mission critical orchestration create a cohesive environment for long context reasoning, agentic AI, and multimodal model deployment. While Huawei scales primarily through hardware count, Nvidia emphasizes rack level efficiency and tightly integrated software controls, which may reduce operational costs for industrial scale AI workloads.

[15]

VentureBeat

Nvidia's Vera Rubin is months away -- Blackwell is getting faster right now

The big news this week from Nvidia, splashed in headlines across all forms of media, was the company's announcement about its Vera Rubin GPU. This week, Nvidia CEO Jensen Huang used his CES keynote to highlight performance metrics for the new chip. According to Huang, the Rubin GPU is capable of 50 PFLOPs of NVFP4 inference and 35 PFLOPs of NVFP4 training performance, representing 5x and 3.5x the performance of Blackwell. But it won't be available until the second half of 2026. So what should enterprises be doing now? Blackwell keeps on getting better The current, shipping Nvidia GPU architecture is Blackwell, which was announced in 2024 as the successor to Hopper. Alongside that release, Nvidia emphasized that that its product engineering path also included squeezing as much performance as possible out of the prior Grace Hopper architecture. It's a direction that will hold true for Blackwell as well, with Vera Rubin coming later this year. "We continue to optimize our inference and training stacks for the Blackwell architecture," Dave Salvator, director of accelerated computing products at Nvidia, told VentureBeat. In the same week that Vera Rubin was being touted by Nvidia's CEO as its most powerful GPU ever, the company published new research showing improved Blackwell performance. How Blackwell performance has improved inference by 2.8x Nvidia has been able to increase Blackwell GPU performance by up to 2.8x per GPU in a period of just three short months. The performance gains come from a series of innovations that have been added to the Nvidia TensorRT-LLM inference engine. These optimizations apply to existing hardware, allowing current Blackwell deployments to achieve higher throughput without hardware changes. The performance gains are measured on DeepSeek-R1, a 671-billion parameter mixture-of-experts (MoE) model that activates 37 billion parameters per token. Among the technical innovations that provide the performance boost: * Programmatic dependent launch (PDL): Expanded implementation reduces kernel launch latencies, increasing throughput. * All-to-all communication: New implementation of communication primitives eliminates an intermediate buffer, reducing memory overhead. * Multi-token prediction (MTP): Generates multiple tokens per forward pass rather than one at a time, increasing throughput across various sequence lengths. * NVFP4 format: A 4-bit floating point format with hardware acceleration in Blackwell that reduces memory bandwidth requirements while preserving model accuracy. The optimizations reduce cost per million tokens and allow existing infrastructure to serve higher request volumes at lower latency. Cloud providers and enterprises can scale their AI services without immediate hardware upgrades. Blackwell has also made training performance gains Blackwell is also widely used as a foundational hardware component for training the largest of large language models. In that respect, Nvidia has also reported significant gains for Blackwell when used for AI training. Since its initial launch, the GB200 NVL72 system delivered up to 1.4x higher training performance on the same hardware -- a 40% boost achieved in just five months without any hardware upgrades. The training boost came from a series of updates including: * Optimized training recipes. Nvidia engineers developed sophisticated training recipes that effectively leverage NVFP4 precision. Initial Blackwell submissions used FP8 precision, but the transition to NVFP4-optimized recipes unlocked substantial additional performance from the existing silicon. * Algorithmic refinements. Continuous software stack enhancements and algorithmic improvements enabled the platform to extract more performance from the same hardware, demonstrating ongoing innovation beyond initial deployment. Double-down on Blackwell or wait for Vera Rubin? Salvator noted that the high-end Blackwell Ultra is a market-leading platform purpose-built to run state-of-the-art AI models and applications. He added that the Nvidia Rubin platform will extend the company's market leadership and enable the next generation of MoEs to power a new class of applications to take AI innovation even further. Salvator explained that the Vera Rubin is built to address the growing demand in compute created by the continuing growth in model size and reasoning token generation from leading models such as MoE. "Blackwell and Rubin can serve the same models, but the difference is the performance, efficiency and token cost," he said. According to Nvidia's early testing results, compared to Blackwell, Rubin can train large MoE models in a quarter the number of GPUs, inference token generation with 10X more throughput per watt, and inference at 1/10th the cost per token. "Better token throughput performance and efficiency, means newer models can be built with more reasoning capability and faster agent-to-agent interaction, creating better intelligence at lower cost," Salvator said. What it all means for enterprise AI builders For enterprises deploying AI infrastructure today, current investments in Blackwell remain sound despite Vera Rubin's arrival later this year. Organizations with existing Blackwell deployments can immediately capture the 2.8x inference improvement and 1.4x training boost by updating to the latest TensorRT-LLM versions -- delivering real cost savings without capital expenditure. For those planning new deployments in the first half of 2026, proceeding with Blackwell makes sense. Waiting six months means delaying AI initiatives and potentially falling behind competitors already deploying today. However, enterprises planning large-scale infrastructure buildouts for late 2026 and beyond should factor Vera Rubin into their roadmaps. The 10x improvement in throughput per watt and 1/10th cost per token represent transformational economics for AI operations at scale. The smart approach is phased deployment: Leverage Blackwell for immediate needs while architecting systems that can incorporate Vera Rubin when available. Nvidia's continuous optimization model means this isn't a binary choice; enterprises can maximize value from current deployments without sacrificing long-term competitiveness.

[16]

Mashable

Nvidia's new Vera Rubin chips: 4 things to know

Nvidia's new superchip is here. Credit: Patrick T. Fallon / AFP via Getty Images Nvidia CEO Jensen Huang announced at CES 2026 in Las Vegas this week that its new superchip platform, dubbed Vera Rubin, was on schedule and set to be released later this year. The news was one of the key takeaways from the highly anticipated keynote from Huang. Nvidia is the dominant player powering the AI industry, so a new line of chips is obviously a big deal. Here are four things to know as we await Vera Rubin's drop later this year. Nvidia introduced six chips on the so-called Rubin platform, one of which is the so-called Vera Rubin superchip that combines one Vera CPU and two Rubin GPUs in a processor. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," Huang said in a statement. "With our annual cadence of delivering a new generation of AI supercomputers -- and extreme codesign across six new chips -- Rubin takes a giant leap toward the next frontier of AI." Massive AI companies will look to package different parts of this new line of chips together to make massive supercomputers that power their products. "These huge systems are what hyperscalers like Microsoft, Google, Amazon, and social media giant Meta are spending billions of dollars to get their hands on," wrote Yahoo. Nvidia assured the public the chips were set to be released this year, but when, exactly, remains unclear. "Typically, production for chips this advanced -- which Nvidia is building with its longtime partner TSMC -- starts at low volume while the chips go through testing and validation and ramps up at a later stage," wrote Wired. There had been rumors of delays, so the announcement at CES seems aimed at quelling those fears. Nvidia has promised the Vera Rubin superchips are powerful and more efficient, which should, in turn, make AI products relying on them more efficient. That's why major companies will likely be lining up to purchase the new line of products. Huang said the Rubin chips could generate tokens -- the units used to measure output -- ten times more efficiently. We're still waiting to get all the details -- and to see when the chips actually hit the market -- but the announcement certainly was a major bit of AI news out of CES.

[17]

SiliconANGLE

Nvidia debuts Rubin chip with 336B transistors and 50 petaflops of AI performance - SiliconANGLE

Nvidia debuts Rubin chip with 336B transistors and 50 petaflops of AI performance Nvidia Corp. today announced a new flagship graphics processing unit, Rubin, that provides five times the inference performance of Blackwell. The GPU made its debut at CES alongside five other data center chips. Customers can deploy them together in a rack called the Vera Rubin NVL72 that Nvidia says ships with 220 trillion transistors, more bandwidth than the entire internet and real-time component health checks. Rubin includes 336 billion transistors that provide 50 petaflops of performance when processing NVFP4 data. Blackwell, Nvidia's previous-generation GPU architecture, provided up to 10 petaflops. Rubin's training speed, meanwhile, is 250% faster at 35 petaflops. Some of the chip's computing power is provided by a module called the Transformer Engine that also shipped with Blackwell. According to Nvidia, Rubin's Transformer Engine is based on a newer design with a performance-boosting feature called hardware-accelerated adaptive compression. Compressing a file reduces the number of bits it contains. That decreases the amount of data AI models have to crunch and thereby speeds up processing. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Nvidia Chief Executive Officer Jensen Huang. "With our annual cadence of delivering a new generation of AI supercomputers -- and extreme codesign across six new chips -- Rubin takes a giant leap toward the next frontier of AI." Nvidia plans to ship its new silicon as part of an appliance called the Vera Rubin NVL72 NVL72. It will combine 72 Rubin chips with 36 of the company's new Vera central processing units, which also made their debut at CES. Vera includes 88 cores based on a custom design called Olympus. They're compatible with Armv9.2, a widely-used version of Arm Holdings plc's CPU instruction set architecture. The Vera Rubin NVL72 keeps its chips in modules called trays. According to Nvidia, the trays have a cable-free design that cuts assembly and servicing times by a factor of up to 18 compared to Blackwell-based appliances. The RAS Engine, a subsystem that the company's GPU racks use to automate certain maintenance tasks, has been upgraded as well. It provides fault tolerance features and performs real-time health checks to verify that the hardware is working as expected. Nvidia says that the Vera Rubin NVL72 provides 260 terabits per second per bandwidth, which is more than the entire internet. The appliance processes AI models' traffic with the help of three different chips called the NVLink 6 Switch, Spectrum-6 and ConnectX-9. All three were announced at CES today. NVLink 6 Switch enables multiple GPUs inside a Vera Rubin NVL72 rack to exchange data with one another at once. That data exchange is needed to coordinate the GPUs' work while they're running distributed AI models. The Spectrum-6, in turn, is an Ethernet switch that facilitates connections between GPUs installed in different racks. Nvidia's third new networking chip, the ConnectX-9, is what's known as a SuperNIC. It's a hardware interface that a server can use to access the network of the host data center. ConnectX-9 performs networking tasks that were historically carried out by a server's CPU, which leaves more processing capacity for AI workloads. Rounding out the list of chips that Nvidia debuted today is the BlueField-4. It's a DPU, or data processing unit. A DPU offloads work from a server's main processor much like a SuperNIC, but it does so across a broader range of tasks. The BlueField-4 can perform not only networking-related computations but also certain cybersecurity and storage management operations. The BlueField-4 powers a new storage system that Nvidia calls the Inference Context Memory Storage Platform. According to the company, it will help optimize large language models' key-value cache. An LLM's attention mechanism, the component it uses to determine which data points to use and how, often repeat the same calculations. A key-value cache allows an LLM to perform a frequently recurring calculation only once, save the results and then reuse those results. That's more hardware-efficient than calculating the same output from scratch every time it's needed. The Vera Rubin NVL72 will ship alongside a smaller appliance called the DGX Rubin NVL8 that includes 8 Rubin GPUs instead of 72. The two systems form the basis of the DGX SuperPOD, a new reference architecture for building AI clusters. It combines Nvidia's latest chips with a software platform called Mission Control that companies can use to manage their AI infrastructure.

[18]

TweakTown

NVIDIA officially unveils Rubin: its next-gen AI platform with huge upgrades, next-gen HBM4

TL;DR: At CES 2026, NVIDIA CEO Jensen Huang unveiled the Rubin AI platform, a six-chip, extreme-codesigned system delivering 50 petaflops and cutting AI token costs to one-tenth of its predecessor. Rubin integrates GPUs, CPUs, advanced networking, and AI-native storage to accelerate large-scale AI innovation economically and efficiently. NVIDIA founder and CEO Jensen Huang proudly took the stage at CES 2026, unveiling the company's next-generation Rubin AI platform. NVIDIA's new Rubin AI platform is the successor to its dominant Blackwell AI chips, with Rubin being the first extreme-codesigned, six-chip AI platform, with Jensen adding that it's now in full production. NVIDIA is aiming to "push AI to the next frontier" with Rubin, not just offering far more computing power, but slicing the cost of generating tokens to around 1/10 of Blackwell, making large-scale AI "far more economical to deploy". The use of extreme codesign means that designing all of the components together is essential because scaling AI to gigascale requires tighter integration innovation between chips, trays, racks, networking, storage, and software to remove bottlenecks. This massively reduces the costs of training and inference, added Huang. Jensen also unveiled AI-native storage with NVIDIA Inference Context Memory Storage Platform, an AI-native KV-cache tier that increases long-context inference with 5x higher tokens per second, 5x increased performance per TCO dollar, and 5x better power efficiency. All of these innovations turn into the new Rubin AI platform, that NVIDIA says promises to dramatically accelerate AI innovation, delivering AI tokens at 1/10 the cost. Huang said: "The faster you train AI models, the faster you can get the next frontier out to the world. This is your time to market. This is technology leadership". Huang said: "Computing has been fundamentally reshaped as a result of accelerated computing, as a result of artificial intelligence. What that means is some $10 trillion or so of the last decade of computing is now being modernized to this new way of doing computing". Huang continued: "Every single six months, a new model is emerging, and these models are getting smarter and smarter. Because of that, you could see the number of downloads has exploded". NVIDIA Rubin AI platform details:

[19]

Inc.

Nvidia Just Shared Details About Its Next Big Business Move

Nvidia is gearing up to release its newest Vera Rubin superchip, designed to drastically boost AI efficiency. The chip, currently in production, is slated for launch in the latter half of 2026, the company announced at the CES tech conference in Las Vegas on January 5. The next generation superchip is meant to power massive AI models and drive the imminent transition from AI chatbots to agents. Vera Rubin brings together one Vera CPU and two Rubin GPUs within a single processor. It will function as part of the Rubin platform, along with four additional networking and storage chips. Altogether, the full entity packs 72 GPUs into one system, which can then be combined with another to form a "massive AI supercomputer," according to Yahoo Finance. Rubin uses just a quarter of the GPUs that older Blackwell systems needed to train the same model. Customers will see a boost in efficiency, as they'll be able to use extra units for other tasks.

[20]

Gadgets 360

Nvidia Introduces Vera Rubin as Successor to Blackwell AI Platform

Vera Rubin is said to deliver up to 10x reduction in inference token cost Nvidia kickstarted the Consumer Electronics Show (CES) 2026 on Monday with several artificial intelligence (AI) announcements. Among them, the biggest introduction was Vera Rubin, the Santa Clara-based tech giant's newest AI platform, which replaces Blackwell. The company also unveiled six new chipsets and a supercomputer built on the new architecture, expanded its catalogue of open-source AI models, and shared advancements made by it in the physical AI space. All of these announcements were made during Nvidia CEO Jensen Huang's keynote session. Nvidia Introduces Vera Rubin AI Platform During his keynote address, Huang introduced the Vera Rubin platform. Just like its predecessor, Blackwell, the new architecture will become the standard for the upcoming chipsets aimed at AI workflows, enterprise systems, and supercomputers. Interestingly, the new AI platform is named after American astronomer Vera Florence Cooper Rubin, who is known for providing evidence for dark matter by studying galaxy rotation curves. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof. With our annual cadence of delivering a new generation of AI supercomputers -- and extreme codesign across six new chips -- Rubin takes a giant leap toward the next frontier of AI," said Huang. The core idea behind Vera Rubin is extreme co-design, meaning Nvidia engineered the platform's components from the ground up to share data quickly, reduce costs, and improve efficiency for training and running AI models. The company also introduced six key chipset families that will be bundled into rack-scale systems called Vera Rubin NVL servers. These include the Nvidia Vera CPU, Nvidia Rubin GPU, Nvidia NVLink 6 Switch, Nvidia ConnectX-9 SuperNIC, Nvidia BlueField-4 data processing unit (DPU) and the Nvidia Spectrum-6 Ethernet Switch. As per the company's press release, the new architecture will accelerate agentic AI, advanced reasoning, and large-scale mixture-of-experts (MoE) model inference. Compared to Blackwell, it is said to offer up to 10x lower cost and up to 4x fewer GPUs to run the same tasks. Nvidia also mentioned some of the companies that will adopt Vera Rubin-based chipsets in the coming months. These include Amazon Web Services (AWS), Anthropic, Dell Technologies, Google, HPE, Lenovo, Meta, Microsoft, OpenAI, Oracle, Perplexity, Thinking Machines Lab, and xAI. Nvidia's Open Models, Data and Tools Alongside its system architecture, Nvidia detailed a suite of open models and data tools intended to accelerate AI across industries. Among the releases is the Nvidia Alpamayo family, a set of open, large-scale reasoning models and simulation frameworks designed to support safe, reasoning-based autonomous vehicle development. The family includes a reasoning-capable vision-language-action (VLA) model, simulation tools such as AlpaSim, and Physical AI Open Datasets that cover rare and complex driving scenarios. Alpamayo is part of what Huang called a "ChatGPT moment for physical AI," where machines begin to understand, reason and act in the real world, including explaining their decisions. The open nature of the models, simulation frameworks and data sets is intended to encourage transparency and faster progress among industry developers and researchers working on Level 4 advanced driver assistive systems (ADAS). Apart from this, Nvidia's Nemotron family for agentic AI, Cosmos platform for physical AI, Isaac GR00T for robotics, and Clara for biomedical AI have also been made available to the open community.

[21]

ETtech Explainer: What's Nvidia's Rubin platform, and why it matters for AI - The Economic Times

The Rubin platform moves Nvidia from a seller of powerful GPUs to delivering fully integrated AI computing systems. Rubin is made up of six chips, consisting of tightly connected processors and networking components -- Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 Super NIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch.At the Consumer Electronics Show (CES) in Las Vegas on Monday, Nvidia unveiled its Rubin artificial intelligence (AI) platform, which is expected to ship in the second half of 2026, with early adoption planned by tech giants including Microsoft, Amazon, Meta and Google. The announcement marks a shift in Nvidia's position, from a seller of powerful GPUs to now delivering fully integrated AI computing systems. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," said Jensen Huang, founder and CEO of Nvidia, at CES 2026. "With our annual cadence of delivering a new generation of AI supercomputers and extreme codesign across six new chips, Rubin takes a giant leap toward the next frontier of AI." Extreme codesign is a holistic approach where the different components (hardware, software, networking, algos, etc.) are engineered simultaneously and collaboratively. ET breaks down why it matters for the AI ecosystem and how Nvidia stacks up against rivals in the AI infra race. What is the Rubin platform? Rubin is Nvidia's next-generation AI computing system designed to run the most advanced AI models efficiently at scale. Instead of a single chip, Rubin is made up of six, which includes tightly connected processors and networking components: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 Super NIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch. While the Vera CPU works closely with AI accelerators, handling system-level tasks, the Rubin GPU is the main AI engine responsible for training large models. Switch allows GPUs to share data, ConnectX-9 speeds up communication, BlueField-4 offloads networking, security, and storage tasks, and Spectrum-6 assures optimal performance. How is it different from the Blackwell platform? Rubin treats AI infrastructure as one coordinated system, unlike earlier generations, where Nvidia mainly sold graphics processing units (GPUs) plugged into standard servers. Even in the recent Nvidia Blackwell platform, the GPU was the centrepiece, with CPUs, networking, and storage often supplied by different vendors. Nvidia said Rubin delivers up to a 10X reduction in inference token cost and a 4X reduction in the number of GPUs required to train AI models, compared with the Blackwell platform. Nvidia claims its Spectrum-X Ethernet Photonics switch systems deliver 5X improved power efficiency and uptime. While each GPU can move data quickly, Rubin can move extensive data, about 260 TB per second, which is more than what the entire internet can handle at once. Nvidia Rubin also introduced the Nvidia Inference Context Memory Storage Platform, a new class of AI-native storage infrastructure designed to handle inference context at a gigascale. Nvidia says Rubin is built for agentic and reasoning-heavy AI workloads. It can work across multiple steps, keep track of long conversations, and run continuously, reducing the time and energy spent on moving data around the system. How does Nvidia stack up against rivals? Rivals such as AMD have narrowed the gap in accelerator performance, but Rubin moves the needle into system-level integration, where fewer companies can match Nvidia's scale. Hyperscalers, including Google's Tensor Processing Units (TPUs), are built to support both training and real-time AI workloads, linking up to 9,216 chips in one pod. Google claims that Ironwood is nearly 30x more power efficient than its first Cloud TPU from 2018. Similarly, AWS created its own AI training hardware, the advanced 3-nanometre Trainium3 chips and the upcoming Trainium4, to support large-scale AI workloads. AWS says Trainium3 is over four times faster than its predecessor, offers four times more memory, and uses about 40% less energy. Trainium competes on price and efficiency within a single cloud, while Rubin focusses less on individual chip speed and more on end-to-end efficiency at scale. Emerging AI hardware startups such as Graphcore, Cerebras, and SambaNova are offering alternative approaches, such as the intelligence processing unit (IPU), a single AI chip, etc. Rubin is positioned as a general-purpose platform that can cater to a broad range of customers, from cloud providers to enterprises, and even AI startups. Where Nvidia may face limitations is in custom workloads optimised for specific cloud providers. Google's TPUs or AWS Trainium chips, for example, are extremely efficient with respect to the services and models they are designed for, and Rubin's general-purpose design may not always outperform these specialised solutions in niche scenarios. Additionally, the high cost and complexity of deploying fully integrated Rubin systems could limit adoption among smaller AI startups that cannot invest in full-stack Nvidia infrastructure. Rubin's clients Microsoft's next-generation Fairwater AI superfactories will feature Nvidia Vera Rubin NVL72 rack-scale systems, scaling hundreds of thousands of Nvidia Vera Rubin Superchips. CoreWeave is among the first to offer Nvidia Rubin, operated through CoreWeave Mission Control for flexibility and performance. Rubin is further expected to be adopted by a wide range of leading AI labs, cloud providers, hardware manufacturers, and startups, including AWS, Anthropic, Black Forest Labs, Cisco, Cohere, Cursor, Dell Technologies, Google, Harvey, HPE, Lambda, Lenovo, Meta, Mistral AI, Nebius, Nscale, OpenAI, OpenEvidence, Oracle Cloud Infrastructure, Perplexity, Runway, Supermicro, Thinking Machines Lab, and xAI, the company said in a media release. Looking ahead Nvidia's vertical integration spanning compute, networking, data processing, and system software increases performance predictability for large-scale deployments, while also raising migration/porting costs for customers building on Nvidia's platform. Nvidia may have raised the stakes for rivals that must now match not only its performance but also the coherence and breadth of its full AI infrastructure stack. Also Read || CES 2026: All you need to know about Nvidia's major announcements

[22]

CRN

Nvidia Vera Rubin: 9 Hardware, Cloud Companies Building Out Ecosystem

Nvidia's new Vera Rubin GPU platform, unveiled at CES 2026, is drawing strong interest from enterprises and technology partners eager to build next-generation AI infrastructure. CRN looks at nine strategic Nvidia vendor partners looking to build out the Rubin ecosystem. The new Nvidia Vera Rubin platform introduced by Nvidia at CES 2026 generated interest not only from businesses looking at ways to significantly enhance their AI capabilities but also from technology providers looking to help build the infrastructure to enable those capabilities. Nvidia used CES to launch its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. Nvidia said its Rubin platform is in production, and that its technology partners expect their related offerings to be available in the second half of 2026. Santa Clara, Calif.-based Nvidia plans to initially make Rubin available in two ways: through the Vera Rubin NVL72 rack-scale platform, which connects 72 Rubin GPUs and 36 of its custom, Arm-compatible Vera CPUs, and through the HGX Rubin NVL8 platform, which connects eight Rubin GPUs for servers running on x86-based CPUs. [Related: The 10 Biggest Nvidia News Stories Of 2025] Both platforms will be supported by Nvidia's DGX SuperPod clusters. The rack-scale platform was originally called Vera Rubin NVL144 when it was revealed at Nvidia's GTC 2025 event last March, with the 144 number meant to reflect the number of GPU dies in each server rack. But the company eventually decided against this, instead opting to stick with the NVL72 nomenclature used for the Grace Blackwell rack-scale platforms to reflect the number of GPU packages, each of which contain two GPU dies. Nvidia's technology partners include a wide range of companies from hardware vendors like Supermicro to storage vendors like Vast Data and DDN and system vendors like Dell Technologies and Lenovo. Those companies all said they plan to introduce high-performance AI systems based on the Nvidia Vera Rubin platform. Other partners unveiling support for Rubin include cloud providers like CoreWeave, Nebius, Microsoft and Red Hat, which are in the process of expanding their infrastructure to take advantage of the enhanced performance promised by Nvidia's new platform to significantly expand their AI capabilities. CRN looks at nine hardware, software and services vendors that are helping build out the Nvidia Vera Rubin ecosystem. Read on for the details. DDN, which develops high-performance storage technologies targeting AI data, unveiled a collaboration with Nvidia supporting DDN's next-generation AI factory architecture. The DDN AI Data Intelligence Platform, which combines the company's EXAScaler platform for high-performance SI training and high-throughput workloads and its Infinia software-defined data platform built for AI inference, RAG, data preparation and metadata-heavy workloads, will be powered by Nvidia Rubin and Nvidia BlueField-4 DPUs. Working with Nvidia Rubin, DDN looks to help enterprises and hyperscalers operationalize large-scale AI performance by eliminating data bottlenecks. DDN said the unified stack provides up to 99 percent GPU utilization across large-scale AI environment while reducing the time to first token, or TFFT, by 20 percent to 40 percent. DDN said it is a certified storage technology for the Nvidia DGX SuperPOD, and that the company already powers over 1 million GPUs worldwide for AI and high-performance computing environments. Vast Data, which builds storage technology targeting AI applications, unveiled a new inference architecture that enables the Nvidia Inference Context Memory Storage Platform aimed at deploying agentic AI. Vast Data said its platform is a new class of AI-native storage infrastructure for gigascale inference built on Nvidia BlueField-4 DPUs and Nvidia Spectrum-X Ethernet networking. The new platform runs Vast Data's AI Operating System software natively on Nvidia BlueField-4 DPUs to move critical data services directly into the GPU server so inference executes in a dedicated data node architecture, which the company said removes unnecessary copies of the data to reduce TTTF. This is combined with Vast Data's parallel DASE (Disaggregated Shared-Everything) architecture so that each host can access a shared global context namespace without the need for coordinating data requests, providing a streamlined path from the GPU memory to persistent NVMe storage. Data center infrastructure vendor and white-box server and storage manufacturer Supermicro unveiled plans to enable first-to-market delivery of data center-scale offerings optimized for the Nvidia Vera Rubin and Rubin platforms with its deployment of the flagship Nvidia Vera Rubin NVL72 and Nvidia HGX Rubin NVL8 systems. The systems will be part of Supermicro's Data Center Building Block Solutions (DCBBS) approach to streamlining production while providing extensive customization options and fast time-to-deployment. Supermicro is offering: Lenovo unveiled the Lenovo AI Cloud Gigafactory with Nvidia to expand on the partnership the two have for accelerating hybrid AI adoption across personal, enterprise and public AI platforms. The aim is to provide AI cloud providers with the ability to achieve TTFT in weeks by quickly deploying gigawatt-scale AI factories using ready-to-use components, expert guidance and industrialized build processes. In addition to taking advantage of the Nvidia Blackwell Ultra high-performance architecture that uses Lenovo's Nvidia GB300 NVL72 system and a liquid-cooled rack-scale architecture integrating 72 Nvidia Blackwell Ultra GPUs and 30 Nvidia Grace CPUs, the Lenovo AI Cloud Gigafactory with Nvidia supports the Nvidia Vera Rubin NVL72 system for AI training and inference. Dell said it will support the Nvidia Rubin platform with its Dell AI Factory, which is aimed at bringing AI to businesses. The company plans to introduce new PowerEdge servers featuring the Nvidia Vera Rubin NVL72 platform, promising to deliver 3.6 exaflops of AI performance with 75 TB of fast memory and advanced resiliency capabilities. The new platforms will support Nvidia Vera Arm-based CPUs with 88 Custom Olympus cores providing 176 threads with Nvidia spatial multithreading capabilities and 1.2 TB-per-second memory bandwidth. They are aimed at data movement engines for agentic AI applications, the company said. Dell is also expanding its PowerEdge line with support for Nvidia HGX Rubin NVL8 configurations it says will deliver about 400 petaflops of AI performance with 2.3 TB of HBM4 memory, 176-TB-per-second memory bandwidth, and 800-Gbps Nvidia ConnectX-9 SuperNICs and Nvidia BlueField-4 DPUs. CoreWeave, the developer of an AI cloud, said it will add Nvidia's Rubin technology to its AI cloud platform to help expand the range of options for customers looking to build and deploy agentic AI, reasoning and large-scale inference workloads. CoreWeave said it expects to be among the first cloud providers to deploy the Nvidia Rubin platform in the second half of 2026. "The Nvidia Rubin platform represents an important advancement as AI evolves toward more sophisticated reasoning and agentic use cases," said Michael Intrator, CoreWeave's co-founder, chairman and CEO, in a statement. "Enterprises come to CoreWeave for real choice and the ability to run complex workloads reliably at production scale. With CoreWeave Mission Control as our operating standard, we can bring new technologies like Rubin to market quickly and enable our customers to deploy their innovations at scale with confidence." Nebius said at CES that it plans to deploy the Nvidia Rubin platform through its Nebius AI Cloud and Nebius Token Factory to help unlock next-generation reasoning and agentic AI capabilities for customers starting in the second half of 2026. Nebius, a Nvidia Cloud Partner, expects to be among the first AI cloud providers to offer Nvidia Vera Rubin NVL72. The company plans to integrate Vera Rubin NVL72 across its full-stack infrastructure at data centers in the U.S. and Europe to help customers build next-generation AI applications with regional availability and control. Nebius founder and CEO Arkady Volozh said in a statement, "We are proud to be one of the first on the market to offer Vera Rubin GPUs as we fuel the next wave of AI innovation. By integrating Vera Rubin into Nebius AI Cloud and our inference platform Nebius Token Factory, we're giving AI innovators and enterprises the infrastructure they need to develop agentic and reasoning AI systems faster and more efficiently." Microsoft President of Azure Hardware Systems and Infrastructure Rani Borkar used a blog post to unveil his company's plans for deploying the Nvidia Rubin platform with Azure. "Microsoft's long-range datacenter strategy was engineered for moments exactly like this, where Nvidia's next-generation systems slot directly into infrastructure that has anticipated their power, thermal, memory, and networking requirements years ahead of the industry. Our long-term collaboration with Nvidia ensures Rubin fits directly into Azure's forward platform design," Borkar wrote. Azure's architecture already incorporates the core architectural assumptions required for deploying Nvidia Rubin, according to Borkar's blog post: Red Hat said at CES that it intends to deliver a complete AI stack optimized for the Nvidia Vera Rubin platform with Red Hat Enterprise Linux, Red Hat OpenShift and Red Hat AI. As the IT industry moves beyond individual servers toward unified, high-density systems, Red Hat said it plans to help start this transformation with the introduction of Red Hat Enterprise Linux for Nvidia, a specialized edition of the company's enterprise Linux platform optimized for the Nvidia Rubin platform and tuned to drive future production on Red Hat OpenShift and Red Hat AI. Red Hat Enterprise Linux for Nvidia will support the platform features of the latest Nvidia architectures on Day zero of availability, starting with the Nvidia Rubin platform, Red Hat said. Red Hat Enterprise Linux for Nvidia will be fully aligned with the main build of the operating system so that any improvements in Red Hat Enterprise Linux for Nvidia will be incorporated into Red Hat Enterprise Linux to help customers easily transition to the traditional Red Hat Enterprise Linux as needed, the company said.

[23]

Geeky Gadgets

NVIDIA Rubin Platform Adds NVLink 6 at 3.6 TBps & HBM4 with 22 TBps Bandwidth

What if the future of AI hardware wasn't just about speed, but about reshaping the very foundation of how artificial intelligence operates? At CES 2026, NVIDIA unveiled the Rubin platform, a innovative suite of components designed to meet the growing demands of agentic AI and robotics. In this overview, Caleb Writes Code explains how the Vera Rubin GPU, the platform's centerpiece, delivers a staggering 35 petaflops for training, 3.5 times faster than its predecessor. This leap in performance isn't just about numbers; it's about allowing AI systems to process complex tasks with unprecedented efficiency, from large language models to autonomous robotics. If you've ever wondered what it takes to power the next wave of AI breakthroughs, this is the hardware redefining the rules. But what makes the Rubin platform truly stand out isn't just its raw power, it's the seamless integration of six innovative components, each engineered to tackle specific challenges in AI infrastructure. From the NVLink 6 interconnect, which features 3.6 terabytes per second of bandwidth, to the BlueField-4 DPU designed for intricate data handling, NVIDIA has created a system that feels almost futuristic in its ambition. Caleb breaks down how these innovations come together to support real-world applications like retrieval-augmented generation (RAG) and large-scale deployments for hyperscalers. Whether you're curious about the technical breakthroughs or the broader implications for industries like healthcare and autonomous systems, this guide offers a glimpse into the hardware shaping AI's next frontier. The Rubin platform is a comprehensive and integrated solution, consisting of six innovative components, each tailored to tackle specific challenges in AI infrastructure. These components include: Together, these components form a unified platform capable of handling AI inference, training, and deployment at unprecedented scales. By integrating these technologies, NVIDIA has created a system that not only meets but exceeds the demands of modern AI applications. The AI industry is undergoing a significant transformation, moving beyond the era of generative AI to embrace agentic AI and robotics. This shift reflects the increasing demand for systems capable of autonomous operation, environmental interaction, and executing complex tasks with minimal human intervention. NVIDIA's Rubin platform is specifically designed to meet these evolving needs, offering hardware that can handle the computational intensity required by agentic AI and robotics applications. In addition to supporting these advanced AI paradigms, the Rubin platform emphasizes faster training and inference, aligning with the industry's growing focus on production-ready AI. As organizations prioritize real-world applications, the need for scalable and efficient hardware has become more critical than ever. NVIDIA's Rubin platform addresses this gap, allowing businesses to deploy AI models more quickly and effectively across a wide range of industries, from healthcare to autonomous systems. Take a look at other insightful guides from our broad collection that might capture your interest in NVIDIA. The Rubin platform introduces several technical breakthroughs that set new benchmarks for AI hardware performance. The Vera Rubin GPU is a standout component, delivering 35 petaflops for training, 3.5 times faster than its predecessor, the Blackwell chip, and 50 petaflops for inference. This performance leap ensures exceptional speed and efficiency, even for the most demanding AI workloads. Complementing the GPU is the NVLink 6 interconnect, which provides a remarkable 3.6 terabytes per second of bandwidth, allowing faster communication between components. Additionally, the platform incorporates HBM4 memory, offering an unprecedented 22 terabytes per second of bandwidth. These innovations are critical for supporting resource-intensive applications such as large language models (LLMs), retrieval-augmented generation (RAG), and other high-performance AI tasks. By reducing latency and increasing throughput, the Rubin platform ensures optimal performance, even under heavy computational loads. The NVIDIA Rubin platform is designed to power a diverse array of AI applications, ranging from large-scale LLMs to robotics and agentic AI. Its advanced hardware capabilities make it particularly effective for retrieval-augmented generation (RAG), a technique that combines LLMs with external knowledge bases to enhance accuracy and relevance. This capability is especially valuable for industries that rely on precise and context-aware AI systems, such as customer service, healthcare diagnostics, and financial analysis. For hyperscalers and NeoClouds, the Rubin platform offers significant advantages in terms of service-level agreements (SLAs) and token efficiency. By optimizing hardware performance, NVIDIA enables these organizations to deliver faster, more reliable AI services. This is increasingly critical as AI becomes deeply integrated into everyday technologies, from virtual assistants to autonomous vehicles. The Rubin platform's ability to handle large-scale deployments ensures that businesses can meet the growing demand for AI-driven solutions without compromising on performance or reliability. The Rubin platform is poised to have a profound impact on the global AI landscape, strengthening the United States' position in the ongoing competition with other nations, particularly China. By prioritizing faster and more efficient AI hardware, NVIDIA enables organizations to deploy AI solutions at scale, unlocking substantial economic and technological value. Major players in the AI industry, including OpenAI and CoreWeef, are expected to adopt the Rubin platform in the latter half of 2026. This anticipated adoption underscores the platform's potential to reshape the AI hardware market, driving innovation and setting new performance benchmarks. NVIDIA's focus on inference hardware reflects a broader industry trend toward practical, production-ready AI applications, further solidifying its leadership in the field. As the AI industry continues to evolve, the Rubin platform is set to play a pivotal role in shaping the future of AI infrastructure. Its advanced capabilities and scalable design make it a cornerstone for the next generation of AI technologies, making sure that businesses and researchers alike can push the boundaries of what is possible with artificial intelligence.

[24]

Benzinga

Nvidia CEO Jensen Huang Says Blackwell Successor Vera Rubin Is In 'Full Production' At CES 2026: Here Is Everything You Need To Know - NVIDIA (NASDAQ:NVDA)

At CES 2026, Nvidia Corp (NASDAQ:NVDA) CEO Jensen Huang outlined a sweeping vision for AI's next computing cycle, confirming that the company's next-generation Vera Rubin platform is already in full production. AI Enters A New Computing Cycle, Huang Says Taking the stage at a packed Fontainebleau Las Vegas venue, Huang said the computing industry is undergoing a once-in-a-decade transition -- and this time, two shifts are happening simultaneously. Applications are now being built directly on AI, and the process of building software itself has fundamentally changed. He made a fashionably late appearance in his signature leather jacket -- shinier than usual -- greeted the crowd with a New Year's wish and immediately launched into Nvidia's success in scaling AI, pushing it toward agentic systems, teaching it the laws of nature and beyond. Huang said the computing industry undergoes a major transformation roughly every 10 to 15 years, with each shift ushering in a new platform. This time, two transitions are happening at once: applications are increasingly built around AI and the process of developing software itself is being fundamentally redefined. See Also: Maduro Down, Drone Stocks Up After Venezuela Mission AI Is Moving Beyond Chatbots Huang traced AI's evolution from early neural networks to transformers and today's large language models, arguing that the next phase extends well beyond text-based systems. He emphasized the rise of agentic AI, models capable of planning, reasoning and acting autonomously over time. "Large language models isn't the only form of information," Huang said. "Wherever the universe has information, wherever the universe has structure, we could teach a large language model." That includes what Nvidia calls physical AI -- systems trained to understand and interact with the real world using the laws of physics. Open Models Are Catching Up Fast A major theme of the keynote was Nvidia's bet on open AI ecosystems. Huang said open models are now roughly six months behind proprietary frontier models and continue to close the gap. According to Huang, about 80% of startups are building on open models and a significant share of AI usage across developer platforms now relies on open-source systems. Nvidia is releasing not only its models, but also the data and lifecycle tools used to train, evaluate and deploy them. Physical AI Powers Robots And Autonomous Vehicles Nvidia highlighted its Cosmos world foundation model, which generates realistic simulations and synthetic data to train robots and autonomous systems. Huang said Nvidia uses Cosmos internally for self-driving development. The company also unveiled Alpamayo, an open-source reasoning and decision-making AI for autonomous driving. Huang said it allows vehicles to learn from limited real-world data and handle unfamiliar scenarios. Nvidia's open models are also rapidly closing the gap with frontier systems, topping leaderboards in areas like OCR, PDF comprehension, natural language search and more. Next Era Of Robotics Systems Is Going To Be... Robots At one point, during the keynote, Star Wars' BDX droids entered the stage -- fully autonomous, powered by Nvidia Cosmos. Huang was clearly enjoying a lively back-and-forth with the droids. Nvidia is also teaming up with Siemens, signaling a major leap for manufacturing as Nvidia's physical AI uses synthetic data from digital factory twins to train next-generation robotics. Vera Rubin Enters Full Production Huang confirmed that Vera Rubin, Nvidia's next-generation AI supercomputing platform and successor to Blackwell, is now in full production. The system delivers up to five times the performance of Blackwell while improving efficiency, memory bandwidth and interconnect speeds. Vera Rubin integrates advanced GPUs, custom CPUs, high-speed networking and full-stack encryption and is designed to address what Huang called AI's next major constraint: context and data movement. "NVIDIA Rubin platform, the successor to the record‑breaking NVIDIA Blackwell architecture and the company's first extreme-codesigned, six‑chip AI platform, is now in full production," the company also later said in a blog post. Vera Rubin also integrates NVIDIA's ConnectX-9 Spectrum-X SuperNIC and can be assembled in just five minutes, compared with roughly two hours for previous systems. The entire platform is water-cooled. An NVLink 6 Switch enables all GPUs within Vera Rubin to communicate simultaneously. Huang said the system is fully encrypted for enhanced security and uses hot-water cooling at around 45°C -- a counterintuitive approach that, he noted, significantly reduces energy costs. At the beginning of Vera Rubin's presentation, Huang also referenced astronomer Vera Rubin, who observed that the outer edges of galaxies were rotating nearly as fast as their centers -- a breakthrough that led to the discovery of dark matter. Huang said Nvidia will name its next computer in her honor. Price Action: Nvidia shares were down 0.39% during Monday's regular session and slipped another 0.069% in after-hours trading, according to Benzinga Neuro. According to Benzinga's Edge Stock Rankings, Nvidia ranks in the 94th percentile for Growth and the 97th percentile for Quality. Click here to see how it compares with its peers. Read Next: Beyond Nvidia: Dan Ives Names Top AI Stocks For 2026 Photo Courtesy: glen photo on Shutterstock.com Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. NVDANVIDIA Corp$187.99-0.07%OverviewMarket News and Data brought to you by Benzinga APIs

[25]

Wccftech

NVIDIA Rubin Is The Most Advanced AI Platform On The Planet: Up To 50 PFLOPs With HBM4, Vera CPU With 88 Olympus Cores, And Delivers 5x Uplift Vs Blackwell

NVIDIA is formally announcing its Rubin AI platform today, which will be the heart of next-gen Data Centers, with a 5x upgrade over Blackwell. Today, NVIDIA is officially announcing its Rubin platform, which comes as a surprise because we were all expecting an update at the company's GTC event, which has already been announced. With the exciting developments in the AI segment and with all the AI talk going around at CES, NVIDIA decided to unveil its grand AI platform a little early. NVIDIA's Rubin platform is going to be made up of a total of six chips, all of which are back from fabs and in NVIDIA's labs for testing. These chips include: All of these chips combined make the Rubin platform alive inside a range of DGX, HGX, and MGX systems. At the heart of each data center is the NVIDIA Vera Rubin Superchip, featuring two Rubin GPUs, one Vera CPU, and massive amounts of memory in HBM4 and LPDDR5x configurations. The highlights of the NVIDIA Rubin technology include: So starting with the Rubin GPU, this chip features two reticle dies, each with lots of compute and tensor cores. The chip itself is designed purely for AI-intensive workloads, offering 50 FLOPs of NVFP4 Inference, 35 PFLOPs of NVFP4 Training performance, a 5x and 3.5x increase over Blackwell, respectively. The chip is also equipped with HBM4 memory, offering up to 22 TB/s bandwidth per chip, a 2.8x increase vs Blackwell and 3.6 TB/s of NVLink bandwidth per CPU, a 2x increase vs Blackwell. For the Vera CPU, NVIDIA has designed its next-gen custom Arm architecture codenamed Olympus, and the chip packs 88 cores, 176 threads (with NVIDIA Spatial Multi-Threading), 1.8 TB/s NVLink-C2C coherent memory interconnect, 1.5 TB of system memory (3x Grace), 1.2 TB/s of memory bandwidth with SOCAMM LPDDR5X, and Rack-scale confidential compute. These combine to offer 2x data processing, compression & CI/CD performance versus Grace. NVLink 6 switches offer networking fabric on the Rubin platform with 400G SerDes, 3.6 TB/s per-CPU all-to-all bandwidth, 28.8 TB/s of total bandwidth, 14.4 TFLOPS of FP8 compute in-network, & a 100% liquid cooled design. Networking is powered by the latest ConnectX-9 and BlueField-4 modules. ConnectX-9 SuperNIC offers 1.6 TB/s bandwidth with 200G PAM4 SerDes, programmable RDMA and data path accelerator, top-level security, and is optimized for massive-scale AI. The Bluefield-4 is an 800G DPU for SmartNIC and storage processor. It integrates a 64-core Grace CPU with ConnectX-9, offers 2x networking capabilities versus BlueField-3, 6x compute, and 3x memory bandwidth. All of these come together in the NVIDIA Vera Rubin NVL72 rack, which offers some impressive uplifts versus Blackwell as detailed below: NVIDIA is also announcing its Spectrum-X Ethernet Co-Packaged Optics solution, which offers a 102.4 Tb/s scale-out switch infrastructure, co-packaged 200G silicon photonics, and offers 95% of effective bandwidth at scale. The system is 5 times more efficient, 10 times more reliable, and offers 5 times higher application runtime. For its Rubin SuperPOD, NVIDIA is also unveiling the Inference Context Memory Storage platform, which is built for gigascale inference and is fully integrated with NVIDIA software solutions such as Dynamo, NIXL & DOCA. To wrap it all up, NVIDIA will be putting its Rubin platform in its bleeding-edge DGX SuperPOD with 8 Vera Rubin NVL72 racks. But that isn't it, there's also the NVIDIA DGX Rubin NVL8 for mainstream Data Centers. With all of these advancements, NVIDIA Rubin offers 10x reduction in inference token cost and 4x reduction in number of GPUs to train MoE models vs Blackwell GB200. The Rubin ecosystem is backed by a diverse range of partners and is in full production, with customers getting the first chips later this year.

[26]

CRN

Nvidia Touts New Storage Platform, Confidential Computing For Vera Rubin NVL72 Server Rack

The AI infrastructure giant used the CES 2026 keynote by Nvidia CEO Jensen Huang to mark the launch of its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. Availability from partners is set to begin in the second half of this year. Nvidia on Monday revealed a new "context memory" storage platform, "zero downtime" maintenance capabilities, rack-scale confidential computing and other new features for its forthcoming Vera Rubin NVL72 server rack for AI data centers. The AI infrastructure giant used the CES 2026 keynote by Nvidia CEO Jensen Huang to mark the launch of its Rubin GPU platform, the highly anticipated follow-up to its fast-selling Blackwell Ultra products. But while the company said Rubin is in "full production," related products won't be available from partners until the second half of this year. [Related: The 10 Biggest Nvidia News Stories Of 2025] Huang and other Nvidia officials in recent months have fought back fears of the massive AI data center build-out representing a bubble by stating that the company expects to make $500 billion from Blackwell and Rubin products between the start of last year and end of this year, citing ongoing demand for generative, agentic and physical AI solutions. In promoting Rubin, Nvidia touted support from a wide range of support from large and influential tech companies, including Amazon Web Services, Microsoft, Google Cloud, CoreWeave, Cisco, Dell Technologies, HPE, Lenovo and many more. The Santa Clara, Calif.-based company plans to initially make Rubin available in two ways: through the Vera Rubin NVL72 rack-scale platform, which connects 72 Rubin GPUs and 36 of its custom, Arm-compatible Vera CPUs, and through the HGX Rubin NVL8 platform, which connects eight Rubin GPUs for servers running on x86-based CPUs. Both of these platforms will be supported by Nvidia's DGX SuperPod clusters. The rack-scale platform was originally called Vera Rubin NVL144 when it was revealed at Nvidia's GTC 2025 event last March, with the 144 number meant to reflect the number of GPU dies in each server rack. But the company eventually decided against this, instead opting to stick with the NVL72 nomenclature used for the Grace Blackwell rack-scale platforms to reflect the number of GPU packages, each of which contain two GPU dies. The GPU packages for Blackwell products also consist of two GPU dies, which are connected through a high-speed die-to-die interconnect. "Essentially we're just being consistent with how we've deployed and talked about it for Blackwell, and we're carrying that forward for Vera Rubin as well," said Dion Harris, senior director of high-performance computing and AI infrastructure solutions at Nvidia, in a briefing with journalists and analysts on Sunday. Harris said the Rubin platform, with the Vera Rubin NVL72 rack as its flagship product, consists of the Rubin GPU, the Vera CPU -- Nvidia's first CPU with custom, Arm-compatible cores -- and four other new chips the company has co-designed to "meet the needs of the most advanced models and drive down the cost of intelligence." The company provided a litany of specs and features for the Rubin platform, some of which have been shared at previous events. Each Vera CPU features 88 custom Olympus cores, 176 threads with Nvidia's new spatial multi-threading technology, 1.5 TB of system LPDDR5x memory, 1.2 TBps of memory bandwidth and confidential computing capabilities. It also features a 1.8 TBps NVLInk chip-to-chip interconnect to support coherent memory with the GPUs. In the briefing, Harris said the CPU's confidential computing feature allows Vera Rubin to deliver the "first rack-scale Trusted Execution Environment, maintaining data security across CPU, GPU and the NVLink domain [to protect] the world's largest proprietary models, training data and inference workloads." Compared to Nvidia's Grace GPU, which is based on Arm's off-the-shelf Neoverse V2 microarchitecture, Vera offers double the performance for data processing, compression and code compilation, according to the company. The Rubin GPU, on the other hand, is capable of 50 petaflops for inference computing using Nvidia's NVFP4 data format, which is five times faster than Blackwell, the company said. It can also perform 35 petaflops for NVFP4 training, which is 3.5 times faster than its predecessor. The bandwidth for its HBM4 high-bandwidth memory is 22 TBps, 2.8 times faster, while the NVLink bandwidth per GPU is 3.6 TBps, two times faster. The platform also includes the liquid-cooled NVLink 6 Switch for scale-up networking. This switch features 400G SerDes, 3.6 TBps of per-GPU bandwidth for communication between all GPUs, a total bandwidth of 28.8 TBps and 14.4 teraflops of FP8 in-network computing. In addition, the Rubin platform makes use of Nvidia's ConnectX-9 SuperNIC and BlueField-4 DPU to take scale-out networking to the next level, according to the company. All of these parts go into the Vera Rubin NVL72 platform, which is capable of 3.6 exaflops of NVFP4 inference performance, five times greater than the Blackwell-based iteration, Nvidia said. Training performance with the NVFP4 format reaches a purported 2.5 exaflops, which is 3.5 times higher than the predecessor. Vera Rubin also features 54 TB of LPDDR5x capacity, 2.5 times higher than Blackwell, and 20.7 TB of HBM4 capacity, 50 percent more than the predecessor, Nvidia said. HBM4 bandwidth reaches 1.6 PBps, 2.8 times greater, and a scale-up bandwidth of 260 TBps second, double that of the Blackwell NVL72 platform. "That's more bandwidth than the entire global internet," Harris said. Nvidia said Vera Rubin also features the third generation of its NVL72 rack resiliency technologies, which includes a cable-free modular tray design that allows for 18 times faster assembly and service. Other features include NVLink Intelligent Resiliency, which the company claims will allow for maintenance of servers with "zero downtime." "The NVLink switch trays now features zero downtime maintenance and fault tolerance, allowing racks to remain operational while switch trays are removed or partially populated," Harris said. There's also a second-generation RAS Engine for reliability, availability and serviceability needs, which Nvidia said will enable GPU diagnostics without taking the rack offline. "All of these features increase system uptime and goodput, which further drives down the cost of training and inference," Harris said. With agentic AI workloads generating massive amounts of context data, Nvidia is introducing a new storage platform it said will provide a significant boost in inference performance and power efficiency for such applications. Called the Nvidia Inference Context Memory Storage Platform, Harris said the technology uses BlueField-4 and Spectrum-X Ethernet to create "AI-native storage infrastructure for storing KV cache" -- which is a data structure that is key to optimizing the way large language models generate tokens or provide responses. Compared to traditional network storage options for storing inference context data, this new platform delivers up to five times higher tokens per second, five times better performance per dollar and five times better power efficiency, according to Harris. "That translates directly into higher throughput, lower latency and more predictable behavior," he said. "And it really matters for the workloads we've been talking about: large-context applications like multi-turn chat, retrieval-augmented generation and agentic AI multi-step reasoning. These workloads stress how efficiently context can be stored, reused and shared across the entire system." Harris said Nvidia is "working closely with our storage partners to bring a new tier of inference context memory to the Rubin platform so customers can deploy it as part of a complete, integrated AI infrastructure."

[27]

Wccftech

NVIDIA's 'Revolutionary' Rubin AI Chips Enter Full Production Well Ahead of Schedule, Proving Jensen's Pace Is Unmatched

NVIDIA's next-generation Rubin chips are currently in full production, despite the original timeline set for H2 2026, indicating that Jensen's AI strategy centers on being 'fast and lethal'. The Rubin AI lineup is poised to be a significant leap forward for NVIDIA in terms of architectural advancements, given that we are seeing upgrades across multiple elements. I'll dive into the improvements later on, but one of the most significant announcements by NVIDIA at CES 2026 was announcing Rubin to be under full production in Q1 2026, which is almost two quarters earlier than the anticipated timeline. According to the company, development on Rubin had already been initiated three years ago, and production was underway alongside Blackwell. NVIDIA currently operates on an annual product cadence on paper; however, when you actually examine the timeline of generational launches, you'll realize that the cycle is slightly shorter than twelve months. NVIDIA's Blackwell ramp-up initiated in H2 2025, while Blackwell Ultra mass production started in Q3 2025. Now, given that Rubin is under full production, the development shows that NVIDIA's pace of generational launches is unmatched, and this clearly demonstrates the company's commitment to staying ahead of the pack. Team Green is known to have major commitments for its Rubin AI lineup, with OpenAI officially disclosed, but we do know that hyperscalers and neoclouds would rush to get their hands on the newer architecture. NVIDIA's CFO previously revealed that Rubin mass production is slated for H2 2026, and now that the timeline has been pushed back, we could see customer shipments to pan out by H2 of 2026, which means that Rubin will become the mainstream revenue driver, alongside the ongoing Blackwell Ultra shipments. NVIDIA Rubin-based products will be available from partners the second half of 2026. Among the first cloud providers to deploy Vera Rubin-based instances in 2026 will be AWS, Google Cloud, Microsoft and OCI, as well as NVIDIA Cloud Partners CoreWeave, Lambda, Nebius and Nscale. - NVIDIA NVIDIA's Rubin platform will consist of a total of six chips, all of which are back-fabrication fabs and ready for volume production. These chips include: We already have the Rubin announcement live here, but it is simply amazing to see NVIDIA's pursuit towards dominating the AI infrastructure race, and with Rubin, the company is set to maintain its lead in the training segment.

[28]

CXOToday

CES 2026: Nvidia Launches Rubin Chip Architecture, Alpamayo AI Models for Vehicles

With these launches Nvidia aims to corner a large chunk of the AI infrastructure spending over the next 2-3 years As expected, Nvidia CEO Jensen Huang hogged the limelight at the Consumer Electronics Show (CES 2026) with two new launches. The first was the launch Alpamayo, a new series of open-source AI models that'd make vehicles think like humans. The second was the launch of the Rubin computing architecture that fulfils the skyrocketing processing power for AI. The new open-source AI models, simulation tools, and databases would be used to train physical robots and vehicles and are designed to assist autonomous vehicles reason through complex driving situation. Huang described the launch as "the ChatGPT moment for physical AI - where machines begin to understand, reason, and act in the real world." "Alpamayo brings reasoning to autonomous vehicles, allowing them to think through rare scenarios, drive safely in complex environments, and explain their driving decisions," he said in the keynote address while launching the Alpamayo 1, a 10 billion-parameter reason-based vision language action (VLA) model. You can watch the keynote session with Jensen Huang here Ali Kani, Nvidia's vice president of automotive actually referred to the recent incidents of autonomous vehicles stalling when the traffic signals at intersections went off. He noted that the VLA tech allows the vehicle to navigate such challenges without previous experience. "It does this by breaking down problems into steps, reasoning through every possibility, and then selecting the safest path," he told the media. Huang said Alpamayo takes sensor inputs and activates steering wheel, brakes and acceleration as well as reasons about the immediate actions it wants to take. In fact, it tells the rider what action is forthcoming, the reasons for it and the trajectory of movement. The company said the AI models' underlying code is available on Hugging Face for developers to tune it into smaller, faster versions for vehicle development and also in training the drive systems. Rubin architecture chip to speed up AI further Announced in 2024 by Nvidia, the Rubin architecture is currently in production and would ramp up in the second half of 2026. "Vera Rubin is designed to address this fundamental challenge that we have: The amount of computation necessary for AI is skyrocketing. Today, I can tell you that Vera Rubin is in full production," Huang announced. The latest launch forms part of Nvidia's frenetic hardware development cycle that has turned the company into the most valuable enterprise in the world. It is learnt that the Rubin architecture would replace the Blackwell architecture, which was brought in to replace the Hopper and Lovelace architectures. Nvidia has already signed up deals with several high-profile AI companies such as Anthropic, OpenAI, besides AWS and others. In fact, Rubin chips would be in use for almost all major cloud providers as well. The chip is named after renowned astronomer Vera Florence Cooper Rubin. Nvidia shared the details of the Rubin architecture. It comprises six separate chips designed to be used in tandem. The Rubin GPU holds the centrepiece position but the architecture resolves issues related to bottlenecks in storage and interconnections. It does so through improving the Bluefield and NVLink systems respectively. Additionally, there is a Vera CPU that is designed specifically for agentic reasoning, Huang said. According to Dion Harris, Nvidia senior director for AI infrastructure, as one starts to enable new types of workflows, like agentic AI or long-term tasks, there is a lot of stress and requirements on the KV cache. "So, we've introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently," he told the media. Moreover, the new architecture also provides a significant push in speed and power efficiencies with Nvidia claiming that during tests the Rubin architecture operates three and a half times faster than the previous Blackwell architecture on model-training and five times faster on inference tasks. We believe these two announcements are in line with what Huang had estimated in an earnings call in October last year where he stuck it out by suggesting that between $3 trillion to $4 trillion would be spent on AI infrastructure over the next five years. Looks like Rubin and Alpamayo would ensure that Nvidia corners a lion's share of these revenues.

[29]

MediaNama

What Is NVIDIA Rubin? New Full-Stack AI Platform Explained

In a significant move, NVIDIA announced that it will now deliver full-stack Artificial Intelligence (AI) computing systems, moving beyond its traditional role of selling standalone Graphics Processing Units (GPUs) that AI providers and computer manufacturers assemble to complete AI infrastructure or build conventional computers and laptops. The move repositions NVIDIA not just as a semiconductor chip supplier, but also signals NVIDIAâ€™s entry into a platformâ€'based model that aligns with broader shifts in AI infrastructure design. NVIDIA has named its new platform Rubin, after American astronomer Vera Rubin. Founder and Chief Executive Officer Jensen Huang unveiled it at the Consumer Electronics Show (CES) in Las Vegas, describing it as the chipmakerâ€™s first â€œextreme co-designed AI platform. NVIDIAâ€™s Rubin is an end-to-end AI computing platform that leverages extreme hardware and software co-design across six NVIDIA chips to deliver lower-cost, higher-efficiency training and inference for large-scale workloads, as well as agentic AI reasoning. Inference in AI refers to the process of drawing logical conclusions, making predictions, and taking decisions based on available data. NVIDIA said Rubin is in full production, with Rubin-based products expected to be available from partners in the second half of 2026. The Rubin platform uses extreme co-design across six NVIDIA chips to optimise AI workloads and reduce training time and inference token costs. These include: â€œRubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof,â€ Huang said. Addressing this surge in demand, NVIDIA claims the Rubin platform can deliver the following benefits: â€œMicrosoftâ€™s next-generation Fairwater AI superfactories, featuring NVIDIA Vera Rubin NVL72 rack-scale systems, will scale to hundreds of thousands of NVIDIA Vera Rubin Superchips,â€ the press release stated. The company also highlighted deployments by Google, Amazon Web Services (AWS), Anthropic, Lenovo, Meta, Mistral AI, OpenAI, Perplexity, and Elon Musk-owned xAI, among others. NVIDIA said in its technical blog on Rubin that â€œAI has entered an industrial phase.â€ The company argues that emerging â€œAI factoriesâ€ must process thousands of tokens for agentic reasoning, complex workflows, and multimodal pipelines. To address challenges such as real-time inference under constraints related to power, reliability, security, deployment speed, and cost, NVIDIA said it is shifting from GPU-level design to data-centre-level architecture. This approach combines GPUs, Central Processing Units (CPUs), networking, security, software, power delivery, and cooling into a single, end-to-end AI computing platform. According to NVIDIA, the Rubin platform is intended to drive a fundamental shift in the economics of AI computing by increasing hardware utilisation, reducing operational complexity, and cutting wasted power. The company claims the platform lowers the cost per token while improving key performance metrics such as tokens per watt and tokens per rack. As a result, AI workloads that previously required large, fragile clusters of loosely integrated systems can now be delivered with higher density and reliability, while maintaining predictable performance. In addition to Rubin, NVIDIA also released several open models and tools aimed at enabling real-world AI systems.

[30]

Analytics Insight

NVIDIA's Vera Rubin Signals the Next Leap in AI Computing

The platform tightly integrates multiple components, including a next-generation Rubin GPU, a custom Vera CPU, high-bandwidth NVLink interconnects, networking chips, and data-processing units, all co-designed to work as a single system. NVIDIA notes that AI workloads are transforming at a rapid pace. Training giant models is no longer the only challenge; inference, long-context reasoning, and now demand highly efficient communication between chips. According to the company, Vera Rubin can dramatically reduce inference costs and the number of GPUs required for specific workloads compared to the previous generation. This makes it better suited for always-on 'AI factories.'

[31]

Market Screener

Nvidia announces mass production of its new Vera Rubin AI platform

"I can tell you that is in full production," Huang said at the CES technology trade show, which officially kicks off on Tuesday. The platform combines several chips, including the Rubin GPU and the Vera CPU, forming a supercomputer specialized in AI capable of executing advanced models with great speed and efficiency. According to the executive, this architecture offers a maximum inference performance up to five times greater than previous generations, with remarkable energy efficiency, and is designed to run advanced AI models in data centers and large-scale applications. "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof," Huang added. During the presentation, he highlighted 's plans to boost physical AI and introduced , a new family of AI models aimed at improving the safety and development of autonomous vehicles. The first car with this technology is expected to debut on US roads in the first quarter, followed by its arrival in European markets. " does something thats really special. Not only does it take sensor input and activate steering wheel, brakes and acceleration, it also reasons about what action is about to take," the CEO said. Huang also highlighted other physical AI developments, such as Nemotron, aimed at developing intelligent agents with reasoning and decision-making capabilities, and Cosmos, focused on AI models for understanding and simulating real-world environments. Thanks to its strategic alliances, continues to set the course for technological innovation, said , leader of the future strategy team at , during his presentation on Monday, in which he highlighted the collaboration with the American chip company. The presentation by Huang, who was accompanied on stage by robots, was part of a media day that highlighted the presence of artificial intelligence-powered devices, such as CLOiD and the new Atlas humanoid robot from Boston Dynamics, owned by Hyundai. The American chip manufacturer has achieved a high valuation on and has driven the AI ??revolution in the stock market.

[32]

Market Screener

Nvidia launches Vera Rubin platform, comprising 6 new chips designed to deliver one AI supercomputer

NVIDIA Corporation is the world leader in the design, development, and marketing of programmable graphics processors. The group also develops associated software. Net sales break down by family of products as follows: - computing and networking solutions (89%): data center platforms and infrastructure, Ethernet interconnect solutions, high-performance computing solutions, platforms and solutions for autonomous and intelligent vehicles, solutions for enterprise artificial intelligence infrastructure, crypto-currency mining processors, embedded computer boards for robotics, teaching, learning and artificial intelligence development, etc.; - graphics processors (11%): for PCs, game consoles, video game streaming platforms, workstations, etc. (GeForce, NVIDIA RTX, Quadro brands, etc.). The group also offers laptops, desktops, gaming computers, computer peripherals (monitors, mice, joysticks, remote controls, etc.), software for visual and virtual computing, platforms for automotive infotainment systems and cloud collaboration platforms. Net sales break down by industry between data storage (88.3%), gaming (8.7%), professional visualization (1.4%), automotive (1.3%) and other (0.3%). Net sales are distributed geographically as follows: the United States (46.9%), Singapore (18.2%), Taiwan (15.8%), China and Hong Kong (13.1%) and other (6%).

Twitter

Facebook

Copy Link

Nvidia CEO Jensen Huang announced the Vera Rubin architecture at CES 2026, declaring it's in full production. The next-generation AI superchip platform promises a 10x reduction in inference costs and requires four times fewer GPUs to train certain models compared to Blackwell. But the real innovation lies in its six-chip design, where advanced networking components work in concert to handle distributed AI workloads across data centers.

Nvidia Vera Rubin Enters Full Production with Dramatic Cost Reductions

Nvidia CEO Jensen Huang made a surprise announcement at the Consumer Electronics Show in Las Vegas this week, revealing that the company's Nvidia Vera Rubin architecture is already in full production 1

. The next-generation AI superchip platform, set to reach customers in the second half of 2026, promises to dramatically transform AI computing economics. According to Nvidia's performance data, the system will reduce AI inference costs by up to 10x and requires only one-fourth as many GPUs to train certain large models compared to the current Blackwell architecture 2

Source: TweakTown

Named after astronomer Vera Florence Cooper Rubin, the Rubin architecture represents a fundamental shift in how Nvidia approaches AI infrastructure challenges. "Vera Rubin is designed to address this fundamental challenge that we have: The amount of computation necessary for AI is skyrocketing," Huang told the CES audience 2

. The platform will replace the Blackwell architecture, which has driven Nvidia's record-breaking data center revenue growth of 66 percent year-over-year .

Six-Chip Design Prioritizes Networking for Distributed AI Workloads

The Rubin architecture comprises six integrated chips working in what Nvidia calls "extreme co-design." At the center sits the Vera CPU, built with 88 custom Olympus cores and full Armv9.2 compatibility, alongside the Rubin GPU that delivers 50 petaflops of 4-bit computational power for transformer-based inference workloads—five times more than Blackwell's 10 petaflops 1

. Both the Vera CPU and Rubin GPU are built using Taiwan Semiconductor Manufacturing Company's 3nm fabrication process 3

Source: Wccftech

But focusing solely on the GPU misses the bigger picture. Four advanced networking components complete the architecture: the NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 data processing unit, and Spectrum-6 Ethernet switch 1

. "The same unit connected in a different way will deliver a completely different level of performance," explains Gilad Shainer, senior vice president of networking at Nvidia. "That's why we call it extreme co-design" 1

Advanced Networking Components Enable Scale-Out AI Infrastructure

The networking innovations address a critical shift in AI model training and inference. "Two years back, inferencing was mainly run on a single GPU, a single box, a single server," Shainer notes. "Right now, inferencing is becoming distributed, and it's not just in a rack. It's going to go across racks" 1

. The NVLink 6 switch doubles bandwidth to 3,600 gigabytes per second for GPU-to-GPU connections, compared to 1,800 GB/s in the previous generation, while also doubling the number of SerDes and expanding in-network computing capabilities 1

In-network computing allows certain operations to be performed within the network itself rather than on individual GPUs, saving both time and power. For AI model training, this means operations like all-reduce—where GPUs need to share and average their computed gradients—can be done once on the network switch instead of requiring every GPU to perform the calculation 1

. The scale-out network, comprising the ConnectX-9, BlueField-4 paired with two Vera CPUs, and Spectrum-6 Ethernet switch with co-packaged optics, connects different racks within data centers while minimizing jitter to ensure synchronized distributed computing 1

Source: Analytics Insight

Major Cloud Providers Commit to Rubin Deployment

Microsoft and CoreWeave will be among the first to offer services powered by Rubin chips later this year, with two major AI data centers that Microsoft is building in Georgia and Wisconsin set to include thousands of Rubin systems 3

. Amazon Web Services, Google Cloud, Anthropic, and OpenAI have also committed to the platform 2

. The platform will power HPE's Blue Lion supercomputer and the upcoming Doudna supercomputer at Lawrence Berkeley National Lab 2

The Rubin architecture will be available in multiple configurations, including the Nvidia Vera Rubin NVL72, which combines 36 Vera CPUs, 72 Rubin GPUs, NVLink 6 switches, multiple ConnectX-9 SuperNICs, and BlueField-4 DPUs 5

. According to Nvidia's tests, the platform operates 3.5 times faster than Blackwell on AI model training tasks and five times faster on inference, while supporting eight times more inference compute per watt for improved power efficiency 2

Implications for Large-Scale AI Deployment Economics

The dramatic cost reductions target a critical bottleneck in AI adoption. For mixture of experts models, the Rubin architecture can complete training in the same time as Blackwell while using a quarter of the GPUs and at one-seventh the token cost . Dion Harris, Nvidia's senior director of AI infrastructure solutions, points to growing memory demands from agentic AI and long-term tasks. "We've introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently," Harris explained 2

These gains arrive as competition intensifies to build AI infrastructure, with Huang estimating that between $3 trillion and $4 trillion will be spent on AI infrastructure over the next five years 2

. The AI supercomputing platform's efficiency improvements could make it harder for Nvidia's customers to justify moving away from its hardware ecosystem, while potentially accelerating mainstream adoption of advanced AI models by making large-scale AI deployment more economically viable 3

References

Summarized by

Navi

[1]

IEEE

Nvidia's Vera Rubin Architecture Thrives on Networking

[2]

TechCrunch

Nvidia launches powerful new Rubin chip architecture | TechCrunch

[3]

Wired

Jensen Huang Says Nvidia's New Vera Rubin Chips Are in 'Full Production'

[4]

The Verge

Nvidia launches Vera Rubin AI computing platform at CES 2026

[5]

ZDNet

Why Nvidia's new Rubin platform could change the future of AI computing forever

Recent Highlights

Today's Top Stories

Google Maps unveils Ask Maps with Gemini AI and 3D Immersive Navigation in biggest update

Google Maps introduces Ask Maps, a Gemini-powered chatbot that answers complex questions and plans trips conversationally. The update also includes Immersive Navigation, a 3D driving view that Google calls its biggest navigation redesign in over a decade. Both features roll out to US and India users on iOS and Android starting today.

21 Sources

Technology

10 hrs ago

Google uses AI and 5 million news reports to predict flash floods across 150 countries

Google deployed Gemini to analyze 5 million news articles, identifying 2.6 million historical flood events to create Groundsource—a geo-tagged dataset that trains AI models to predict flash floods up to 24 hours in advance. The system now forecasts urban flash flood risks across 150 countries through Google's Flood Hub, helping emergency response agencies act faster despite limitations in resolution compared to traditional weather services.

4 Sources

Technology

10 hrs ago

Perplexity launches Personal Computer, an AI agent that runs 24/7 on your Mac mini

Perplexity unveiled Personal Computer at its Ask 2026 developer conference, turning Mac mini into an always-on AI system with full access to local files and apps. The AI agent tool operates continuously, controllable from any device, while promising security through user approval for sensitive actions, full audit trails, and a kill switch for emergencies.

9 Sources

Technology

22 hrs ago

AI autocomplete covertly shifts human opinions on social issues, even when users ignore suggestions

Cornell University researchers discovered that AI autocomplete tools subtly manipulate how people think about major societal issues. In studies with over 2,500 participants, biased AI suggestions shifted opinions on topics like the death penalty and fracking—even among users who rejected the AI's text. Warning participants about AI bias beforehand or debriefing them afterward failed to prevent the attitude shift.

3 Sources

Science and Research

22 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

Nvidia Vera Rubin architecture slashes AI costs by 10x with advanced networking at its core

Nvidia Vera Rubin Enters Full Production with Dramatic Cost Reductions

Six-Chip Design Prioritizes Networking for Distributed AI Workloads

Advanced Networking Components Enable Scale-Out AI Infrastructure

Major Cloud Providers Commit to Rubin Deployment

Implications for Large-Scale AI Deployment Economics

References

Nvidia's Vera Rubin Architecture Thrives on Networking

Nvidia launches powerful new Rubin chip architecture | TechCrunch

Jensen Huang Says Nvidia's New Vera Rubin Chips Are in 'Full Production'

Nvidia launches Vera Rubin AI computing platform at CES 2026

Why Nvidia's new Rubin platform could change the future of AI computing forever

Related Stories

Nvidia ships first Vera Rubin AI chips to customers, promising 10x efficiency gains over Blackwell

Nvidia Unveils Vera Rubin Superchip: Six-Trillion Transistor AI Platform Set for 2026 Production

NVIDIA Unveils Next-Gen AI Powerhouses: Rubin and Rubin Ultra GPUs with Vera CPUs

Recent Highlights

OpenAI Releases GPT-5.4, New AI Model Built for Agents and Professional Work

Anthropic sues Pentagon over supply chain risk label after refusing autonomous weapons use

OpenAI secures $110 billion funding round as questions swirl around AI bubble and profitability

Recent Highlights

Today's Top Stories

Google Maps unveils Ask Maps with Gemini AI and 3D Immersive Navigation in biggest update

Google uses AI and 5 million news reports to predict flash floods across 150 countries

Perplexity launches Personal Computer, an AI agent that runs 24/7 on your Mac mini

AI autocomplete covertly shifts human opinions on social issues, even when users ignore suggestions