8 Sources
8 Sources
[1]
Nvidia Groq 3 LPU and Groq LPX racks join Rubin platform at GTC -- SRAM-packed accelerator boosts 'every layer of the AI model on every token'
Nvidia's Vera Rubin platform is poised to massively power up the next generation of AI data centers, or "factories," as CEO Jensen Huang calls them, when those systems start arriving later this year. Today, during his GTC keynote, Huang revealed how Nvidia is using the IP it acquired from Groq last year to expand Rubin's capabilities. The Rubin platform now includes a new chip, the Nvidia Groq 3 LPU, an inference accelerator that bolsters these systems' ability to deliver tokens in volume and at low latency for high interactivity at the leading edge of AI models. Recall that the Rubin platform already includes six chips from which Nvidia builds up rack-scale systems and scales them out into AI factories: the Rubin GPU itself, the Vera CPU, NVLink 6 scale-up switches, the ConnectX 9 smart NIC, the Bluefield 4 data processing unit, and the Spectrum-X scale-out switch with co-packaged optics. The Groq 3 LPU becomes another building block for Rubin at scale. Unlike most AI accelerators, which rely on HBM as their working memory tier, each Groq 3 LPU incorporates 500 MB of SRAM, the same memory used for ultra-high-speed caches on CPUs and GPUs. That's paltry compared to the vastly more capacious 288GB of HBM4 on each Rubin GPU, but as you would expect, that SRAM delivers 150 TB/s of bandwidth, far more than the 22 TB/s of that same HBM. For bandwidth-sensitive AI decode operations, the massive bandwidth boost of the Groq 3 chip offers tantalizing benefits for inference applications. In turn, Nvidia will build up Groq 3 LPX racks comprising 256 Groq 3 LPUs. That rack offers 128GB of SRAM with 40 PB/s of bandwidth for inference acceleration, and it joins those chips together with a dedicated scale-up interface of 640 TB/s per rack. Nvidia envisions Groq LPX as a co-processor for Rubin that will boost decode performance at "every layer of the AI model on every token," according to Nvidia hyperscale VP Ian Buck, and it positions Rubin to serve the next frontier of AI: multi-agent systems that need to deliver interactive performance while inferencing models of trillions of parameters with context windows of millions of tokens. As the AI agents in those multi-agent systems begin talking more and more to other AIs rather than humans looking at chatbot windows, the frontier for responsiveness requirements also shifts. What might seem like a reasonable rate of tokens generated per second for a human is glacial for an AI agent. In the future of multi-agent systems that Buck describes, the combination of Rubin GPUs and Groq LPUs moves us from a world where 100 tokens per second is a reasonable throughput to one of 1500 TPS or more for AI agent intercommunication. The addition of the Groq 3 LPU to the Rubin arsenal could help the platform fend off challengers in the low-latency inference frontier. Cerebras, whose wafer-scale engines fuse massive amounts of SRAM and compute for low-latency inference with advanced models, has frequently needled Nvidia regarding the perceived disadvantages of its GPUs for that purpose, and customers as large as OpenAI have signed up for Cerebras capacity to serve some of their state-of-the-art models with the favorable latency characteristics of that platform. Buck also hinted that the Groq 3 LPU might lead to a reduced role for the Rubin CPX inference accelerator, saying that the company is currently focused on integrating the Groq 3 LPX rack with Rubin. While he didn't offer more details, that focus shift would make sense in today's memory-constricted world, since the two chips are meant to offer similar enhancements for inference performance and the Groq LPU doesn't require the large amount of GDDR7 memory that each Rubin CPX module does. We're on the ground at GTC this week, and we'll be exploring what the fusion of Groq and Nvidia IP means for the future of AI inference through conversations and sessions at the event. Stay tuned. Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
[2]
Nvidia slaps Groq into new LPX racks for faster AI response
GPUzilla's $20B acquihire paves to way to AI agents that halucinate faster than ever GTC Nvidia will use Groq's language processing units (LPUs), a technology it paid $20 billion for, to boost the inference performance of its newly-announced Vera Rubin rack systems, CEO Jensen Huang revealed during his GTC keynote on Monday. Using this technology, the GPU giant can now serve massive trillion parameter large language models (LLMs) at hundreds or even thousands of tokens a second per user, Ian Buck, VP of Hyperscale and HPC at Nvidia told press ahead of Huang's keynote Sunday. Until now, ultra-low latency inference has been dominated by a handful of boutique chip slingers like Cerebras, SambaNova, and of course, Groq, the latter of which Nvidia all but absorbed as part of an acquihire late last year. Demand for these so-called premium tokens has grown over the past year. OpenAI is using Cerebras' dinner-plate sized accelerators to achieve near nearly instantaneous code generation for models like GPT-5.3 Codex-Spark. By combining its GPUs with Groq's LPUs, Nvidia wagers inference providers will be able to charge as much as $45 per million tokens generated. To put that in perspective, OpenAI currently charges about $15 per million output tokens for API access to its top GPT-5.4 model. To be clear, LPUs won't replace Nvidia's GPUs but rather augment them. LLM inference encompasses two stages: the compute-heavy prefill phase in which the prompts are processed, and the bandwidth-heavy decode phase during which a response is generated. With up to 50 petaFLOPs each, Nvidia's newly announced Rubin GPUs aren't hurting for compute, but with 22 TB/s of HBM4 memory bandwidth, Groq's latest chip tech is nearly 7x faster, achieving 150 TB/s apiece. This makes Groq's LPU an ideal decode accelerator. Nvidia plans to cram 256 of the chips into a new LPX rack system that'll be connected via a custom Spectrum-X interconnect to a neighboring Vera-Rubin NVL72 rack system. The GPUs will handle the compute-intensive prompt processing, while the LPUs spew out tokens. The GPU giant needs that many chips because, while SRAM may be fast, the chips are neither capacious nor compute-dense. Each Groq 3 LPU is capable of 1.2 petaFLOPS of FP8 and contains 500 MB of on board memory. That's about 1/500th of the capacity of Nvidia's Rubin GPU. "The LPU is optimized strictly for that extreme, low-latency token generation, offering token rates in the 1000s of tokens per second. The trade off, of course, is that you need many chips in order to perform that kind of performance," Buck explained. "The tokens per second per chip, is actually quite low." In other words, to do anything interesting, Nvidia is going to need a lot of them. Even with 256 chips per rack, that's only 128 GB of ultra fast memory, which is nowhere near enough for trillion-parameter models like Kimi K2. At 4-bit precision you'd need at least 512 GB of memory or about a thousand LPUs to hold a 1 trillion-parameter model in memory. Nvidia says multiple LPX racks can be ganged together to support these larger models. The integration of Groq's latest LPUs into Nvidia's LPX racks represents a bit of a course correction for the AI infrastructure magnate. Nvidia had previously announced a dedicated prefill processor called Rubin CPX at Computex last year. The basic idea was to use GDDR7-equipped Rubin CPX processors for prefill processing and HBM-equipped Rubin GPUs for decode. However, that project appears to have been abandoned in favor of Groq's LPU-based decode accelerators. "Integrating LPU and LPX into our written platform to optimize the decode, that's where we're focused right now," Buck said. Nvidia isn't the only one looking to fuse its compute-heavy AI accelerators to an SRAM heavy architecture like Groq's. On Friday, Amazon Web Services (AWS) announced a collaboration with Cerebras to develop a combined inference platform, not unlike Nvidia's Groq 3 LPX. In this case, the platform will use AWS' Trainium 3 accelerators for prompt processing and Cerebras' WSE-3 ASICs, each of which pack 44 GB of SRAM onto a wafer-sized chip, to generate low-latency tokens. Nvidia's Groq-based LPX systems are expected to ship alongside its Vera Rubin rack systems later this year, though it appears both access and software support may be somewhat limited. At least initially, Nvidia is focusing on model builders and service providers that need to serve trillion-plus parameter models with high token rates. Buck also notes that while Nvidia is using Groq's ASICs to accelerate its inference platform, they don't support its CUDA natively just yet. "There are no changes to CUDA at this time. We are leveraging the LPU as an accelerator to the CUDA that's running on the Vera NVL 72 platform," he explained. ®
[3]
NVIDIA Vera Rubin Opens Agentic AI Frontier
Seven New Chips in Full Production to Scale the World's Largest AI Factories With Configurable AI Infrastructure Optimized for Every Phase of AI, From Pretraining, Post-Training and Test-Time Scaling to Agentic Inference News Summary: The NVIDIA Vera Rubin platform is opening the next AI frontier with: GTC -- NVIDIA today announced the NVIDIA Vera Rubin platform is opening the next frontier of agentic AI, with seven new chips now in full production to scale the world's largest AI factories. The platform brings together the NVIDIA Vera CPU, NVIDIA Rubin GPU, NVIDIA NVLink™ 6 Switch, NVIDIA ConnectX-9 SuperNIC, NVIDIA BlueField-4 DPU and NVIDIA Spectrum™-6 Ethernet switch, as well as the newly integrated NVIDIA Groq 3 LPU. Designed to operate together as one incredible AI supercomputer, the chips power every phase of AI -- from massive-scale pretraining, post-training and test-time scaling to real-time agentic inference. "Vera Rubin is a generational leap -- seven breakthrough chips, five racks, one giant supercomputer -- built to power every phase of AI," said Jensen Huang, founder and CEO of NVIDIA. "The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history." "Enterprises and developers are using Claude for increasingly complex reasoning, agentic workflows and mission-critical decisions. That demands infrastructure that can keep pace," said Dario Amodei, CEO and cofounder of Anthropic. "NVIDIA's Vera Rubin platform gives us the compute, networking and system design to keep delivering while advancing the safety and reliability our customers depend on." "NVIDIA infrastructure is the foundation that lets us keep pushing the frontier of AI," said Sam Altman, CEO of OpenAI. "With NVIDIA Vera Rubin, we'll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people." Shift to POD-Scale Systems AI infrastructure is rapidly evolving -- from discrete chips and standalone servers to fully integrated rack-scale systems, POD-scale deployments, AI factories and sovereign AI. These advances are driving dramatic gains in performance, improving cost efficiency for organizations of all sizes and across industries -- from startups and mid-sized businesses to public-private institutions and enterprises -- while helping democratize access to AI and improving energy efficiency to power the world's most demanding workloads. Through deep codesign across compute, networking and storage, supported by an ecosystem of more than 80 NVIDIA MGX ecosystem partners with a global supply chain, NVIDIA Vera Rubin offers the most extensive NVIDIA POD-scale platform -- a supercomputer where multiple racks purpose-built for AI work together as one massive, coherent system. NVIDIA Vera Rubin NVL72 Rack Integrating 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs, Vera Rubin NVL72 delivers breakthrough efficiency -- training large mixture-of-experts models with one-fourth the number of GPUs compared with the NVIDIA Blackwell platform and achieving up to 10x higher inference throughput per watt at one-tenth the cost per token. Designed for hyperscale AI factories worldwide, NVL72 scales seamlessly with NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilization across massive GPU clusters while reducing time to train and total cost of ownership. NVIDIA Vera CPU Rack Reinforcement learning and agentic AI workloads rely on large numbers of CPU-based environments to test and validate the results generated by models running on GPU systems. The NVIDIA Vera CPU Rack delivers dense, liquid-cooled infrastructure built on NVIDIA MGX, integrating 256 Vera CPUs to provide scalable, energy-efficient capacity with world- class single-threaded performance, unlocking agentic AI at scale. Integrated with Spectrum-X Ethernet networking, Vera CPU racks keep CPU environments tightly synchronized across the AI factory. Together with GPU compute racks, they provide the CPU foundation for large-scale agentic AI and reinforcement learning -- with Vera delivering results twice as efficiently and 50% faster than traditional CPUs. NVIDIA Groq 3 LPX Rack NVIDIA Groq 3 LPX marks a milestone in accelerated computing. Designed for the low-latency and large-context demands of agentic systems, LPX and Vera Rubin unite the extreme performance of both processors to deliver up to 35x higher inference throughput per megawatt and up to 10x more revenue opportunity for trillion-parameter models. At scale, a fleet of LPUs function as a giant single processor for fast, deterministic inference acceleration. The LPX rack with 256 LPU processors features 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the AI model for every output token. Optimized for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximize efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. Fully liquid cooled and built on MGX infrastructure, LPX integrates seamlessly into next-generation Vera Rubin AI factories to be available in the second half of this year. NVIDIA BlueField-4 STX Storage Rack The NVIDIA BlueField-4 STX rack-scale system is an AI-native storage infrastructure that extends GPU memory seamlessly across the POD. Powered by BlueField-4 -- combining the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC -- STX delivers a high-bandwidth shared layer optimized for storing and retrieving the massive key-value cache data generated by large language models and agentic AI workflows. NVIDIA DOCA Memos™ -- a new DOCA framework that supercharges BlueField-4 storage -- enables dedicated KV cache storage processing to boost inference throughput by up to 5x while significantly improving power efficiency compared with general-purpose storage architectures. The result is POD-wide context that delivers faster multi-turn interactions with AI agents, more scalable AI services and higher overall infrastructure utilization. "The NVIDIA BlueField-4 STX rack-scale context memory storage system will enable a critical performance boost needed to exponentially scale our agentic AI efforts," said Timothée Lacroix, cofounder and chief technology officer of Mistral AI. "By delivering a new storage tier purpose-built for AI agents memory, STX is ideally positioned to ensure that our models can maintain coherence and speed when reasoning across massive datasets." NVIDIA Spectrum-6 SPX Ethernet Rack Spectrum-6 SPX Ethernet is engineered to accelerate east-west traffic across AI factories. Configurable with either Spectrum-X Ethernet or NVIDIA Quantum-X800 InfiniBand switches, it delivers low-latency, high-throughput rack-to-rack connectivity at scale. Spectrum-X Ethernet Photonics with co-packaged optics achieves up to 5x greater optical power efficiency and 10x higher resiliency compared with traditional pluggable transceivers. Improving Resiliency and Energy Efficiency NVIDIA, along with over 200 data center infrastructure partners, announced the NVIDIA DSX platform for Vera Rubin. This includes DSX Max-Q to enable dynamic power provisioning across the entire AI factory, resulting in the deployment of 30% more AI infrastructure within a fixed-power data center. The new DSX Flex software enables AI factories to be grid-flexible assets, unlocking 100 gigawatts of stranded grid power. NVIDIA also today released the Vera Rubin DSX AI Factory reference design, a blueprint for codesigned AI infrastructure that maximizes tokens per watt and overall goodput, improving system resiliency and accelerating time to first production. By tightly integrating compute, networking, storage, power and cooling, the architecture increases energy efficiency and ensures AI factories can scale reliably under continuous, high-intensity workloads with maximum uptime. Broad Ecosystem Support Vera Rubin-based products will be available from partners starting the second half of this year. This includes leading cloud providers Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, along with NVIDIA Cloud Partners CoreWeave, Crusoe, Lambda, Nebius, Nscale and Together AI. Global system manufacturers Cisco, Dell Technologies, HPE, Lenovo and Supermicro are expected to deliver a wide range of servers based on Vera Rubin products, as well as Aivres, ASUS, Foxconn, GIGABYTE, Inventec, Pegatron, Quanta Cloud Technology (QCT), Wistron and Wiwynn. AI labs and frontier model developers including Anthropic, Meta, Mistral AI and OpenAI are looking to use the NVIDIA Vera Rubin platform to train larger, more capable models and to serve long-context, multimodal systems at lower latency and cost than with prior GPU generations.
[4]
Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthropic and Meta on board
Nvidia on Monday took the wraps off Vera Rubin, a sweeping new computing platform built from seven chips now in full production -- and backed by an extraordinary lineup of customers that includes Anthropic, OpenAI, Meta and Mistral AI, along with every major cloud provider. The message to the AI industry, and to investors, was unmistakable: Nvidia is not slowing down. The Vera Rubin platform claims up to 10x more inference throughput per watt and one-tenth the cost per token compared with the Blackwell systems that only recently began shipping. CEO Jensen Huang, speaking at the company's annual GTC conference, called it "a generational leap" that would kick off "the greatest infrastructure buildout in history." Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will all offer the platform, and more than 80 manufacturing partners are building systems around it. "Vera Rubin is a generational leap -- seven breakthrough chips, five racks, one giant supercomputer -- built to power every phase of AI," Huang declared. "The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history." In any other industry, such rhetoric might be dismissed as keynote theater. But Nvidia occupies a singular position in the global economy -- a company whose products have become so essential to the AI boom that its market capitalization now rivals the GDP of mid-sized nations. When Huang says the infrastructure buildout is historic, the CEOs of the companies actually writing the checks are standing behind him, nodding. Dario Amodei, the chief executive of Anthropic, said Nvidia's platform "gives us the compute, networking and system design to keep delivering while advancing the safety and reliability our customers depend on." Sam Altman, the chief executive of OpenAI, said that "with Nvidia Vera Rubin, we'll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people." The Vera Rubin platform brings together the Nvidia Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch and the newly integrated Groq 3 LPU -- a purpose-built inference accelerator. Nvidia organized these into five interlocking rack-scale systems that function as a unified supercomputer. The flagship NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. Nvidia says it can train large mixture-of-experts models using one-quarter the GPUs required on Blackwell, a claim that, if validated in production, would fundamentally alter the economics of building frontier AI systems. The Vera CPU rack packs 256 liquid-cooled processors into a single rack, sustaining more than 22,500 concurrent CPU environments -- the sandboxes where AI agents execute code, validate results and iterate. Nvidia describes the Vera CPU as the first processor purpose-built for agentic AI and reinforcement learning, featuring 88 custom-designed Olympus cores and LPDDR5X memory delivering 1.2 terabytes per second of bandwidth at half the power of conventional server CPUs. The Groq 3 LPX rack, housing 256 inference processors with 128 gigabytes of on-chip SRAM, targets the low-latency demands of trillion-parameter models with million-token contexts. The BlueField-4 STX storage rack provides what Nvidia calls "context memory" -- high-speed storage for the massive key-value caches that agentic systems generate as they reason across long, multi-step tasks. And the Spectrum-6 SPX Ethernet rack ties it all together with co-packaged optics delivering 5x greater optical power efficiency than traditional transceivers. Why Nvidia is betting the future on autonomous AI agents -- and rebuilding its stack around them The strategic logic binding every announcement Monday into a single narrative is Nvidia's conviction that the AI industry is crossing a threshold. The era of chatbots -- AI that responds to a prompt and stops -- is giving way to what Huang calls "agentic AI": systems that reason autonomously for hours or days, write and execute software, call external tools, and continuously improve. This isn't just a branding exercise. It represents a genuine architectural shift in how computing infrastructure must be designed. A chatbot query might consume milliseconds of GPU time. An agentic system orchestrating a drug discovery pipeline or debugging a complex codebase might run continuously, consuming CPU cycles to execute code, GPU cycles to reason, and massive storage to maintain context across thousands of intermediate steps. That demands not just faster chips, but a fundamentally different balance of compute, memory, storage and networking. Nvidia addressed this with the launch of its Agent Toolkit, which includes OpenShell, a new open-source runtime that enforces security and privacy guardrails for autonomous agents. The enterprise adoption list is remarkable: Adobe, Atlassian, Box, Cadence, Cisco, CrowdStrike, Dassault Systèmes, IQVIA, Red Hat, Salesforce, SAP, ServiceNow, Siemens and Synopsys are all integrating the toolkit into their platforms. Nvidia also launched NemoClaw, an open-source stack that lets users install its Nemotron models and OpenShell runtime in a single command to run secure, always-on AI assistants on everything from RTX laptops to DGX Station supercomputers. The company separately announced Dynamo 1.0, open-source software it describes as the first "operating system" for AI inference at factory scale. Dynamo orchestrates GPU and memory resources across clusters and has already been adopted by AWS, Azure, Google Cloud, Oracle, Cursor, Perplexity, PayPal and Pinterest. Nvidia says it boosted Blackwell inference performance by up to 7x in recent benchmarks. The Nemotron coalition and Nvidia's play to shape the open-source AI landscape If Vera Rubin represents Nvidia's hardware ambition, the Nemotron Coalition represents its software ambition. Announced Monday, the coalition is a global collaboration of AI labs that will jointly develop open frontier models trained on Nvidia's DGX Cloud. The inaugural members -- Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam and Thinking Machines Lab, the startup led by former OpenAI executive Mira Murati -- will contribute data, evaluation frameworks and domain expertise. The first model will be co-developed by Mistral AI and Nvidia and will underpin the upcoming Nemotron 4 family. "Open models are the lifeblood of innovation and the engine of global participation in the AI revolution," Huang said. Nvidia also expanded its own open model portfolio significantly. Nemotron 3 Ultra delivers what the company calls frontier-level intelligence with 5x throughput efficiency on Blackwell. Nemotron 3 Omni integrates audio, vision and language understanding. Nemotron 3 VoiceChat supports real-time, simultaneous conversations. And the company previewed GR00T N2, a next-generation robot foundation model that it says helps robots succeed at new tasks in new environments more than twice as often as leading alternatives, currently ranking first on the MolmoSpaces and RoboArena benchmarks. The open-model push serves a dual purpose. It cultivates the developer ecosystem that drives demand for Nvidia hardware, and it positions Nvidia as a neutral platform provider rather than a competitor to the AI labs building on its chips -- a delicate balancing act that grows more complex as Nvidia's own models grow more capable. From operating rooms to orbit: how Vera Rubin's reach extends far beyond the data center The vertical breadth of Monday's announcements was almost disorienting. Roche revealed it is deploying more than 3,500 Blackwell GPUs across hybrid cloud and on-premises environments in the U.S. and Europe -- the largest announced GPU footprint in the pharmaceutical industry. The company is using the infrastructure for biological foundation models, drug discovery and digital twins of manufacturing facilities, including its new GLP-1 facility in North Carolina. Nearly 90 percent of Genentech's eligible small-molecule programs now integrate AI, Roche said, with one oncology molecule designed 25 percent faster and a backup candidate delivered in seven months instead of more than two years. In autonomous vehicles, BYD, Geely, Isuzu and Nissan are building Level 4-ready vehicles on Nvidia's Drive Hyperion platform. Nvidia and Uber expanded their partnership to launch autonomous vehicles across 28 cities on four continents by 2028, starting with Los Angeles and San Francisco in the first half of 2027. The company introduced Alpamayo 1.5, a reasoning model for autonomous driving already downloaded by more than 100,000 automotive developers, and Nvidia Halos OS, a safety architecture built on ASIL D-certified foundations for production-grade autonomy. Nvidia also released the first domain-specific physical AI platform for healthcare robotics, anchored by Open-H -- the world's largest healthcare robotics dataset, with over 700 hours of surgical video. CMR Surgical, Johnson & Johnson MedTech and Medtronic are among the adopters. And then there was space. The Vera Rubin Space Module delivers up to 25x more AI compute for orbital inferencing compared with the H100 GPU. Aetherflux, Axiom Space, Kepler Communications, Planet Labs and Starcloud are building on it. "Space computing, the final frontier, has arrived," Huang said, deploying the kind of line that, from another executive, might draw eye-rolls -- but from the CEO of a company whose chips already power the majority of the world's AI workloads, lands differently. The deskside supercomputer and Nvidia's quiet push into enterprise hardware Amid the spectacle of trillion-parameter models and orbital data centers, Nvidia made a quieter but potentially consequential move: it launched the DGX Station, a deskside system powered by the GB300 Grace Blackwell Ultra Desktop Superchip that delivers 748 gigabytes of coherent memory and up to 20 petaflops of AI compute performance. The system can run open models of up to one trillion parameters from a desk. Snowflake, Microsoft Research, Cornell, EPRI and Sungkyunkwan University are among the early users. DGX Station supports air-gapped configurations for regulated industries, and applications built on it move seamlessly to Nvidia's data center systems without rearchitecting -- a design choice that creates a natural on-ramp from local experimentation to large-scale deployment. Nvidia also updated DGX Spark, its more compact system, with support for clustering up to four units into a "desktop data center" with linear performance scaling. Both systems ship preconfigured with NemoClaw and the Nvidia AI software stack, and support models including Nemotron 3, Google Gemma 3, Qwen3, DeepSeek V3.2, Mistral Large 3 and others. Adobe and Nvidia separately announced a strategic partnership to develop the next generation of Firefly models using Nvidia's computing technology and libraries. Adobe will also build a cloud-native 3D digital twin solution for marketing on Nvidia Omniverse and integrate Nemotron capabilities into Adobe Acrobat. The partnership spans creative tools including Photoshop, Premiere Pro, Frame.io and Adobe Experience Platform. Building the factories that build intelligence: Nvidia's AI infrastructure blueprint Perhaps the most telling indicator of where Nvidia sees the industry heading is the Vera Rubin DSX AI Factory reference design -- essentially a blueprint for constructing entire buildings optimized to produce AI. The reference design outlines how to integrate compute, networking, storage, power and cooling into a system that maximizes what Nvidia calls "tokens per watt," along with an Omniverse DSX Blueprint for creating digital twins of these facilities before they are built. The software stack includes DSX Max-Q for dynamic power provisioning -- which Nvidia says enables 30 percent more AI infrastructure within a fixed-power data center -- and DSX Flex, which connects AI factories to power-grid services to unlock what the company estimates is 100 gigawatts of stranded grid capacity. Energy leaders Emerald AI, GE Vernova, Hitachi and Siemens Energy are using the architecture. Nscale and Caterpillar are building one of the world's largest AI factories in West Virginia using the Vera Rubin reference design. Industry partners Cadence, Dassault Systèmes, Eaton, Jacobs, Schneider Electric, Siemens, PTC, Switch, Trane Technologies and Vertiv are contributing simulation-ready assets and integrating their platforms. CoreWeave is using Nvidia's DSX Air to run operational rehearsals of AI factories in the cloud before physical delivery. "In the age of AI, intelligence tokens are the new currency, and AI factories are the infrastructure that generates them," Huang said. It is the kind of formulation -- tokens as currency, factories as mints -- that reveals how Nvidia thinks about its place in the emerging economic order. What Nvidia's grand vision gets right -- and what remains unproven The scale and coherence of Monday's announcements are genuinely impressive. No other company in the semiconductor industry -- and arguably no other technology company, period -- can present an integrated stack spanning custom silicon, systems architecture, networking, storage, inference software, open models, agent frameworks, safety runtimes, simulation platforms, digital twin infrastructure and vertical applications from drug discovery to autonomous driving to orbital computing. But scale and coherence are not the same as inevitability. The performance claims for Vera Rubin, while dramatic, remain largely unverified by independent benchmarks. The agentic AI thesis that underpins the entire platform -- the idea that autonomous, long-running AI agents will become the dominant computing workload -- is a bet on a future that has not yet fully materialized. And Nvidia's expanding role as a provider of models, software, and reference architectures raises questions about how long its hardware customers will remain comfortable depending so heavily on a single supplier for so many layers of their stack. Competitors are not standing still. AMD continues to close the gap on data center GPU performance. Google's TPUs power some of the world's largest AI training runs. Amazon's Trainium chips are gaining traction inside AWS. And a growing cohort of startups is attacking various pieces of the AI infrastructure puzzle. Yet none of them showed up at GTC on Monday with endorsements from the CEOs of Anthropic and OpenAI. None of them announced seven new chips in full production simultaneously. And none of them presented a vision this comprehensive for what comes next. There is a scene that repeats at every GTC: Huang, in his trademark leather jacket, holds up a chip the way a jeweler holds up a diamond, rotating it slowly under the stage lights. It is part showmanship, part sermon. But the congregation keeps growing, the chips keep getting faster, and the checks keep getting larger. Whether Nvidia is building the greatest infrastructure in history or simply the most profitable one may, in the end, be a distinction without a difference.
[5]
Nvidia ups the stakes for AI infra with turbocharged Vera Rubin platform launch - SiliconANGLE
Nvidia ups the stakes for AI infra with turbocharged Vera Rubin platform launch Nvidia Corp. is throwing down the gauntlet to the rest of the artificial intelligence chip industry with the launch of its next-generation Vera Rubin platform. Announced at GTC 2026 today, it consists of no less than seven new ships designed to power what Chief Executive Jensen Huang said is the "greatest infrastructure buildout in history." Vera Rubin is named after the pioneering astronomer who first discovered evidence for dark matter, and it's much more than just a simple refresh of its previous-generation graphics processing units. The company said it's a complete architectural overhaul that's aimed to power the enterprise shift toward "agentic AI" - a world where autonomous AI agents that can reason, use third-party software tools and execute complex workloads on behalf of humans. The Vera Rubin platform is anchored by the new Rubin GPU and Vera central processing units, but that's not all. The platform also consists of Nvidia's NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 data processing unit and the Spectrum-6 Ethernet switch, plus the new Nvidia Groq 3 large processing unit that's designed to support the deterministic, low-latency requirements of trillion-parameter model inference. Huang promised that Vera Rubin will deliver a "generational leap" in AI compute performance. "Seven breakthrough chips, five racks, one giant supercomputer, built to power every phase of AI" was how Huang described it. "The agentic AI inflection point has arrived." Nvidia said it wants to move away from selling discrete chips and standalone servers and move toward selling complete "AI factories," made up of fully integrated rack-scale systems and pod-scale deployments to support sovereign AI deployments. At the heart of this strategy is the new Vera Rubin NVL72, which is a liquid-cooled rack-scale system made up of 72 Rubin GPUs and 36 Vera CPUs connected over its high-speed NVLink 6 interconnects. The system also integrates the new ConnectX-9 SuperNICs and BlueField-4 DPUs to achieve "breakthrough efficiency." For instance, Nvidia said the Vera Rubin NVL72 platform can be used to train large mixture-of-experts models using just one-fourth of the number of GPUs compared to what would be required with its previous-generation Blackwell chips. In terms of inference, the company said Vera Rubin will deliver 10 times greater throughput at just a 10th of the cost per token. For agentic reasoning workloads, Nvidia has introduced the Vera CPU Rack, which consists of 256 CPUs in a single cluster. It's aimed at reinforcement learning and agentic workloads that require heavy CPU-based simulation to validate GPU-generated results, the company explained. According to Nvidia, these racks are 50% faster and twice as efficient as traditional x86-based CPU servers at reasoning tasks. Meanwhile, the BlueField-4 STX storage rack is meant to act like a dedicated "context memory" tier, which AI agents can use to maintain coherence during massive, multi-turn interactions, Nvidia said. By offloading cache data to the BlueField-4 chips, companies can increase their inference throughput by up to five-times. Finally there's the Nvidia Groq LPX Rack, which is meant to set new standards for accelerated computing. It's aimed at low-latency workloads and the large context demands of agentic systems, and combines the performance of Vera Rubin with Nvidia's custom LPUs to accelerate inference throughput per megawatt by 35 times. When paired with the Vera Rubin GPUs, they will boost performance by jointly computing each layer of the underlying AI model for every output token, Nvidia said. OpenAI Group PBC and Anthropic PBC CEOs Sam Altman and Dario Amodei both heaped praise on the new platform. "Nvidia infrastructure is the foundation that lets us keep pushing the frontier of AI," Altman said. "With Nvidia Vera Rubin, we'll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people." The new chips aren't just about raw performance - they also tackle two of the major problems with AI infrastructure, namely power consumption and heat. With the new Vera Rubin DSX AI Factory Reference Design, Nvidia has unveiled a comprehensive blueprint for data center operators to build out multiple, massive clusters of Vera Rubin chips. The DSX stack is powered by Nvidia's DSX Max-Q software, which uses dynamic power provisioning to squeeze 30% more infrastructure into a fixed power envelope, the company explained. Meanwhile DSX Flex helps AI factories interact with the power grid to unlock "stranded" energy. Companies including Dassault Systèmes SA and Cadence Inc. say they have already integrated the blueprint into their respective Systems Engineering and Reality Data Center Digital Twin platforms. In addition, Nvidia rolled out the Nvidia Omniverse DSX Blueprint, which allows customers such as Schneider Electric Co. and Siemens AG to build "physically accurate digital twins" of their AI factories. By simulating airflow, power utilization, network topologies and thermal behavior virtually, those companies will be better able to optimize their AI infrastructure and squeeze out more performance at lower costs. Nvidia said customers won't have to wait long to get their hands on the new Vera Rubin platform. It's expected to ship via cloud infrastructure partners like Amazon Web Services Inc., Google Cloud and Microsoft Corp, as well as hardware manufacturers such as Dell Technologies Inc. and Supermicro Computer Inc. in the second half of the year.
[6]
NVIDIA Unveils Vera Rubin With Groq's LPX to Break Into Inference, a Market Where It Has Never Been First
NVIDIA's Groq partnership is now formalizing, as Jensen unveils a hybrid compute tray featuring Groq's third-generation LPU units in a Rubin rack. The debate over what NVIDIA would do with Groq has been ongoing for quite some time, and we have maintained a key lead on developments. At GTC 2026, NVIDIA unveiled a new Vera Rubin hybrid compute tray, the Groq 3 LPX, which features eight of the 'unannounced' Groq3 units, which we'll discuss ahead. According to NVIDIA, LPX and Rubin together deliver unprecedented inference performance, enabling a 35x increase in inference throughput per megawatt, which is why Groq's solution was a key to NVIDIA unlocking the inference market. As for the individual compute tray, we are looking at a rack with 256 units of LPUs, bringing in 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth. This is NVIDIA's answer to what Cerebras and competitors are doing in the realm of inference, and by essentially combining Rubin GPUs with LPUs, NVIDIA targets both the prefill and decode stages of inference, allowing the company to become competitive in a market where 'they aren't the first ones'. For an individual Groq3 chip, you are looking at 500 MB of SRAM, 150 TB/s of SRAM bandwidth, and 1.2 PFLOPs (FP8). When you combine Rubin and Groq's LPX tray, NVIDIA's CEO says that the total AI inference compute reaches up to 315 PFLOPs, and here's a close look at the inside of the tray: Optimized for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximize efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. The idea is that Groq's LPU units will play a role similar to Mellanox's in networking, and that this hybrid architecture will give NVIDIA a head start on latency-sensitive workloads. With agentic AI becoming the next 'inflection' point for the industry, it is essential for NVIDIA to keep up with the compute demands, which is why Groq's partnership came at a vital time for Team Green.
[7]
Nvidia Puts Groq LPU, Vera CPU And Bluefield-4 DPU Into New Data Center Racks
Announced at Nvidia's GTC 2026 event, the AI infrastructure giant's new Groq-based inference server rack, called the Nvidia Groq 3 LPX, will be available alongside the Vera Rubin NVL72 rack, Vera CPU rack and BlueField-4 STX storage rack in the second half of the year. Nvidia said Monday that it's adding one more processor to the six-chip Vera Rubin platform it has heralded as the next big leap in AI computing: the Groq language processing unit. At its GTC 2026 event in San Jose, Calif., the AI infrastructure giant revealed that it plans this year to release a server rack with a new generation of the language processing unit (LPU) it licensed from AI chip startup Groq as part of a non-exclusive deal last December. [Related: Analysis: Nvidia's AI Dominance Expands To Networking As It Makes Bigger CPU Push] The Santa Clara, Calif.-based company also revealed three other new racks: a server rack packed with Nvidia's custom Vera CPUs, a storage rack reference architecture featuring its BlueField-4 DPUs and a networking rack with its Spectrum-6 Ethernet switches. However, Nvidia indicated that one previously announced Vera Rubin product, an NVL server rack powered by the Rubin CPX GPU, is on hold, at least for now. The expanded platform, which Nvidia CEO Jensen Huang was expected to detail during his GTC keynote on Monday, is part of the vendor's push to enable a new wave of AI agents that interact with each other to carry out complex tasks. And it's coming as the company faces increased competition from pure-play rivals like AMD and Qualcomm as well as major customers such as Amazon Web Services who are developing their own AI chips. Vera Rubin is the much-anticipated successor to the Grace Blackwell platform, which played a major role in the company finishing its 2026 fiscal year with a record $215.9 billion in revenue. In a briefing with journalists, Ian Buck, Nvidia's vice president of hyperscale and high-performance computing, said the underlying chips, including the Rubin GPUs in the Vera Rubin NVL72 rack, are "designed to operate together as one incredible AI supercomputer." The seven chips will "power every phase of AI, from massive scale pre-training to post-training, test-time scaling and real-time agentic inference," the latter of which represents what the company now views as the fourth AI scaling law, Buck added. The executive said the new Groq-based rack, called the Nvidia Groq 3 LPX, will be available alongside the Vera Rubin NVL72 in the second half of the year. Other products based on the Vera Rubin platform are expected to become available starting in that timeframe. "The Vera Rubin platform is going to expand the entire AI factory revenue opportunity and open the next frontier in agentic AI, with seven new chips now in full production to scale across the world's largest AI factories," Buck said in the Sunday briefing. Buck said Nvidia will offer the Groq 3 LPX alongside its Vera Rubin NVL72 to boost inference performance for premium, trillion-parameter AI models by several orders of magnitude, significantly increasing the revenue AI model providers can generate. During his presentation, the executive claimed that the two server racks combined can boost throughout for a 1-trillion-parameter GPT model by 35 times compared to the previous-generation Blackwell NVL72. The claim was based on the combined racks enabling 300 tokens per second for every megawatt consumed by Nvidia's rack-scale platforms, with the model serving 500 tokens per second for every user, the latter of which Buck said amounts to an opportunity of $45 generated by AI model providers for every million tokens. With the Vera Rubin NVL72 and Groq 3 LPX racks enabling major improvements in performance and efficiency, the executive said this will enable AI model providers to generate 10 times more revenue from trillion-parameter models than the Blackwell NVL72. "We'll be working deeply with the AI labs and the AI frontier model builders who are deploying these trillion-parameter models to offer the next generation of premium and ultra-premium model serving," Buck said. Packing 256 Groq 3 LPUs, the Groq 3 LPX rack will be liquid cooled and connect through a custom Spectrum-X Ethernet interconnect to the Vera Rubin NVL72, which contains 36 Vera CPUs and 72 Rubin GPUs, to boost decode performance. Decode is a critical process for agentic AI models that allows them to produce complex, multi-step responses. Between these two racks, the Groq 3 LPUs and Rubin GPUs will work together "at every layer of the AI model on every token," according to Buck. He said Nvidia turned to Groq's chip technology because while GPUs feature large memory capacity and "amazing floating point performance" to offer high throughout for AI systems sold in volume, LPUs are "optimized strictly for that extreme low-latency token generation, offering token rates" of up to thousands of tokens per second. The LPU's low latency is made possible by its use of SRAM memory. While each chip only features 500 MB of SRAM in contrast to the 288 GB of HBM4 memory of the Rubin GPU, the LPU's SRAM bandwidth is a whopping 150 TBps, which is seven times faster than the 22 TBps HBM4 bandwidth of the Rubin GPU, according to Buck. With the Groq 3 LPX's 256 LPUs, the rack will feature a total SRAM capacity of 128GB and a total SRAM bandwidth of 40 PBps, according to Nvidia. Data centers using the platform will be able to scale to more than 1,000 LPUs across multiple racks. "We offload parts of the computation for every token to the LPU, primarily the FFM layers, to take advantage of the high bandwidth that the LPU has to offer while the attention math and the rest of the model is still being run on the GPU," he said. Buck said Nvidia started developing the Groq 3 LPX after it gained a non-exclusive license to Groq's technology and hired members of the startup's team, including its founders, last December for implementation into Nvidia's platforms. The deal was reportedly worth $20 billion, the most it has ever paid for technology and personnel. To quickly build a rack around the Groq 3 LPU, Nvidia took advantage of its modular MGX rack architecture that the company uses for its NVL72 platforms, according to the executive. "It's been a real privilege to have them and their team join Nvidia, and the collaboration between the two teams has been excellent," he said. Asked by CRN about potential Groq 3 LPX availability from OEMs, Buck indicated that the company is focused on direct engagements with AI developers who are providing trillion-parameter, high-token-rate models with low latency. "Those will be more focused and exciting opportunities that we'll be able to share more later this year," he said. With Buck calling Nvidia's Vera the "best CPU for agentic AI workloads," the company plans to offer the chip in its first CPU-only server rack in addition to the Vera Rubin NVL72. The liquid-cooled CPU rack will contain 256 Vera CPUs, up to 400 TB of LPDDR5X memory capacity, 300 TBps of memory bandwidth and 64 BlueField-4 DPUs. The rack will be able to support more than 22,500 concurrent CPU environments, according to the company. Compared to a rack with Nvidia's previous-generation Grace CPU, the Vera CPU rack can deliver two times greater performance across various workloads, including scripting, text conversion, code compilation, data analytics and graph analytics, according to Nvidia. Buck said customers are expected to deploy the CPU rack "at scale" alongside Nvidia's NVL72 racks, storage racks and networking racks for agentic AI workloads. The executive said CPUs are important for such workloads because GPUs rely on them to "do the tool calling, SQL query and compilation of code." "This sandbox execution is a critical part of both training and deploying agents across the data centers, and those CPUs need to be fast. We want to make sure that they could actually do the tool calling as quickly as possible to keep the GPU and the entire data center fully utilized," Buck said. From a competitive perspective, Vera has three times more memory bandwidth per core, double the energy efficiency and 50 percent more single-threaded performance than "today's modern x86 CPUs," the executive said without providing more specifics. Marking Nvidia's first server CPU to use custom, Arm-compatible cores, Vera features 88 of these custom cores, 176 threads with Nvidia's new spatial multi-threading technology, 1.5 TB of system LPDDR5x memory, 1.2 TBps of memory bandwidth and confidential computing capabilities. It also features a 1.8 TBps NVLInk chip-to-chip interconnect to support coherent memory with the GPUs. Vera will become available in the second half of the year from a wide range of cloud service providers, including Lambda, Oracle Cloud and Nebius, as well as many OEMs, including Dell Technologies, HPE, Cisco, Lenovo and Supermicro. In addition to the Groq 3 LPX and Vera CPU racks, Nvidia revealed a storage rack reference architecture powered by BlueField-4 DPUs to speed up agentic AI workloads. Called BlueField-4 STX, the modular reference architecture is designed to let storage providers build infrastructure solutions that significantly improve the rate at which data can be accessed by agentic AI applications. "Agentic AI demands real time access to data and contextual working memory to keep the conversations fast and coherent. and as that context grows and AIs get smarter, traditional storage and data paths can slow AI inference and reduce GPU utilization," Buck said. Nvidia said the first rack-scale implementation of STX will include the new Nvidia CMX context memory storage platform, which the company revealed in January. This platform "expands GPU memory with a high-performance context layer for scalable inference and agentic systems," according to the company. Nvidia claimed that this, in turn, will enable AI agents to provide up to five times more tokens per second compared with traditional storage. The company also said that the STX architecture provides four times greater energy efficiency than "traditional CPU architectures for high-performance storage "and can "ingest two more pages per second for enterprise AI data." STX-based solutions are expected to become available in the second half of this year from storage vendors, including Dell, HPE, IBM NetApp, Hitachi Vantara, DDN, Everpure, Nutanix, Cloudian, Weka, Vast Data and MinIO. The Spectrum-6 SPX Ethernet rack, on the other hand, is "engineered to accelerate east-west-traffic across AI factories," taking advantage of Nvidia's Spectrum-X Ethernet or Quantum-X800 InfiniBand switches to deliver "low-latency, high-throughput rack-to-rack connectivity at scale," according to the company. Last September, Nvidia revealed a "new class of GPU" called Rubin CPX that was designed to speed up complex AI applications, including software coding and generative video. When it was announced, the company said that the Rubin CPX and the associated Vera Rubin NVL144 CPX rack-scale platform would debut by the end of this year. However, Nvidia did not mention the Rubin CPX in its Sunday briefing with journalists. A statement by a Nvidia spokesperson indicated that the company has put Rubin CPX-based products on the backburner to focus on the Groq-based LPX platform. "Delivering accelerated token generation with LPX into our portfolio and platform to optimize the decode is where we're focused right now, and we're excited to be bringing this to market in [the second half] of 2026," the representative told CRN in an email. The Rubin CPX was meant to speed up the performance of "massive context" AI applications by serving as the dedicated GPU for context and prefill computation, the first of two steps in Nvidia's disaggregated inferencing serving process. The vanilla Rubin GPU, on the other hand, would handle the second step: generation and decode computation. The GPU's platform, the Vera Rubin NVL144 CPX, was expected to contain four Rubin CPX GPUs, four Rubin GPUs and two Vera CPUs in each of the rack's 18 compute trays. The platform was named before Nvidia changed the way it counts GPUs in January, which resulted in the regular Vera Rubin platform's suffix changing from NVL144 to NVL72.
[8]
Nvidia unveils Vera Rubin AI platform with seven new chips By Investing.com
SAN JOSE, Calif. - Nvidia (NASDAQ:NVDA) announced today the Vera Rubin platform, featuring seven chips now in full production designed for AI infrastructure deployment, according to a company press release. The $4.43 trillion semiconductor giant continues to dominate the AI infrastructure market with revenue reaching $215.94 billion over the last twelve months, up 65% year-over-year. The platform integrates the Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the Groq 3 LPU. The chips are designed to operate together for AI training and inference workloads. The Vera Rubin NVL72 rack incorporates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. The company states it trains large mixture-of-experts models with one-fourth the number of GPUs compared with its Blackwell platform. The efficiency gains align with Nvidia's industry-leading gross profit margin of 71% and return on assets of 75%. The Vera CPU rack integrates 256 Vera CPUs for reinforcement learning and agentic AI workloads. The Groq 3 LPX rack contains 256 LPU processors with 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth. The BlueField-4 STX storage rack combines the Vera CPU and ConnectX-9 SuperNIC to provide storage infrastructure for large language model data. The Spectrum-6 SPX Ethernet rack is configurable with either Spectrum-X Ethernet or Quantum-X800 InfiniBand switches. Nvidia announced the DSX platform for Vera Rubin, which includes DSX Max-Q for dynamic power provisioning. The company stated this enables deployment of 30% more AI infrastructure within fixed-power data centers. Products based on Vera Rubin will be available from partners starting in the second half of this year. Cloud providers including Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure are listed as partners. System manufacturers Cisco, Dell Technologies, HPE, Lenovo and Supermicro are expected to deliver servers based on Vera Rubin products. AI developers including Anthropic, Meta, Mistral AI and OpenAI stated they plan to use the platform.The product launch comes as 33 analysts have revised their earnings upwards for the upcoming period, according to InvestingPro data. Trading at a P/E ratio of 37.81 and a PEG ratio of 0.55, the stock appears undervalued based on InvestingPro's Fair Value analysis. Investors seeking deeper insights can access Nvidia's comprehensive Pro Research Report, one of 1,400+ available reports that transform complex Wall Street data into actionable intelligence. In other recent news, Nvidia announced the BlueField-4 STX reference architecture, aimed at enhancing storage infrastructure for AI applications with long-context reasoning. The new architecture includes the CMX context memory storage platform, which reportedly delivers up to 5x token throughput compared to traditional systems and offers improvements in energy efficiency and data ingestion rates. Additionally, Nvidia introduced ComfyUI's App View, a simplified interface for generative AI tools, and RTX Video Super Resolution, which enables real-time 4K video upscaling with significant speed improvements. Ahead of Nvidia's annual GPU Tech Conference, Truist Securities reiterated its Buy rating with a $283.00 price target, anticipating updates on various technological trends. Similarly, BofA Securities maintained a Buy rating, setting a price target of $300.00, and highlighted expectations for updates on Nvidia's product pipeline and other innovations. Meanwhile, Seaport's Chief Equity Strategist noted potential upside opportunities for Nvidia, Broadcom, and AMD, as software and semiconductor valuations have compressed recently. These developments reflect Nvidia's ongoing efforts to advance its technological offerings and maintain investor interest. This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.
Share
Share
Copy Link
Nvidia announced the Vera Rubin platform at GTC 2026, integrating seven chips including the newly acquired Groq 3 LPU to accelerate AI inference. The platform promises 10x higher inference throughput per watt at one-tenth the cost per token compared to Blackwell systems. OpenAI, Anthropic, and major cloud providers have committed to deploying the infrastructure.
Nvidia unveiled its Vera Rubin platform at GTC 2026, marking what CEO Jensen Huang described as a generational leap in AI infrastructure designed to power the shift toward agentic AI
1
. The platform brings together seven chips now in full production: the Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU3
. These components work together as a unified supercomputer designed to handle every phase of AI development, from massive-scale pretraining and post-training to real-time agentic inference.
Source: Wccftech
The announcement comes with backing from major AI companies including OpenAI, Anthropic, and Meta, along with commitments from Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure to offer the platform
4
. Sam Altman, CEO of OpenAI, stated that "with Nvidia Vera Rubin, we'll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people"3
. Dario Amodei, CEO of Anthropic, emphasized that the platform provides the compute, networking, and system design needed to advance safety and reliability for increasingly complex reasoning and agentic workflows4
.The Groq 3 LPU represents a significant addition to Nvidia's arsenal, addressing the growing demand for low-latency inference in trillion-parameter large language models
1
. Unlike traditional AI accelerators that rely on HBM memory, each Groq 3 LPU incorporates 500 MB of SRAM, delivering 150 TB/s of bandwidth compared to the 22 TB/s offered by HBM4 on Rubin GPUs1
. This massive bandwidth advantage makes the chip ideal for bandwidth-sensitive AI decode operations.
Source: The Register
Nvidia will deploy these chips in Groq 3 LPX racks comprising 256 Groq 3 LPUs, offering 128GB of SRAM with 40 PB/s of bandwidth for inference acceleration and connecting the chips with a dedicated scale-up interface of 640 TB/s per rack
1
. When deployed alongside Vera Rubin NVL72 systems, the LPUs function as decode accelerators while Rubin GPUs handle compute-intensive prefill processing2
. Ian Buck, VP of Hyperscale and HPC at Nvidia, explained that the LPU boosts decode performance at "every layer of the AI model on every token"1
. The combined platform delivers up to 35x higher inference throughput per megawatt and up to 10x more revenue opportunity for trillion-parameter models3
.The flagship Vera Rubin NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs
3
. This rack-scale system delivers breakthrough efficiency, training large mixture-of-experts models with one-fourth the number of GPUs compared to the Blackwell platform while achieving up to 10x higher inference throughput per watt at one-tenth the cost per token3
. The system scales seamlessly with Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilization across massive GPU clusters3
.
Source: NVIDIA
For reinforcement learning and agentic AI workloads, Nvidia introduced the Vera CPU Rack, which packs 256 Vera CPUs into a single liquid-cooled cluster
5
. These racks provide scalable infrastructure for the CPU-based environments needed to test and validate results generated by GPU systems, delivering results twice as efficiently and 50% faster than traditional CPUs3
. The platform also includes the BlueField-4 STX storage rack, which acts as "context memory" for maintaining coherence during massive multi-turn interactions, potentially increasing inference throughput by up to five times5
.Related Stories
Nvidia positions the Vera Rubin platform as essential infrastructure for the emerging era of multi-agent systems, where AI agents communicate with each other rather than solely serving human users through chatbot interfaces
1
. Buck explained that what seems like a reasonable rate of 100 tokens per second for human interaction becomes glacial for AI agent intercommunication, and the combination of Rubin GPUs and Groq LPUs enables throughput of 1500 tokens per second or more1
.The platform represents Nvidia's strategic evolution from selling discrete chips and standalone servers toward fully integrated rack-scale systems, POD-scale deployments, and complete AI factories
3
. This shift addresses the infrastructure buildout Jensen Huang characterized as historic, supported by an ecosystem of more than 80 MGX partners with global supply chain capabilities3
. The addition of Groq technology, acquired for $20 billion, helps Nvidia compete in the low-latency inference frontier where companies like Cerebras have challenged its GPU-centric approach1
. Buck indicated that the Groq 3 LPU integration may reduce the role of the previously announced Rubin CPX inference accelerator, as both chips target similar enhancements but the Groq LPU doesn't require the large amounts of GDDR7 memory that Rubin CPX modules need1
. With inference providers potentially charging as much as $45 per million tokens generated using this technology, compared to approximately $15 per million output tokens for current top-tier models, the economic implications for the AI industry are substantial2
.Summarized by
Navi
[1]
[2]
[4]
06 Jan 2026•Technology

25 Feb 2026•Technology

29 Oct 2025•Technology

1
Technology

2
Technology

3
Business and Economy
