Nvidia unveils Vera CPU with 88 Arm cores to power next-generation AI infrastructure

Reviewed byNidhi Govil

12 Sources

Share

Nvidia introduced its Vera CPU at GTC 2026, featuring 88 custom Olympus Arm cores designed for agentic AI workloads. The processor delivers 1.2 TB/s memory bandwidth and 50% higher performance than standard CPUs. Nvidia now offers liquid cooled rack systems with 256 Vera CPUs, competing directly against Intel and AMD in the data center CPU market for the first time.

Nvidia Positions Vera CPU as Core Component of AI Infrastructure

Nvidia unveiled its Vera CPU at the GPU Technology Conference (GTC) 2026, marking a strategic shift in how the company approaches AI infrastructure

1

. The processor features 88 custom Olympus Arm cores built on Arm v9.2-A architecture, representing a significant upgrade from the 72 Arm Neoverse cores found in Nvidia's previous Grace processor

4

. Unlike Grace, which primarily served as a companion to GPUs, the Vera CPU is positioned as a general-purpose data center CPU designed to compete directly with offerings from Intel and AMD

2

.

Source: TweakTown

Source: TweakTown

"Vera is arriving at a turning point for AI. As intelligence becomes agentic -- capable of reasoning and acting -- the importance of the systems orchestrating that work is elevated," said Jensen Huang, founder and CEO of Nvidia

5

. The chip addresses a critical need in modern AI workloads where agents must execute code, perform SQL queries, and handle tool calling—tasks that cannot run on GPUs alone

2

.

Technical Architecture Delivers Breakthrough Memory Bandwidth

The Vera CPU introduces several architectural innovations that differentiate it from traditional server processors. Each chip delivers 1.2 TB/s of memory bandwidth through support for up to 1.5 TB of SOCAMM LPDDR5X memory

2

. This represents approximately 13.6 GB/s per core under full load, with the fabric capable of delivering up to 80 GB/s per core when other cores are not fully saturated

4

. Nvidia claims this provides 3x more memory bandwidth compared to contemporary x86 processors from Intel and AMD

2

.

Source: SiliconANGLE

Source: SiliconANGLE

The Olympus Arm cores feature a distinctive spatial multi-threading design that physically partitions execution units, caches, and register files between two hardware threads

4

. This approach enables 176 threads to run concurrently without time-slicing shared resources, delivering more predictable performance in multi-tenant environments. The core complex operates as a single coherent domain using Nvidia's Scalable Coherency Fabric, avoiding the NUMA topology challenges common in high-core-count x86 designs

4

.

The execution pipeline includes a 10-wide instruction decode block and a neural branch predictor capable of handling two branch predictions per cycle

2

. Nvidia has also integrated a PyTorch-optimized instruction buffer and custom prefetch engine designed for graph analytics, targeting the irregular control-flow patterns common in AI frameworks and data analytics workloads

4

.

Liquid Cooled Rack System Enables Massive AI Deployments

Nvidia introduced a high-density liquid cooled rack system that packs 256 Vera CPUs alongside 64 BlueField-4 data processing units

2

. This rack-scale architecture delivers more than 22,500 CPU cores and 400 TB of memory, providing approximately 300 TB/s of aggregate memory bandwidth

4

. The system is designed specifically for agentic AI frameworks, reinforcement learning, and AI training techniques that require substantial CPU resources

2

.

Ian Buck, VP of Hyperscale and HPC at Nvidia, explained that agents require CPUs for critical tasks: "Agents don't operate on GPUs alone. They need CPUs in order to do their work, whether we're training agentic models or serving them, GPUs today actually call out to CPUs in order to do the tool calling, SQL queries and the compilation of code"

2

. This sandbox execution capability is essential for both training and deploying agents across data centers.

The Vera CPU will be available in both single- and dual-socket configurations from ODM and OEM partners including Foxconn, Wistron, Dell Tech, Lenovo, and HPE

2

. Nvidia's NVL8 HGX systems, which traditionally used x86 processors from Intel, will now offer Vera CPU configurations for the Rubin generation

2

.

Vera Rubin Platform Integrates Seven Chips for Complete AI Factory

The Vera CPU forms a critical component of Nvidia's broader Vera Rubin platform, which integrates seven chips designed to operate as a single co-designed system

3

. The platform includes the Rubin GPU built on TSMC's 3nm process with 288 GB of HBM4 memory, the Groq 3 LPU for low-latency inference, NVLink 6 switches, ConnectX-9 SuperNICs, and Spectrum-6 Ethernet switches

3

.

Source: Tom's Hardware

Source: Tom's Hardware

The Vera CPU connects to Rubin GPUs via NVLink-C2C at 1.8 TB/s of coherent bandwidth, which is seven times faster than PCIe Gen 6

3

. This high-speed interconnect enables the CPU to handle orchestration tasks including scheduling workloads, routing KV cache data, managing context, and running the control plane for agentic AI workflows

3

. The platform supports confidential computing across both CPU and GPU domains, enabling encrypted execution and isolation that extend into GPU memory and multi-socket nodes

4

.

Major Cloud Providers Commit to Vera Deployments

When the Vera CPU debuts in the second half of 2026, major cloud and AI infrastructure providers have already committed to deployments. Alibaba, ByteDance, Meta, Oracle, CoreWeave, Lambda, Nebius, and NScale have all announced plans to integrate the chips into their data centers

2

. Meta recently revealed plans to deploy Nvidia's standalone Grace CPUs at scale, suggesting strong demand for Nvidia's CPU offerings beyond GPU-attached configurations

2

.

Nvidia claims the Vera CPU delivers 1.5x the performance per core compared to standard CPUs and operates with twice the efficiency

5

. For customers, this translates to the ability to run more AI workloads per rack while reducing power consumption—a critical consideration as AI infrastructure scales. The company's roadmap also reveals plans for the Rosa CPU in 2028, which will focus on ultimate single-thread performance and shorten Nvidia's CPU development cycle from four years to two, matching the cadence of AMD and Intel

1

. This aggressive development timeline signals Nvidia's commitment to competing in the data center CPU market long-term, potentially reshaping the competitive landscape dominated by x86 architectures for decades.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo