Nvidia unveils BlueField-4 STX storage architecture to solve agentic AI data bottleneck

3 Sources

Share

Nvidia introduced BlueField-4 STX at GTC 2026, a modular reference architecture designed to eliminate storage bottlenecks in agentic AI systems. The platform claims 5x faster token throughput and 4x better energy efficiency by bypassing traditional CPU-based storage paths. Partners including Dell, HPE, IBM, and Oracle Cloud are building STX-based systems for launch in late 2026.

Nvidia Targets Critical AI Storage Bottleneck at GTC 2026

Nvidia announced BlueField-4 STX at GTC 2026 on March 16, introducing a modular reference architecture specifically designed to address the data access bottleneck that limits agentic AI inference performance. The platform delivers up to five times the token throughput, four times better energy efficiency, and twice the page ingestion speed compared with traditional CPU-based storage architectures. "AI systems that reason across massive context and continuously learn require a new class of storage," Jensen Huang, founder and CEO of Nvidia, said at the conference

3

.

Source: SiliconANGLE

Source: SiliconANGLE

Solving the KV Cache Bottleneck in Modern LLM Systems

The core problem BlueField-4 STX addresses is KV cache management during transformer inference. When an LLM processes information, the attention mechanism computes key-value pairs for every token in context, which must be stored and retrieved for each subsequent generation step. As context windows expand into hundreds of thousands of tokens, the KV cache outgrows GPU HBM capacity, forcing systems to offload to host DRAM or NVMe storage. Both routes pass through the CPU, adding latency that compounds with context length and stalls GPU execution as data transits. "Traditional data centers provide high-capacity, general-purpose storage, but generally lack the responsiveness required for interaction with AI agents that need to work across many steps, tools and different sessions," Ian Buck, Nvidia's vice president of hyperscale and high-performance computing, explained

2

.

How BlueField-4 DPU and ConnectX-9 SuperNIC Bypass Traditional Storage Paths

Built around a new storage-optimized BlueField-4 DPU and ConnectX-9 SuperNIC, the architecture bypasses the host CPU by routing data through a dedicated accelerated storage layer via RDMA over Spectrum-X Ethernet

3

. The BlueField-4 DPU manages NVMe SSDs directly and handles data integrity and encryption for the KV cache, keeping context accessible at the storage processor rather than transiting the host. This approach inserts a dedicated context memory layer between GPUs and traditional storage, fundamentally changing how AI storage systems operate

2

.

Source: VentureBeat

Source: VentureBeat

The first rack-scale implementation built on STX is the Nvidia CMX context memory storage platform, which extends GPU memory with a high-performance context layer designed specifically for storing and retrieving KV cache data generated by large language models during inference

2

.

Partner Ecosystem Spans Storage Vendors and AI Cloud Providers

Nvidia is distributing BlueField-4 STX as a reference architecture to its storage partner ecosystem rather than selling it directly as a product

2

. Storage and infrastructure vendors co-designing systems based on STX include DDN, Dell Technologies, HPE, IBM, NetApp, and VAST Data, alongside manufacturing partners AIC, Supermicro, and Quanta Cloud Technology. Eight cloud and AI providers -- including CoreWeave, Lambda, Mistral AI, and Oracle Cloud Infrastructure -- committed to early adoption for context memory storage. Buck confirmed that STX ships with a software reference platform alongside the hardware architecture, with Nvidia expanding DOCA to include a new component called DOCA Memo

2

. STX-based platforms are expected from partners in the second half of 2026.

Why This Matters for Enterprise AI Deployments

The combination of enterprise storage incumbents and AI-native cloud providers signals Nvidia's positioning of STX as the reference standard for anyone building AI storage infrastructure that serves agentic AI workloads -- which within the next two to three years is likely to include most enterprise AI deployments running multi-step inference at scale

2

. IBM, sitting on both sides of the announcement as a storage provider co-designing STX-based infrastructure, demonstrated real-world impact with a production proof of concept at Nestlé that reduced data refresh cycles from 15 minutes to three minutes, achieving 83% cost savings and a 30x price-performance improvement

2

. As context windows continue expanding and agentic AI systems become more prevalent, the ability to maintain coherent working memory across sessions, tool calls and reasoning steps without storage-induced GPU underutilization will determine which organizations can deploy these systems at production scale.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo