Special Breaking Analysis | Nvidia moves even further down the stack: Why STX signals a new battleground in storage for AI factories

By David Vellante | March 17, 2026

Nvidia’s introduction of its BlueField-4 STX reference architecture is perhaps one of the more under-appreciated announcements coming out of GTC 2026. While much of the attention remains focused on GPUs, LPUs, and NemoClaws for safety, STX signals a significant structural evolution in that Nvidia is extending its control deeper into the AI infrastructure stack – this time into storage.

As AI shifts from training to inference and GPU performance continues to progress at unprecedented levels, new bottlenecks and challenges emerge beyond compute. Balancing system performance and addressing cost pressures is increasingly a priority for Nvidia as memory constraints, data movement inefficiencies, and the absolute cost of serving tokens at scale become more pressing. In our view, Nvidia’s STX architecture is a direct response to these pressures, effectively redefining storage as an active player in the AI pipeline.

The implications for storage vendors, hyperscalers, neoclouds and end customers are profound in our view, necessitating a new mental model for the role storage plays in the AI value chain. In this special Breaking Analysis we explain our view on what was announced by Nvidia and what it means to the ecosystem and its end customers.

Nvidia has announced a new blueprint for AI storage

STX is a reference architecture that defines how storage systems should be built for AI-native workloads. It brings together four key components:

BlueField-4 DPUs to offload data movement and storage management tasks from the CPU
ConnectX-9 SuperNICs and Spectrum-X Ethernet to enable RDMA and bypass traditional CPU and OS bottlenecks
A redesigned data path that moves data directly between storage and GPUs at low latency
CMX, a rack-scale implementation optimized specifically for key-value (KV) cache storage

Nvidia is reinventing the AI storage stack by offloading data movement with BlueField DPUs, bypassing CPUs with direct memory access and optimizing the way it handles KV cache. By doing so it can push data through the system much faster. Nvidia is claiming up to 5x improvement in token processing performance and 4x gains in energy efficiency, driven largely by eliminating inefficiencies in the data path and optimizing for AI-specific workloads.

Storage now becomes part of the inference engine

We’ve long used the bromide that storage vendors need to think “outside the box.” We believe the most important shift that Nvidia is driving is profound in that traditional external storage that serves general purpose applications is giving way to a new conceptual model. Storage has always been about optimizing systems to be: 1) rock solid; 2) lightning fast; and 3) dirt cheap. Nvidia is blowing away the concept of a box by making storage part of the inference engine itself; and adding a fourth dimension – i.e. intelligence. Specifically, our view is Nvidia is making storage workload-aware with specialized intelligence to make AI run better.

A major emphasis of STX is how it optimizes KV cache. Large language models rely heavily on KV cache, which stores intermediate data used to maintain context during inference. Think of KV cache as a dynamic, searchable, high speed memory-based store designed for the rapid lookup of context. It uses a key, which is a fast index to find the value – i.e. the data it points to. As context windows expand dramatically, KV cache is becoming both a cost driver and a performance bottleneck. Traditionally, this data has lived in expensive GPU memory or moved inefficiently through CPU-centric architectures.

Nvidia’s approach disaggregates this layer by moving KV cache into high-speed flash managed by DPUs, while using RDMA to bypass the CPU entirely. The result is a system where storage, networking, and compute are all tightly coupled designed to improve inference performance.

Similar to VMware’s storage API efforts but far more significant

We’re reminded of the 2010s when storage became a major bottleneck for VMware customers. When performance tanked, it was often an IO problem and tuning the system was cumbersome. VMware created a set of APIs (e.g. VAAI, VASA, VADP), which standardized how storage integrated into virtualized environments and, over time, somewhat commoditized parts of the storage stack. However, Nvidia’s approach is more prescriptive and far more powerful in our view.

Unlike VMware, Nvidia controls not just the interface, but the underlying silicon and data path. Specifically:

Nvidia GPUs define the compute layer
Nvidia DPUs manage data movement
Nvidia NICs and switches define the network fabric
STX defines how storage plugs into this system

Whereas VMware’s effort was more about integration to enable interoperability, Nvidia with STX is standardizing the architecture of the AI factory itself. By way of example, think of VMware’s VASA as “tell me how you do space efficient snapshots and I’ll orchestrate them.” Think of STX as “here is a blueprint for how storage must be built for AI factories.”

Implications for the Ecosystem

This move has implications across the ecosystem, but importantly, it does not create a simple divide between winners and losers. Nvidia is simultaneously pulling partners into its ecosystem while also redefining where differentiation lives. So winners and losers will depend on how the ecosystem responds and how quickly. Here’s a quick rundown of how we see the effects on the players.

Vendors being abstracted (margin pressure over time)

We believe vendors and architectures described by the attributes below face the greatest risk to their margin model and possibly their survival as Nvidia standardizes the AI data path through STX. To be clear, Nividia is shifting value away from the underlying infrastructure and toward the Nvidia-defined architecture. This will in our view further accelerate the move away from general purpose architectures toward parallel computing.

Traditional storage providers whose differentiation is rooted in data path performance and hardware architecture are at risk if they don’t respond
Systems that rely on general purpose CPU-mediated I/O paths will be increasingly bypassed by RDMA and DPU-driven designs
Offerings that are not optimized for AI inference, context memory, and KV cache workloads will become outdated and expensive relative to modern systems.

Vendors being pulled into Nvidia’s ecosystem (near-term beneficiaries, long-term tension)

Notably, many of the same companies affected by this change like Dell, HPE, IBM, NetApp, VAST, WEKA, Nutanix – are explicitly partnering with Nvidia to build STX-based systems.

These vendors benefit from:
- Early alignment with Nvidia’s roadmap
- Participation in AI infrastructure buildouts
- Access to rapidly growing AI demand
However, this creates a structural tension:
- Differentiation shifts above the STX layer
- Core architecture becomes increasingly defined by Nvidia

Differentiation moves to the “context layer”

The key battleground is shifting to three important areas:

KV cache / context memory management
Data orchestration across inference pipelines
Global namespace and data services across clusters and clouds – in other words AI factories scale up, scale out and scale across and having a unified, logical namespace means that data will be accessible irrespective of where it sits.

Different architectural approaches will emerge. Some vendors will choose to optimize serving workloads with best of breed offerings in block, file and object. Others will attempt to unify all data types in a single architecture. Both approaches can be effective. We’re arguing here that the key will be the ability to access that data across a global namespace.

In addition, storage is increasingly becoming a data business. As such, our view is advantage will accrue to vendors that position themselves as data platforms rather than storage systems. This will allow further differentiation up the stack and enable greater value delivery to customers.

Startups and neo-clouds: fastest to adopt, best aligned

The early adopter list on Nvidia’s short lis includes CoreWeave, Crusoe, Lambda, Nebius, OCI, Vultr. Here’s our quick take:

GPU cloud providers are:
- Architecturally aligned with Nvidia
- Free of legacy constraints
- Highly sensitive to inference economics

We believe these players are best positioned to operationalize STX quickly and capture near-term gains in throughput, utilization and cost efficiency. As well it expands their TAM. The caveat is, with the exception of Oracle, their stacks are relatively immature.

AI-native storage players are both advantaged but challenged

Companies such as VAST Data and WEKA, and long time HPC players like DDN have long argued that storage must be re-architected for AI. this give them early positioning and a head start around disaggregated architectures, high performance AI data pipelines and intelligent workloads.

The risk is STX effectively defines a new baseline for AI storage and creates a “jump ball” effect. These emergent players must continue to prove their value above Nvidia’s architecture by focusing on data intelligence and orchestration not only within the data path but potentially higher up the stack. This brings many of these firms in to new competitive territory and with more limited resources they will need to pick their spots.

Hyperscalers with adopt the concepts but not the architecture

Hyperscalers such as AWS, Azure, and Google Cloud are unlikely to adopt STX directly in our view. Instead they will most probably internalize the principles of RDMA, disaggregated memory and KV cache optimization and build their own proprietary implementations.

The advantage of hyperscalers remains their full stack control (including custom silicon) and of course their massive scale. The question is can they keep up with Nvidia’s pace of innovation, especially in the silicon and networking space.

The bottom line for all these ecosystem groups is Nvidia is not trying to displace the storage venors but it is redefining the architecture and shifting value away from how data is stored to how context is served. Optimizing on that requirement becomes the new battleground.

The big picture is control of the AI data path

Stepping back, this announcement fits into a broader industry theme. Nvidia is methodically expanding its footprint across every critical layer of the AI stack including compute (GPUs, CPUs, LPUs), networking (NVLink, Spectrum-X), data movement (DPUs) and now storaage (STX).

The connective tissue is control of the AI data path – i.e how data moves, where it resides, and how efficiently it feeds the model during inference.

Action Items

In our view, Nvidia’s STX announcement is a signal that the AI data path is becoming the primary battleground for value creation. As storage is redefined from a system of record to a performance layer for inference, different players in the ecosystem must respond with speed and clarity.

For storage vendors, the priority is to reposition differentiation above the data path. The traditional focus on raw performance and hardware efficiency is being standardized by Nvidia’s architecture, which means value must shift toward data intelligence, orchestration, and context management. Vendors should align with STX to remain relevant in AI infrastructure buildouts, but they must avoid being reduced to interchangeable components within Nvidia’s blueprint. This requires meaningful investment in KV cache and context memory capabilities, treating context as a first-class data type. At the same time, vendors must rethink resiliency models, recognizing that not all data requires maximum durability, and instead adopt workload-aware approaches that balance performance, cost, and persistence. Building AI-native control planes that enable policy-driven data services will be critical to sustaining differentiation.

For AI factory operators – including hyperscalers, neo-clouds, and large enterprises – the focus must shift to inference economics. Token throughput, latency, and GPU utilization are emerging as the key metrics of success, surpassing traditional measures of compute capacity. Operators should adopt disaggregated memory architectures that leverage high-speed flash and DPU-based context layers to reduce reliance on expensive GPU memory. Eliminating CPU bottlenecks through RDMA-enabled, GPU-direct data paths will be essential to maximizing performance. While reference architectures like STX can accelerate deployment, operators must remain disciplined in understanding where they can differentiate versus where they are simply adopting a standardized model. Ultimately, the efficiency of the entire data pipeline – not just the model – must be optimized.

For end customers building AI applications, the emphasis should be on aligning infrastructure decisions with the demands of inference-driven workloads. You will be increasingly buying intelligence in the form of tokens through APIs. This means prioritizing context performance and cost over raw storage capacity and evaluating systems based on how efficiently they serve and manage context in real time. Customers should demand workload-aware infrastructure that can dynamically adapt across training, inference, and analytics use cases. It is also critical to interrogate vendor claims of differentiation, ensuring that value is being delivered above Nvidia’s architecture rather than simply integrated into it. As architectures evolve toward disaggregated, AI-native data paths, customers must avoid lock-in to legacy designs that cannot support the requirements of agentic AI and long-context models.

These observations suggests that as the industry shifts from training to inference, competitive advantage will accrue to those who can most effectively control and optimize the data path. Nvidia has made clear its intent to define that path. Its probability of success is high in our estimation.

The rest of the ecosystem must now determine how – and where – it will compete.

Article Categories

By David Vellante | March 17, 2026

Disclaimer

All statements made regarding companies or securities are strictly beliefs, points of view and opinions held by SiliconANGLE Media, Enterprise Technology Research, other guests on theCUBE and guest writers. Such statements are not recommendations by these individuals to buy, sell or hold any security. The content presented does not constitute investment advice and should not be used as the basis for any investment decision. You and only you are responsible for your investment decisions.

Disclosure: Many of the companies cited in Breaking Analysis are sponsors of theCUBE and/or clients of Wikibon. None of these firms or other companies have any editorial control over or advanced viewing of what’s published in Breaking Analysis.

David Vellante

David Vellante is co-CEO of SiliconANGLE Media, as well as co-founder and Chief Analyst at theCUBE Research, the world’s leading open source technology research community. Dave is a long-time tech industry analyst, entrepreneur, writer and speaker. As co-host of theCUBE – “The ESPN of Tech,” Vellante has interviewed over 5,000 experts since 2010. He is also a co-founder of CrowdChat, an angel funded startup based in Palo Alto using big data techniques to extract business value from social data. Prior to these exploits, Dave founded a CIO consultancy and spent a decade growing and managing IDC’s largest business unit. He lives in Massachusetts with his wife and four children where he is active in town activities including serving as the president of his town’s local “Kiddie Sports” association. Dave holds a B.S. in Applied Mathematics from Union College.

You may also be interested in

AI Infrastructure Breaks Away From Cloud-Native Models as Enterprises Chase Production Scale

Paul Nashawaty April 8, 2026

Research Report – Nutanix and the Emerging AI Infrastructure Stack

John Furrier April 7, 2026