Nvidia’s introduction of its BlueField-4 STX reference architecture is perhaps one of the more under-appreciated announcements coming out of GTC 2026. While much of the attention remains focused on GPUs, LPUs, and NemoClaws for safety, STX signals a significant structural evolution in that Nvidia is extending its control deeper into the AI infrastructure stack – this time into storage.
As AI shifts from training to inference and GPU performance continues to progress at unprecedented levels, new bottlenecks and challenges emerge beyond compute. Balancing system performance and addressing cost pressures is increasingly a priority for Nvidia as memory constraints, data movement inefficiencies, and the absolute cost of serving tokens at scale become more pressing. In our view, Nvidia’s STX architecture is a direct response to these pressures, effectively redefining storage as an active player in the AI pipeline.
The implications for storage vendors, hyperscalers, neoclouds and end customers are profound in our view, necessitating a new mental model for the role storage plays in the AI value chain. In this special Breaking Analysis we explain our view on what was announced by Nvidia and what it means to the ecosystem and its end customers.
Nvidia has announced a new blueprint for AI storage
STX is a reference architecture that defines how storage systems should be built for AI-native workloads. It brings together four key components:
- BlueField-4 DPUs to offload data movement and storage management tasks from the CPU
- ConnectX-9 SuperNICs and Spectrum-X Ethernet to enable RDMA and bypass traditional CPU and OS bottlenecks
- A redesigned data path that moves data directly between storage and GPUs at low latency
- CMX, a rack-scale implementation optimized specifically for key-value (KV) cache storage
Nvidia is reinventing the AI storage stack by offloading data movement with BlueField DPUs, bypassing CPUs with direct memory access and optimizing the way it handles KV cache. By doing so it can push data through the system much faster. Nvidia is claiming up to 5x improvement in token processing performance and 4x gains in energy efficiency, driven largely by eliminating inefficiencies in the data path and optimizing for AI-specific workloads.
Storage now becomes part of the inference engine
We’ve long used the bromide that storage vendors need to think “outside the box.” We believe the most important shift that Nvidia is driving is profound in that traditional external storage that serves general purpose applications is giving way to a new conceptual model. Storage has always been about optimizing systems to be: 1) rock solid; 2) lightning fast; and 3) dirt cheap. Nvidia is blowing away the concept of a box by making storage part of the inference engine itself; and adding a fourth dimension – i.e. intelligence. Specifically, our view is Nvidia is making storage workload-aware with specialized intelligence to make AI run better.
A major emphasis of STX is how it optimizes KV cache. Large language models rely heavily on KV cache, which stores intermediate data used to maintain context during inference. Think of KV cache as a dynamic, searchable, high speed memory-based store designed for the rapid lookup of context. It uses a key, which is a fast index to find the value – i.e. the data it points to. As context windows expand dramatically, KV cache is becoming both a cost driver and a performance bottleneck. Traditionally, this data has lived in expensive GPU memory or moved inefficiently through CPU-centric architectures.
Nvidia’s approach disaggregates this layer by moving KV cache into high-speed flash managed by DPUs, while using RDMA to bypass the CPU entirely. The result is a system where storage, networking, and compute are all tightly coupled designed to improve inference performance.
Similar to VMware’s storage API efforts but far more significant
We’re reminded of the 2010s when storage became a major bottleneck for VMware customers. When performance tanked, it was often an IO problem and tuning the system was cumbersome. VMware created a set of APIs (e.g. VAAI, VASA, VADP), which standardized how storage integrated into virtualized environments and, over time, somewhat commoditized parts of the storage stack. However, Nvidia’s approach is more prescriptive and far more powerful in our view.
Unlike VMware, Nvidia controls not just the interface, but the underlying silicon and data path. Specifically:
- Nvidia GPUs define the compute layer
- Nvidia DPUs manage data movement
- Nvidia NICs and switches define the network fabric
- STX defines how storage plugs into this system
Whereas VMware’s effort was more about integration to enable interoperability, Nvidia with STX is standardizing the architecture of the AI factory itself. By way of example, think of VMware’s VASA as “tell me how you do space efficient snapshots and I’ll orchestrate them.” Think of STX as “here is a blueprint for how storage must be built for AI factories.”
Implications for the Ecosystem
This move has implications across the ecosystem, but importantly, it does not create a simple divide between winners and losers. Nvidia is simultaneously pulling partners into its ecosystem while also redefining where differentiation lives. So winners and losers will depend on how the ecosystem responds and how quickly. Here’s a quick rundown of how we see the effects on the players.
Vendors being abstracted (margin pressure over time)
We believe vendors and architectures described by the attributes below face the greatest risk to their margin model and possibly their survival as Nvidia standardizes the AI data path through STX. To be clear, Nividia is shifting value away from the underlying infrastructure and toward the Nvidia-defined architecture. This will in our view further accelerate the move away from general purpose architectures toward parallel computing.
- Traditional storage providers whose differentiation is rooted in data path performance and hardware architecture are at risk if they don’t respond
- Systems that rely on general purpose CPU-mediated I/O paths will be increasingly bypassed by RDMA and DPU-driven designs
- Offerings that are not optimized for AI inference, context memory, and KV cache workloads will become outdated and expensive relative to modern systems.
Vendors being pulled into Nvidia’s ecosystem (near-term beneficiaries, long-term tension)
Notably, many of the same companies affected by this change like Dell, HPE, IBM, NetApp, VAST, WEKA, Nutanix – are explicitly partnering with Nvidia to build STX-based systems.
- These vendors benefit from:
- Early alignment with Nvidia’s roadmap
- Participation in AI infrastructure buildouts
- Access to rapidly growing AI demand
- Early alignment with Nvidia’s roadmap
- However, this creates a structural tension:
- Differentiation shifts above the STX layer
- Core architecture becomes increasingly defined by Nvidia
- Differentiation shifts above the STX layer
Differentiation moves to the “context layer”
The key battleground is shifting to three important areas:
- KV cache / context memory management
- Data orchestration across inference pipelines
- Global namespace and data services across clusters and clouds – in other words AI factories scale up, scale out and scale across and having a unified, logical namespace means that data will be accessible irrespective of where it sits.
Different architectural approaches will emerge. Some vendors will choose to optimize serving workloads with best of breed offerings in block, file and object. Others will attempt to unify all data types in a single architecture. Both approaches can be effective. We’re arguing here that the key will be the ability to access that data across a global namespace.
In addition, storage is increasingly becoming a data business. As such, our view is advantage will accrue to vendors that position themselves as data platforms rather than storage systems. This will allow further differentiation up the stack and enable greater value delivery to customers.
Startups and neo-clouds: fastest to adopt, best aligned
The early adopter list on Nvidia’s short lis includes CoreWeave, Crusoe, Lambda, Nebius, OCI, Vultr. Here’s our quick take:
- GPU cloud providers are:
- Architecturally aligned with Nvidia
- Free of legacy constraints
- Highly sensitive to inference economics
- Architecturally aligned with Nvidia
We believe these players are best positioned to operationalize STX quickly and capture near-term gains in throughput, utilization and cost efficiency. As well it expands their TAM. The caveat is, with the exception of Oracle, their stacks are relatively immature.
AI-native storage players are both advantaged but challenged
Companies such as VAST Data and WEKA have long argued that storage must be re-architected for AI. this give them early positioning and a head start around disaggregated architectures, high performance AI data pipelines and intelligent workloads.
The risk is STX effectively defines a new baseline for AI storage and creates a “jump ball” effect. These emergent players must continue to prove their value above Nvidia’s architecture by focusing on data intelligence and orchestration not only within the data path but potentially higher up the stack. This brings many of these firms in to new competitive territory and with more limited resources they will need to pick their spots.
Hyperscalers with adopt the concepts but not the architecture
Hyperscalers such as AWS, Azure, and Google Cloud are unlikely to adopt STX directly in our view. Instead they will most probably internalize the principles of RDMA, disaggregated memory and KV cache optimization and build their own proprietary implementations.
The advantage of hyperscalers remains their full stack control (including custom silicon) and of course their massive scale. The question is can they keep up with Nvidia’s pace of innovation, especially in the silicon and networking space.
The bottom line for all these ecosystem groups is Nvidia is not trying to displace the storage venors but it is redefining the architecture and shifting value away from how data is stored to how context is served. Optimizing on that requirement becomes the new battleground.
The big picture is control of the AI data path
Stepping back, this announcement fits into a broader industry theme. Nvidia is methodically expanding its footprint across every critical layer of the AI stack including compute (GPUs, CPUs, LPUs), networking (NVLink, Spectrum-X), data movement (DPUs) and now storaage (STX).
The connective tissue is control of the AI data path – i.e how data moves, where it resides, and how efficiently it feeds the model during inference.
Action Items
In our view, Nvidia’s STX announcement is a signal that the AI data path is becoming the primary battleground for value creation. As storage is redefined from a system of record to a performance layer for inference, different players in the ecosystem must respond with speed and clarity.
For storage vendors, the priority is to reposition differentiation above the data path. The traditional focus on raw performance and hardware efficiency is being standardized by Nvidia’s architecture, which means value must shift toward data intelligence, orchestration, and context management. Vendors should align with STX to remain relevant in AI infrastructure buildouts, but they must avoid being reduced to interchangeable components within Nvidia’s blueprint. This requires meaningful investment in KV cache and context memory capabilities, treating context as a first-class data type. At the same time, vendors must rethink resiliency models, recognizing that not all data requires maximum durability, and instead adopt workload-aware approaches that balance performance, cost, and persistence. Building AI-native control planes that enable policy-driven data services will be critical to sustaining differentiation.
For AI factory operators – including hyperscalers, neo-clouds, and large enterprises – the focus must shift to inference economics. Token throughput, latency, and GPU utilization are emerging as the key metrics of success, surpassing traditional measures of compute capacity. Operators should adopt disaggregated memory architectures that leverage high-speed flash and DPU-based context layers to reduce reliance on expensive GPU memory. Eliminating CPU bottlenecks through RDMA-enabled, GPU-direct data paths will be essential to maximizing performance. While reference architectures like STX can accelerate deployment, operators must remain disciplined in understanding where they can differentiate versus where they are simply adopting a standardized model. Ultimately, the efficiency of the entire data pipeline – not just the model – must be optimized.
For end customers building AI applications, the emphasis should be on aligning infrastructure decisions with the demands of inference-driven workloads. You will be increasingly buying intelligence in the form of tokens through APIs. This means prioritizing context performance and cost over raw storage capacity and evaluating systems based on how efficiently they serve and manage context in real time. Customers should demand workload-aware infrastructure that can dynamically adapt across training, inference, and analytics use cases. It is also critical to interrogate vendor claims of differentiation, ensuring that value is being delivered above Nvidia’s architecture rather than simply integrated into it. As architectures evolve toward disaggregated, AI-native data paths, customers must avoid lock-in to legacy designs that cannot support the requirements of agentic AI and long-context models.
These observations suggests that as the industry shifts from training to inference, competitive advantage will accrue to those who can most effectively control and optimize the data path. Nvidia has made clear its intent to define that path. Its probability of success is high in our estimation.
The rest of the ecosystem must now determine how – and where – it will compete.
