Special Breaking Analysis: CoreWeave’s Vera Rubin Bet Shows AI Infrastructure Is Becoming a Full-Stack Game

By David Vellante | July 02, 2026

The first Vera Rubin rack shipping is a major industry milestone because it signals that the next phase of AI infrastructure will be defined by extending GPUs into reliable, highly utilized, production-scale AI factories. In our opinion, the key takeaway from CoreWeave’s discussion with theCUBE is that the company is trying to differentiate from so-called neoclouds (GPU clouds) and is positioning itself more as a purpose-built AI hyperscaler. The theme company execs are emphasizing is differentiated engineering across data centers, power, cooling, networking, storage, orchestration and inference software.

This premise is based on discussions with CoreWeave executives Peter Salanki and Corey Sanders. The pair laid out a view of the market in which demand remains “insatiable,” supply is still constrained and the winners will be determined by operational velocity. CoreWeave said it has roughly 45-plus data centers live, more than a gigawatt of power online and a pipeline that remains capacity-constrained. The company’s claim that it can move from power-on to GPUs in customers’ hands in nearly single-digit days is notable. In AI infrastructure, time-to-cluster is becoming as critical as time-to-market in software.

Our research indicates that this is where CoreWeave is attempting to separate itself from traditional hyperscalers. The company is not merely installing accelerated compute inside conventional cloud infrastructure. It is redesigning the facility, the rack, the cooling loop, the network topology and the software control plane around the operating characteristics of AI workloads.

Watch the full conversation.

The bigger picture: Vera Rubin raises the infrastructure stakes

Vera Rubin represents the next major architectural step after Blackwell, and the above conversation suggests that CoreWeave sees it as both a training and inference platform. As we’ve often reported, the AI market is shifting. Frontier training still consumes enormous capacity, but inference is becoming the new battlefield. Every AI application, agent, workflow and enterprise automation loop ultimately becomes a token-generation workload.

Corey Sanders framed the Vera Rubin opportunity around doing more work with fewer resources, lowering cost and enabling inference at a scale that was previously uneconomic. The headline number we discussed was roughly one-tenth the cost per million tokens with the newest Nvidia system. We believe that cost compression is the most important commercial implication of Vera Rubin. Lower inference cost does not reduce demand; it expands the market. As inference becomes cheaper, enterprises will run more data through models, deploy more AI workflows and ask models to reason more frequently across larger corpora.

That is classic Jevons’ paradox applied to AI. As token economics improve, consumption grows.

CoreWeave’s argument is the GPU is only the beginning

The most important part of the discussion was not just the GPU. It was the surrounding system. Salanki repeatedly emphasized that there is no single “secret sauce.” CoreWeave’s advantage, in his telling, comes from a flywheel of operational expertise across every layer of the stack.

That includes modular data center design, physical cabling workflows, automation, burn-in testing, telemetry, liquid cooling control, network topology awareness and job-level orchestration. The point is simple that AI infrastructure is a highly distributed system.

The follow chart summarizes the key points of the conversation across the disciplines in which Coreweave is trying to differentiate.

Focus Area	CoreWeave’s Stated Approach	Value Proposition
Data center buildout	Move from power-on to customer-ready GPUs in days	Accelerates time-to-revenue and customer deployment
Liquid cooling	End-to-end liquid cooling for Vera Rubin-class density	Supports 250 kW racks and better energy efficiency and lower costs
Control systems	Valvey, Racky and Mission Control (derived from the Weights and Biases acquisition)	Enables real-time cooling, power and failure response and sets up anticipatory remediation workflows
Networking	Scale-up copper inside the rack; scale-out InfiniBand/RoCE across racks	Balances bandwidth, latency and distance constraints; attacks the AI bottlenecks to drive maximum performance and resource utilization
Orchestration	Topology-aware Kubernetes and Slurm-on-Kubernetes	Helps workloads exploit rack-level and cluster-level performance
Inference	Query routing, distributed KV caching and model placement	Improves utilization and token economics
Asset lifecycle	Use older GPUs for batch inference, speculative decoding and pre-fill	Extends useful life of prior-generation infrastructure

Table 1. Coreweave Key Focus Areas and Differentiation

Liquid cooling becomes strategic infrastructure

One of the clearest technical themes was around liquid cooling. With Blackwell, liquid cooling became mainstream for the highest-density AI systems. With Vera Rubin, CoreWeave is moving toward end-to-end liquid cooling, where cooling is no longer an accessory but a core architectural system.

Salanki described Vera Rubin racks reaching up to 250 kilowatts per rack, compared with the roughly 16-kilowatt racks that were common only several years ago. That density changes the economic equation. Facilities not purpose-built for AI may be able to physically place the racks, but they cannot necessarily power, cool, monitor and operate them efficiently.

CoreWeave’s Valvey and Racky systems are important innovations here. Valvey manages liquid cooling at the valve and sensor level, dynamically redirecting fluid based on real-time demand. Racky acts as a local rack or pod manager, ingesting telemetry from GPUs, power systems, leak sensors and building management systems. Above both sits Mission Control, which uses these signals to identify underperforming components, route around problems and remediate issues automatically.

We believe this is a major part of CoreWeave’s differentiation. The company is arguing that high utilization does not come from overprovisioning redundancy. It comes from observing failures quickly, isolating problems precisely and recovering workloads fast enough that customers still get a reliable experience.

The networking story is copper inside, optics outside

The conversation also highlighted a critical design reality in AI clusters in that scale-up and scale-out networking are fundamentally different problems.

Inside a Vera Rubin rack, CoreWeave can use copper to achieve extremely high bandwidth because distances are short. Copper is power-efficient and avoids the heat and energy penalties of converting electrical signals into optical ones. But copper cannot scale across a football-field-sized data center. Signal loss becomes a main constraint.

For scale-out, CoreWeave described using Nvidia Quantum InfiniBand and Spectrum-X RoCE, with Spectrum-6-based designs for Vera Rubin. The key architectural concept is a multiplanar, two-tier network that can connect very large GPU clusters while keeping communication within two switch hops. That means AI training and inference workloads, which are highly sensitive to network latency and topology, can be optimized.

Corey Sanders added that CoreWeave’s Kubernetes layer and SUNK, its Slurm-on-Kubernetes layer, are topology-aware. This means customers do not have to manually understand every rack boundary, network plane and latency profile. The platform exposes the infrastructure in a way that allows jobs to land where they run best.

Inference will drive the next wave of infrastructure innovation

The discussion moved beyond training into the architecture of inference itself. That is where the analysis gets more interesting.

Salanki described a future in which inference is disaggregated into specialized pipeline stages. A smaller speculative decoding model may run first to predict likely outputs. Larger models may handle more complex steps. Older GPUs may be used for batch inference, pre-fill or lower-latency-insensitive tasks. Newer Vera Rubin systems may serve trillion-parameter models at high speed.

This is a very different model from the early generative AI era, where a single model replica might run on a uniform block of H100s. The future is heterogeneous, in our view. The right infrastructure will route the right query to the right hardware, in the right location, at the right cost point.

That has two important implications:

First, depreciation fears around older GPUs may be overstated. Prior-generation GPUs can remain valuable when the inference stack knows how to place workloads intelligently; and
Second, the AI cloud market to date has differentiated by having access to the newest GPUs. It will increasingly need to compose old and new accelerators into the most efficient token factory.

Storage, KV cache and the AI application loop

The discussion also reinforced a broader thesis that we’ve been developing. Specifically, AI infrastructure is absorbing traditional general purpose IT functions – today dominated by x86 architectures. Storage, networking, CPUs, databases, caches, sandboxes and evaluation systems are being redesigned around accelerated computing. We believe traditional functions will, in the fullness of time, be consumed by AI-optimized systems.

CoreWeave discussed LOTA cache, object storage, distributed KV caching and the ability to move model weights to pockets of available capacity around the world. This is not a trivial feature, according to technologists. Inference workloads increasingly depend on memory hierarchy, cache locality and fast access to model state. As agents become more common, the application loop will require not just GPUs but also CPUs for tool execution, sandboxes for safe code deployment, databases for state and evaluation systems for quality control.

Our view is that this is where the “AI factory” concept becomes tangible. The GPU is the furnace, but the factory also needs logistics, telemetry, quality control, routing, storage and automation. CoreWeave is making the case that it has built that factory from the ground up.

The hyperscaler comparison

The most pointed strategic discussion came near the end. Salanki argued that traditional hyperscalers were built for a different era and a different workload model. His claim is that their infrastructure was designed for mission-critical enterprise applications, where redundancy often meant reserving or duplicating large amounts of compute capacity. That model works for banks, exchanges and enterprise systems that value interruption avoidance above utilization.

AI workloads are different. Training and inference customers want to use as much of the available compute as possible. They can tolerate certain failures if the platform detects, isolates and recovers quickly. CoreWeave’s philosophy is closer to supercomputing, meaning push the system hard, expect some components to fail, and engineer the software to route around those failures.

We believe this is the central distinction between an AI hyperscaler and a general-purpose hyperscaler. The AI hyperscaler is not simply a cloud provider with GPUs. It is a company whose operating model, telemetry, reliability systems and economics are built around accelerated computing from day one.

What this means for enterprises

For enterprises, the message is that owning GPUs is not the same as operating an AI factory. The hardest problems are increasingly outside the chip. These include power density, liquid cooling, cluster networking, job scheduling, storage throughput, inference routing, utilization and failure recovery.

Enterprises that attempt to build large-scale AI infrastructure on their own will need to understand whether they can achieve competitive utilization, economics and keep pace with the industry’s developments. Most enterprises won’t be able to do this alone. A GPU cluster running at poor utilization is not merely inefficient; it is economically disasterous. CoreWeave’s pitch is that its purpose-built stack can deliver higher utilization, faster access to capacity and better token economics than customers can achieve alone. It sounds like the early days of cloud, where IT costs were much more attractive in the cloud and agility that drove time-to-value was the fundamental value proposition.

The data suggests that as Vera Rubin-class systems arrive, this gap between mainstream enterprise capabilities and those of AI clouds could widen. Rack densities are rising. Cooling is becoming more complex. Networks are becoming more specialized. Software orchestration is becoming more workload-aware. In this environment, infrastructure expertise becomes a competitive moat.

The risks to Coreweave and other AI clouds

CoreWeave’s story is compelling, but it is not risk-free. The company is operating in a capital-intensive market where supply chains, power availability, customer concentration, GPU roadmap timing and accurate forecasts all come together. The same forces that create rapid growth will also create execution pressure.

There is also a strategic question around equilibrium. Today, demand exceeds supply. If supply eventually catches up, differentiation will shift from raw availability to cost, reliability, utilization, geographic reach, developer experience and platform services. CoreWeave appears to understand that transition, which is why its emphasis on Mission Control, topology-aware orchestration, inference routing and observability data is so important. Beyond these attributes, services that allow customers to become token generators will be appealing in our view.

Coverweave’s long-term advantage will be measured by how efficiently it can convert capital, power and silicon into customer-visible intelligence.

Our take

CoreWeave is making a strong argument that the AI era requires a new kind of cloud. Vera Rubin is a catalyst, but the real story is the system around it. The company’s differentiation comes from the integration of facility design, liquid cooling, power management, networking, storage, orchestration and inference software.

We believe the most important insight from this discussion is that AI infrastructure has become a full-stack engineering discipline. The next competitive frontier is not only faster chips, but also time-to-deployment, higher utilization, lower token cost, better failure recovery and smarter workload placement.

In that context, CoreWeave’s Vera Rubin strategy should be viewed as a statement about where the AI infrastructure market is heading – i.e. toward purpose-built, software-defined, liquid-cooled, topology-aware AI factories optimized for the economics of intelligence production.

Action Item for Chief AI Officers

Chief AI Officers should take a cue from Coreweave and immediately shift their AI infrastructure strategy from GPU procurement to AI factory optimization. We believe the priority is to build a workload-level operating model that maps training, inference, agentic workloads, storage, networking, cooling, and utilization into one integrated architecture. Competitive advantage at the infrastructure level will increasingly come from the ability to route the right AI workload to the right place at the right cost point, while maintaining high utilization, rapid recovery, and efficient token economics.

In our opinion, the near-term mandate for Chief AI Officers is to treat AI infrastructure as a strategic production system, not an experimental technology stack. Organizations that get the substrate right will be able to create alpha by building a new AI operating model on top with proprietary data, workflows, process knowledge and a deep understanding of how the tacit knowledge of their enterprise drives value. Understand token economic and architect for heterogeneous compute to operationalize AI workloads with rigor and you will better support revenue-generating digital platforms.

Article Categories

By David Vellante | July 02, 2026

Disclaimer

All statements made regarding companies or securities are strictly beliefs, points of view and opinions held by SiliconANGLE Media, Enterprise Technology Research, other guests on theCUBE and guest writers. Such statements are not recommendations by these individuals to buy, sell or hold any security. The content presented does not constitute investment advice and should not be used as the basis for any investment decision. You and only you are responsible for your investment decisions.

Disclosure: Many of the companies cited in Breaking Analysis are sponsors of theCUBE and/or clients of Wikibon. None of these firms or other companies have any editorial control over or advanced viewing of what’s published in Breaking Analysis.

David Vellante

David Vellante is co-CEO of SiliconANGLE Media, as well as co-founder and Chief Analyst at theCUBE Research, the world’s leading open source technology research community. Dave is a long-time tech industry analyst, entrepreneur, writer and speaker. As co-host of theCUBE – “The ESPN of Tech,” Vellante has interviewed over 5,000 experts since 2010. He is also a co-founder of CrowdChat, an angel funded startup based in Palo Alto using big data techniques to extract business value from social data. Prior to these exploits, Dave founded a CIO consultancy and spent a decade growing and managing IDC’s largest business unit. He lives in Massachusetts with his wife and four children where he is active in town activities including serving as the president of his town’s local “Kiddie Sports” association. Dave holds a B.S. in Applied Mathematics from Union College.

You may also be interested in

AI Agents Cannot Scale Without Shared Memory and Knowledge Infrastructure

Paul Nashawaty July 23, 2026