Executive Summary
As enterprises accelerate AI-driven initiatives, development and platform engineering teams are increasingly constrained by data pipeline complexity inside Kubernetes environments. Streaming ingestion, metadata management, security, and pipeline orchestration are frequently delivered through loosely coupled stacks of storage, streaming platforms, vector databases, metadata catalogs, and orchestration layers, resulting in longer time-to-value and increased operational risk. Infrastructure consolidation at the data platform layer, not further abstraction, has become the most effective lever for accelerating developer productivity and reducing pipeline fragility.
Survey results from the Future of Data Platforms Summit confirm that these challenges are widespread: 65.4% of organizations cite scaling AI as their top challenge, and 51.6% identify data quality and metadata issues as persistent blockers. Together, these findings suggest that platforms that natively integrate storage, streaming, metadata, and pipeline execution within Kubernetes are increasingly critical to enterprise success.
This brief examines why VAST Data is structurally well-positioned to shorten time-to-value for Kubernetes-based development teams and where this approach diverges meaningfully from traditional best-of-breed architectures.
The Kubernetes Data Pipeline Problem Is No Longer About Tools
Across enterprise and cloud-native environments, Kubernetes has become the default control plane for application delivery. However, data infrastructure has lagged behind in adopting the same operational model. We see that organizations rarely struggle to deploy Kubernetes itself, but they struggle to coordinate all the services and side-cars around it, especially the data services.
Typical Kubernetes-native data pipelines now involve:
- Object storage for raw and intermediate data
- Kafka or equivalent streaming platforms
- Separate metadata catalogs and governance tools
- External vector databases for AI workloads
- Independent observability and audit systems
- CI/CD pipelines for data services themselves
Each layer introduces its own security model, metadata representation, scaling constraints, and operational dependencies.
Our survey data reinforces this fragmentation:
- 51.8% of organizations report running 4–6 data platform vendors, with another 30.8% operating 7 or more
Future of Data Platforms Summit survey, November 2025, theCUBE Research
. - Despite 96% believing their environments are “well integrated,” over 34% still cite persistent data silos, a clear signal that integration is often conceptual, not operational
Future of Data Platforms Summit survey, November 2025, theCUBE Research
The result is slower experimentation, brittle pipelines, and extended onboarding cycles for development teams.
Why Streaming and Metadata Are the Real Bottlenecks
While storage performance and AI readiness often dominate architectural discussions, and we see that in our survey data, we emphasize that streaming data and metadata coordination are where pipelines most often fail to scale.
Traditional Kafka-centric designs introduce several friction points:
- Separate infrastructure stacks for streaming and storage
- High operational overhead for long-retention or replay-heavy workloads
- Inconsistent access controls between source data and derived streams
- Metadata duplication across catalogs, vector stores, and audit systems
We see that VAST’s approach, basically abstracting the storage layer into a database by embedding a Kafka-compatible event broker directly into the data platform, can change this model. Streaming events are stored as first-class objects, inheriting the same security, replication, snapshot, and lifecycle controls as all other data in the Vast platform.
This matters because metadata is no longer an afterthought. As Pernsteiner notes, when metadata, audit logs, pipeline telemetry, and even vector embeddings live on the same platform, as VAST aims to do, it removes entire categories of synchronization, crawling, reconciliation, and security permissions work that typically sit outside the critical path but still delay delivery to the data consumer.
This aligns closely with survey findings showing storage and management layers deliver the highest ROI across the data platform stack, outranking analytics and consumption layers. Future of Data Platforms Summit survey, November 2025, theCUBE Research
Unified Metadata as a Prerequisite for AI-Scale Pipelines
In modern application architectures that routinely include AI pipelines, these architectures are collapsing the traditional boundaries between structured, unstructured, and streaming data.
RAG pipelines, agentic workflows, and real-time inference introduce new requirements:
- Continuous ingestion triggers
- Vector indexing at massive scale
- Metadata-driven access controls
- Auditable lineage from source to inference output
We are seeing a shift from data storage systems to data engines, where ingestion, indexing, triggering, and execution are integrated. What is typically a very complex set of activities, often resulting in multiple copies of the same data, can be performed in place or even in-flight.
Survey data strongly support this direction:
- 88% of organizations believe they have strong governance and metadata practices, yet 51.6% still report data quality disruptions—a gap that suggests governance tools are often detached from execution platforms
Future of Data Platforms Summit survey, November 2025, theCUBE Research
. - 87% say open formats are critical to reducing lock-in, reinforcing the need for protocol-level compatibility rather than proprietary abstraction
Future of Data Platforms Summit survey, November 2025, theCUBE Research
VAST’s model, which treats metadata catalogs, vector stores, audit logs, and streams as database-native constructs, directly aims to address this disconnect.
Kubernetes-Native Pipelines Without Kubernetes-Native Complexity
A key insight from our research is that developers want the benefits of Kubernetes without the friction.
Here is how we see VAST’s Data Engine approach addressing the friction the platform engineering teams experience:
- Functions and pipelines to be defined at the VAST platform layer
- Kubernetes is used as the execution substrate, not the developer interface
- Event-driven triggers based on data changes, not cron jobs
- Consistent observability across storage, streaming, and compute
Importantly, this does not require abandoning existing Kubernetes investments. VAST integrates with customer-managed clusters today and enables future consolidation for organizations that lack operational depth in Kubernetes.
This aligns with survey signals showing:
- 93% plan to increase investment in management tools
- 65.4% identify scaling AI as their top challenge, not model availability or frameworks
Future of Data Platforms Summit survey, November 2025, theCUBE Research
Simplifying the platform, rather than adding more tools, is clearly the most significant factor driving return on investment.
Our ANGLE
Data platforms are the lynch pin of Agentic AI. How they work in the cloud-native ecosystem is a key factor in how quickly an organization will see ROI of their AI. From an independent perspective, VAST’s differentiation does not rest on any single feature. It lies in architectural convergence:
- Streaming is data, not just storage infrastructure
- Metadata is required operationally, not descriptive
- Pipelines are platform services, and should not be externally glued on
- Kubernetes is an execution fabric, not a burden for developers and platform engineers
The model VAST Data has built into their platform directly aims to address the top inhibitors surfaced in the Future of Data Platforms survey, scaling AI, data quality, skills shortages, and cost control, by removing entire classes of integration and operational work from the developer lifecycle.
While best-of-breed strategies will continue to dominate vendor selection decisions (favored by 82% of respondents), the definition of “best-of-breed” is evolving. Platforms will play a big role as they can help reduce pipeline sprawl without enforcing proprietary lock-in, and are increasingly viewed as enablers rather than compromises.
The next phase of Kubernetes-native data platforms will not be defined by more services, but by fewer seams.
We see VAST Data’s unified approach to storage, streaming, metadata, and pipelines represents a pragmatic response to the realities faced by modern development and platform engineering teams. For organizations seeking to shorten time-to-value while scaling AI workloads responsibly, this architecture merits serious consideration.
Here is a related video discussion with Andy Pernsteiner, Field CTO at VAST Data:
Feel free to reach out and stay connected through robs@siliconangle.com, rob@smuget.us, read @realstrech on x.com, and comment on our LinkedIn posts.

