Formerly known as Wikibon

AI Data Readiness Is Becoming the Real Enterprise Bottleneck

Enterprises Are Discovering That AI Failure Is Usually a Data Failure

The AI industry has spent the last several years focused on models, GPUs, and infrastructure scale. Organizations raced to acquire compute capacity, experiment with foundation models, and launch pilot projects across the business.

Yet despite unprecedented investment, an estimated 60% to 80% of AI initiatives still fail to reach production. The problem is increasingly clear: most enterprises do not have a model problem. They have a data problem.

As organizations move from experimentation to implementation, the primary challenge is no longer generating AI outputs. It is finding, governing, preparing, and operationalizing the data required to make AI useful at scale.

In this episode of AppDevANGLE, I spoke with Sam Newnam, VP of AI Solutions and Business Development at Hammerspace, about enterprise AI readiness, data fragmentation, infrastructure strategy, and why data orchestration has become the limiting factor for AI ROI.

Our conversation explored why organizations continue to overinvest in compute while underinvesting in data readiness, how fragmented enterprise environments slow AI adoption, and why successful AI initiatives increasingly depend on platforms that can simplify data access, governance, and movement across hybrid environments.

The AI Value Gap Starts With Data Fragmentation

One of the most important themes from the discussion is that enterprise data environments were never designed for AI.

Over the past two decades, organizations accumulated data across on-premises infrastructure, public clouds, SaaS applications, backup systems, and specialized storage platforms. While these environments successfully supported business operations, they created significant barriers to AI adoption.

As Newnam explained, organizations often assume they already possess the data necessary to succeed with AI.

“The real value is inside the data,” he said. “People think they have the data, but they don’t realize how hard the data component is.”

The challenge becomes particularly visible when enterprises attempt to move beyond proof-of-concept projects.

Small pilot initiatives may only require a few hundred or a few thousand files. Production AI systems often require access to millions of files distributed across multiple locations, storage platforms, and security domains.

“You’ve got this 26-year-old data scientist that shows up and says, ‘Hey, I need god-level access to everything,’” Newnam joked. “And every team in the enterprise says, ‘Wait a minute, that’s my data.’”

The result is a growing collision between AI ambitions and the reality of fragmented enterprise infrastructure.

Data Gravity Has Become an AI Problem

The concept of data gravity is not new, but AI has dramatically increased its importance.

Historically, organizations could tolerate data silos because applications were often tied to specific systems of record. AI changes that equation because models need access to large, diverse, and continuously updated datasets.

According to Newnam, many organizations underestimate the complexity of assembling data for AI workloads.

“It’s really easy to start a proof of concept with the first 500 or 1,000 files,” he explained. “Connecting to millions of files in different locations becomes a problem that nobody thought about.”

What makes this particularly challenging is that valuable data often resides in places enterprises are not actively using. Historical archives, backup repositories, departmental file systems, and legacy storage environments may contain highly relevant information that AI initiatives require.

The issue is not simply access but understanding what data exists, where it resides, who owns it, how it can be used, and whether it can be governed appropriately. This creates significant operational overhead before model development even begins.

AI Readiness Requires Simplifying the “Messy Middle”

Another recurring theme throughout the discussion was the growing complexity gap between traditional IT operations and AI deployment.

Many enterprise storage and infrastructure teams possess deep expertise in managing systems of record. However, AI introduces entirely new technologies, workflows, and operational requirements.

“A traditional storage admin doesn’t understand Helm charts and Kubernetes and ingest pipelines,” Newnam noted.

The result is an expanding operational gap between infrastructure teams and AI teams. Organizations must now coordinate:

  • Storage infrastructure
  • Data governance
  • Security policies
  • Metadata management
  • Data movement pipelines
  • Vector databases
  • Foundation models
  • Retrieval systems

The challenge is that most organizations lack the specialized skills necessary to integrate all these components effectively.

According to Newnam, successful AI adoption increasingly depends on abstracting this complexity away from operational teams.

“It’s really about how do I obscure that stuff,” he explained. “How do I create a true platform that isn’t a bunch of Legos laid out on the table?”

This reflects a broader trend occurring throughout enterprise IT. The market is moving away from assembling individual AI components and toward integrated platforms that reduce operational friction.

Governance Cannot Be Separated From Data Mobility

One of the biggest misconceptions surrounding AI readiness is that organizations can simply extract data from legacy systems and move it into AI pipelines. In reality, governance requirements often make this far more complicated.

Many enterprise systems were designed with embedded security controls, access permissions, compliance frameworks, and regulatory safeguards. When data moves, those protections must move with it.

“The security teams are really where we see a lot of failures,” Newnam explained. “Their job is to protect the organization from risk.”

This creates tension between AI teams seeking broad data access and security teams responsible for maintaining governance. The challenge is not simply moving files. Organizations must preserve metadata, permissions, classifications, lineage, and contextual information throughout the entire AI workflow.

“Metadata is king,” Newnam said. By maintaining metadata alongside data movement, enterprises can preserve governance policies while still enabling AI access to distributed information sources.

This capability is increasingly becoming a prerequisite for production-scale AI deployments.

AI ROI Depends More on Data Readiness Than Infrastructure Spending

The conversation also highlighted a growing disconnect between where organizations spend AI budgets and where actual value is created.

Many enterprises continue investing heavily in GPUs, storage infrastructure, and model experimentation. While these investments remain important, Newnam argues that organizations frequently underestimate the importance of data readiness.

“There’s this fear of making a massive investment in GPUs and storage and infrastructure and hoping they get an outcome,” he explained.

Instead of viewing AI as a large-scale infrastructure project, Newnam advocates for a more incremental approach. Organizations should focus on solving specific business problems, proving measurable outcomes, and building scalable data foundations that support future expansion.

“I think the way enterprises really start to see ROI is project-based AI,” he said.

This mirrors successful application development practices where teams start with narrowly defined use cases, validate outcomes, and expand gradually rather than attempting enterprise-wide transformations immediately.

The difference is that AI success increasingly depends on whether the underlying data infrastructure can scale alongside those initiatives.

Analyst Take

The enterprise AI conversation is undergoing an important shift. For the past two years, organizations have largely focused on acquiring models, infrastructure, and compute capacity. Those investments were necessary, but they are proving insufficient.

The emerging bottleneck is data readiness. What makes this challenge particularly difficult is that it spans multiple organizational functions simultaneously. Infrastructure teams manage systems of record. Security teams manage governance. Data scientists build models. Business units define outcomes.

AI success increasingly depends on bringing all of these groups together around a common operational framework. The organizations that succeed will not necessarily be the ones with the largest GPU clusters or the newest models. They will be the ones that can identify, govern, mobilize, and operationalize data faster than their competitors.

The market is beginning to realize that AI readiness is fundamentally a data orchestration problem. Models may generate intelligence, but data determines whether that intelligence creates business value.

Article Categories

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
"Your vote of support is important to us and it helps us keep the content FREE. One click below supports our mission to provide free, deep, and relevant content. "
John Furrier
Co-Founder of theCUBE Research's parent company, SiliconANGLE Media

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well”

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content