Formerly known as Wikibon

Why Data Is the Real Bottleneck in Enterprise AI

The GPU Idle Problem

Organizations spent the last two years racing to acquire GPUs, convinced that compute power was the key to unlocking AI’s potential. But there’s a problem: those expensive GPUs are sitting idle, waiting for data that isn’t ready.

According to theCUBE Research’s 2025 study, 86% of organizations are prioritizing data unification as the core part of their AI strategies. This shift represents a fundamental reckoning where AI isn’t primarily a compute challenge. It’s a data challenge.

In a recent episode of AppDevANGLE, I sat down with A.B. Periasamy, co-founder and co-CEO of MinIO, to discuss why data has become the ultimate differentiator in the AI era and what enterprises must do to capitalize on this shift.

From Compute-First to Data-First

The initial AI gold rush focused on acquiring the most powerful GPUs available. Organizations assumed that more compute would automatically translate to better AI outcomes. They were wrong.

“In the past two years, they jumped into AI right away. They focused on compute and then quickly came to the realization that those GPUs were idling out when the data was not ready,” A.B. explained. “It’s not like these organizations didn’t have data at all. They even embraced data practices in the past, but that data was not GPU-world data.”

The problem wasn’t the absence of data, it was the fragmentation. Data scattered across different sources, trapped in silos, stored in incompatible formats, and governed by inconsistent policies. This is what A.B. calls the “data swamp” problem, where previous data lake initiatives failed because they couldn’t deliver the unified, accessible data that AI models demand.

Why Enterprises Are Playing Catch-Up

To understand where enterprise AI is headed, look at the consumer technology market. Companies like Instagram, WhatsApp, and TikTok have built empires on data, delivering services at massive scale while generating enormous profits.

“Look at Instagram, WhatsApp, any of those apps, TikTok. How can they possibly give service of that magnitude? I pay like $100, $200 a month for my telephone bill. Here, WhatsApp is doing free international calls, video calls, and it’s free. How is that even possible?” A.B. asked. “The data updates for everything multiple times over and gives them insane amounts of profit.”

Consumer companies figured out years ago that data is the asset. Enterprises are only now catching up to this reality, driven by AI’s ability to unlock value from previously underutilized information.

The gap between consumer and enterprise is closing, and A.B. predicts they’ll eventually blur together, with the consumer world continuing to lead on innovation velocity.

Data Unification is More Than Just API Access

Early attempts at data unification focused on making APIs available across different systems: the data mesh concept. On paper, it sounded promising: connect Hubspot, Salesforce, Zendesk, Office 365, and on-premises systems through a unified API layer.

It didn’t work.

“The problem was that the data, while they had the data, was not accessible,” A.B. noted. “Just having APIs available is not enough. GPUs are hungry. They need more data. The more data you can give to it, the more unification, more context you give to it, the more intelligence you get out.”

APIs create a drip-feed problem. Data arrives too slowly, in inconsistent formats, and without the necessary context. AI models need something fundamentally different: unified data pools where all relevant information, including call center logs, audio recordings, financial transactions, customer history, even external data like weather or geographic information, exists in one accessible location.

This is where modern data architectures like Apache Iceberg come into play, providing the standardization and open APIs necessary to make unified data pools a reality.

Consolidation Creates Risk and Opportunity

Bringing all enterprise data under one roof creates an obvious problem: what happens to governance, compliance, and security controls when data leaves its original system?

“When data was in different systems and different APIs, every system had their own policy mechanism, their own security mechanism. It was a mess,” A.B. explained. “But now this topic has become more important because now we are having all of the data unified in one place. It’s very easy to have a data breach that’s of the largest scale you’ve ever seen.”

The paradox is that consolidation simultaneously increases risk and makes governance easier. With unified data, organizations can finally implement consistent policies across their entire data estate. One policy engine. One identity system. One set of access controls.

“Now that the data is unified, you can have one policy that tells you have access to this warehouse data, this namespace data, that table data,” A.B. said. “And one policy, simple definition. The policy can only work if you have simple identity, simple policies that anyone can understand.”

Our research confirms this is a critical concern. Organizations cite complexity and skills gap issues as the main challenges in managing modern data platforms, making simplicity in governance essential.

Why Data Size Matters

One of the most compelling insights from my conversation with A.B. centered on the concept of emergent behavior. The idea that intelligence doesn’t scale linearly with data size but rather exhibits sudden leaps at certain thresholds.

“The more data you have, it actually gives you emergent behavior, just like we saw with the models themselves,” A.B. explained. “The GPUs five years before and the GPUs today, they are significantly better. If we had known that we would have gotten human-like intelligence, NVIDIA or Intel or AMD would have produced the same GPU we have today even 10 years before, 20 years before. We simply did not know we will get this emergent behavior beyond a certain size.”

The same pattern applies to data. Consumer companies discovered this years ago: 10 million users doesn’t create meaningful network effects. 100 million is better. But somewhere between 500 million and a billion users, something fundamental changes. Data that seemed like noise becomes an essential part of the value proposition.

“Try deleting some cat videos out of TikTok and Meta, Instagram. It can make headlines,” A.B. noted. “The data, once it reaches a certain capacity, that’s when it starts to show the emergent behavior, emergent property of the information held inside the system.”

Most enterprises haven’t grasped this concept yet. They’re still thinking about data minimization and storage costs rather than data maximization and intelligence gains.

The European vs. American Approach

The conversation naturally turned to how organizations should balance AI innovation with risk management. A debate playing out differently across geographies.

“Europe is thinking harder about the risks. The US is thinking harder about innovation,” A.B. observed. “And you can see that OpenAI was born here. Why? Because we focused on innovation and not risk. If you thought about risk, OpenAI won’t be here. Anthropic won’t be here.”

But this doesn’t mean risk should be ignored. A.B. advocates for making risk a function rather than a barrier: “You should actually embrace innovation, but then it does not mean that risks are not important. You have to make risk a function. Make risk easier to implement. Don’t make risk come in the way.”

He drew a parallel to autonomous vehicles, where 1% failure rates can be fatal, yet the industry continues to innovate because the alternative is unacceptable.

From an application development perspective, this balance is becoming codified in regulations like the EU Cyber Resilience Act (CRA), which requires all applications interacting globally to comply by December 2027. This isn’t about choosing innovation or risk, it’s about innovating with purpose and building compliance into the foundation.

The Data Center Renaissance

A.B. made a provocative observation: we’ve always called them data centers, but for most of computing history, they were really compute centers. That’s changing.

“We are moving into an era where data centers now have a real purpose,” he said. “The birth of computing was actually around data, but for a long part of the history, it was about compute. We should have called them compute centers, but now it has become more than ever. If you look into any of the modern enterprise, think about an app like Uber, calling a cab to an app, a single Postgres database can handle all of the data.”

The question becomes: why do companies like Uber or Walmart need exabytes of data when their core operational data could fit in a single database?

The answer: because that historical data, every transaction, every interaction, every data point collected over time… that’s what actually runs the business. AI has dramatically increased the ability to extract value from this accumulated data, making data infrastructure more critical than ever.

What Organizations Should Do Now

Based on our research and this conversation, here are the key actions enterprises should prioritize:

1. Audit Your Data Fragmentation Identify how many systems, silos, and formats your data currently exists in. Map the dependencies and understand the true cost of this fragmentation.

2. Invest in Unified Data Platforms Move beyond API integration strategies. Invest in technologies that create true data unification like object storage, data lakehouse architectures, and open table formats like Iceberg.

3. Simplify Governance Consolidate identity management and policy engines. Make security and compliance a function that enables rather than blocks innovation.

4. Think Big on Data Stop thinking about data minimization and start thinking about data maximization. The emergent intelligence you’re looking for may only appear once you cross certain scale thresholds.

5. Make Risk a Function, Not a Barrier Build compliance and security into your data foundation from day one, but don’t let risk concerns prevent experimentation and innovation.

The Bottom Line

The AI revolution isn’t being won by whoever has the most GPUs. It’s being won by whoever has the most accessible, unified, and governed data.

As A.B. put it: “More data, more intelligence, and you will get there.”

Organizations that continue to treat data as a secondary concern will find their expensive GPU investments sitting idle, waiting for data that never arrives in the right format, at the right time, with the right governance.

The enterprises that win will be those that recognize data as their most valuable asset and build their entire AI strategy around making that data accessible, unified, and ready to feed the hungry models that depend on it.

Article Categories

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
"Your vote of support is important to us and it helps us keep the content FREE. One click below supports our mission to provide free, deep, and relevant content. "
John Furrier
Co-Founder of theCUBE Research's parent company, SiliconANGLE Media

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well”

You may also be interested in

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content