The conventional security playbook, which is to restrict access to protect sensitive data, is a limiting factor for enterprise AI. AI systems need high-fidelity data to produce real business value, so organizations that over-protect data are degrading its utility and ultimately limiting AI outcomes before a single model is trained.
This emerging requirement to balance data protection with data accessibility was a central topic of a theCUBE Research interview with Vince Goveas, director of product management at Capital One Software. Goveas argues that data protection must shift from perimeter defense to a data-centric model where sensitive data is “self-protecting” as it moves through pipelines and into AI systems, preserving usability without expanding exposure.
Capital One Software’s approach is enterprise tokenization, delivered through Capital One Databolt. The product case is grounded in a direct comparison. In research conducted with PwC, Capital One tested tokenization, masking, and clear text across AI use cases. On structured data, masking achieved 50% predictive accuracy relative to baseline. Tokenization achieved 99.7%, while still protecting sensitive fields. Goveas drew the implication clearly: data protection materially impacts AI performance and, by extension, business results.
Why the stakes are higher now
The way the market has very recently evolved sharpens the relevance of this discussion. Enterprises are increasingly evaluating how AI systems will participate in operational workflows, make decisions, coordinate work across systems and, ultimately, execute actions on behalf of the business.
This shift moves the challenge beyond protecting datasets. Organizations must now consider how sensitive information is governed throughout AI-driven processes, including retrieval, reasoning, orchestration and execution. The relevant questions are no longer limited to who can access a dataset. They extend to what an AI system or agent can retrieve and act upon at runtime, how sensitive information is exposed during multi-step reasoning, how policies are enforced across systems, and how decisions can be monitored, explained and audited after the fact.
Recent industry frameworks reinforce this broader perspective. Regulatory expectations continue to mature, while guidance from organizations such as NIST and OWASP increasingly addresses AI risk across the full operational lifecycle. At the same time, emerging enterprise AI operating models, including concepts such as systems of intelligence, agent control loops and AI-mediated business processes, highlight that securing AI requires more than protecting data. It requires protecting the context, policies and business logic that govern how data is used to drive decisions and actions.
Viewed through this lens, Goveas’ core argument remains highly relevant. Beyond being a prerequisite for model training, the ability to protect sensitive data while preserving its analytical value is a foundational requirement for trustworthy AI operations.
Five strategic implications
1. Tokenization as agentic access control. The original framing, “can a data scientist safely use this dataset?”, is no longer sufficient. The operative question is whether an agent can safely retrieve, reason over and act on sensitive data without overexposing context or accumulating permissions beyond its task. Tokenization should be positioned as a core component of agentic access architecture, not just a data pipeline control.
2. Data utility as a system property. The interview’s utility thesis extends to the broader challenge of governed intelligence systems. Enterprises need a harmonized layer of meaning, rules and state so agents can operate coherently across data silos. Tokenization protects sensitive fields; semantic grounding determines whether the AI system understands what those fields mean and what policies govern their use.
3. Cyber resilience, not just compliance. As AI becomes embedded in operational decision-making, recovery planning has to expand. Organizations must be able to restore not just data, but business state, policy context, agent reasoning traces and authorization history after disruption. This reframes data protection as operational resilience infrastructure.
4. Secure self-service at agent speed. Goveas’ point that developers and analysts should not wait 30-plus days for data access applies with even more force to agents. Agentic workflows require approved data products, policy-aware retrieval, runtime monitoring and automated access revocation. The security team can no longer be the chokepoint; the controls have to run autonomously.
5. From point control to trust architecture. The strongest positioning for tokenization is not as a standalone solution but as a foundational layer within a broader AI trust fabric, alongside classification, discovery, identity, policy enforcement, confidential computing, model-context controls and auditability. The goal is not just protecting data in motion; it is enabling trusted, governed action at machine speed.
Analyst takeaway
The core thesis holds and has become more urgent: AI value depends on usable, trusted data. But the definition of “trusted” has expanded. In the generative era, trust meant the model would not leak sensitive information. In the agentic era, trust means the system can act on sensitive data reliably, within policy, at speed, and with a recoverable audit trail if something goes wrong.
Enterprises that underprotect sensitive data increase regulatory and privacy risk. Those that over-protect it by destroying utility cap their AI ambitions before they start. The organizations building durable AI advantage are treating data protection not as a constraint on AI deployment but as the architecture that makes scalable, governed AI possible.
That reframe, from security bottleneck to enabling infrastructure, is what makes this conversation worth continuing.

