Formerly known as Wikibon

Amazon S3 at 20: How “Storage for the Internet” Became the Foundation of the AI Data Era

Abstract: Twenty years ago, on Pi Day, Amazon Web Services launched Amazon Simple Storage Service (S3) with a bold premise: provide “storage for the internet” through simple APIs that could scale indefinitely. What started as a developer-friendly object store has evolved into one of the most foundational infrastructure layers in modern computing.

Today, S3 stores more than 500 trillion objects and processes over a quadrillion requests per year, supporting everything from SaaS applications and media streaming to genomics research and large-scale analytics.

In conversations with AWS VP and Distinguished Engineer Andy Warfield, through my years of experience building on AWS, and in a recent AnalystANGLE discussion with Dave Vellante, we explored how S3 transformed storage, became the underlying data substrate for modern applications, and is now positioning itself as a foundational layer for the emerging AI data stack.

From Storage Silos to API-Driven Infrastructure

When S3 launched in 2006, enterprise storage looked very different. Organizations were buying arrays, building disaster recovery sites, and managing storage silos across multiple vendors and data centers. In fact, both Andy and I were doing exactly that.

Before S3, infrastructure teams had to build and operate their own storage environments for every application or service.

“Amazon had teams building internet-facing applications that kept running into the same problem—every team was standing up their own storage systems. There was a lot of repeated, undifferentiated management.”— Andy Warfield, AWS

S3 addressed this problem with a radically simple idea: expose storage through REST APIs and make it effectively infinite in scale.

From a customer perspective, this changed the economics and operational model of storage.

  • Purchasing hardware
  • Planning capacity
  • Building DR environments
  • Managing replication

Developers could simply write to a GET / PUT / DELETE API endpoint and rely on AWS to handle the rest.

In many ways, S3 did for storage what virtualization did for compute—it abstracted the infrastructure away.

Cloud-Native Storage by Design

One of the most significant architectural breakthroughs behind S3 was that it was designed cloud-native from day one.

Traditional storage systems were built around single systems, clusters, or tightly coupled hardware architectures. S3 instead adopted a distributed architecture built around multiple availability zones, enabling unprecedented durability and availability.

“S3 rethought storage as a regional service built across physically separate infrastructure—three availability zones with independent power and connectivity.”— Andy Warfield

This design made it possible for S3 to deliver the famous 11 nines of durability, a guarantee that dramatically shifted how enterprises think about disaster recovery and data protection.

For many organizations, S3 effectively replaced the need for secondary data centers.

When Object Storage Became an Application Platform

For the first few years of its existence, object storage was largely viewed as archival storage.

However, something interesting happened as developers began building applications on top of S3. The tipping point came with the rise of big data analytics frameworks.

The introduction of the Hadoop S3A connector, which enabled Hadoop workloads to run directly on S3 rather than HDFS, was a pivotal moment.

“That was one of the moments when we realized data in S3 had become primary application data—not just archival storage.”— Andy Warfield

From there, the use cases expanded rapidly:

  • SaaS platforms storing customer data
  • Media streaming and content delivery
  • Genomics and scientific research
  • Large-scale analytics and data lakes

Today, AWS reports over one million data lakes running on S3, reinforcing its role as the foundation for modern analytics architectures.

Removing Friction for Developers

Another defining characteristic of S3’s evolution has been AWS’s focus on removing complexity for developers.

Many of the most impactful S3 innovations were not flashy new products but improvements that simplified how applications interact with data.

Examples include:

  • Strong consistency
  • Multipart uploads
  • Event notifications
  • CloudFront integration
  • Lifecycle management
  • Intelligent tiering

“About 80% of what the team does is improving and scaling the core system—removing sharp edges so developers can build faster.”— Andy Warfield

This philosophy enabled developers to focus on building applications instead of managing infrastructure. It became a cornerstone of the modern cloud operating model.

The Next Phase: New Data Primitives for the AI Era

As workloads evolve, S3 is also expanding beyond traditional object storage.

Two recent innovations illustrate this shift:

  • S3 Tables for structured data using open table formats such as Apache Iceberg
  • S3 Vectors for AI and embedding workloads

These capabilities signal a move toward treating S3 as a multi-modal data layer that supports structured, unstructured, and vector data.

“We’re adding new data primitives so developers can build on the same shared data substrate using whatever tools they want.”— Andy Warfield

This approach aligns with broader industry trends toward open data architectures and composable data platforms.

Economics Still Matter

Despite the new capabilities, one principle has remained constant: lowering the cost of storage.

Since its launch, the price of S3 storage has dropped approximately 84%, while services such as Intelligent-Tiering have saved customers billions of dollars.

“Price is another form of friction for building applications. Lowering that friction allows customers to keep more data and extract more value from it.”— Andy Warfield

This is particularly important in the AI era, where long-term datasets are becoming valuable assets for training models and building new applications.

So What?

Looking back over the past two decades, the most important contribution of Amazon S3 isn’t simply that it created a highly scalable storage service.

It fundamentally changed how developers think about data infrastructure—or more accurately, how they don’t think about infrastructure.

This principle has become the north star for almost every infrastructure company, from cloud providers to enterprise software platforms.

S3 demonstrated that storage could and should be:

  • API-driven
  • Infinitely scalable
  • Durable by default
  • Integrated directly into application architectures

In doing so, it helped enable several major technology shifts:

1. The rise of cloud-native architectures

Applications could rely on externalized infrastructure services instead of tightly coupled systems.

2. The emergence of the data lake

Object storage became the economic and scalable foundation for analytics and machine learning.

3. The evolution of the modern data platform

S3 now serves as the underlying storage layer for many data platforms, lakehouses, and analytics engines.

4. The data substrate for AI systems

AI pipelines—from training data to model checkpoints—depend on scalable and durable data layers.

In other words, S3 is not a data platform in the traditional sense.

But it has become a core component of nearly every modern data platform architecture.

And as AI workloads grow, that role is only expanding.

Looking ahead, S3 will likely continue moving closer to applications, supporting higher-performance workloads and deeper integration with compute and AI tooling.

What started as “storage for the internet” is increasingly becoming something more important:

The durable data substrate for the AI economy.

Feel free to reach out and stay connected through robs@siliconangle.com, rob@smuget.us, read @realstrech on x.com, and comment on my LinkedIn posts.

Full video with Andy –

Full AnalystANGLE –

Article Categories

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
"Your vote of support is important to us and it helps us keep the content FREE. One click below supports our mission to provide free, deep, and relevant content. "
John Furrier
Co-Founder of theCUBE Research's parent company, SiliconANGLE Media

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well”

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content