Abstract: Twenty years ago, on Pi Day, Amazon Web Services launched Amazon Simple Storage Service (S3) with a bold premise: provide “storage for the internet” through simple APIs that could scale indefinitely. What started as a developer-friendly object store has evolved into one of the most foundational infrastructure layers in modern computing.
Today, S3 stores more than 500 trillion objects and processes over a quadrillion requests per year, supporting everything from SaaS applications and media streaming to genomics research and large-scale analytics.
In conversations with AWS VP and Distinguished Engineer Andy Warfield, through my years of experience building on AWS, and in a recent AnalystANGLE discussion with Dave Vellante, we explored how S3 transformed storage, became the underlying data substrate for modern applications, and is now positioning itself as a foundational layer for the emerging AI data stack.
From Storage Silos to API-Driven Infrastructure
When S3 launched in 2006, enterprise storage looked very different. Organizations were buying arrays, building disaster recovery sites, and managing storage silos across multiple vendors and data centers. In fact, both Andy and I were doing exactly that.
Before S3, infrastructure teams had to build and operate their own storage environments for every application or service.
S3 addressed this problem with a radically simple idea: expose storage through REST APIs and make it effectively infinite in scale.
From a customer perspective, this changed the economics and operational model of storage.
- Purchasing hardware
- Planning capacity
- Building DR environments
- Managing replication
Developers could simply write to a GET / PUT / DELETE API endpoint and rely on AWS to handle the rest.
In many ways, S3 did for storage what virtualization did for compute—it abstracted the infrastructure away.
Cloud-Native Storage by Design
One of the most significant architectural breakthroughs behind S3 was that it was designed cloud-native from day one.
Traditional storage systems were built around single systems, clusters, or tightly coupled hardware architectures. S3 instead adopted a distributed architecture built around multiple availability zones, enabling unprecedented durability and availability.
This design made it possible for S3 to deliver the famous 11 nines of durability, a guarantee that dramatically shifted how enterprises think about disaster recovery and data protection.
For many organizations, S3 effectively replaced the need for secondary data centers.
When Object Storage Became an Application Platform
For the first few years of its existence, object storage was largely viewed as archival storage.
However, something interesting happened as developers began building applications on top of S3. The tipping point came with the rise of big data analytics frameworks.
The introduction of the Hadoop S3A connector, which enabled Hadoop workloads to run directly on S3 rather than HDFS, was a pivotal moment.
“That was one of the moments when we realized data in S3 had become primary application data—not just archival storage.”— Andy Warfield
From there, the use cases expanded rapidly:
- SaaS platforms storing customer data
- Media streaming and content delivery
- Genomics and scientific research
- Large-scale analytics and data lakes
Today, AWS reports over one million data lakes running on S3, reinforcing its role as the foundation for modern analytics architectures.
Removing Friction for Developers
Another defining characteristic of S3’s evolution has been AWS’s focus on removing complexity for developers.
Many of the most impactful S3 innovations were not flashy new products but improvements that simplified how applications interact with data.
Examples include:
- Strong consistency
- Multipart uploads
- Event notifications
- CloudFront integration
- Lifecycle management
- Intelligent tiering
“About 80% of what the team does is improving and scaling the core system—removing sharp edges so developers can build faster.”— Andy Warfield
This philosophy enabled developers to focus on building applications instead of managing infrastructure. It became a cornerstone of the modern cloud operating model.
The Next Phase: New Data Primitives for the AI Era
As workloads evolve, S3 is also expanding beyond traditional object storage.
Two recent innovations illustrate this shift:
- S3 Tables for structured data using open table formats such as Apache Iceberg
- S3 Vectors for AI and embedding workloads
These capabilities signal a move toward treating S3 as a multi-modal data layer that supports structured, unstructured, and vector data.
“We’re adding new data primitives so developers can build on the same shared data substrate using whatever tools they want.”— Andy Warfield
This approach aligns with broader industry trends toward open data architectures and composable data platforms.
Economics Still Matter
Despite the new capabilities, one principle has remained constant: lowering the cost of storage.
Since its launch, the price of S3 storage has dropped approximately 84%, while services such as Intelligent-Tiering have saved customers billions of dollars.
This is particularly important in the AI era, where long-term datasets are becoming valuable assets for training models and building new applications.
So What?
Looking back over the past two decades, the most important contribution of Amazon S3 isn’t simply that it created a highly scalable storage service.
It fundamentally changed how developers think about data infrastructure—or more accurately, how they don’t think about infrastructure.
This principle has become the north star for almost every infrastructure company, from cloud providers to enterprise software platforms.
S3 demonstrated that storage could and should be:
- API-driven
- Infinitely scalable
- Durable by default
- Integrated directly into application architectures
In doing so, it helped enable several major technology shifts:
1. The rise of cloud-native architectures
Applications could rely on externalized infrastructure services instead of tightly coupled systems.
2. The emergence of the data lake
Object storage became the economic and scalable foundation for analytics and machine learning.
3. The evolution of the modern data platform
S3 now serves as the underlying storage layer for many data platforms, lakehouses, and analytics engines.
4. The data substrate for AI systems
AI pipelines—from training data to model checkpoints—depend on scalable and durable data layers.
In other words, S3 is not a data platform in the traditional sense.
But it has become a core component of nearly every modern data platform architecture.
And as AI workloads grow, that role is only expanding.
Looking ahead, S3 will likely continue moving closer to applications, supporting higher-performance workloads and deeper integration with compute and AI tooling.
What started as “storage for the internet” is increasingly becoming something more important:
The durable data substrate for the AI economy.
Feel free to reach out and stay connected through robs@siliconangle.com, rob@smuget.us, read @realstrech on x.com, and comment on my LinkedIn posts.
Full video with Andy –
Full AnalystANGLE –

