ABSTRACT: With GTC and AWS PI Day in the rearview mirror, we examine the advancements and innovation delivered with Amazon SageMaker and Amazon S3 Tables, revolutionizing data management and AI development. The introduction of Amazon SageMaker Unified Studio, a more comprehensive platform, provides a single environment to streamline data analytics and AI workflows, integrating seamlessly with AWS analytics and AI/ML services. Meanwhile, Amazon S3 Tables, the first cloud object store with native Apache Iceberg support, simplifies tabular data storage and analytics. These innovations empower organizations to accelerate time-to-value, enhance collaboration, and unify data access across cloud environments.
AWS PI Day and GTC: Oh AI
The momentum from AWS Pi Day 2025, which showcased the next generation of Amazon SageMaker, SageMaker Lakehouse, and Amazon S3 Tables, continued seamlessly into NVIDIA GTC 2024, where AWS deepened its long-standing partnership with NVIDIA to accelerate generative AI innovation. Pi Day focused on delivering a unified, open, and secure data foundation through capabilities like SageMaker Unified Studio, S3 Tables with native Apache Iceberg support, and zero-ETL integrations, all designed to simplify and speed up data-to-AI workflows. These capabilities laid the groundwork for broader AI acceleration, empowering customers to build, govern, and scale AI applications with greater agility and confidence.
At GTC, AWS and NVIDIA unveiled a series of joint initiatives that extend this vision. These include the integration of Amazon SageMaker with NVIDIA NIM inference microservices for optimized GPU performance, and the upcoming availability of NVIDIA’s powerful Blackwell platform on Amazon EC2 and DGX Cloud. AWS will also provide the infrastructure for Project Ceiba, a groundbreaking AI supercomputer powered by over 20,000 NVIDIA Grace Blackwell Superchips. Together, these advancements reinforce the synergy between AWS’s cloud-native ML development environments and NVIDIA’s high-performance AI computing, enabling customers to build and run next-generation AI models at scale, securely and efficiently.
SageMaker Unified: A Comprehensive Data and AI Development Environment
SageMaker Unified Studio combines functionalities from Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI into a single experience. Users can discover, prepare, and collaborate on data assets within a governed environment. By leveraging Amazon SageMaker Lakehouse, a unified data platform built on Apache Iceberg, organizations can access diverse data sources, including Amazon S3 data lakes, Redshift data warehouses, and third-party databases, all from a single interface.
Addressing the Challenges of AI-driven Organizations
Data is the lifeblood of AI. And as AI continues to transform industries, businesses face new challenges, such as siloed data, complex pipelines, and inconsistent access controls. These hurdles make collaboration difficult for data teams and hinder AI-powered decision-making. Talking to AWS VP of Analytics Sirish Chandrasekaran, he sees that convergence, governance, and agents are the three major themes driving AI transformation.
- Convergence: The distinction between structured and unstructured data, data lakes and data warehouses, and even data analysts and data scientists is rapidly disappearing. Organizations are looking for a unified platform that allows them to integrate their diverse data sources and analytics tools seamlessly.
- Governance: AI governance is about more than just compliance, it is about confidence. Organizations must gain trust that their AI models are trained on reliable, well-governed data and that they comply with responsible AI policies.
- Agents: Organizations increasingly seek multi-agent AI applications, where multiple AI-driven agents collaborate to solve complex problems. An example used was how companies like BMW leverage AI agents for real-time root cause analysis, significantly reducing engineers’ time investigating issues.
Key Enhancements in SageMaker Unified Studio
AWS has expanded SageMaker Unified Studio with two significant updates:
- Amazon Q Developer Integration: Now embedded throughout the studio, Amazon Q Developer serves as a powerful AI assistant that helps users discover data, generate SQL queries, automate ETL jobs, and build generative AI applications using natural language.
- Amazon Bedrock Integration: The latest update to SageMaker Unified Studio now includes Claude 3.7 Sonnet, DeepSeek, and optimized inferencing models from Amazon, Meta, and Anthropic, enabling low-latency generative AI capabilities within the platform.
Additionally, SageMaker Unified Studio now supports zero-ETL integrations with over 15 data sources, including Aurora, DynamoDB, Salesforce, Redshift, and eight applications. This allows seamless data access without requiring costly and time-consuming ETL processes.
A Redefined Approach to Data Cataloging and Governance
AWS has introduced SageMaker Catalog as the control layer for the entire AI and data stack. Unlike traditional data catalogs, SageMaker Catalog offers:
- AI-powered metadata generation, enabling users to easily discover and understand their data assets.
- Real-time monitoring and compliance, ensuring AI models are trained on governed data.
- Bedrock Guardrails for automated policy enforcement, reducing hallucinations, bias, and toxicity in generative AI applications.
AWS also utilizes automated reasoning techniques, mathematically verifiable proofs that are all the rage, to enhance governance across AI workflows, similar to how IAM policies and S3 security mechanisms have been validated for years.
Amazon S3 Tables: Revolutionizing Data Storage and Analytics
The Power of Apache Iceberg in Cloud Storage
Amazon S3 Tables, announced at re:Invent 2024, is the first cloud object store to support native Apache Iceberg. This advancement eliminates the need for self-managed table storage, offering up to 3x faster query throughput and 10x higher transaction rates acording to AWS. S3 Tables facilitate seamless data integration across AWS analytics services, including Amazon Athena, Amazon EMR, AWS Glue, and Amazon Redshift.
Addressing the Limitations of Parquet Storage
For years, Apache Parquet has been the dominant format for storing tabular data in Amazon S3. While Parquet offers efficient storage and optimized query performance, it lacks built-in table management capabilities, making tasks like schema evolution, data mutations, and time travel more complex.
S3 Tables build on the strengths of Parquet while overcoming these challenges by integrating Apache Iceberg, which provides:
- Schema evolution: The ability to add or remove columns without disrupting existing workflows.
- Transactional consistency: Support for ACID transactions to ensure reliable updates and deletes.
- Snapshotting and time travel: Enabling users to query historical versions of their data without complex ETL processes.
Unified Data Access with SageMaker Lakehouse
With the general availability of S3 Tables integration with SageMaker Lakehouse, users can now query S3 Tables directly from SageMaker Unified Studio. This enables:
- Secure, fine-grained access control to tabular data across multiple engines.
- The ability to join S3 Table data with Redshift data warehouses and third-party sources like PostgreSQL and Amazon DynamoDB.
- Seamless analytics and ML model development without requiring complex ETL processes.
Bringing Analytics and Storage Closer Together
The launch of S3 Tables marks a fundamental shift in AWS data architecture, bridging the gap between object storage and structured data analytics. AWS VP Andy Warfield explains that S3 Tables transform S3 from a traditional object store into a first-class table storage service, providing structured data capabilities previously only available through external database engines.
AWS has optimized S3 Tables for both high-performance analytics and scalable data management. Key enhancements include:
- Automatic compaction: Reducing the overhead of managing small Parquet files for improved query performance.
- Support for the Iceberg REST Catalog API: Ensuring seamless integration with modern query engines such as DuckDB, Apache Spark, and Polars.
- Expanded global availability: With deployments across multiple AWS regions, enabling organizations to scale workloads efficiently.
Simplified Data Management and Querying
AWS has introduced new features for easier S3 Tables management:
- Create and query tables directly from the Amazon S3 console using Amazon Athena.
- Schema definition support and increased scalability with up to 10,000 tables per S3 table bucket.
- Deep integration with SageMaker Unified Studio, allowing data teams to interact with S3 Tables through a single interface.
As Warfield highlights, AWS continuously innovates based on customer feedback, ensuring that S3 Tables evolve to meet the growing needs of data-driven enterprises. By combining high-performance storage, seamless analytics integration, and simplified governance, S3 Tables and SageMaker Lakehouse together create a next-generation data foundation.
Our ANGLE
AWS is really stepping up innovation within its services in areas such as AI, data platforms, and data management. As the rise of agentic systems is desired by organizations, like the example of BWM, we see agentic systems being applied to make organizations more productive and gain real ROI. If you know my take on this, it might not surprise you to hear me so positive on the announcements. I have been saying for over a decade, bring customers solutions not services, and SageMaker and S3 are delivering on that promise. The next generation of Amazon SageMaker and S3 Tables is reshaping how organizations approach AI, ML, and analytics. We believe that by providing a unified, utilizing open technology and providing a secure environment from the start, these innovations significantly reduce time-to-value for data-driven projects.
We see many of these advancements as being key to organizations knowing how and where to get started with AI projects. It also is critical in being able to have an environment that leverages Sagemaker’s zero ETL, which has been expanded and limits the amount of data that moves, depending on the SLA for applications, as organizations will not move all their data to the cloud but by embracing open standards like Apache Iceberg, AWS is making the data pipeline building significantly easier and lower cost. In the case of the S3 Tables, this truly allows organizations to pick their favorite compute layer for the data platform, for which they make a very compelling argument for keeping it in the AWS family with the SageMaker Unified Studio.
We also agree that governance will be critically important and can only be achieved with technical and business metadata convergence. In fact, we see that if regulation is going to come about around AI in the United States, it will be key to ensuring you meet those regulations. Think of it as the approach and regulation that has been well-known in the FinServ industry for decades but on steroids. This is why I am very bullish on Sagemaker Catalog, especially its AI generation, monitoring and compliance capabilities, and Bedrock Guardrails. The catalog and metadata are a significant battleground.
As a product wonk, I appreciate Sirish and Andy’s information on how they continued to innovate between the announcement at AWS Re:Invent and Pi Day. The pieces that were brought to bear coming from the customer feedback will help organizations get more out of these services. Whether you’re a data engineer, analyst, or AI/ML practitioner, these tools offer a robust foundation for seamless collaboration, enhanced governance, and scalable AI development.