The 2025 Real-Time Analytics Summit marked a pivotal moment in the evolution of data analytics—one that transcends the speed of data processing and enters a new era where AI agents are not just data consumers but active decision-makers.
From Dashboards to AI-Driven Agents
As the summit keynote insightfully framed, real-time analytics has long been compared to racing cars—faster and more powerful than ever. Yet, these systems have traditionally been limited by their drivers: humans tasked with interpreting dashboards and metrics, inevitably becoming the bottleneck. This human dependency caps the potential of real-time data, no matter how advanced the underlying technology.
Today, for the first time, AI agents capable of instant reasoning over streaming data are poised to eliminate that bottleneck. This leap fundamentally changes the analytics paradigm, where AI seamlessly consumes, interprets, and acts on data in real time.
The Scale and Untapped Potential of Data
To appreciate this shift, consider the scale we’re dealing with: over 100 zettabytes of data generated annually worldwide, according to IDC and corroborated by theCUBE Research’s recent market analysis. Despite this explosion, only a small fraction of data is actively used in decision-making. The widening chasm between data creation and actionable insight is one of the biggest inefficiencies in modern enterprises.
Apache Pinot and StarTree Building the Foundation for AI-Driven Analytics
A central theme of the summit was the platform evolution required to support this AI-driven future. StarTree and Apache Pinot are emerging as critical enablers, pioneering open-source, real-time analytics platforms designed from the ground up to power both human and AI consumption.
The real-world examples shared at the event underline this momentum:
- LinkedIn Talent Insights, built on Apache Pinot, evolved from monthly Hadoop reports to interactive dashboards that surface hiring trends and talent flows in near real-time. The next frontier is conversational AI interfaces that distill these insights further, enabling users to query talent shifts naturally and receive context-rich, actionable summaries without parsing dense dashboards.
- Use cases across industries—from fraud detection at NewBank to real-time ad analytics at DoorDash and merchant insights at Stripe—showcase the scalability and flexibility of these platforms, which process over 13 billion queries weekly on petabytes of data using tens of thousands of CPU cores.
theCUBE Research Perspective on Market Trends and the Shift Toward AI-Integrated Analytics
Our latest research highlights the rise of AI-augmented analytics platforms as the fastest-growing segment in the analytics market, with a projected CAGR exceeding 30% through 2028. Organizations increasingly prioritize systems that surface insights faster and can autonomously act on anomalies, customer behaviors, and operational risks.
This shift corresponds with a broader architectural transition — from monolithic analytics suites toward disaggregated, modular stacks. This design philosophy empowers enterprises to select best-of-breed tools for instrumentation, data transport, processing, storage, and visualization. This modularity is critical for meeting the diverse needs of modern use cases, from sub-second freshness in personalization engines to high concurrency demands in observability platforms.
The New Drivers of Data
The most compelling insight from the summit is the transformative potential of AI agents acting as autonomous drivers in the analytics journey. Unlike legacy rule-based systems, AI agents:
- Learn and adapt continuously, eliminating the need for manually coded triggers.
- Interpret natural language queries, lowering the barrier for non-technical users.
- Take automated corrective actions, such as rolling back faulty deployments or dynamically adjusting customer offers.
- Scale across multiple domains and workloads, enhancing multi-tenancy and operational efficiency.
The result is a system in which language becomes the interface, and AI agents reason through complex data patterns at scale—a radical departure from static dashboards and reports.
For example, Praveen Nepali Naga, CTO of Mobility and Delivery at Uber, expressed his enthusiasm for the StarTree Real-Time Analytics Summit and discussed how Uber’s technology and marketplace have evolved over the past decade. Initially starting with just Uber Black and UberX, Uber has since expanded its transportation modes to include reserved rides, airport pickups, taxis (hailable), shared rides (Uber Pool), shuttles, and even services for teens. On the delivery front, Uber began with food delivery and expanded into drinks, groceries, alcohol, retail, and enterprise solutions. Today, Uber’s mission is to enable users to “go anywhere” and “get anything,” operating in over 70 countries and 10,000 cities, handling 33 million trips daily, and reaching a milestone of one million concurrent trips. This scale poses immense engineering challenges around concurrency and system fan-out, supporting over 8 million earners and 170 million consumers.
Uber is described as the epitome of a real-time network, where pressing the app button triggers a complex real-world and digital interaction: matching riders with drivers in real time globally, with safety as a core concern. Real-time data plays a crucial role here, as the “value of data” decays rapidly over time. Uber categorizes the value of data into seconds (critical for ETAs, pricing, and matching), minutes (to assess supply-demand balance in geographic areas), and days (for historical data used in personalization and product development). Different technologies are used to serve these needs.
Uber leverages data-driven experimentation for product development, running hundreds of experiments and iterating based on real-time insights. For example, real-time ETA data was used to introduce “faster” badges in the app that improved user conversion, alongside badges based on historical trends like “popular when you travel.”
Praveen highlighted two key platforms powered by real-time data: the Marketplace Indexing Platform, which indexes drivers, couriers, and trips in real time for effective matching, and the Mapping Platform, which integrates historical data, real-time traffic, and driver inputs to provide accurate ETAs crucial for routing and pricing.
The massive scale of Uber’s data ecosystem was emphasized: ingesting about 8 trillion Kafka events daily, maintaining a data lake nearing an exabyte (900 petabytes), and running roughly 200 million queries daily on Pinot, their real-time analytics infrastructure. Data from apps, sensors, telemetry, backend systems, and third-party sources flows into Kafka, then processed and stored in Pinot for real-time decision-making, while also feeding into a data lake with Hudi for transactional and historical analysis.
Shifting focus to autonomous vehicles (AVs), Praveen explained Uber’s vision of a hybrid marketplace that combines human drivers with AVs and integrates multiple transportation modes (UberX, Uber Black, taxis, Uber Moto, shared rides, shuttles, and services for teens).
He described the challenge of managing heterogeneous supply and demand patterns, highlighting the inefficiencies of fixed AV fleets that can be underutilized during low-demand periods. Uber’s flexible hybrid model allows dynamic matching of AVs and human drivers to balance utilization and meet fluctuating demand, creating economic efficiencies.
With tens of thousands of trips completed, Uber partners with 14 AV providers worldwide, including Waymo. The hybrid marketplace is built on three pillars:
- Marketplace – Seamlessly stitching AVs and human drivers to optimize dispatch and reduce wait times.
- Fleet Management Platform – Enabling AV providers to manage vehicle utilization in real time, including charging and routing logistics.
- User Experience – Offering riders a seamless experience where they may not even know if they are getting an AV or a human-driven car, with features like unlocking and rating rides integrated into the Uber app.
Praveen emphasized sustainability, shared mobility, and autonomous technology as key components shaping Uber’s future of mobility. He concluded by acknowledging the critical role of the technology community in Uber’s journey and encouraging attendees to stay connected through Uber’s social channels and blog.
What This Means for the Industry
For CIOs and data leaders, building analytics platforms that support AI-native workflows is no longer optional but essential to remain competitive. The combination of streaming data at scale, modular platforms, and intelligent AI agents will unlock new business agility, deeper customer understanding, and faster risk mitigation.
From theCUBE Research, I would recommend organizations:
- Embrace open, disaggregated stacks to retain flexibility.
- Invest in platforms supporting vector embeddings and indexing for AI workloads.
- Democratize data access by leveraging conversational AI interfaces.
- Prioritize platform features like tiered storage, multi-tenancy, and operational ease to support diverse analytics workloads.
All these capabilities are crucial, but one key aspect to remember is the need to analyze and convert incoming data into a format Pinot understands. In practice, this means making trade-offs depending on the use case. For some scenarios, you might prioritize one approach over another. With the evolving demands of AI, it becomes essential to support converting incoming data into vector embeddings and enable vector indexing concepts. Moreover, the platform supports the MCP (Model Context Protocol), allowing agents to be accessed and interacted with through other agents, which adds another layer of flexibility.
At the core of this system lies a classic trade-off between compute and storage — where is your compute happening relative to your storage? Is it local or remote? If data is remote, you incur latency and additional costs, whereas local compute offers much faster performance. Similarly, the choice of data format matters. Formats like Parquet offer sequential access, while Pinot’s optimized format supports random access. Each comes with different performance and cost characteristics. On the indexing side, the choice between coarse- or fine-grained indexing impacts application performance directly. Ultimately, the key is to select the right balance based on the specific application’s needs.
Different applications have different priorities. For instance, ad hoc analytics performed internally can tolerate higher latency and prioritize cost savings, while user-facing applications demand low latency and the best possible performance. The platform has evolved to support the full spectrum of storage and performance options to address these diverse requirements. With the introduction of tiered storage, data can reside in remote stores like S3 but still achieve optimized performance, allowing users to trade off between cost and speed seamlessly.
Historically, Pinot relied heavily on its optimized format for maximum speed. However, customers frequently asked if the platform could also support formats like Iceberg, which integrates well with data lakes but is comparatively slower. With the upcoming Iceberg support, users can toggle indexing features on or off depending on their data sets, further refining their control over performance and cost trade-offs. This granular control extends to time-based policies — for example, specifying that data be kept local for the most recent six hours for performance, then moved to cheaper, remote storage thereafter.
Excitingly, the platform also incorporates native AI capabilities with features like auto-vector embedding. As data arrives, embeddings can be generated in real time by querying any preferred cloud or custom model. In addition to supporting queries over Parquet and Iceberg, Pinot now includes a natural language interface that automatically translates queries into Pinot’s native query language, calls the system, and uses large language models (LLMs) to summarize results. This creates a seamless experience where data ingestion, querying, and AI-powered insights coexist on a unified platform.
The architecture also includes an enricher component, which integrates external systems to generate embeddings, and full support for the MCP protocol, enabling rich agent-to-agent communication. Notably, the system preserves structured data alongside embeddings, empowering diverse use cases that require both types of information.
On the deployment side, responding to customer demand, the platform now supports a “bring your own Kubernetes” model, allowing users to deploy it within their own Kubernetes clusters on platforms like EKS or AKS. This provides complete control and flexibility while benefiting from Startly’s monitoring and SLA guarantees. Customers can choose from dedicated SaaS, bring-your-own-cloud, or bring-your-own-Kubernetes deployment options across all major clouds, widening accessibility and adoption.
None of these advancements would be possible without the vibrant Apache Pinot community, whose contributions continue to drive innovation and expand use cases. The platform has come a long way from the early days of monthly reports to dashboards and now AI-powered real-time analytics agents. The future is no longer just about faster queries or prettier charts — it’s about making AI-powered real-time analytics the foundation for the next generation of intelligent systems. We hope you embrace this transformation and make it a key part of your journey.
Related Customer Case Studies: Real-Time Analytics and Marketplace Observability
Uber’s emphasis on real-time data, marketplace indexing, and hybrid fleet management reflects a broader industry trend where leading companies leverage advanced real-time analytics platforms to optimize operations and enhance user experiences. Several notable customer case studies illustrate similar challenges and innovative solutions:
- Grab’s Real-Time Metrics Platform for Marketplace Observability
Grab, a leading Southeast Asian super app, developed a real-time metrics platform for deep marketplace observability. This platform enables Grab to monitor dynamic supply and demand patterns across its ride-hailing and delivery services, similar to Uber’s marketplace indexing and matching systems. Grab can quickly respond to shifting market conditions by harnessing real-time data, improving resource allocation, and enhancing customer experience.
(Read more: How Grab Built a Real-Time Metrics Platform for Marketplace Observability) - CrowdStrike’s Scaling of Real-Time Analytics with Apache Pinot
CrowdStrike, a cybersecurity leader, scaled its real-time analytics infrastructure using Apache Pinot to deliver rapid, high-volume data insights critical to its threat detection and response capabilities. Like Uber’s use of Pinot for executing 200 million daily queries, CrowdStrike’s platform demonstrates how efficient, scalable real-time analytics can power decision-making and operational effectiveness at massive scale.
(Read more: How CrowdStrike Scaled Real-Time Analytics with Apache Pinot) - Nubank’s Management of Real-Time Data Complexity and Cloud Cost Reduction
Nubank, a leading digital bank in Latin America, tackled the complexity of real-time data processing by adopting Apache Pinot, which helped them reduce cloud infrastructure costs by $1 million. Nubank’s use case highlights how real-time data platforms enable enhanced customer personalization and product innovation and bring significant operational efficiencies, paralleling Uber’s need to manage trillions of Kafka events and a vast data lake for marketplace optimization.
(Read more: Nubank Tames Real-Time Data Complexity with Apache Pinot, Cuts Cloud Costs by $1M)
Additional examples:
- AWS & Apache Pinot: Powering Real-Time GenAI Pipelines
Apache Pinot on AWS drives real-time context retrieval for GenAI applications, enabling fast vector search and retrieval-augmented generation (RAG). - 7SIGNAL: Strategic Migration from Apache Druid to Apache Pinot
The technical and business motivations behind 7SIGNAL’s migration from Apache Druid to Apache Pinot for network observability and performance monitoring are detailed below. - Life360: Scaling Family Safety with Real-Time Geospatial Analytics
Life360 delivers precise, real-time geospatial insights at scale to power family safety features with Apache Pinot.
These examples demonstrate the critical importance of real-time analytics platforms in enabling complex marketplaces and data-driven enterprises to scale effectively, optimize utilization, and deliver superior customer experiences—challenges that Uber addresses through its sophisticated hybrid marketplace and real-time data infrastructure.
Future Outlook
The 2025 Real-Time Analytics Summit painted an inspiring vision: a future where real-time analytics is no longer constrained by human interpretation but accelerated by AI-powered agents capable of making decisions at machine speed. With platforms like Apache Pinot and StarTree leading the way, and an industry-wide architectural shift underway, the race for AI-driven analytics is on — and the finish line promises transformative impact for businesses worldwide.
To learn more and watch the replay of RTA Summit 2025 here.