SUMMARY
Interview with Madan Thangavelu, Sr. Director of Engineering for Uber Rider App
The platform demonstrates how enterprise applications can scale beyond traditional data analytics to handle complex real-time operations while maintaining consistency and reliability at global scale
Uber’s Rider app has evolved from basic ride-hailing to a multi-service platform handling everything from package delivery to train bookings, demonstrating how data-driven applications can expand beyond traditional boundaries
The app’s unique architecture separates real-time operational data from presentation layers, allowing it to handle a million events per second while maintaining consistency across millions of devices globally
Core innovation includes “Riblets” pattern that replaced traditional MVC architecture, enabling multiple teams to independently develop and deploy features that share the same screen real estate without conflicts
Uber’s approach to data management bridges transactional and analytical systems by using pre-computed ML models combined with real-time data, while maintaining state consistency across rider and driver applications
Backend services are structured in layers with core transaction logic isolated in base services, freeing upper layers from complex distributed system concerns while maintaining high throughput and reliability
Looking ahead, Uber envisions moving beyond pre-aggregated data and pre-trained models to leverage historical data more directly in real-time, enabling richer, more contextual user experiences
System observability and monitoring have evolved into sophisticated autonomous systems that can detect anomalies across thousands of microservices and identify potential issues within seconds
Transcript
Introduction and background of Uber’s Rider app development
George Gilbert Madan Thangavelu runs engineering for the flagship Rider app at Uber. Over 10 years, he’s been a key part of expanding it beyond just rides. In this interview, he describes how developers can take data-driven applications far beyond analytics and the modern data stack.
Madan, give us a little of your background, how you got to Uber and how you came to be responsible for the Rider app.
Overview of Uber’s growth and app evolution
Madan Thangavelu I’ve been at Uber for almost 10 years now, so it’s been a long journey. When I started, it was very tiny startup, and definitely has grown to running these millions of apps in so many devices across the world.
I run our Rider engineering team, which is responsible for the flagship app that you download. And over the last couple of years we’ve really transitioned from an app that’s only an A to B travel app to an app that gives you much more related to package delivery, renting a car, now we can even do train booking. The app complexity has definitely grown over the last few years.
Exploring separation of concerns in Uber’s architecture
George Gilbert Okay, great. And that’s what we want to explore. Now, as I’ve said to you offline, a lot of our audience is really familiar with lake houses and the modern data stack, but they’re moving towards being able to build apps like Uber and what you’ve done. So one of the first things I was hoping we could go into is how Uber chose this separation of concerns between what’s in the Rider app and what’s in the back-end platform.
Real-time data management and app architecture
Madan Thangavelu That’s a very interesting question, and that actually sets up the Uber app slightly differentiated from other apps. So typically a lot of apps are where the user interacts with static content, sifting through their friends’ posts or whatnot. It’s very high-scale.
But in the case of Uber app, that data you talked about is very real-time. Things that can happen on your app that somebody else changed, so the driver can cancel or your order is now ready and all those updates are now happening on your app, which is completely distinct from somewhere else. That real-time nature is what sets the Uber app apart.
And to your question about how do we separate the data, I think to start with, we need to think about what data the user can create, which is your app, like the location, your position, your interest, where you want to go, what you’re tapping. And then there are things on the server that are data the server knows before you know, which is, “All right, there’s no cars in this neighborhood, or this driver canceled or driver got accepted.”
So we definitely try to do from an interaction standpoint, the system that has the data is the one that initiates. A lot of data push comes from the server, with the app pushing the data to these things happen. And from a separation of concern, we definitely look at where all these things have to come together, meaning the Uber Driver app and the Rider app cannot independently operate on its own data layers and microservices. They all ultimately have to converge because they have to meet together at the same time.
Understanding real-time system synchronization
George Gilbert So it sounds like because of the real-time nature of the whole system, not just the Rider app, but the Driver app and the fact that you’re matching real-time activities, that the separation of concerns was very much driven by making sure everyone is up-to-date in real-time.
Confirming real-time data convergence
Madan Thangavelu Exactly, yeah. And all your offline data has to eventually also combine together on that single thread where the interaction happens, which could be a trip or something else.
Common entities and backend platform integration
George Gilbert Okay. So let’s talk about some of those entities like a trip, like a fare. Some of these things, I assume are common both say the Rider app or the Driver app. If they’re common, is that then something that gets put into a shared backend platform and then that’s responsible for making sure propagating the real-time updates to the different apps?
Domain architecture and service layers
Madan Thangavelu Definitely. I think there are some core entities like you mentioned, the fulfillment order and the state and what’s happening, then there are systems around fares. We say, “Okay, who pays who and for what?” There are pricing systems that determine how much to charge.
So from a layer abstraction perspective, these are fundamental microservice domains, so to speak. There are domains which have multiple intra services, but they all represent fare, multiple services that represent all the fulfillment states which combine the Rider states, but that’s a domain by itself.
So these are very core domains which are at the lowest layer. Then at the app level, very similarly, you will build libraries that are common and shared around, “Hey, these are libraries that serve your fares and order states.” And then you build your application code on top of those libraries. So then this domain that’s having the true state in the backend and that library that’s on the app, those then track to make sure that it’s the same view for all parties involved and there’s application code on top.
Understanding app logic and microservices
George Gilbert So would it be fair to say that even if it’s a backend microservice, it could be part of the Rider app? In other words, we shouldn’t think of the Rider app just as presentation. There’s logic in there?
Integration of app logic and messaging
Madan Thangavelu 100%. A driver and you letting know a Rider that the cancel has happened, now a push has to be sent to the Rider when the driver cancellation happens. And then when you want to do an analysis post that event from the earner of Driver app has to align with the fact that there was a message that was delivered to do BI or analysis or ML, and all that has to be stitched together.
Microservices implementation and technology stack
George Gilbert Okay. So let’s just drop into for a sec, how you implement some of this business process logic. In microservices, can you implement this in different languages, different technologies, and that’s just all hidden from the consumer of microservice?
Standardization and transaction management
Madan Thangavelu Yeah. So to do that, some of the standardization, at least from the language perspective, we have libraries in our Golang and Java. So these microservices that are at the bottom that are doing these data and transactions, those libraries are not handed off as a transaction library.
So the consumer does not say, “I want transactions across these things.” They place very standard crystallized APIs saying, “I want to place an order and these are my order parameters.” Now in the deepest system within that microservice we implement logic to transaction, initiate, change all these entities and then close the transaction, make sure everything is complying with each other, and we do some of the distributed transactions as well with entities that are not part of the same RDBMS.
Transaction logic in modern data systems
George Gilbert So this is interesting. And the reason I want to touch on this is, again, our audience is growing up on just mostly analytics. And for most of them, all the operational workloads are upstream, but they’re coming together. So what you’re saying is you’re trying to relieve the transaction semantics and all the nuts and bolts of the transaction from the application developer of the app who’s calling the logic, and all that transaction logic is implemented in this base level shared service.
Transaction management and error handling
Madan Thangavelu Correct, correct. And I think the transaction logic and the onus of storing the data and keeping it accurate, the onus of emitting your automatic BI events, a lot of these things, if you keep it at the lowest level, then your layers that are about multiple microservices sitting about, and even the app itself can just assume that it’s mostly stateless interacting with this common thing that makes sure that things are not cross-wired.
Base-level transaction services
George Gilbert So you have to think carefully about the base transactional services and then you’re essentially insulating the upper level from having to worry about transaction logic? They just say, “Do this,” and all the underlying logic, and then therefore any errors or retries or compensation is in the lower level?
Error handling and user experience
Madan Thangavelu Yes. So the lowest level has to inform what the user could do. So as an example, when you place an order, let’s say that you are already on a trip and we don’t allow you to take a trip. The fact that you should be prevented would be at the lowest layer, or you’re trying to cancel a trip you’re not on. That would be at the lowest layer.
But that lowest layer has to inform those layers about to say, “Okay, this is the reason we couldn’t take your trip or we had to cancel.” Then that needs to propagate back to the app. It has to then show a view that corresponds to that.
So in the case like a payment issue, so you chose the wrong card that has expired. The app has to decide that, “Okay, for this error, I need to show you a pop-up that says, ‘Okay, you want to switch your payment?'” Instead of just dropping the error on the app has to understand the error semantics, but it doesn’t have that something wrong would go through the system because that guardrail exists at the lowest part of the stack.
Understanding transaction states
George Gilbert So in other words, whoever’s calling the transaction just has to understand the different states, like it could go wrong. This is what I present if it goes wrong.
Confirmation of state handling
Madan Thangavelu Exactly.
Platform and app architecture
George Gilbert Okay. So then would it be fair to say that the Rider app includes some upper level microservices, and it just calls on the lower levels shared services and that’s the platform? And we shouldn’t think of just as this nice UI, the Rider app is Rider logic interacting with shared logic where the shared logic could be things that the driver app also has to call on?
Shared logic implementation
Madan Thangavelu Yeah, we definitely talk about this shared logic being in the backend systems, but also shared logic on the app layer, foundation layer. There are libraries that’ll do your authentication or your accounts page. If you go to your accounts page in the primary Uber app, and if you go to the accounts page in the Uber Eats app, they will look exactly similar, because we’ve built, again, layers of common platform in the app code, in the mobile code itself that can basically serve these functionalities.
Payment profiles, again, very similar. You go to your Eats app, you go to your Rider app, they all have the same. So we’ve done a lot of app level platformizations to share code. We’ve done a lot of these back-end platform level standardizations to present these transactions in a single place.
Historical and transactional systems integration
George Gilbert Okay. So for the part of the audience that’s used to dealing analytic databases like a data warehouse or a data lake, what are some foundational services that have to call on both a historical system of truth to inform with context, but also have to execute a transaction? What would be some examples and how do you bridge those?
AI and ML integration examples
Madan Thangavelu I’ll maybe touch on two, and maybe a third example that’s very interesting to me. So let’s talk about AI, ML use case specifically. Let’s say you’re on the app and we want to show you a card that says, “Right now there is a lot of demand and then you should probably pick this other product.” Or some recommendation like that.
Now the request will come over this common system that’s going to do the matching, dispatching. But we also need to make an inference whether we need to show that card. So there’s data in these transactional systems that can tell you whether you can have an order. Then there’s an ML system that has this potential offline trained model that now incorporates with this data that you’ve got here in real-time, and then the model.
And then you make the call whether you show that card or not. And you don’t have to show it every time. You may have to show it sometimes when the user is sensitive to that information. So that’s where the ML piece comes in. It’s not static and dry. You’re making choices on whether to show it or not.
So imagine in this situation, the blend is such that the way Uber built some of these ML inferences is the request itself is not carrying all the parameters, because if you allow for that, you can make a mistake. You may not have all the historic data. So all that has to be pre-computed and computed in real-time in these near real-time offline systems, so to speak, which is not your primary live hot database.
Real-time data processing and ML integration Madan Thangavelu But you need to take the features from the request, push it into this ML inference system that is going to have this near real-time pre-computed, your Spark or your other pipelines that keep it as fresh. So it is going to join between your newest feature that just came in with the newest feature that just computed offline, create your input vector. That is what goes into the model that gives you that inference.
Pattern recognition and implementation George Gilbert That sounds like a pretty universal pattern that a lot of microservices would use. I assume that’s the context it’s used in. What are some examples?
Clarifying examples needed
Madan Thangavelu So when you say examples, are you thinking about specific product use cases, or are you thinking about something different?
Product use case examples request
George Gilbert Yeah, no product use cases. So someone in the audience says, “Oh, that’s how this part of the app works.” Where it’s informed both by the real-time context and the near real-time that you were talking about, and that maybe it’s you personalize the feed or something like that.
Real-time personalization examples
Madan Thangavelu Absolutely. Yeah, if you open the app, we instantly… For example, typically there is a button that says, “Okay, where you want to go?” But right underneath it, let’s say you have a promotion. Let’s say your account is starting to expire, or you’ve never taken the reserve trip, and we want to promote the reservation as a capability.
So just at the moment when the home screen loads, we get your app lat-long, we’ve send it to this backend system. That backend system is going to just have these two three parameters, then it’s going to call out to the ML system. The ML system is going to take a lot of your parameters that are near real-time, like how many trips have you taken? Have you ever taken a trip in the last seven days? Have you ever taken the reservation as a trip?
So it has to capture all that real-time. And then infer whether we want to show you this card that is trying to engage you into this concept of reservation of a trip. Now let’s say you do that and you do end up taking the trip. You get dropped off, next second, you open the app and you want to do it again. We don’t want to show you that reservation—you just did it, which means that all the data has to catch up so that card doesn’t show up and look stale for you.
Database architecture and data management
George Gilbert All right. So this is a great example. So under the covers… Now I’m thinking at the database level. So the most recent state of the trips, where do you keep the history? And then what does the logic look like that stitches all that together and presents the right feed, the right personalized view?
Data stitching and tracing systems
Madan Thangavelu I think that stitching is very critical. So there are a few different techniques we use. One is when the request originates. This is very standard on the distributed systems backend community where you would start adding a trace header to every call that happens, and then every system it interacts with has that trace ID.
And then second thing we do in a customized way is we create a session UUID for a user that is specific to the duration of which you use an app, say 20 minutes, 30 minutes. So now these IDs are automatically propagated to all microservices. They’re automatically propagated to all logs and backend systems and these offline systems, near real-time systems.
And then when you want to essentially combine and get a view of all these things, then these trace IDs and these session IDs will help you combine these data together to put together the big picture of, “Okay, for this session and trace, what happened on their driver’s side, what happened on the rider’s side?”
Database infrastructure and streaming systems
George Gilbert And so underneath, I’m just thinking at the plumbing level, are you using Spanner as your transactional database? Are you using streaming system to get the user maybe telemetry and then the historical context is in some lake house?
Technical infrastructure implementation
Madan Thangavelu Yeah, that’s a great question. So the way we do that, you’re right, the trip state machine is entirely backed by Spanner. Now, in order to keep a copy of this historical context or offline analysis, at the nuts and bolts level, we have built frameworks, think of them as state machines.
And what we have done is at the framework level at the state machines, we’ve created ways to emit events and metrics. So an individual developer is not trying to keep these offline systems and these historic data in our warehouse and our Hive tables accurate. Instead, automatically as state transition happens, the framework ends up emitting these events and metadata into your Kafka and your message bus.
And from there we automatically end up indexing that into our raw tables in your Hive. We did have a choice at some point to actually listen to events from Spanner. We’ve attempted it and we have not gone there yet. So instead we do it… It’s transaction anyway, all the data goes to that single choke point. So we’re able to do that at the framework level.
Logic and data system integration
George Gilbert Okay. So it’s at the logic that brings all those data systems together and is responsible for creating a coherent view?
System coherence confirmation
Madan Thangavelu Correct.
Transaction orchestration discussion
George Gilbert Okay. So we’ve talked about how you’re abstracting the transaction nuts and bolts by putting it into the responsibility of a low-level microservice. When you need to orchestrate multiple transactions, typically people think of that as the responsibility of a workflow orchestrator. Is that then just fixed logic that’s in microservice that calls multiple other transactions?
Workflow and transaction management
Madan Thangavelu Yeah, so typically people do these in workflows. Workflows are not a very well-suited paradigm for a high throughput real-time system like this. And that’s where we had to hand roll our own ways of doing these transactions. And we have some versions and generations of this tech.
So I’ll describe the first version of it and the more recent version. In some of the first versions, we would essentially have a number of workers have all the data in memory and we will operate across by routing requests to the right worker, which has kind of a queue in memory. Which means once everything is serialized, your transactions are easy, because now once these three parties have to do the same thing, then in memory you just have one buffer and you’re just changing it in the sequence it came.
That’s used to be, about two, three years back. And over time with Spanner and the abilities that it has provided, we’ve actually pushed some of the serializing further into the database. But once you want to do transaction across another system that has its own database, we’ve created this concept of a two-phase commit even across… This is very typical in database systems, we were able to implement some of those at the application layer where these two services will be able to operate and do that transaction rollback by implementing similar API patterns across services.
Evolution of the Rider app
George Gilbert Okay. This is all interesting, because these are things that more mainstream developers are going to have to face, and hopefully the rising tide of abstractions will make it easier. So let’s dive into the Rider app some more directly. Now, I know you and Uber has published a lot about how this app evolves very quickly, and you talked a little bit about it at the beginning how it started out as just matching a rider with driver and now there’s a whole range of services. So talk about maybe how it’s grown over time in what you can do and then how that had to be reflected in how you built the app.
App architecture evolution and componentization
Madan Thangavelu Definitely. I think about five to six years back, we did some major re-architecting of the app. Prior to that, it was very standard MVC development patterns, which is very common. You have your screens, you end up interacting, you have a model and you have a logic that ends up determining what to show in the view.
But as the number of engineers grow and as the feature sets grow, it is extremely hard to keep up. So as an example, when you land on the homepage, one team from safety wants to pop up say, “Hey, you’re a verified driver.” Another team wants to basically say, “Your credit card is expiring.” Another team wants to say, “I want to sell you reservation as a trip and engage you in that.” And then the user wants to take a trip. So you can see many things can happen in a single small real estate.
Now if you do just MVC, then all these developers in different parts of the company, thousands of them across regions, time zones, they’re all touching the same parts of the code, same files. So a fundamental shift had to be done on how to have different frameworks that allow people to independently operate.
So the mental model I would try to draw parallels is instead of having a screen then clicking and going to another screen, Uber created new patterns. And I’m not saying frameworks per se, I’m starting with patterns because you want to compartmentalize each of these. The safety team that wants to pop up should write and code separately. Somebody else wanting to show another card should write separately, but ultimately they all come as a single UI.
And my mental model is think of it as a single page application where different pieces say, “I want to show this on the view.” Then there’s a coordinator that ultimately takes it and determines the actual rendering and view.
Understanding component architecture
George Gilbert So to be clear, it’s sort of like the application you could think of as a screen, and the components of the app are responsible for elements of the screen real estate. And then this Riblet… I don’t know if framework is the right word, is what composes everything and makes it work together. And that then allows the Rider app engineering team to work on features rather independently, almost like they’re microservices?
Component reusability
Madan Thangavelu Exactly, exactly. And then the good part about it is you can take some parts of those ribs that has been built in the Rider app, remember I talked a little bit about platformization. So the idea is because of these where your component gets the data from the backend, your component tells the view what to render for just your piece, we can take parts of it and put it into your Eats app. You can take parts of the Driver app. So your logins are not rebuilt, they are microservices being reused somewhere else.
Component flexibility discussion
George Gilbert Oh, so they can be composed differently?
Confirmation of component modularity
Madan Thangavelu Absolutely. They are logic with a presentation that are reusable.
Component structure clarification
George Gilbert To give you an idea, let’s say you’re on home, that’s a home rib, and within the home you can have say two buttons. Each button could be its own rib and node, which gets its own data. Which means if I want to move this button from somewhere, I just yank that part of the code and module and stick it somewhere else.
Architecture integration questions
George Gilbert All right, let’s try and start tying some of this together. So would these ribs, like components within the presentation part of the Rider app, would these ribs have their own logic independent of some of the microservices that the rider app manages on the backend?
Component and domain model relationships
Madan Thangavelu Yes, I think that’s the art and science of it, because you cannot expose your backend transactional models or business models as is. So very typical patterns, people will need some presentational models which very directly speak to the UI layout. And then there are these domain models which represent the deep core entities that are almost invariant.
And there’s a case to be made where you can make layers about these microservices to finally say, “Every presentation goes through presentation.” Which means every domain will be translated and only presented. And Uber did some of this with some layering where we have a presentation layer, we have a mid-tier and we have these core models.
But ultimately where we arrived at is we can’t only do presentation, because remember the interaction in a marketplace is two-sided, three-sided or sometimes four-sided, which means all those entities have to be sent to a single app regardless of whether that app owns that entity or not. So rider doesn’t own only the rider entity, driver data needs to come.
So where we are is at a blend of, there are presentation models and then there are entity models that are actually exposed directly in the app, but the app is very sensitive and not modify that, but as-is use it from a read pattern, but it is free to do the UI rendering in whichever shape or form it deems fit.
Future direction and observability
George Gilbert Okay. So yeah, part of what is so unique is that this is a real-time system. And so tell us more, then, about where you could take this in the future. It started out as, “Get me a ride,” but it’s more and more services. What has to be in the architecture to allow that sort of adaptability and what are some of the things that we should think about in the future that we might be able to call on?
Future capabilities and data handling
Madan Thangavelu I’ll touch on two things. One is interesting concept of observability when you have such high cardinality services in an app and how data relates to it. It’s very important. And then second, I’ll talk to a little bit of app UX architecture, which starts to flex because there are so many use cases.
The app, we do a lot of things in the sense that in some regions in the world, Uber app might actually work as a negotiation and price. So today in the US you would look at a price, you press a button, you get it. Some places in the world you might press it, a driver can counter and give you a price, you might accept.
In some cases you might be able to send a packet, so now you’re having to put a pin to verify. The complexity of the different A to B movement use cases is quite a lot – there are 80 that I can out memory spell, and obviously there are more.
Data analysis and system monitoring
George Gilbert You mentioned that there’s an enormous data volume. It sounds like, I think you said a million events a second from the global system. So this is no longer like someone in the back room trying to manage the system for say performance tuning sometime every few weeks. This is how do you keep a real-time system live and responsive? And I assume that it’s almost like a whole different set of applications themselves.
Real-time monitoring and response systems
Madan Thangavelu 100% yeah. It’s not only purely about, “Sync this volume of data somewhere.” This is, you’ll have to react to data that’s coming, which means you have to have it near real-time and that’s when you would determine whether the app is performing accurately in all these dimensions everywhere.
Even to get that data from the phone to backend will be challenging because as a user puts the app on the background, as a user puts it to foreground, different things might happen, and then you might have data loss. So obviously there’s that. Then once it comes in, you have say five seconds to react to it and build a point of view whether that kind of stream of events represent good functioning of the app or bad functioning of the app for particular flow in the app, because nobody’s sitting there watching this monitoring.
Final thoughts on future development
George Gilbert Madan, anything that you think we should impart to the viewers, again, who are coming from the modern data stack, but who aspire to build apps, like what you’ve built? Any parting thoughts?
Vision for future app development
Madan Thangavelu The bridge between these systems are closing in more and more, and what I foresee is being able to build app experiences that are only possible by referencing large amount of data in real-time, I think that hasn’t fully happened yet.
We get the best of it by some level of pre-aggregation or some level of ML model generation, which kind of creates this pseudo facade to make it look like it is understanding your past and present and looking up all data. But I do expect as we bridge that gap of having access to the historic data, large volume of data, it will start to play into the app experiences in a more direct way rather than an indirection, which is where the systems are today as it stands in the community.
Concluding remarks
George Gilbert That’s actually a fascinating thought, because the ML model is pre-baked with offline data and you feed a little bit in in real-time, but as our systems and infrastructure technology, for instance, allows very large storage class memory in columnar, we don’t have to pre-aggregate it quite so much. We could do the real-time analysis on much larger datasets, and that’s how you’re saying then we can at real-time, provide context that includes all the history without semi-baking it?
Madan Thangavelu Yeah, exactly.
Interview conclusion
George Gilbert All right. Food for our next conversation. Madan, thanks for joining us today. That was a real treat.
Closing response Madan Thangavelu Thank you, George. Pleasure to talk to you.