George Gilbert and Savannah Peterson host Madan Thangavelu from Uber on this episode. Madan takes us through the evolution and importance of API gateways, particularly through the lens of Uber’s experience. Madan is the architect of Uber’s API platform and explains how API gateways emerged around 2014 as companies moved from monolithic software to microservices architecture. The gateway evolved through three major generations: from a routing and orchestration layer, then to a configuration-driven system, and finally to a sophisticated platform handling multiple protocols and real-time data streams. Toward the end of the evolution, the focus shifted from managing incoming requests (ingress) to also handling outgoing requests (egress), which has become particularly relevant in the age of AI integration when external models or partners need to be called.
Key points from the interview:
- API gateways initially emerged as a necessity when companies broke down monolithic software into microservices, serving as a central entry point that evolved from simple routing to handling complex functionalities like authentication, authorization, and rate limiting. Each piece required careful coordination to maintain system efficiency.
- The second generation of API gateways at Uber moved from a code-based to a configuration-based approach, allowing for better scalability and management of thousands of APIs without creating another monolith. This transformation reduced complexity while maintaining full functionality.
- Real-time functionality presented unique challenges at Uber, requiring the integration of streaming technologies with API capabilities to handle location updates and driver-rider matching with minimal latency. This implementation demanded precise handling of GPS data and complex synchronization of moving components.
- The emergence of AI services has shifted focus toward egress (outbound) API management, as companies increasingly need to govern and monitor their interactions with external AI services. This includes managing prompts, controlling token usage, and implementing AI governance at the gateway level.
- The future of API gateways points toward a dual role of managing both internal services and external AI interactions, with increased emphasis on business-aware transformations and data augmentation capabilities to support AI operations.
[Savannah Peterson]
Good afternoon everyone, and welcome to a very special podcast edition with your favorite CUBE analyst, George Gilbert and a recent star, Madan. We’re so grateful to have him back from Uber today. He is the architect of the Uber API platform, and if you missed the first episode he did with George, it is full of insights and I am super excited to be learning about all of the trends of the API gateway world and how that applies to our AI future and the lessons we can learn for orchestrating the agentic future that lies ahead of us. So without further ado, George, thank you for curating this convo. I appreciate you.
[George Gilbert]
Okay. Well, Savannah, it’s good to have you join the show. So Madan, let’s set the context. And for those who haven’t been working in the world of microservices, explain why we need API gateways. Set some context for us.
[Madan Thangavelu]
Sure. First of all, George, Savannah, thanks for having me. Super excited to be here. And this is a great topic to discuss, especially in what’s happening in the industry right now, so I’m definitely looking forward to it. So about 2014, sometime around that, when companies started realizing that they cannot be running software as one giant monolithic piece, so they have to break down as the companies are larger, there are so many engineers, you have to break down your single piece of software into smaller pieces, so that they can upgrade them, change them over time. And this was a trend called microservices. And you may have heard back in the day that was a big deal. And when that happens, then your end user, which your web browser or your app, you’re not going to be worried about which piece of software to access or that button that you’re clicking or that page that you want to watch.
So all of that has to streamline into a single entry point into a company, and that entry point started to be this concept of a gateway to all the small pieces of software that were hosted behind the scenes. And that’s how the gateway as a concept came out to be. And initially it used to be a very slim, just routing construct, but over the years, really, really important functionality have amassed in this layer called API gateways, which really what lead to the growth of microservices as a mainstream strategy now in the industry.
[Savannah]
Madan, I think that was one of the most digestible explanations of microservices and API gateways I’ve ever heard, so congratulations. I just thought of little Lego pieces lining up to go through a little shoot into a central hub. That was <inaudible>. So you’ve been working on this and Uber’s API gateway in particular for quite some time. How has that evolved over time since you’ve been there?
[Madan]
Yeah, it’s taken multiple journeys, because as I said, when initially this concept came about, it was as a pure necessity more than just a rather innovative idea. So as a necessity, because you needed the central hub to know where the software pieces are, it was introduced in the industry. There were concepts like NGINX, which were just pure routers and Apache routers. These were even termed as routers, they were not called API gateways. So all they did was just routing right. Now once people started doing that, then they realized they needed fundamental functionality in those things. As an example, who can even call these pieces of software in a company?
So you need authentication, you need authorization, whether they have truly access to your profile page. So slowly then observability came into place, I want to see all the people who access this particular piece of information over the last three months, six months for security reasons. Then people went into reliability where what if 1000 mobile apps suddenly call us every second for no reason, how do we block them? So then there came in reliability and DOS protection type approaches. So the gateway itself started to accrue more and more functionality beyond just, “Let me route your requests to the right software component. ” So that took its own path of evolution over the last few years in the industry and that’s where we are. And I would say at this point we are maybe getting more mainstream than ever with respect to adopting API gateways.
[George]
So maybe start to delineate for us how the functionality grew and receded. Break it down now into another level of detail. I think you’ve in the past told us it grew into a monolith in its first incarnation, but that made it somewhat of a bottleneck and hard to upgrade. Explain the thinking that led to that and then how you walked back from that functionality.
[Madan]
For sure. What’s interesting is that is somewhat like that in many companies still who haven’t made the transition. But the way it started out was people started breaking the single piece of software into smaller pieces, but something needs to sit in front and all these functionality has to be baked in. A natural thought process for anybody <inaudible> let’s do another single microservices in front of it that will do this federation and that’s where we will implement our authentication, rate limiting and all the security features that I talked about. And so teams started creating a very slim router with these shared functionalities, but that itself became code, and that itself became a monolith by itself over time. And then if you think about companies having 3, 000 engineers, 5,000 engineers, then now that every team got their own piece of software to iterate and grow, but the moment you want to access your app and the app needs to access your functionality, there’s this system sitting in between that is yet another technical yes slim layer, but you have to go write code, explain what your service or request is, and that started accumulating a lot of code. So eventually we broke it down, but through another different type of monolith in front of the original monolith. That’s what happened.
[George]
Just to be clear, what you’re saying is in order to abstract maybe what started as hundreds of microservices and later grew to thousands, you had to code in some knowledge of their functionality in this gateway, and that in other words, you’re somehow duplicating the functionality and it’s tightly coupled, which made it a bottleneck, and that defeats the purpose of the modularity you were creating in the backend microservices. Is that essentially what was going on?
[Madan]
That is a very fair way to say that. To just give you a little bit more intuition, when a request originates from your app. Let’s say you have a weather app. The weather app says, “I want to know the current weather. ” And all the app can say is your latitude and longitude that comes from your device. So it can reach an API gateway and then it could go to one system behind the scene as an example that can tell you, “Okay, for this latitude, longitude, I’m going to give you, whatever, like 70 degrees. ” But imagine that another system exists in the company that only translates latitude and longitude to a city, and then your service is able to give the weather, another service is able to name the city. Now all of a sudden people will say, “Oh, that’s your system.
This is my system. ” “Look, there’s a layer up above where you can orchestrate. So let me call the system that gives me the city and then I will call your system with the city name instead of latitude and longitude. So you can give me the weather and then I’m going to return it back to the phone, which can explain the whole thing together. ” So suddenly what was originally a monolith of all these three functionalities got split into one functionality of lat, long to city in one system, another into city to weather, and then a third system that started to put everything together. So that’s how this single model let’s continued to grow despite us trying to break everything apart.
[George]
So this monolith, it’s the thing that’s composing all the pieces, but it’s getting bigger and bigger all the time?
[Madan]
Yes. – Okay.
At its peak, as we were working on our own internal ones, we got it to almost take 2000 such APIs and 1. 5 million lines of code and 3000 microservices behind the scenes. It was insane.
[Savannah]
I was actually going to wonder about that in terms of volume. So I’m glad that you answered that question. 300 microservices is a lot. There’s a lot going on all at once. How do you prioritize what gets access? How did you go through that to even figure out how to operate when it got to be too big of a monolith?
[Madan]
Yeah, so typically by the time API gateways or such monoliths grow to a significant size, you’re usually having a team in a company that’s sole purpose is to ensure that it builds, deploys, is not creating outages, because at that point you’re in a multi-tenant system. One team’s function can immediately break the entire build, so you cannot release and roll out. So it gets complicated. What we’ve done in our first generation of that gateway is we anticipated some of this, and what we had done is a concept where you can break that single giant gateway, which is this 1. 5 million lines of code, but we structure our coding framework within that such that it’s limiting so that a single API, even though it’s doing orchestration, is part of a small sandbox, if you will, in the way you write code.
And then when we deploy that to production, that in itself was deployed as smaller cohorts. So if you take your, I don’t know, as an example, an app that has your profile settings page and your homepage, we could deploy APIs that are related to home separately in a separate deployment and the APIs that are related to your profiles and setting page as its own deployment. And we had to throw in a routing layer microservice in front of it, which did not have any code and all it did was just routing, they got true routing. So then it became another two layer system within that, but we had planned for it, and so we could take the 2000 APIs and millions of lines of code and still be able to deploy sandboxes if you will, and small legos that were doing these orchestration for systems underneath it.
[George]
So it sounds like you were trying to create some modularity in the gateway that’s supposed to be the router, and you did this with some form of sandbox and then you had to put then a router on top of that. So maybe describe when you decided to do the next generation of that, how did you make it more compositional so that you had greater modularity?
[Madan]
So it was pretty clear at that point as this thing grew, we had tamed the scale at which it grew, we were able to handle it. This was still 2015, 2016, and it was very clear we had to do something different. And at that point, very early in the industry, there were ideas where we said, “Okay, once you provide a place for people to write code and actually do free wheel software development, it usually builds pretty quickly. ” So the only way to truly tame this is the router, which shouldn’t have a free code zone, it should not have any coding run done by anybody, and then all the functionalities around security, authentication, schema validation, rate limiting, authorization should still be able to function but without code, and then some orchestration.
So a request may come from service A and service B, but somehow both of them need to be sent back to the app or system can respond, but they need to transform it. So all these functionalities, we can’t provide a space where people have to code in order to get those functionalities. So we need to move our API gateway from a code okay zone to a no code zone. And that’s where we started working on a concept around a configuration. A configuration is typically, you can think of as a dormant no functionality, it’s just a representation, and some system has to take it and make it alive by understanding the configuration and making a thing out of it. So what we did was we took all this functionality that’s supposed to be and represented them as configuration saying, “If you want this functionality, this is how the configuration should look like.
If you want that functionality, this is how the configuration should look like. ” And we created a new generation of API gateway around 2016, early 2017, which only would function based on configuration and did not allow for any code writing in that space. So we could take this two layer first generation system, push it out, and then put a new system that was just a config and had the same functionality as the previous system.
[George]
So configuration meant like you could use metadata and settings to describe each of the backend capabilities, but it meant you had to anticipate in the design of the metadata, all the configuration settings so that you did not need anything embalmed in procedural code?
[Madan]
Absolutely, absolutely. And that was the biggest of the effort and the most complexity was to do this mapping from all the functionality to what could that look like in terms of the configuration, because it needs to be simple. If the configuration became complex, that itself would look like a code.
[Savannah Peterson]
I was just thinking that, because you’re essentially keeping the functionality but decreasing the complexity by an order of magnitude, and that’s actually way harder to do than the inverse of trying to map all the possibilities there. That’s interesting. Over what duration of time is this all transpiring?
[Madan]
So the first generation was somewhere around 2014 or 2015 or early 2016. That’s when the massive growth of APIs happened for us. And then from late 2016 to early 2017 is where we deployed the new API gateway, and then we took about a year, year and a half to move where the company from the previous system to this new system with 2,000 APIs. And more recently we’ve just exploded that further into a 3, 500, 4,000 APIs just facing Uber’s public apps.
[George]
When you say this latest, now are you talking about a third incarnation of the gateway?
[Madan]
So the second incarnation continued to stay, the second incarnation was config and a built system that understands the config. A further evolution where we have come is additional more protocols. So protocols are almost think of it as just languages. So a system inside Uber can understand a protocol called P-channel, gRPC, HTTP. These are just protocols. And then the app which is on your phone may not understand all these protocols. It might pick one as the primary and continue to talk to the backend servers. And when that comes in, then we have to translate between these protocols. Sometimes we can evolve the protocol on the app itself so that it understands directly. So since our second incarnation, the direction we’ve been going is still continuing with our config-driven strategy because that’s worked really well.
However, we’ve introduced more protocols support within the same concepts, and we talked about a lot of functionality we used to add previously in our API gateway. We have pretty much doubled the amount of functionality that are more catered towards reliability… And I can go into it if you’re interested, but more reliability- focused functionalities into our API gateways.
[George]
Maybe elaborate a bit more on the protocols. So the configuration, it sounds like says maybe describes the capabilities of the individual microservices, and then is it that the protocols allow you different ways to communicate with the microservice and invoke functionality and retrieve data? Is that what’s going on?
[Madan]
Yeah, protocols are just communication languages, or every time your browser you type a website, it effectively uses a protocol called HTTP in which the browser says, “I’m looking for this website. This is the data I need.” The backend understands that protocol call HTTP with and says, “Okay, I understand what you’re asking, so let me get that back to you. ” Over the years there are other protocols called gRPC. And the difference from HTTP and JSON to gRPC is one is text-based, meaning if you intercepted it, you would be able to read the lines in an HTTP. In a gRPC you wouldn’t because it’s a binary protocol, it’s encoded differently. So the backend when it receives that request, it needs to understand what it is because now it’s encoded differently.
So that’s what a protocol is. And a lot of internal systems in large companies do use the binary protocol because it’s much smaller. You can say the same thing in much lesser number of bytes, so it’s faster, it is better to communicate internally. And so as these protocols have evolved, that’s where the app wouldn’t evolve because your browser is still continue to go to talk HTTP because that’s what is deployed everywhere. But however, if we control the app like the Uber app or any other company having their app, we can make a choice whether to use HTTP or directly ask gRPC to be called from the app, which we haven’t done yet, but we have the ability to take your HTTP and call an internal protocol. In the case of Uber, we have our own internal protocol called P-shadow, which is our own homegrown version of a protocol. So these as just languages to communicate, and those are the abilities we’ve added to the gateway now to understand more languages.
[George]
Okay. So there’s richness of configuration, so you don’t need code to describe the capabilities, and then you can communicate with different protocols depending on the preference of the microservice that’s serving that request?
[Madan]
Yeah. If I could take one more minute on the clarity of it. So in a config, you can tell if a request comes from this third-party company. Companies do have third-party partners who access their APIs. You can say that the maximum amount of requests you want to allow from this company to your company would be 1000 requests every second. That’s the max. So you can go to config and represent that, “When a request comes, this is how you find the name of the company that’s calling you. ” That’s part of a header. And in the header you’d say, “Company ID ABC,” and then you can say max limit 1000 per second. So that’s a configuration. Now when the company does call you, they might call using one of the protocols, but the API gateway now needs to parse that protocol, understand how to look for this company ID in that protocol language and say, “Okay, now I see the company ID.
And then it’ll go back to the config and say, “Now you might have requested 1000 a second, you’ve done three a second thus far, so I’m going to allow you to make the call. If not, I’m going to block. ” And to make a call, it further continues that protocol down. So config can be tangential, it can have details about the protocol. You can even read in things. Let’s say the external company calls you with the name company ID ABC, internally you can call that saying company ID ABC-preferred for whatever reason, but you can make those changes. What to do is in the config, the how to do is in the protocol and the built system of the API.
Oh, okay. That’s the separation, what and how.
[George Gilbert] What in the config, how in the protocol. Okay, that’s-
[Madan]
Control plane and data plane.
Control plane is time thing, and the data plane is where the data actually just happens back and forth.
[George]
Okay. Okay. So this is the core evolution into this third generation of this API gateway?
[Madan]
Yeah. – Okay.
[George]
And so tell us more about maybe some of the functionality improvements that you made in the third generation.
[Madan]
Yeah, I would say the approach… There are different parts of it. The first one I’ll touch on reliability. I think that that one is a very important one. So this central system in any company is pretty much an interface between anything external to the company and anything internal. So it plays a lot of important role in reliability and security. So I’ll start from the security, it’s an easiest one. Most traffic routing systems that were in the early days of 2014, 2015, and even now routers, they don’t log your information about the details of the request itself. Because in this concept of communication, when an app calls some server, there’s a notion called headers, which are readable to every party in the entire call.
And typically when you make a call from your API, your browser, your phone, whatever, which as a user, you don’t make API calls, but the app is doing it on your behalf, it passes through at least 20 different systems before it reaches the final system that is able to respond to the request that you ask, anywhere. Now for all those 20 systems in the call path, all of them will be able to read a part of that request called headers, which is transparent and visible to anybody. Nobody puts any sensitive information in these things, that’s just industry standards, but headers are visible to all these 20 hubs. And then there is the part which is called body of request, which is typically encrypted if you’re an HTTPS. So nobody in the intermediary 20 systems can read it, only the API gateway and anything behind can read it.
So when this happens, the best place to have the richest information about your request, and the first thing that has the richest information is API gateway. So it passes through all these 20 hubs across the internet, everybody is able to only read the headers. A router can just read the header and the path of where you’re trying to go and it’ll route to service A, service B, service C. But an API gateway is the first place where you would say, “Okay, I understand the details of the body as well. ” So anything you want to do special about that, you need to do there. So one example is let’s say somebody hacked your account and got your token to access. How about I access a trip history information or your friend’s list information in Facebook or whatever, but I call George’s account information for the access.
So I’m saying, “I have access, but give me information about George’s trip history, even though I have token about Savannah’s,” because I could hack your account. So this is called the IDOR attack. Now at that point, you’re having access about one person’s token, but you’re trying to get more information about another person, and if you do security right it should not be allowed. The API gateway will shut it down because it’ll say the payload somewhere that you’re asking for is this other person, but the token authorization you got was for this other person. And the gateway is able to do it because now it has access to the details of the body, what exactly you’re asking, and on the response as well it can sanitize things. Let’s say you’re a company and you’re trying to access, “Give me all the 20 employees that I’m having and give me the details about them. ” But if by policy we want to restrict some parts of the access to that, you could restrict that with some of these tokens.
[George]
So basically you’re saying that by having knowledge, perhaps if I’m understanding the header and the body, you can put business rules into the gateway about… Because you’re essentially through the gateway serving up the capabilities of your company’s service. Forget that it’s Uber and mobility, there’s all sorts of services, but you want to now encode rules as to who can invoke them and how, and that goes into the gateway?
[Madan]
Yep. – Okay. And that’s this third generation.
[George]
So this gateway is now the menu of capabilities of the company?
[Madan]
Yeah, the menu for sure, and within those menu you have access rules as to who will access and what. And every time this access happens, API gateways then also are able to log into a secure channel as to who accessed what, the exact data that they got, and all that metadata becomes a central place. Even though your company may have 3000 systems but… So in the previous example, we talked about the APIs of the weather app, right? So if you want to concretely answer, how many times did users truly ask for a weather app data? Now the system that was only giving you that <inaudible> can’t tell you which city they ask for and then which lat, long they asked for. And then the other system only knows about city to weather, but one single place that you would know is okay, constantly it was at the API gateway.
So <inaudible> has a lot of use case. We have traffic management. Let’s say people are trying to toss your system and they’re accessing a lot more and then you can rate limit things at the gateway, observability… Let’s say a part of your company’s functionality goes down, one single place where you can tell whether overall the user is able to make progress in what they’re trying to do using your app, you can do that again at the API gateway. So 3000 APIs may be there, but maybe 200 are the most important ones. And a single API call may ultimately internally translate to 20 to 50 to 100 API calls internally, but half of them might fail, but ultimately whether a user is succeeding or not can only be seen either on your client side or a single place which would be the API gateway. So observability is another area we improved quite a lot <inaudible>.
[George]
So just to be clear then, the observability is the foundation for, I imagine the analytics, which is what’s going on? Who called this stuff? How much did they use? Then from that we might be doing how are we going to bill them, or diagnostics about our operations, which is how did we perform and how can we improve? So this gateway becomes the central collection point, not just for requests, but for pricing and for operational improvement, things like that?
>> Yeah, so analytics for sure, and then the other part of it is more real-time, which is in real-time you can actually block things. And like you say in software, once you create one layer of indirection, you can solve any number of problems because now you have a single choke point, you can load it up with whatever functionality. I think overall API gateways have started providing that to so many companies. And for us we still back it up with a configuration, which is what keeps us going in terms of scaling this beyond a certain number.
[Savannah]
Go ahead, George. Go ahead.
[George]
Well, I was just going to say one of the things that’s unique about Uber is that this is a real-time service. So maybe explain how you had to adapt the API gateway to accommodate real-time functionality.
[Madan]
Yeah. So real-time was super unique for Uber even back in the day, because our app is very real-time in nature. And real-time is a very overloaded word, but the specific part of real-time that I’m talking about is if two people are on the streets, one is constantly moving, a driver or a food or whatnot, and then the other person who’s receiving is also standing on the road who’s trying to get into a car. Now the interaction between them has to be almost instantaneous. It can’t be that the driver submitted a location now and then 20 seconds later or one minute later, the rider receives it saying, “Here’s where the car is because you will most likely never end up meeting your driver as they’re taking turns and trying to pick you up. ” And then consistently, let’s say a driver is driving on a freeway and they need to take an exit, unless we know exactly the location they are in, the Uber server can’t tell navigation instructions to the app, or even matching.
So let’s say you’re pretty close to the freeway exit, you have a pickup order. If we exactly knew where you were, then we can ensure that you have enough time to take the freeway. But if you miss that exit and then we assign that order to you, you have to go through a whole loop and come back. So it does directly talk to system efficiency. So the interaction between where a client is to a server and then another client that needs to get the update of this other person, this needed quite a lot of interaction in the true real-time nature. So some technologies have been done in the past, they’re called web sockets, and then some company do TCPs constantly streaming data. So we adopted another protocol called server-sent events where the server can constantly send data to the app at a high frequency.
So when location upload also happens in a streaming way, we can stream it right back into the rider and say, “This is where the location is. ” And then we get location fast enough. And as cars move around, all the apps look very consistent because they have all their most recent data. And to do this, you can’t keep doing this traditional HTTP way which goes like, “I have this information. Let me ask the server. Server gives me back. Okay, great. Now let me ask it again. ” So if you keep doing this and you keep… It’s called the concept of polling. Instead, we create a pipe and say, “This is the information I’m interested in.” And you just sit there and the server constantly just keeps sending you data because we know you need that data.
[George]
Okay. So you’re not constantly creating connections and doing a request response. There’s a persistent connection with the real- time data flowing in both directions, and that’s how where riders and drivers are and can send real-time-
[Madan]
What we did with the API platform,
which is a more requested and response pattern, is we integrated this streaming tech and this API tech. So all the control plane, the functionalities we talked about apply as the semantics as well. So the streaming and the consistent connection is about the ability to deliver, but the ability to generate that payload and generate that API response continues to remain in our API gateway. So we put the streaming right next to our API gateway, so the delivery channel is streaming, but still all the API functionality continues to exist even at this, which is very rather unique in how we’ve done.
[George]
Okay. – You took it exactly the
[Savannah]
direction I was going to take it, George. So great minds. I’m curious, so we’re essentially talking about wait time here for the viewer who’s watching. And I feel like that’s one of the more emotional experiences that go along with the Uber experience is wait time, whether that be for food or delivery or car or whatever that might be. I realize… Well, you may or may not want to disclose this, but if I’m opening up the app and looking to see cars, how accurate is that map of vehicles I’m seeing? Because I would imagine a lot of those are getting called simultaneously at the same time, particularly in cities like we live in in San Francisco.
[George]
Yeah, there’s always quite a lot of interaction that is happening, and especially when you have a driver assigned to you who’s coming and picking you up, that data needs to be as accurate as possible because that’s when you can actually meet them at the right time if you did not know where exactly they are. So it has to be extremely robust. A good thing to know is GPS data itself on devices are a very low fidelity thing. If you open your Google Maps, it might remember your last location for the first three seconds before it updates itself. These are very common. In other social media apps this doesn’t matter, but apps that are very reliant on location, it’s very important, but the device OS itself doesn’t provide that high fidelity instantly at the moment you look for. So it has lots of challenges, especially if you go into the cities, GPS can keep bouncing around.
So we have to do a lot of tech to really pin it down to a road and then ensure that that’s the most recent one. The app can also send out of order locations because now your app can say, “This is the first batch of location, the second batch of location. ” But because how the internet works, the second batch can arrive before the first batch, so it has to be reconciled. So it’s a very fascinating engineering problem to just deal with locations.
[Savannah]
Yeah, I can imagine. Well, and everyone’s moving around. If you’re inside a hotel or an office building, a lot of the time it doesn’t think you’re actually near where you’ve requested your car. It’s not your fault, it’s GPS. It’s just the nature of the beast. So it’s a very interesting-
[George]
Customers would love to not see that bad experience. That’s how we’re all tuned to because everything needs to be perfect.
[Savannah]
Well, and you spoiled us with speedy rides on demand, so we’ve come to expect perfection, which is an interesting situation, which actually I’m curious. So here you’ve had all this experience and shaped along with the team, the evolution of the API gateway of Uber up to this point largely with microservices and a lot of fascinating things that you’ve just discussed over the last 40 minutes. I’m curious, how do you see a lot of this work that you’ve done in the past helping teach your team and the rest of us about how to prepare for our… or integrate. I shouldn’t say prepare, it’s already here, our AI and agentic future?
[Madan]
Yeah, I think it’s very clear that there is value in this indirection layer of API, which is the microservices world pretty well. Now if I were to draw parallels to what’s happening, there are two ways in which a gateway can be called. So your app’s calling some company’s data center, which I will just term it as ingress, and then your company could depend on APIs that other companies offer. So in order to do that, your systems will call outside your company’s data center into some other company to get data. So as an example, if you want reviews from Yelp and they happen to give you an API, you might get a request from your customer and your app, but you may make an API call to Yelp or some data from that.
So there’s that egress part, and this hasn’t yet developed very well in the industry to truly do an API gateway on the egress part. Many people still have fragmented ways to call other companies’ APIs. Now, most companies do not have a very heavy need to regulate that or put these functionalities on the egress side of the API gateway. However, that’s changing with AI. So with AI, the way to think about it is your app might call your data center, but your systems will not have the LLM capabilities. You are effectively now going to call outside companies and OpenAI and the likes of Anthropic and whatnot.
So most of your function is going to rely on data that’s going to come from an external company in the AI world on all that gen AI and LLM stuff. So there your company really needs to put a central gateway that is going to start to be that choke point, and that is where you would start your AI governance, AI observability, your prompt guarding or prompt augmentation or controlling how much tokens you use. You have to rate limit not the ingress anymore, you have to rate limit the egress because that’s important for your business now. So that’s where we are starting to see API gateways will start to play a big part in the egress side of the ecosystem.
[George]
Okay. So this is like you want to have a federation of services, you want to access some services that are outside your company. You need to govern who can call, what they can call, maybe you’re throttling on tokens or you’re throttling maybe on a metered price, but this is the new vision of companies working together programmatically.
[Madan]
Yep. – Okay.
[George]
But this gateway is something that an agent is respecting and communicating through, is that how it works?
[Madan]
Yeah, so you could run an agent within your company, but ultimately when you make calls from that agent to another LLMs that are hosted outside, that egress has to be monitored and governed and that goes through this API. And there if you have some orchestration, let’s say you’ll have some external vector databases that you use, you have other LLMs that you use, you want to do some basic coordination. Again, those kind of things will start coming in this AI gateway as we-
[George]
So this AI gateway, the key is what you described for us originally was an API gateway to internal Uber services, and now we need a gateway to access external services, and we’re beginning to think about how that’s constructed because you have to worry about permissions, you have to worry about pricing. There’s contracts maybe that have to be set up in advance.
[Madan]
Yeah, these used to be there, even previously these requirements, but most companies did not worry too much about it because you did not need that central governance per se. So for example, payment providers were always a thing. Companies used to call external companies for payment providing. Companies used to call external companies for location services like Google and whatnot. But each of these were independent concepts. So decentralized teams could govern them just fine. But once you get into the AI mechanism, let’s say your app or your service has 30 different places where AI comes into place and you use gen AI for it and 30 different teams are using gen AI. Now all of them are typically backing the experience on potentially common data that’s common for your company.
So by the time they all make calls to your LLMs outside of your company, the need for governance starts to grow quite a lot because the use case is quite large and they all have very, very common patterns. So you can’t anymore govern that and keep tap of it in a decentralized way, but rather you need to a centralized way. And that happens through an API gateway with functionalities that are very catered to AI interactions.
[George]
Okay. Just recap for us how that gateway, when it’s organized for AI based interactions is different from the more traditional procedural interactions that we had with the microservices. Just distill again, a comparison.
[Madan]
Sure. So to start with, in the former of an ingress API gateway, you are trying to protect your system largely, one, and then in the case of <inaudible> and the gateway, it’s a more egress. So you are trying to stop leakage and prevent bad things from going out.
>> Okay. – So <inaudible> from coming in, and here,
[Madan]
it’s bad things from going out. Second, I think because the functionality is very catered to AI, the egress gateway can use all information that you have within your company and really cater the functionality that’s fully aware of the request body. Remember we talked about request body? Here, it can take the full utilization of that. So for example, if you just want to augment the query and ground it based on your company’s facts and true data, you can always do that as the API gateway. The API gateway will ground that query. So you might ask LLM to say, “Generate this based on… This is what I want you to do, write a poem. ” But the API gateway can say, “Let me pull some more data about pass points written within this company, or just data that should be accessible outside.
As part of that prompt, it can augment it and then give it to the external LLM. Now all the different use cases within our company don’t have to now think about that augmentation because augmentation is taken care by the API gateway. So we will see a lot more business aware transformation and augmentation in the AI API gateway more than what we saw in the traditional API gateways.
[George]
Okay. So we’re going to see… Basically for the last 10 years, it was all about organizing and protecting ingress to your internal systems, and now we have a new generation about egress and organizing access to external systems.
[Madan]
Yeah. – Okay.
[George] Savannah, with that, why don’t you bring us home?
[Savannah]
Well, I thought that was awesome, Madan. Lots of sound bites there. Also making some rather chewy concepts very digestible for all of us. Really appreciate it. I’m excited to hear more about what you continue to do, and we’ll keep this conversation going as the landscape evolves. I can imagine your job is only going to get more complex, but it seems like you enjoy the ride if you’ve been hanging out over there for that long. I’m also smiling ironically, because I actually just had an Uber Eats order delivered while we were talking today. So this was conveniently timed for my lunch. Also, dog fooding myself on the experience, and that actually got here faster than I was expecting because obviously I wouldn’t have had it scheduled to be delivered during this call. So you surprised and delighted on the other side. George, thanks for all the prep and your great questions today. Anything else anybody wants to add before we bid adieu to our audience?
[Madan]
Love the conversation. Thanks for having me again.
[George]
All right, thanks, Madan.
[Savannah]
Yeah, absolutely, Madan. And thank all of you, wherever you might be tuning in. I believe the three of us are remote near San Francisco, California. My name is Savannah Peterson, joined by George Gilbert. You’re watching theCUBE, the leading source for enterprise tech news.