Conf42 Cloud Native 2024 - Online

Scalability and Cloud Native Architecture on AWS


I’ll show you how to maximize scalability on AWS through well-designed systems. I will discuss the right questions to ask before designing systems. You’ll learn to match the right AWS service to the right job and accept trade-offs, thus building a reliable, efficient, and flexible data system.


  • Gaurav: Thank you for joining me for my talk on scalability on AWS. More specifically scalability and cloud native architecture on Amazon. I'm going to try to keep this talk at more or less a beginner to intermediate level.
  • The talk is more specific about scalability. Throughout the presentation, I would like to weave business outcomes with security and technology. I will try to make it as business value focused as possible. Here is a cheat sheet to help you understand each and every slide.
  • Author of the book Security and Microservice Architecture on AWS. Throughout the presentation, I like to keep my examples consistent. That way I don't have to go back and explain what the setting is.
  • An ecommerce website needs to be efficient and scalable. An architect can help you figure out how to make trade offs. The whole point of the talk is to look at every application from an architect's perspective.
  • Cloud says you can hand over some of the responsibilities that traditionally were yours to us. But then you lose flexibility that is associated with running your own application. As an architect to evaluate what my business cares about the most, I would like to focus on scalability.
  • You can achieve scalability in two ways. Vertical scalability is just adding an extra size to a server. The second way is horizontal scalability where you add an extra server to the mix. These are the two factors that will decide whether you want horizontal or vertical scalability.
  • If you want to scale vertically, all you can do is increase the RAM or the CPU or the throughput of that application. I wanted to go a little bit into what are the cost implications of that. Ram does become expensive on the cloud and you need to figure out how you can optimize your application.
  • Most people don't have horizontal scalability by default. Just because you add an extra server to the equation doesn't mean your application can start running faster, better, or process more requests. There are two ways of achieving scalability. The first one is using what is called auto scaling.
  • AWS architect: Three scenarios where each one of these modes is the best option to go for. If you have demand that is sustained and fairly stable, it's a flat line you don't need auto scaling. In such a situation, serverless is best option for you. You need to make the right trade off while trying to achieve scalability.


This transcript was autogenerated. To make changes, submit a PR.
Good morning, good afternoon, good evening folks. Thank you for listening to me. My name is Gaurav. I am the director of engineering. I worked in a lot of New York City based companies and thank you for joining me for my talk on scalability on AWS. More specifically scalability and cloud native architecture on AWS. Now, I know I'm trying to hit as many buzzwords as possible, but I'm going to try to keep this talk at more or less a beginner to intermediate level. So I'm going to try to get as much into the basics as possible and hopefully if you like it, if you are interested, you can then move on and search for some of the online literature to get into more advanced topics. So without further delay, I'm going to start off by talking about the structure of how this presentation is going to unfold. To begin with, of course I'm going to give you an introduction. I'll talk about who I am, what I do for a living, and as I've said, I am an architect. I'll talk about why you might want to have an architect in your company. This is again, you can think about it as advertising for my profession. I will then, because this talk is more specific about scalability, I will talk about the problem that currently exists in all the businesses that I have worked with as an architect. What is it that causes you to start looking for scalability AWS? A solution? I will then talk about what it is that scalability exactly tries to solve. What does a focus on scalability give you? And then I will talk about how focusing on scalability can solve the problem that currently exists in the industry. Now this is very abstract, so I would like to also go into some of the tools that AWS gives you in order to make your application more scalable. As with any architecture, there's always a pro and a con. There's always a trade off. So I'll talk about some of the trade offs that you'll have to make in order to make your application scalable on AWS. And then I'll talk about some of the metrics that you can use to decide how much you want to give up in order to have a more scalable application. Then I will go into some of the best practices that you can follow. What are the rules of thumb that you can follow in order to design your system so that you can make the right trade offs and make the right choices for your application from a more scalability perspective? And finally, I am aware of the fact that people have short attention spans in these days of TikTok and all. So I'll give you a cheat sheet, a TLDR, if I may, where I will talk about just two things that if you were to miss the entire presentation, what are the two things that I would like you to walk away with that I would think you would benefit from? And these might be things that you already know or may not, but I would like to reiterate those things. And throughout the presentation, I would like to weave business outcomes with security and technology. So it won't just be a very technical presentation. I will try to make it as business value focused as possible. One last structural point. Throughout the presentation, I will have this question at the bottom left corner of the screen. This will be the question that I'm trying to answer while I'm going through each and every slide. So if you think that I'm rambling too much on a particular slide, or if youll think, what exactly is he trying to say, hopefully looking at this question will make you reorient and try to kind of figure out what it is that I want you to take away from the slide, and I'll try to have at least just one takeaway from each slide. So with that said, again, as I said, I'm the author of the book Security and Microservice Architecture on AWS. It's an O'Reilly media publication from 2021. So if you want, please do grab a copy. It's available on Amazon by the day. I work as a director of engineering, sometimes as an architect, as a consultant in various companies, and I've worked in a lot of companies throughout my career in New York City. So please do find me on LinkedIn at night. I work as a research scholar and a doctoral student at University of Rutgers. So my thesis is on international business, on corruption and international business, something that I'm very passionate about. So do feel free to follow some of my work. Just apart from that, I have an MBA from NYU Stern School of Business in finance. So I try to merge finance into any tech related talk, any tech related discussion I have. That's just something that I like to do. And I also have a master of Science from Rochester Institute of Technology in AI. This was before the whole AI hype, so somehow I managed to be ahead of the curve there another structural part of the presentation. Throughout the presentation, I like to keep my examples consistent. That way I don't have to go back and explain what the setting is. So in this particular case, the setting that I would like to use is that of an ecommerce website. So you can assume that throughout this presentation I run a company that has an ecommerce website and this website, it gives you the ability to search for products. So you can search it by using simple keywords like I want sunglasses. Or you can have an advanced search there where you can say, okay, I want to find sunglasses which cost between $5 and $15 and have a rating of four stars and above. So that's the kind of website that you can assume I'm running. And any example that I want to give related to scalability would be pertaining to this website. So you can think about that. That way we are all on the same page when it comes to examples. Of course you can run wild on how this application is running, but you can just assume that the problem that I'm trying to solve is I want to make this application more scalability for all the different use cases that my end users are going to use this application for. With that said, and with that background, I like to go into, first of all, architecture. What exactly is architecture? So anytime I look at a software, I look at six different aspects of a software, I always think that these are the six points that I want to focus on. I feel an application, in our case the ecommerce website, has to be efficient. I don't want to pay money to a cloud provider or whatever for an inefficient system. I want to extract as much juice as I can out of the code that runs on this application. I want it to be scalable. Again, this is the whole point of the talk. So anytime I add a server to the mix, anytime the number of requests jump up, I need to be able to increase the amount of servers that I have, the resources that I have, and handle the new load that my application gets. So if my application is an overnight success, I don't want to send people back saying like hey, I know youll visited my website, I know you wanted to buy it, but we just don't know how to handle so many requests at the same time. I might want my application to be available all the time. I don't want any downtime. I don't want it to be going down at 11:00 at night because of some network latency or something like that. I want my application to be secure. Security is important in the day of cyber attacks and everything. I don't want one malware to infect the entire application and bring everything down. Of course I might be running some kind of a venture backed application. So I want it to be as cost efficient as possible so I don't have to pay money and convince my investors that I can handle the cost aspect of it. And finally, I want my application to be simple. A simple application is easier to maintain. I can hire more efficiently after hiring, people can get onboarded more easily, I can document it better, and I can expand it in a better way. So I want all of these things. And this is where I realized that I can't have all of them in my life. I've realized you have to pick at the most four of these six points, if not three. Sometimes you might just get one or two even. And that's where you need an architect. An architect is someone that can come in and say, okay, these are the things that you need to focus on. These are some of the points that you can give up, some of your needs, and this is how your application will be most efficiently running for the scale and the level of growth that your business expects. And that's the whole point of this talk. The whole point of the talk is to look at every application from an architect's perspective, figure out how to make trade offs, where to make trade offs, and what are the factors that will decide where these trade offs should be made. What are the tools that are available to you in order to make these trade offs. So one of the tools that I always like to begin talking about is the AWS shared responsibility model. Back in the day, when you used to run the application on your own servers, Amazon started in a garage. You were responsible for every single aspect of the application your server should be running. It should be able to scale up, scale down. You should make sure that in that garage that you are running the server. No one should just jump in and steal your servers. At the same time, youll are responsible for availability and all of the six aspects that I talked about. That's not the world we live in today. That's why you have cloud services. Cloud says you can hand over some of the responsibilities that traditionally were yours to us. We will handle those responsibility in return. Of course, you pay us extra and you might lose on some of the flexibility that you have running those applications on your backend. A way of distinguishing them. I talked about, I always use this example by Albert Barron, which is about pizza as a service. I like to have pizza. I'm thinking of having pizza for dinner today. There are two ways I can do that. I could go out to a restaurant and then I don't have to worry about anything. I don't have to worry about where the cheese comes from. I don't have to worry about doing the dishes after eating. I don't have to worry about the temperature of the oven and all sorts of things. But at the same time, if I want a customization, if I want gluten free base or something like that, I don't have that option anymore. On the other hand, if I want everything customized, I want to have gluten free base. I want to have pizza dough, which is of a specific type. I want the cheese to be french mozzarella made from buffalo milk or whatever it is. I can make everything at home. But then after doing everything, I'm now responsible for the dishes. I'm responsible for setting the dining table, the oven, the heat, et cetera. So I have to decide what is it that I care about most. The third option is of course I can go for something in the middle where I can order takeout, or I can order on seamless or uber eats or something like that where I still have to do the dishes, but I don't have to worry about the pizza term. And in the same way on the cloud you can have an ala carte of responsibilities. I could have everything on premise, or I could have something where everything, including scalability, availability, security is taken care of by AWS, but then I lose the flexibility that is associated with running my own application. Or I could find something somewhere in the middle and decide how much to give up control. Again, that's my responsibility. And as an architect to evaluate what my business cares about the most, I would like to focus on the scalability aspect because that's what the presentation is about. More specifically, I would want to go into figuring out what is it that I will give up while trying to attain more scalability? And what is it that I will probably have to give up as far as scalability is concerned, if I want to be not flexible on the other aspects of my application. In order to do that, I want to take a step back and first go into what exactly scalability is part of. The reason is I've noticed scalability is often confused with another kind of cousin of it called efficiency, because they are both trying to solve a very similar problem. So let's assume our application, our ecommerce application, is an overnight success. All of a sudden you start getting millions of requests per minute. You can solve it and suddenly you realize that you are hitting the seams of your server. Your server CPU utilization is high, your ram is off the roof. You can solve it in two ways. You can either increase the size of the ram that you have. If you are hitting 80%, you add another 20% to it. Or you can add more servers to the equation. You could make your code run in such a way that it can suddenly handle more requests that come in. You can add caching to the equation. That way you don't have to make a round trip to the database or I o or whatever it is. In both ways the solution is the same. You can now handle more requests. In the first one you add more resources. In the second one, you keep the number of resources the same. You just get more out of whatever you have running. The first one where you add more resources is the scalability that we want to talk about in this talk. The second one where you keep the number of resources the same, just get more output out of the same number of resources is efficiency. In most cases you want the system to be as efficient as possible, but soon you start hitting limits. You have diminishing returns as far as efficiency goals go. And that's why you need to focus on scalability, because your business would be a success and the last thing you would want to do is not service customers because you don't have the resources. Or rather you have the resources that you could have added. You just don't know how to add them. An example here. I've tried to create this caricature in chat GPT, where we are sitting in this group of developers. We suddenly hit success and then how do we scale? Well, we scale by adding these servers. And what happens once you add those servers? For starters, you lose the simplicity. You created this monster Frankenstein application with all these chords going everywhere. And look at these people, they've all done it. It's all a scaled system, but at what cost? You have a Frankenstein that youll suddenly have to tame. So that's where you need to focus. So having discussed scalability, let's talk about the ways you can achieve scalability. Well, you can do it in two ways. The first one is vertical scalability. Vertical scalability, going back to the textbook definition, is just adding an extra size to it. If you're hitting building the ram ceiling, like say if you have a four GB computer and your ram is close to getting done, you increase your ram to eight GB. Or if you have a CPU, you can increase from go from I five to I seven to I nine or whatever it is, vertical scalability, where you just improve the size of the server that is holding your application. The advantage of doing it is you don't have to make any changes to your code. It's the same code, it just runs on a better system. The second way of doing it is horizontal scalability where you add an extra server to the mix. So if you're running a cluster, and this is a big F, if you are running your application AWS part of a cluster, there are different multiple resources within that cluster that run your application. Horizontal scalability means you can just add extra resources to that cluster and then you can start achieving more. So what are the advantages and disadvantages? Well, for starters, if you don't have your application running as a cluster, it's just easy to achieve vertical scalability, right? Especially in the world of cloud, you can just increase the size within 20 minutes and you're done. The disadvantage though is you have limits to how big your instance can be. On EC two you can't go beyond T 24 x. Well you shouldn't ever reach that point, but there are limits to that. Secondly, vertical scalability is very easy to achieve. Initially you don't have to hire new engineers, you don't have to change the code, but it starts hitting diminishing returns very quickly. And horizontal scalability is the way to go beyond a certain point. So these are the two places factors that you need to figure out at what stage is your application at your company at and that will decide whether you want horizontal or vertical scalability. Now getting a little more into vertical scalability, as I mentioned, you can have an EC two instance that is running. And if you want to scale vertically, all you can do is increase the RAM or the CPU or the throughput of that application of the instance. And then suddenly you have a more scaled system. The other thing you can do as far as achieving vertical scalability goes is if you have a general purpose instance running. That is if you have an instance with two gigs of RAM and one virtual CPU. If you suddenly want to increase the CPU size, AWS gives you a memory optimized or a CPU optimized instance that you can go for where you give up some of the RAM for an extra CPU or you give up an extra CPU for some more RAM. So if you hit one of the two limits, if you hit the Ram limit, you can give up some of the RAM and you can give up some of the CPU for extra RAM at the same cost. So that's the other way of achieving vertical scalability. And finally, slightly less intuitive way is if your application runs on a shared server. Like if you have a database running on a shared database cluster, you can move it to its own instance and that way you can achieve better data. Youll can have a dedicated server, that way you get more processing power with that instead. I wanted to go a little bit into what are the cost implications of that. For this I've collected all the on demand prices for the T four instance on EC two AWS. You can see if you have a micro instance that has a 1GB RAM, it costs you zero point. It has two virtual cPus. If you go from 1GB to two gigabyte, it takes 0.8 extra cents. And now for zero point $0.16 you get the same number of VCPU, but you get 2GB of RAM. 0336 increase, but you get 4GB of RAM. So you can see that for each gigabyte increase you're spending approximately zero. Look at the CPU side of things. You see a CPU jump from two to four between T four large and T four extra large. Ironically, AWS doesn't seem to be charging extra for that CPU jump, even though they say that CPU is expensive. And all I've noticed the memory is how they're pricing their tiers. So as an architect, if you have a demand that you are projecting, you should keep that in mind and figure out how this insight will help you at some point. Ram does become expensive on the cloud and you need to figure out how you can optimize youll application based on this kind of an equation. I've written a few blocks around cloud economics, but that's something that I always keep in mind when it comes to projections. With that being said, I would like to go into the more complicated, according to me, way of achieving scalability. That is horizontal scalability for starters. Why is it that most people don't have horizontal scalability by default? Well, because for starters, your application needs to be ready to have horizontal scalability. Just because you add an extra server to the equation doesn't mean suddenly your application can start running faster, better, or process more requests. A fun joke one of my old boss used to tell me was nine couples can't have one baby in one month. If the baby takes nine months to be born. That's how it takes. That's how some applications might be. So you need to redesign the application in order to have horizontal scalability where that extra server should make a difference. With that said, let's assume you have an application that can actually do that. One way of doing it is of course you can over provision your entire system. Let's say you are suddenly going to expect a million requests per minute. Youll can assume that's what you're going to do. And right since day one you can provision your servers in that way where you have enough servers to handle 1 million requests. So if you hit that kind of a level that is not going to be a problem. What's the disadvantage of doing it that way? For starters, on day one where you're not getting a million requests, you have all these servers running and you're paying money for no reason. They are just running there without handling any requests. Then your load starts picking up. Day 50, day 100, whatever it is. Then each server starts becoming useful even though then this server here is still being wasted. But some of the others are now useful until you hit this limit where you have provisioned something. Once you cross this limit again now you are under provisioned because you don't have any server above this that you could provision. Now suddenly you might end up running to the market trying to buy new servers. I mean you live in the world of cloud so you're provisioning a new server but that's still a problem. You have to do something. You need to have someone to monitor the utilization before a new server is added. And that's the problem that horizontal scalability might have. I have at the bottom. This is something that I found online. Tried to give attribution wherever possible. If this is how your demand is going to increase. The blue line is how you want your servers to be provisioned. But in reality that's not how the world works, right? There's always these step increases. Anytime you go and get a new server you have to jump up a step. You might have to make a capital expenditure if it is on premise or in the cloud you can just provision a new server. Finally you can also start reducing your traffic and you might have to get rid of servers again. That is something that youll have to worry about. If you are under provisioned you might lose customers because you don't have the processing power to accept the request. If you're over provisioned your servers are running for no reason and you are losing money. A final point I would like to make is that of elasticity. Elasticity is very similar to scalability but in a very temporary setting. So what if normally speaking you get say 5000 requests a day? That number is going to be the same. But on the day of the Super bowl suddenly you get like a million requests. So that temporary spike, you might want to also have the ability to handle the traffic that comes in because you don't want to waste the turn away customers. Even if it is on a temporary basis. So sometimes while designing for scalability, you might also want to consider elasticity where you want the ability to add extra servers for 1 hour where you have that extra spike. So that's what you want to achieve. So what you want to achieve is to have servers being added in the way this blue line is added. When the spike increases, you want more servers. When it goes down, you want less servers. There are two ways. The first one is by using what is called auto scaling. What AWS allows you to do as part of the shared responsibility model is it allows you to teach it how to add new servers. By that it means you can specify events which when triggered will add an extra server to your cluster. These events are known AWS auto scaling triggers and they can be something simple like each time my CPU utilization hits 90% I want an extra server to be added. Or each time my Ram hits 85% I want an extra server to be added to the cluster. By ram I mean the average ram of the cluster. This way you can have logic that can teach Amazon to add extra servers and thus your application now starts scaling up or scaling down automatically. And that way you don't have to over provision or under provision. There is still the possibility because it's still a stepwise increase. You can see that there are still these pockets where you still have under or over provisioning. If your auto scaling logic is not aggressive enough, it will under provision and then you will have lost customers. Or you can aggressively auto provision, but then you might have more over provisioning or overcapacity. The second disadvantage of auto scaling is of course the fact that you are adding extra logic to the equation. That's going to increase the complexity of your application. You can't add, there is no free lunch. You still have to figure out how to add this logic. Sometimes this logic itself can be complicated. If it's as simple as Ram or CPU, that's one thing. But you might want to say okay, each time I see an upward sloping curve of the number of requests I want you to add and that's where your logic starts getting more and more complicated. So that's a disadvantage of auto scaling. The other way of doing it is a serverless application. What serverless does is it's the full service application that we talked about in the shared responsibility model where Amazon says we will handle everything for you as far as scalability goes, as long as you make your application run in the very specific way that the serverless application is designed to run. So within limits. Of course, you can't have like a trillion requests a second, but within limits. What Amazon says is we will increase the capacity and match your load as long as your application runs the way it is designed to run. One example is Lambda Lambda is something that runs a function for you. It's supposed to be very small functions, asynchronous functions generally, even though they can now run for 15 minutes, you're supposed to run small asynchronous functions. But if you can extract this function out and run it on an AWS lambda within region, within limits, Amazon will automatically provision resources for youll provision servers and your function will run on those servers so you don't have to think about scalability, adding auto scaling logic, et cetera, et cetera. Again, I keep repeating like a broken tape, as long as your application conforms to the way lambdas are supposed to run. So that then becomes a very effective way of doing, especially if you have big spikes, jumps and downs. Adding out the scaling logic might be hard, but lambdas are the rescuers. Postgres is another example. Aurora on AWS gives you an option of running your application in a serverless mode where Amazon handles all the scaling up and scaling down for you. I'm actually going to talk about postgres Aurora in the next slide where I'll talk about how Aurora can run either in postgres mode, it can also run in the auto scaling mode, or it can run in the fixed capacity mode, all three modes. So it's a wonderful case study to differentiate between all the different aspects of scaling. DynamoDB is another example where if you have a key value lookup, DynamodB gives you a serverless lookup. So as I said, Aurora is an example of an application on AWS that can run in the one at the bottom. That's the base mode where you can just say, okay, I just want one cluster, or you can have it run in an auto scaling mode, or it can also run serverless. So I want to talk about the three different scenarios where each one of these modes is the best option to go for and how you AWS architects can make that decision. The first mode is if you have demand that is sustained and fairly stable, it's a flat line you probably don't want need auto scaling. That's unnecessary complexity. Youll are going to get five requests per minute. That's it. No more, no less. So in such a situation, you want the thing at the bottom because it's cost efficient, it's extremely simple, and you don't need to worry about anything. Aurora serverless also gives you most of that. But Aurora serverless, you have to pay a premium. AWS doesn't give you serverless for free. You pay a premium for the auto scaling that it does on your behalf. And that is unnecessary in this situation because your demand is constant. The next part is where you have fairly constant demand, except certain times of the day when you need to add more resources. Let's say at lunchtime. Suddenly people want to sit on their desks and order, I don't know, cell phones or sunglasses from your website. That's where you have a very sustained demand and you have spikes when you can predict them. Such a situation, the best option youll have, first of all, your regular single instance, Aurora is not going to work because at lunchtime there is a spike and youll can't just get rid of customers, I'm assuming at that point. So you can either auto scale or you can use serverless. Serverless will still work, but AWS. I said you are paying an unnecessary premium to AWS when you can just have a time based auto scaling. You can just teach AWS to scale up at 11:00 a.m. In the morning and then scale down at 130 and that's the best option you have. Third part, what if your demand is bursty? You just can't predict when suddenly you're going to get a million requests and then you're not going to get something. This lunchtime spike was fun, but it's lunch at any given time of the day throughout the world. In such a situation, well, youll need to go for serverless because Amazon will handle the ups and downs and you don't need to have complicated logic that will scale up and down for you. Even though you could invest time and money in doing that, you don't want to reinvent the wheel. So in such a situation, serverless is the best option for you. The point of me doing this is as an architect, I think about these kind of trade offs every day and the answer to that problem is different depending on what the problem is that you're trying to solve. And that's just one way of achieving scalability. So as I said, there are many, many ways you can handle scalability. As an architect, your job is to make sure that you pick the right tool for the job. As I mentioned, scalability is not where you extract more out of the same number of resources. You do have to spend money on resources. You make a decision that it is better to have a resource that you are allocating for your application because it's important for the business and this way your application can handle more traffic, can handle more load, as your company becomes more and more popular and your product starts getting sold more and more. Secondly, as I mentioned, the scalability does not come for free. First of all, you have to spend money in the resources, plus it might lead to complexity. It can lead to do more loss of security or any of the other aspects. So you need to make the right trade off while trying to achieve scalability. With that said, I hope that I was able to show some newer aspects of scalability and shine some light on some of the more confusing aspects of scalability. Thank you again for your time and I hope you learned something.

Gaurav Raje

Engineering Manager @ MoneyLion

Gaurav Raje's LinkedIn account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways