Conf42 Chaos Engineering 2024 - Online

Building Resilient Systems with Serverless Web Development

Video size:

Abstract

Embark on a journey through the serverless frontier! Join my talk to uncover the power of serverless web development, exploring resilience, fault tolerance, and Chaos Engineering. Discover best practices for building robust applications in this paradigm shift.

Summary

  • Olumide Akinremi will walk you through building resilient systems with serverless web development. In this discussion, we are going to talk about chaos engineering. All systems are bound to fail at any point. But what is important is building confidence in your system when a failure happened.
  • Front end poses greater challenges compared to every other environment. Adding additional features to your front end or your code generally doesn't make it resilience. It's important to building with chaos engineering in mind and trying to catch and fix issue before they arise.
  • So let's talk about handling failures and building resilient systems in a serverless web development. If your authentication service goes down, for example, you need to think about how your system is going to work. We will talk about some tools that can be used to implement this chaos engineering.
  • Next, let's take a deep dive into a LinkedIn use case. The most important part of this application is this user profile. Focus on what to fail and how it should fail. These are tools that we can use to create chaos.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, and thank you for joining my session. I'm super excited to walk you through building resilient systems with serverless web development. I am Olumide Akinremi and I work as a technical team lead at Sabi. In this discussion, we are going to talk about chaos engineering. Chaos on the front end, building resilient systems and auto and failures. First, what is chaos engineering? I know a lot of people might have asked about chaos engineering and some of its advantage, but it's important to have a quick breakdown of chaos engineering and why it is so important to software engineering. So Wikipedia defined chaos engineering as the discipline of experimenting on a system in order to build confidence in the system's capability to withstand turbulence condition in production. So basically you have systems and it's important to build for scenarios whereby your system will fail. All systems are bound to fail at any point. Your AWS server, your azure deployment, or your services. Everything will fail at some point. But what is important is building confidence in your system when a failure happened. And this can only be done by ensuring you are prepared for this scenario and know what to do in this scenario. More like seeing the issue before it happened or knowing that this is going to happen and you are prepared for this situation rather than not being prepared at all. And then when it happened, you have no idea of why your system fail to see chaos engineering as having a car and you decide to go on a road trip with your friend. And on the road trip there are a lot of failures that might happen, just like running out of gas, having a spare tire. Having a spare tire is important because you can have a flat tire on the road. So all these things you should consider on the road trip, the kind of failure that might happen on your road trip and trying to prepare ahead for this kind of failure. So things are bound to happen in the tech world, maybe as we speak, someone's system is currently down and trying to resolve it. This happens like every single time. So it's smart to find a fix to the problem before they arrive because they can cause trouble and give you a hard time to deal with when they pop up. So it's more like checking your car to ensure that this car can take me on this road trip. And if different situation occur, I have the writing in place to ensure that I keep moving, I don't get stranded at this point. So let's talk about chaos on the front end. The front end is a very crazy environment because a lot of all these failures are not dependent on you as a front end engineer or as a full stock engineer, they are dependent on different situation which are not in your control. So imagine that you have an application whereby the front end is supposed to, sorry, the back end is supposed to return you some data that you are going to use to render for a particular page, but at some point the server goes down, the back end can't return those data that you need and then everything is building. The user can't see anything in your application, the user complain and as a result you might lose some users. As a result you might get some call from your CTO or your CEO that the applications is not working, or in fact like the customer support team or users generally leaving feedback on the application, this is not working, this is crap. So it's important to deliberately introducing issues into your front end application to observe potential problems and assess how your application respond to this. And have at the back of your mind that it's important to have a lot of things in place for your front end application because this is where the user sees and this is where they interact on how they interact with your system. The user are not going to see the back end application, they see your front end application and they interact with it and different situation can make you lose users. If your app fails to render on the initial load, user complains and leave. If they click on a particular CTA and is not responsive, they give you some feedback that it's not working. I click on this button, nothing happened. In fact, if the user have some network connectivity issue and it times out when making a particular request, they complain that oh, this doesn't work just because you didn't undo those failures and faults. So it's important to building with chaos engineering in mind and trying to catch and fix issue before they arise. Be prepared for a situation like that. So introducing additional features to your front end or your code generally doesn't make it resilience. In fact, it might add potential risks an issue to failure in the application, because adding new features means that there are more features or more user interacting with your system and in that case they can try to interact with the new feature you built and the one they are trying to use before is broken. So it's important to be prepared for situations where this happened and be ahead of the users. And another important point is the front end poses greater challenges compared to every other environment because of different thing we need to deal with. Javascript engines, plugins, accessibility, styling, latency, viewport, all of these are not 100% in your control, but they are things you should be prepared for imagine an application that works end to end on chrome, mobile, responsive born, Internet Explorer or Mozilla. A particular feature doesn't work the way it should just because the JavaScript engine or the browser doesn't support a particular style that you've used or a particular function that you've used. So it's important to have tested or be ahead of the users in situations like that to ensure that it works on all browsers. So let's talk about handling failures and building resilient systems in a serverless web development. So take a look at this diagram. This is a music streaming platform that have an authentication service, a movie service, a recommendation struct service, then a service that keep track of your watch history. And this is connected to a catch so you can see it really fast compared to others that need to connect to the database. So the database on the other end feeds like all other services, because the authentication needs to go to database to retrieve user information. So I believe this is like a basic microservice that most people use. Then we have a front end service that talks to whatever front end services that you built. We can be react application angular application or a mobile app. So at this point think of a situation where your database goes down, meaning that none of this service will be able to talk to the front end service. So your react application angular application and your mobile application suffers from this threat, or a situation where your authentication service is down, meaning that users won't be able to log in. So we can talk about different scenarios of other services going down and what will happen. But what is important here is knowing what we fail and handling the failure. So if your authentication service goes down, for example, you need to think about how your system is going to work. Does my application depend on the authentication service to fully function? So based on your answer then you should decide how you are going to build and react to this failure. If your authentication service goes down then the user should still be able to access the application and still stream movies and see the recommendation part of the application because your applications is not solely dependent on this service, because it's a microservice which every service are dependent. So with this you can kind of do some testing scenario in terms of kiosk engineering to simulate each services and see how your system depend on them as a whole and try to react to those failures that might happen. We will talk about some tools that can be used to implement this chaos engineering we've been talking about to kind of simulate failures and know how your system react to it. But overall, it's important to understand how your system works and knowing what fail and how they will feel and finally how you react to them. Next, let's take a deep dive into a LinkedIn use case. So we have this LinkedIn profile, and in this profile we have different views. We have the user profile section, we have the feed section, we have the recent activity session, and we have the post section. In each of this session, in each of this section we have the views and they do different things. So all these little views are what form this is a profile page and a lot of personalization happening here in terms of recommended or suggested posts to follow. Also the recent activities that this profile has performed, then the user can follow. So in this page. So let's take for example, we can retrieve, just like I blowed out this, this is supposed to be the user profile picture and then the username of who you want to follow. Then let's think about a situation whereby we can retrieve that information. Is it really necessary to show the follow button or to tell the user to follow this profile because they can't see the information about what to follow? So it's important to know this little detail to know how you react to this. So that's why I'm thinking also because it is irrelevant to show this follow button if the user can see the profile picture and the name of who to follow, because it's confusing. And at this point the user will be kind of concerned that I don't even know who I'm following. So that is one way to undo failures for this account. Another way to do that is knowing what depends or how your system depends on each other, or each section depends on each other. So the most important bit of this page, based on your application or based on what you are building for your own use case. But for this use case, the most important part of this application is this user profile, which is here because this user profile, it's what makes us know or what makes the back end know, the recommendation of who you want to follow, the suggested posts you want to read, and the recent activity. So in situations like we can't see or retrieve the user profile, then it's irrelevant to show any of this information because we don't have any user profile here, meaning that it's irrelevant to display any of this information. So our page can fail gracefully to not confuse the user any further. But then if we can view the user profile and then we can retrieve other information about the suggested post. To follow who to retro recent activity. We can still display information on this page because what is important to the user or what is most important to the user, it's currently being displayed. So with this example, we can better streamline what we want the user to see at any point based on the failure that might happen from our back end. So I said focus on what to fail and how it should fail. So just focus on your application, think about what is going to fail and ensure you know and you are prepared for how it is going to fail if such failure happen. So these are tools that we can use to create chaos. So these are the ones that I've created that can help you create chaos in your application and like failure for your application. Thank you for listening and let me know if you have any questions. You can reach me on LinkedIn or Twitter if you have any questions for me. Bye everyone.
...

Olumide Akinremi

Technical Team Lead @ Sabi

Olumide Akinremi's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways