Conf42 Chaos Engineering 2024 - Online

Chaos multi-domain scenarios and its business impacts

Video size:

Abstract

Multi-domain impacts of chaos theory and its implementation are currently the real challenge among SREs. We tend to apply the same theory in different contexts but fail to understand the business gains. I will emphasize benchmarking chaos tools and gauging the business importance of chaos cases.

Summary

  • Tejendra Bhandari will present the business impacts of multidomain use chaos engineering use cases. This is his second presentation for Con 42. Many organizations are finding it difficult to gauge the impacts of chaos engineering experiments.
  • Today I'm going to present an introduction about chaos engineering and myself. Then we'll focus upon the session details and then last question and answers. You can post the questions and answers on the Slack channel and I'll be happy to take up them.
  • The use cases could be from open source use cases, from your learning use cases. There would be multiple ways you can try these experiments and create a value using these experiments. Which open source or which license cases tool would be helpful for me or for my organization? It solely solely depend upon your experiments.
  • A prediction of these experiments and outcomes will help you to reduce the outages and issues which you are facing in your landscapes of chaos engineering. Revenue realization is the most important part. converting your use cases into revenues. Easy use for any user across organization which can take your chaos engineering experiments without any pain.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello. Good day everyone. My name is Tejendra Bhandari and today I'm going to present the business impacts of multidomain use chaos engineering use cases. I hope you are all doing good and welcome to Con 42. This is my second presentation for Con 42 and I'm happy and glad to meet you all virtually. So let's start and dig into the business impacts which helps to gauge presalespeople, the architects and the people who are willing to present the chaos engineering towards organizations which are new to implementation and basically finding it difficult for how to gauge the impacts and how to basically redeem the benefits which have been applied by chaos engineering experiments. So let's start with the agenda. So today I'm going to present an introduction about chaos engineering and myself. Then we'll focus upon the session details and then last question and answers. You can post the questions and answers on the Slack channel and I'll be happy to take up them. So let's start. So basically when we say business, right, how we are going to gauge the impact which chaos been created by chaos experiments and how you can get the outputs in the business form for any organization which are willing to invest in it, but they want to actually see what is the actual safe or basically what is the actual revenue which has been come from the chaos tools which has been implemented. So basically, to start with, there are lot of impactful cases which can be across domains, can be applied and you can gain insights from multiple experiments and also from a single experiment in multiple domains. So create first of all a strategy where you find a lot of similar AWS or any platform pool basically. So let's say for example, you have used AWS as in chaos experiment platform and you have gate insights out of this AWS. You can start multiplying them within the forte of AWS and can use these similar platforms as a single bucket and can create a strategy out of it and then can portray these are the strategically good areas where you can actually implement chaos engineering experiments. Now to drive the experiments, right? We are always in a jiffy that how you are driving the experiments? How is the business case being created? So you have to drive the experiments based on the business requirements. And I'm sure a lot of businesses do not know their requirements, but you have to start from somewhere. So we have to understand what are the business requirements which they are willing to. Either it's a resiliency, either they're volume drawing the outages, what is the impact they want actually out of the chaos engineering domain, then working with teams that have a full picture multiple times. You do not get a team which has full organization picture, but we have to dig out the team which actually work with lot of other teams and can serve as a single platform owners for them and gain the insights and interact with them and then learn how technically they are facing the problems and what are the challenge areas which they are actually facing it. This happens in all the domains as per my understanding. I have worked in media, I have worked in medical, I have worked in delivery, I have worked in multiple domains within technology for hours implementation and I found it very resourceful to interact with people who are actually engaged with infrastructure, actually engaged with writing of the services across the organization or multiple service lines. So it is helpful then you get started with the most painful area. If they are able to give you the most painful area, I'm sure they would be because there are a lot of painful areas which are actually impacting the organization and the run of the chaos, or probably they are failing lot of cases internally. But there will be surely a lot of sres where you can actually sort out your chaos experiments which you have designed or you want to design. So that would be very helpful to start with. Then a lot of time we create lot of use cases to portray the challenges and the outcomes of it. So you start converting those cases into a common pool. And then where you do not have the business to interact with or you do not have any information on the business or the platform or the infrastructure, you can start playing around in an environment where the customer or the organization wants to you to start with. You start using the common and the most easiest experiments to gain out the insights of the infrastructure or the network where you want to hit the experiments from. So it would be helpful in multiple domains then artifacts. As I say, it's the most important thing to create awareness and to create lot of traction towards what you are doing. So create an impacts artifact and publish across organization and publish it multiple times. Reach out to teams which are actually using your domain. As I say, using your domain means either they are using AWS as a platform or any cloud platform and they are using it on daily basis, but they are not aware of the challenges which they are actually they can face off. So publish them, publish them and then basically make them aware of what you are doing and if there is a result or not result. But you can start applying with them and learning from them is a very meaningful exercise which can be done on your past experiences or your strategy where you have defined for any platform or any organization, you can basically systemize these insights into an experiment and then curate a set of experiments for that particular organization or service line. So this is all about how you are creating an impact on multiple domains. Now, how do you create a value to this business using these experiments? I'm sure a lot of people have run the experiments and they would have got some insights out of it, but they are failing to convert these insights into a business use case. Right, because at the end you have to win the business and then once you win the business, then only the traction or the organization benefits would be known to other people who are willing to take these experiments on their services also. So you have to understand the organizational landscape where you want to portray, how do you want to portray and what are the business outcomes you want to actually levy on. Then you have to create an experiment bucket which has an agility to run on multiple services, whether it's serving in a different domain itself, but in the same organization. The experiment should be so agile that someone can create a small change in the experiment and can utilize them in their services. As I said, gaining insight is most powerful and most useful. When you gain insights, you actually portray them into your documentations and then people understand these technical terms and insights which you have given. They can also correlate with their services and then allow you to come into their area and then experiment. Now, acceptance of the use cases across organization. As I said, you create a pool of use cases which are very generic in nature and very generic towards the organization where you are implementing it. And then slowly and gradually start giving these experiments to run on services, first manually and then create these use cases as an automation pipeline based use cases where people do not have to come and edit your use cases, they just have to run the use case and then get the output out of it. So more and more ease of use cases you give to the user. They will be highly appreciating to run them and then very less conflicts would be there to challenge their environment because these use cases would run internally to them and then they will get the insights. And slowly and gradually this will become a pool of use cases where you start getting the insights. Then the most important part is conversion of these use cases into revenue. So the business is revenue. So how do you convert these use cases, your business use case into revenue? You have to gauge how much and what have you saved or what have you found, basically. So for example, you have found an application which is running on a load balancer, and these load balancers seems to be very intact and these services are running very fine, you do not have any challenge in that. But when you run the use case which you have defined as an experiment on the load balancer or in basic infrastructure, and you find that there are projections of the transactions which are going on internally and the teams are not able to find out this, and you create an outcome of this run and then tell them, say, hey, you know, 30% of your rejections have been going when your ha is switching over, and 30% means some amount of x amount of dollars which you have probably would have lost. Or if in any case of big billion day, your revenues would be majorly impacted because of these failures and how this x amount would be calculated. It would be calculated based on the failures which may happen and the time or the manner which would be able to rectify this. In the outage which is in production or in a big billion day environment, you can calculate the manners and man days and then how much time it would take to rectify this use case. So these terms will give you some sort of revenue or number and these numbers would be then mapped to your use cases and outcomes. So the best part is to try, try, and then try with multiple landscapes in your other organizations and then create a pool of use cases which can run automatically into a CI CD line and then these CI CD lines when they come up with some outcomes, as I said, you can map them into a menage or a revenue term and then can win a use case to any business. So moving on, how you are creating value with this technology, the use cases could be from open source use cases, from your learning use cases, creation from the experiments which have been tried and tested but failed before. There would be multiple ways you can try these experiments and create a value using these experiments. Working on the experiments may be challenging because a lot of times you do not have an idea where and how to start with an experiment. So I would recommend go to open source tools, go to open source communities, find out which are the relevant experiments which have been driven earlier in the past or probably present in the system, and then curate your experiment based on that and start hitting the infrastructure API or any layer you want for that reason, and start creating your own experiments with that particular background of your knowledge. And then either you can use any automation tool to make it into a CI CD line and probably getting an experiment result every day, every run or whenever it is required. So basically you get to know how your system is behaving and you are hitting the system in the right mode. Once you hit the system and you get the learning out of it. You can use these experiment and create a depthful experiment. Say for example series of experiments. Like once you hit the infrastructure, then you hit the network layer and in sequence you hit the API or the application layer. Then you get an end to end result of an experiment and can create depthful experiment for an organization. So yes, creating the experiments with technology will help. A lot of people have asked me a question, which open source or which license cases tool would be helpful for me or for my organization? It solely solely depend upon your experiments. First you hit manually and then gauge whether my technology which I'm using, chaos, a landscape which can use a license based tool or can I go around to an open source community and get my experiments to be created. And then slowly, gradually when I create a pool of experiments, can I turn it to a serviceable or a license based investment or whatever I'm investing right now is giving me an output which I wanted. So it's again based on the business requirements and the technology landscape which you are already using and invested in. So moving on, I'll not take much of time here, but we have talked in length here. So experiments in depth. As I said, you create your experiments, create a series of experiment and then try pushing into an area where you think these experiments would help me. And then a prediction of these experiments and outcomes will help you to reduce the outages and issues which you are facing in your landscapes of chaos engineering. Then revenue realization is the most important part. As I said, convert your use cases into revenues which help you to map a revenue for an outage or a time effort saved for any outcome of a chaos engineering result. Infrastructure, network and application changes which impacts any kind of issue or outage can help to realize what are the major areas in any application or any infrastructure or any landscape of an organization. You can dig in and can create a lot of chaos internally with small, small changes which may or may not be tested by thoroughly, but you would be the person who would be able to get the insights out. Then last but the most important ease of use for any user across organization which helps or which can take your chaos engineering experiments without any pain and can run them without any disruption in their current running services would be the major business impact which you can create and these small, small impacts. You can actually map it to your revenue and based on the runs, based on multiple runs, usage and outcomes, you can actually drive your business changes and business revenue impacts across. So you can please post your question and answers. I would be happy to help. I have been working with presales and sales and practices to implement this and have been interacting with lot of CTO, CXOs and Devsecops, people who are willing to implement security as well as resiliency in their application layer and infrastructure layer or network layer. So thanks for your patience and thanks for listening me and hope this session will give you a lot of insight and which will make some really your organization and you by yourself facing the challenges and implementing chaos engineering. Thank you so much. Have a good day.
...

Tejendra Bhandari

Cloud security architect @ Nokia

Tejendra Bhandari's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways