Video size:

0:09 Miko Pawlikowski

Hello and welcome to Conf42Cast Episode Three Planet Chaos Native. My name is Miko Pawlikowski and today with me our guest, Uma Mukkara, co-founder and CEO at Chaos Native. Uma, it's a real pleasure to have you here today. How are you doing?

0:25 Uma Mukkara

Great, Miko, excited to be here, looking forward to having a great conversation.

0:30 Miko Pawlikowski

Okay, so we have this little tradition that we start all of our guests off with the question about pets. If you could have any animal in the world as a pet, what would you pick?

0:40 Uma Mukkara

I do have one at home. It's a pomaranian Indians breed's dog, cute one. So dog is our favorite pet.

0:49 Miko Pawlikowski

Okay, that's a safe bet. I'm not sure what it looks like, do you have it around somewhere? Maybe we can link to a photo. We have a lot of dog lovers and they share, so that's great. One thing that you know, a person notices when they go on your LinkedIn is that you're basically co-founding a third company right now. Does that make you a serial entrepreneur?

1:09 Uma Mukkara

I would say yes, in some instances, and not really in some sense, because my second startup was more of a multi open-source on my first technology. And I would say this is my real second startup on Chaos Native, which actually, I co-founded as part of the previous company. So I did not have any exit yet. So if you're talking about how many exits you made. Not yet, so that doesn't count me as a serial entrepreneur. But in terms of co-founding and running companies: yes, I did have a fantastic experience in the last 10 years, I'm more confident. So I think I have the general experience that you get from a serial entrepreneur. So feeling pretty good about what we want to do at Chaos Native.

1:57 Miko Pawlikowski

Yeah, definitely is very impressive. But before we go off, diving to what Chaos Native is, and you know, your previous companies, what's your favorite and well, the most and the least favorite thing about being a co-founder and CEO right now at Chaos Native?

2:12 Uma Mukkara

I'm pretty excited about the opportunity here, right now, the favorite thing is the opportunity itself. For your audience in this show, I was a co-founder in the previous company. We built this technology, Litmus, and then I have chosen to spin it off and then focus on Chaos Native. So I'm very excited about the opportunity and what we can build - the next big thing in chaos engineering. So that's, that's the best thing, the market is pretty encouraging. And the least favorite thing, I would say that as I take on this new role, I will have less time to spend on the actual technology, Litmus, at the code level, or at the documentation level, or even to spend a lot of time with the community users. So I would say that's a little bit of a sad thing. But you know, when you take more responsibilities you need to spend on the execution side as well.

3:07 Miko Pawlikowski

Right. Because on the previous one, you were the CTO, so you had a more technical role. And now you're running the show.

3:15 Uma Mukkara

I did run a few months, the previous company, but I had reached out for help in running from more experienced people. And then you know, I was watching as a co-founder, I did get enough experience. So this time around you ride the ship. So that's, I think we've got everything where we want to.

3:34 Miko Pawlikowski

Awesome. Yeah, I know, you know, a lot of people in the chaos engineering kind of realm hand ecosystem are following you closely. So let's talk a bit more about Litmus. Because, you know, obviously I've been following that for a while now. But not everybody's familiar with Litmus. What's Litmus?

3:50 Uma Mukkara

Litmus is a chaos engineering framework written in a cloud native ways, a cloud native application that is supposed to be providing platform not just to set up experiments. A platform to do your chaos engineering end to end. In fact, I started Litmus as a way to test OpenEBS, back in 2018. We were one year down the line, and I was looking at how to chaos-test OpenEBS. We put it in production for our own SAS platform at that time. So pretty intrigued at that time about chaos-testing OpenEBS, then ended up writing Litmus just for that purpose. But it opens us with, for fun really at that time in 2018, CubeCon. You want to open-source something, you know, that's useful. That was a great response back then. So it was not really a blind startup, right? I would say Chaos Native. It just it just happened. Yeah, it grew very naturally. And then the community was growing faster than what we expected. So since last year, we've been focusing a lot more with more resources. To answer your question, Litmus is really that platform that can scale well for your chaos engineering needs. And it's built with open source at its heart. And community at its driving mode, right, for chaos experiments. We are through a good chunk of our journey. We are now at 2.0 beta will go GA in a couple of months. But I think, you know, we have a basic feature complete. And as that's required for doing chaos engineering. A lot more to go, but that's what Litmus is right now. And it is the CNCF sandbox project. We have applied for incubation, we'll be moving to that stage, hopefully in the next couple of months.

5:38 Miko Pawlikowski

That's awesome. It always helps to have the official stamp of approval. But I feel like you're kind of undersunning Litmus here. Because the feature completeness is great, but it doesn't necessarily help if it's difficult to use. And I think one of the sign points is that Litmus and the Chaos Native is supposed to be fairly easy to use, right? You have to pub with like a ready made experiments. Is that the right impression? Is that the unique selling point here?

6:07 Uma Mukkara

Yeah, so the unique selling point we wanted to build for a larger requirements, right? So first step that we took is let's get the architecture right. Let's get easy to use and our Helm charts studies that we used, that's not the end uprate. But we wanted to get the chaos operators, the chaos hub, all those basic elements to be available just like it is and then towards Litmus 2.0, we started focusing on how easy it is to actually use. The ease of use also, there are many aspects to it. One is how easy you can construct and run the chaos experiment. The other one is how easy you can actually see the result and debug your entire system. And then how easy it is to see your logs and you know, in a complex chaos experiment. So we did focus on ease of use, it was Litmus 2.0. So right now, it's very, very easy just to start one chaos experiment, or to create and manage a very complex chaos scenario are both very, very easy. So that's also definitely another unique thing about it. So a cloud native application, very easy to get started, and scale from there.

7:17 Miko Pawlikowski

Okay, so if I was brand new to that, and I wanted to get started, I'm guessing I go to your website. What's the easiest way to get cracking? Do I take one of the experiments from the hub and take it from there?

7:31 Uma Mukkara

The easiest and simplest way is you pick up the helm chart, and then you install it with the latest version of Litmus. You actually get a portal to help you get started as a simple web service, a UI service that comes up with the basic username and password. And we have thrown up a ready made chaos workflows for you, chart right there on the web. For example, a simple pod delete, CPU hog memory have all those. And also, we have constructed couple of end to end application, including chaos as well. For example, sock shop, and portato, right? So these workflows, we call ego and just install and click a few buttons. It actually install your sock shop, set up the monitoring, set up the chaos experiments, and you can run it and then go and use it. And then you get used to how do you do chaos engineering at the basic level. And you can use that template and start changing the experiments to your liking and start tuning them. We also have the capability to add your private hubs, but it's a little bit of day 2 operation, right? So you can take things from public asap on day one, and private chaos comes when you want really. Install it for your real needs.

8:45 Miko Pawlikowski

Yeah, I'm asking primarily because there's still a lot of stigma around doing chaos engineering, and it's still in a lot of people's minds, something that's very risky and very unorthodox. And I'm wondering, as a fellow chaos engineering practitioner, where do you think we are right now on, like, the adoption curve? Like you have for a smartphone, we have early adopters, then you have the early majority, and it becomes mainstream basically afterwards. What do you think we are right now with the same April 2021?

9:17 Uma Mukkara

We are much ahead. The growth adoption curve has been pretty steep in the last three to four quarters, I would say. I would still say we are in the early adoption. If you take that Crossing the Chasm graph, we are in the first part, moving to the second. I would think the mainstream adoption of chaos engineering, especially in cloud native, will happen probably in 23. It's primarily driven by Kubernetes itself. So Kubernetes crossed chasm, maybe a few years ago, a couple of years ago, people are now moving to production with Kubernetes with larger applications. So the challenge obviously comes in, there's a lot of dynamism. So chaos engineering, there are a lot of preachers now. Right, so it shows like this talk like this conferences. It is taking the limelight out to the front. And there's a lot of advocacy happening. It's all these are going to help the adoption go getting faster. But I still see there is a culture thing that's still playing around, right? People are waiting for more success stories and word of mouth has to happen. And the more we make things easier for adoption, things will gain with that. There is still reluctance and a bit of inertia on them in my opinion.

10:33 Miko Pawlikowski

So do you see like, more and more competitors appear on the market selling, you know, services around that? Which is probably a good thing, right? In the grand scheme of things, it legitimizes the practice as a whole.

10:46 Uma Mukkara

Yeah, no, definitely. We never wanted to be a chaos engineering company by design, right. So I was into data, it seems that there are many opportunities to build some cool things and create a business for your own. And also technology, right. So we see there are, including yesterday, there was one another project that was accepted into CNCF. So we have three CNCF projects dedicated for chaos engineering. That was not the case three years ago. So, we see many companies coming forward in the next couple of years. And a lot of investments also will happen into the reliability space. Chaos. Engineering is one of the ways to achieve reliability. I see a lot of things changing in this space. And we will be driving some of those changes for pulling.

11:35 Miko Pawlikowski

I'm hopeful for that, too. And, you know, that reminds me of the Chaos Carnival that you organized and it looked like the interest seems to be growing. How was it received?

11:44 Uma Mukkara

It was a fantastic show, in my opinion. Generally, after this pandemic started, there is this virtual webinar fatigue, right. So I was a little bit nervous about, you know, I am going to help start at another virtual conference, what will be the probable success? With that mindset, we started, but the fantastic thing about it was the response. The CFPs were pouring and we had close to 1100 registrations. So for the amount of marketing that we did, I would say it was all pretty organic. And speakers were awesome. I'm really thankful for all those great stuff. So to me, it was a proof of the adoption of chaos engineering, or the enthusiasm, curiosity about chaos engineering in the market today, right? So I see more shows happening now, similar successes is happening. This all points to that there is a need for tools that help you achieve better reliability faster and with more ease of use. Yeah, no, Chaos Carnival was good. And I wanted to do it a little bit more open-source way, right? So the Litmus community was trying to drive the organizing show rather than you know, giving it to more professionals to run the show. We wanted to let the Litmus team drive it.

13:03 Miko Pawlikowski

Yeah, it's great. And even though we probably should have scheduled this events a bit better, because I think there were like two weeks between this and Conf42 Chaos Engineering. Let's talk a little bit more about the future then. This is where we are right now. Where do you see this going? Chaos engineering in general, let's say in the kind of medium term perspective. Let's say three to five years, you know, what do you expect to happen? And what would you like to happen?

13:29 Uma Mukkara

Chaos engineering, you know, if you want to go a little bit back into the past, right? So it was a fantastic effort by the team at Netflix to introduce this at some scale. There was a lot of knowledge that was spread around. But right now, what happened in the last year or two is us blow job tools to make easy for various needs. Not every tool is same or a platform is same. People are building for their needs, and some are open source and some are trying to do business with it. But what we have right now is a more focused approach by various people to bring chaos engineering to the next stage. I think we are successful in bringing the first step in it as a larger team of various people in the market today. I think today, if you look at various stuff, it's easy to start with chaos engineering. You can pick up things and then start doing it. Within a day you can run some chaos experiments. So where are we going with this has two angles to it. One is the culture itself. The market and the target users of chaos engineering will start seeing it as something that is a must and helpful in their DevOps ecosystem. That's culture, people are more used to you have to have a CA pipeline in your system. Similarly, where is the chaos stage or chaos strategy and chaos first principle it is called. So that's a culture that is bound to change and that will drive the adoption and the other angle is the technology itself, right? And people will start using chaos engineering for more complex use cases or more complex scenarios. As they start using it, your first level bugs are all rolled out in production and double, triple failures are going to be the drivers of your chaos strategy. And your mean time to failure will increase. But you know, there is going to be a failure, right? And when such a big failure happens, and what's your recovery? So we're going to get into that complex scenarios and one thing that strikes that time is what is the observability criteria, right? So observability is something that's not common. There are various tools that have been born in the last decade. But Prometheus is one of the observability platform that has been found to be common in the modern systems. So Prometheus based observability, has to have an idea of chaos, right? So, that's what we call one of the chaos engineering principles, that we want to try within our architecture, is open observability. We're not creating new standards for observability. But Prometheus is an open metric for observability, hat is a chaos metric that can get built into it, right? So open observability is going to evolve, in my opinion, in future, the use case that I would think that time is, you know, take a complex system, and automate. Build a complex chaos scenario within a few days, and then put it into use and then observe what, what's going on, right? Right now to do that, it takes about six weeks to a quarter for any solution architect to convince and then get that done. Where in three years to five years that may be a thing of one week, right? So that's the change that I'm looking forward to.

14:23 Miko Pawlikowski

Yeah, that makes sense. Kind of going back to the culture aspect. I think, you know, one of the things that works the best with people is like a good story. Can you share a good story of an interesting or like a funny bug slash outage that you experience that people can gossip about? Some fair, memorable?

17:10 Uma Mukkara

Sure. I mean, like, I've experienced many, in my, my own SAS platform operational billing. But one of the things that I want to share here is how can a simple pod delete experiment can bring out something that's overseen in your application. A pod delete, people think, you know, you go and do a pod delete and Kubernetes spins it back up. So what's the big deal about it? So in this specific application with one of our users, where we're working closely was, the application was monolithic, and it was converted to microservice, you know, obviously, using all this Docker, Kubernetes, pods, everything, and in each time, that's more than what it's supposed to be, right? Basically, it takes more time to come up with the pod to be operational, and you're deployed roughly in a highly redundant fashion, you got more pods. So when you delete a pod, it's really taking time to come back up. Kubernetes did its job that okay, I spin it back up, and there was more load on the other pods. Then what happens when you have more load, your horizontal pod scaler, autoscaler, kicks in SPA. And before this pod comes up, you see, actually, you know, more nodes have been spun, right? So you know, more pods, and then usually more. So it's all just a pod delete resulted in haywire, right? So, such things right? So it goes back to your application guys. And then how did I not do it? Because you never tested at scale at performing this level. And it can be only possible to test it on this environment. It's, it's not like a fun story, but a real story. Simply experiments tested in different environments, different stages can result in different outcomes. And that's what is the randomness. And your horizontal pods scaler can also have a bug, right? You expect it and then you know, if it doesn't work after an upgrade, something else will happen. So science cannot help here. You are PhDs, and all. You just need to be practical, and then you know, keep doing it. Expect that, you know, things will not work. Your response is what matters, right? So that's a story that I thought is worth sharing here.

19:25 Miko Pawlikowski

That's a good one. And I'm guessing the CFO must have left, you know, seeing how the new instances were. Tell me, you know, obviously Kubernetes is kind of still eating the world or how to use the software world and it's still on the uptick. What's the next best thing do you think we're going to see afterwards? And when the next big thing arrives? Do you think it's going to, like, replace Kubernetes anytime soon? Or is it, like, too engrained now and invested into that it's here to stay for a while.

19:58 Uma Mukkara

My experience tells me that this itself is the big thing. Kubernetes has not reached saturation, right? We just started seeing large enterprises to get in. And that is virtualization that is being built underneath this Kubernetes thing. I see the next big things happening within Kubernetes itself. What else seen in the last few years is the technology came in Kubernetes. But there are various new areas that started emerging rate service measures, getopts, how you do DevOps, right, and chaos engineering. And we will see new types of cloud services, it's not just the three big services, clouds, that you have are going to be driving a lot of innovation. There are going to be new services or new cloud platforms that are going to emerge. So all of these things, the next big thing really is how you do things rather than the technology itself, right? So I see kind of an evolution in the services that are going to be available. This ride will continue to happen for at least next five years and maybe ten years. At least there are enough challenges, in my opinion. There's no room for next Superbad thing. Unless it's like totally, totally different area of cryptocurrency in other things. But technology-wise, containers are going to be there for quite some time. I like that. So I'm picturing Chaos, Native surfing the cloud the native wave, but are there any plans to make, you know, Litmus and Chaos Native grow outside of the Kubernetes world too? And kind of, you know, spill into the legacy stuff? Yes, that's one of the reasons why, you know, like, all this also happened. We started talking to the enterprise users. So the use cases really are, yeah, we started moving to cloud native, right? Obviously, this is great. But what I need is a story around resilience for my application, which is going to move on to cloud native space, right? Litmus is a Kubernetes application, cloud native, and it can scale well. It's a platform. And we expect one Litmus for the entire enterprise. And then taking care of the needs of the collaboration of various users and, you know, experiment collaboration, policy collaboration, chaos strategy collaboration, all that stuff. But there's got to be the chaos workflows directed towards non-Kubernetes, as well, and the platforms. Kubernetes is just one platform, what happens to my legacy platform, I have all that stuff. So you need to target towards a strategy of chaos rather than chaos experiments. So in my opinion, chaos experiments are going to become a commodity, right? So you won't be able to sell anything, or differentiate yourself based on the experiments. It's going to take two, three years more. With this kind of hubs approaching chaos experiments are going to be well, alright. So you need to use them and start writing new chaos experiments only for your legacy stuff. That's where more work is going to be involved. For us? Yes, you know, we have included some new features to start chaos on Kubernetes recently, and we have examples of executing chaos against non-Kubernetes platforms. But we are going to invest more and more as we grow, supporting our enterprise users. It's still Kubernetes that drives, but there is going to be hybrid environment.

23:23 Miko Pawlikowski

I'm definitely going to follow what you guys do. It's, it's really awesome. Sorry, running out of time. So I just want you to spill some beans and share some golden nuggets of wisdom with our viewers towards the end. Two questions. One is, if you were to pick, like, one single thing that you did, that provided the highest return on investment for your career, it could be any kind of investment, you know, a technology, a course that you took, or you know, whatever it is, what would it be? And why would you recommend doing that?

23:58 Uma Mukkara

I was not an open source guy four years ago, right? A four and a half years ago, right? So I pivoted into open source with a lot of advice coming from the investor community primarily, and, and also from some big CAOs, right? So the big decision was, let's start doing things in open source. So we had to leave everything that we built for five years and then start fresh, right, so an open source. And we were early in that space we took on Kubernetes. The biggest decision to move on to open source has really led great results. So I would say that was. I think open source is going to drive many things in future as well. Moving on to commercial, open source software approach for business and building businesses, has been one of my greatest investments in terms of decisions. I love that I couldn't agree more.

24:54 Miko Pawlikowski

There's just so much sense in that. And the last one then.Imagine that there's a kid, somewhere at school, that sees you doing that and thinks, "okay, this guy's engineering and open source stuff is really cool. I want to be doing what you're doing when I grow up". What would you tell them? What would be your advice?

25:13 Uma Mukkara

Chaos engineering will not be famous at that time, when you grow up, right? So one, one thing that I have observed or learned is that there are opportunities that keep coming, right? Technology keeps changing. The next big thing is, is a matter of either three years or five years, right? It keeps coming. But it's the set of people that drive this big change, right, that you work with. There could be anywhere. Most of the time, they're not within your team. And maybe you have worked with them in the past, or talk to them in the flying plane, wherever, right? So I think if I had to go back 20 years, I would have spent more time in building deeper relationships with people. I would have created more opportunities now, if I had built more relationships in the past. I would say focus on the human aspects of it. Be true to yourself, you know, empathize and appreciate what others are doing. And that way you actually create opportunities for yourself. And the next big thing comes in, the journey will be much easier because we got a lot of friends out there who are opening for feedback, right? But failing early is a good thing in any journey so that you can get back and then move fast.

26:26 Miko Pawlikowski

And if the pandemic taught us anything is that it's worth spending time and effort on the human aspect. And it's not that obvious when all of a sudden you can't go and sit in person. Uma, this has been a real blast. Thank you so much for coming. And thanks for being part of the Conf42 experience. Thank you for your time.

26:47 Uma Mukkara

Thank you Miko. It's been a pleasure being here.

26:50 Miko Pawlikowski

Cheers.

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways