Conf42 Chaos Engineering 2022 - Online

When Gremlins Play with Cockroaches: A Chaos Experiment

Video size:

Abstract

What happens when Gremlins play with resilient cockroaches?

Using Chaos Engineering to improve system resilience, Gremlin’s “Failure as a Service” makes it easy to find weaknesses in a system while CockroachDB is the SQL database for building global, scalable cloud services that survive disasters. While chaos engineering is about collecting, designing, implementing, orchestrating and scaling the faults in systems, CockroachDB is famously known to “survive anywhere”. This talk explores a series of Gremlin experiments to help CockroachDB evolve.

Summary

  • Rain Leander is a developer advocate with chaos engineering. He wanted to see how chaos engineering could be applied to his own projects. He decided to try out Gremlin, a tool that gives real time feedback into distributed systems. Leander wanted to replicate this experiment as cheaply and simply as possible.
  • So let's talk about the next steps. I'd like to try those original platforms that I had in mind, Netlify, Bracel, heroku. And then ultimately, ultimately, the next step is to present at Comp 42 chaos Engineering and ask for help.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Jamaica makes up real time feedback into the behavior of your distributed systems and observing changes, exceptions. Errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. You san good afternoon, good evening, good night, wherever you are joining us today, thank you for joining us at Comp 42, Chaos Engineering 2022. My name is Rain Leander. I am a developer advocate with cockroach Labs, and full disclosure, this is a talk about failure, mistakes, lessons learned. And I want to start by recognizing the two major reasons why people come to chaos engineering. One is that, like me, you're curious about this chaos engineering thing, and you'd like to apply it to projects that you already have or projects that you're planning to have. And you think, you know what, I'm going to get my hands dirty. I'm going to start playing with the tools that are out there. And in this experiment, I played with Gremlin. Yes, there might be a part two to this experiment, and you'll understand why soon enough. The other reason why people might come to chaos engineering is because some catastrophic event has happened in production, and they realize that they want to set up sufficient checks and balances to prevent that from ever happening again. I will say, thank goodness I have not had that experience directly. I have worked for companies that have had that experience, and I have not enjoyed being the one on pager duty. But in this case, for this experiment here I was the curious student making a mess, learning from things and whatnot. That's where I'm coming from. And I think that's important to acknowledge first, because if you're watching this and you've experienced a catastrophic event, maybe you want to watch this on one and a half speed or more, and really just start playing with your environment and the different tools available. So let's start out with what happened is that initially I had heard of Chaos engineering. I work at Chaos Cockroach Labs, and I found this medium post, which I should probably share the link, which I will do later, basically about hardening CockroachDb. And I read through it, and it was about taking Gremlin and installing using AWS platform, installing three instances of Ubuntu within a VPC so you can enable private and public networking. It was using CockroachDB, dedicated the open source version, which is important later, and Gremlin. And so I signed up for Gremlin free account. I played with it a little bit. I noticed that a few things, which I'll go into later, but then I decided, okay, so my plan is instead of using AWS net platform, I kind of would like to avoid that platform because my experience with AWS is not that high. And so I wanted to keep it simple, maybe install it on virtual systems on my laptop, maybe use a platform that I'm used to working with like Netlify or Vercell or Hadoku, but basically not AWS platform. Instead of installing the instances of cockroach labs three Ubuntu systems, I wanted to use cockroach labs services, which is a service where you don't have to worry about the actual installation, anything like this. It's a free service you can sign up to. Super easy. I already was skeptical based on playing with Gremlin, but I'll tell you why in a minute. And then I thought, you know what? Yeah, I'm going to use Gremlin because I did go in, I played around, I was like yeah, this could work, this could work. So what actually happened is first of all, cockroachDb serverless, there are lots of benefits to it. It's made so that a developer like myself can spin up an application, point it to the database and not worry about it, not worry about sharding or scalability or replication, nothing. And because of that, it's kind of like how when a Linux developer on Ubuntu, who's used to being able to go into terminal and hack on nodes and whatnot, then gets a Mac or a Windows system from the 90s where that kind of access wasn't really as available. And unfortunately within serverless there's not a way for an administrator to then go into the database and allow access to Gremlin. For example, because Gremlin requires a specific secret key, a team key, certain identifications just like any API, so that it can tell what you are observing. So there was that. So then I switched to cockroachdB, dedicated the open source version that I could download and I downloaded it onto a docker image, onto my makes, which is by the way, what I was using this whole time was my laptop because I wanted to replicate this as cheaply and as simply as possible. So number two was that I was like okay, so I've got the cockroach labs installed in Docker on my system. Can't use serverless, oh well, I've got cockroachdb installed. So then I go to the Gremlin aspect and I'm like okay, so how do I get Gremlin into my system? And it turns out that gremlin doesn't like makes and I didn't notice that when I was setting up the project. So all of a sudden I now had to use Docker again to install Gremlin, which is fine. It worked like a dream, even had a demo. These docs, by the way, are available on cockroachlabs.com and gremlin.com respectively. It was pretty smooth, except that again, the reason why I wasn't using AWS was because I wasn't that comfortable. And now suddenly I was using Docker, which I'm also not very comfortable with. All of this may have been avoided if I had more Docker experience. And if you're watching this right now and you have that Docker experience, maybe it was as simple as combining a yaml file. But for me, I did not know how to take my docker images with cockroach labs, my docker images with gremlins and get them to observe each other communicate in any way. The other aspect of that was that I found that on cockroach labs Docker, one of the nodes was constantly failing. And this may have been my version, because the open source version is a little bit older than the latest version. It could have been any number of reasons. Maybe my laptop just can't hang, it doesn't have enough resources on it, but for whatever reason, one node was constantly failing and it didn't matter if I reinstalled. If I dedicated more resources, it was not there. So I decided to take out the docker aspect completely and I switched to kubernetes, specifically minikube. And that resolved the node issue with cockroach labs. I know, I have no idea, to be clear why it was failing on Docker but not minikube. But again, the docs for mini cube installation of cockroach tv are cockroach labs.com. And then finally, because the talk was coming up, I decided to embrace AWS and see if I could get it to work exactly the way the original writer on medium had gotten it to work. And I went and created three instances on AWS and started to install cockroach. It was an incredibly painful 6 hours of my life, and I decided that instead of bashing my head against the things that I don't know, that I would instead package everything up and bring it here to conf 42 chaos engineering. Now here's what I learned. If the plan doesn't work, change the plan, not the goal. And two losers don't quit. Winners fail until they succeed. And those two things are fall down seven times. Get back up eight. Don't give up so while I haven't given up on this project, I indeed have a deadline. This conference is the 10 march, and as of when I recorded this, this is where I am. I also need to find out why that one node kept failing on Docker and if that is a systemic thing that is maybe resolved in a more recent version of CockroachDB, or if it's something to do with my docker setup. And then I realized I don't know Docker as well as I did. I have spun up docker images before, I have put applications on Docker before, but apparently this was just beyond my knowledge and so it was humbling. So let's talk about the next steps. One, I'd like to try those original platforms that I had in mind, Netlify, Bracel, heroku. I would like to spin up an application, a simple it can be just a leaderboard or a website on one of these platforms that I'm more comfortable with and see if I can get gremlins to recognize it and test it. This would require that Netlify Versaille Hodoku allowed that kind of observation and testing. Netlify and Gremlin do have a brand new relationship, but not enough to have documentation yet. So I have hope there are other chaos engineering tools besides Gremlin. There's chaos kit, there's yespen, there's litmus. They're probably at this conference. I'm going to hear about tons of other tools. It may be that those other tools are the answer. If you have experience with other chaos engineering tools that you think you could help me with, I would love to hear from you. Maybe there's something else about Docker. Maybe there's someone watching this who's like, I am a docker expert and this sounds intriguing, and maybe I do just need to merge a YAMl file. That's a possible next step. And then finally, ultimately, the next step is to present at Comp 42 chaos Engineering and ask for help. I'm rain Leander on most of the socials. My email address is rain cockroach labs.com and I would love to collaborate with you. Let's try these next steps. Or if you know of something, if you work for Gremlin, I would love to hear from you. I hope you have come along with me on this journey. I hope that when you run into issues that you don't give up, especially if you're coming from a place of curiosity, of adventurous learning, rather than from running from a catastrophic event. I hope you don't give up even though your motivation may not be exactly the same as if you were trying to fix something majorly wrong. If you have any questions, I'm rain on Discord, and thank you to comp 42 so much, the organizers and sponsors of this conference. And I will see you online.
...

Rain Leander

Developer Advocate @ Cockroach Labs

Rain Leander's LinkedIn account Rain Leander's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways