Conf42 Cloud Native 2021 - Online

Building the Stonehenge using Gall’s law

Video size:

Abstract

Gall’s Law states that all complex systems that work, evolved from simpler systems that also worked. Let me show you two examples of how to start small and simple and grow in complexity whilst improving your software application. Based on the background of two different companies, two different stacks and team formation.

Sharing how we identified the hints for improvement, the techniques to support it, and the tools that aided through this path. Both cases going from tiny experiments to millions of transactions processed per hour.

Summary

  • Fabricio Buzeto will talk about building the Stonehenge using Gauss law. He's been working with startups since 2011 and this has been his love since then. What he'll share today is a bit of his past experience and what he's learning about building this type of products.
  • Galls law states that a complex system that works is invariably found to have evolved from a simple system that also worked. Galls law also states that this simple system may or may not work. Knowing what works or not is key when you're selecting that simple system.
  • A software that works is a software that fulfills its purpose. Galls law states that a complex system designed from scratch never works. That's why we should start over with a working simple system. Monoliths are simple to develop, simple to test and simple to deploy.
  • The Stonehenge is a distributed systems strategy just like the others. It's a simple, self sufficient, context focused, service enabled application. What I'm trying to avoid here is not complexity as per se, but that code. The simplest solution is usually the best.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm going to talk today about building the Stonehenge using Gauss law. My name is Fabricio Buzeto and I've been coding since 2002. I worked at my fair share there in big companies and small as well. Also, I did my time in academia and had my phd. That's why you're going to see a lot of references in my slides. I've been working with startups since 2011 and this has been my love since then. And what I'm going to share today is a bit of my past experience and what I've been currently learning about building this type of products in this kind of environment of startups. So let's start talking about what is galls law. Galls law states that a complex system that works is invariably found to have evolved from a simple system that also worked. And a key word here is the word evolve. How do we perceive a complex system evolving? Usually when we think about this, we think about adding features, starting with something that works, and just keep adding features and features that will remain with this working. But usually a complex system evolves and starts from a very initial state that usually don't work or we don't know that it works. So that's where comes the second and important part of the Galls law. It's what is this simple system? When I think about galls law, the first and better example that I can find is game of life from John Colby. So if you are not familiar with this classic example in computer science, you have just three simple rules. And with three simple rules, you can create very complex behaviors. When you look the results and play around with John Coy's game of life, you learn that most of the initial states that you have are not stable. They die out and you have nothing left. But if you work well and you try it out, you can find very complex and beautiful results like this. Galls law also states that this simple system may or may not work. And that's something that we should be familiar with, because knowing what works or not, it's key when you're selecting that simple system. So knowing what works means stacks with thinking about it. And when we talk about software, a software that works is a software that fulfills its purpose. And the purpose of a software is many. A software can exist to improve sharing, or to prove something, or to help explore some context, some problem. A software can exist to help politics within an organization, somebody that's trying CTO, a promotion, or trying to prove that something that they believe is true, it's happening but usually most software, they are focused on business, helping the business advance or go forward. So I'm going to stick with that. Also, every software has a client. This client can be the team that is using that software to help them with some task. Can be the company itself can be a sponsor that's paying for that software no matter what, but usually is a user, somebody that's really interested of having that software solving their problem. Galls law also states that a complex system designed from scratch never works and cannot be patched up to make it work if you do. So if we start a complex system from scratch, as a complex design, you have to start over with a working simple system. That's why we should start simple. And I'll start with some examples of myself. I'll start with coconut. Coconut was a startup that I worked back in 2012 and we did social tv analytics. Basically what we did was CTO, watch for a Twitter stream about tv shows and generate intelligence based on that data. And when I got into the company, our infrastructure was very simple. We plugged it into Twitter API, processed the data using django, and fed to a MySQL database. It worked well. We just had a few shows, but we stumbled on our first barrier, which was the limits of the Twitter API. So we had to switch to a Twitter API roles, which was data sift. So it allowed us to address the whole Twitter comments about a tv show. And since we had just a few shows, it also started small. So our average twizz per day was just ten k and our peak was five k. When we had these peaks, our application would just freeze. And that was not nice. I knew that our infrastructure was not the best, it was the simplest we can do with our purpose, but was not also available for us to just start rewriting everything from scratch. So what we did was, let's just try to scale this. And we did. We scaled first our Django application, and this helped us CTO increase our volume. We reached five k tweets per day. Then our database started, got into the way, but we managed to also increase, doubling our capacity until we knew that we reached our limit. It was impossible for us to increase any longer without just having a very complex and difficult to maintain database and code base. So what we did was let's go for a final infrastructure. So we studied, we choose a part storm back then, and we did some experiments and we started small. We migrated one simple metric from our Django application to apart storm. And over six months we migrated one metric at a time to this new application while the other application was maintained and also works until the end. We managed to migrate everything and we reached more than 1 million tweets per day. And I remember our peak was 50k tweets per minute, even though we never had any issues with load anymore. So this got me thinking that this kind of approach where I tried to stretch the application and the architecture as much as I could until it hurt, it was the best approach for this kind of situation. And this is how I started when I got into Bxblue. So I was since the beginning in the company, our first mvp was just an unbalanced page pointing to a Google Docs. Of course it won't last for much. When we started having too much load for us to handle by ourselves, I started building a rails application that helped us handle our requests from the clients. This rail application eventually replaced our Google spreadsheets and replaced it as an RP CTO handle our client's paper line. This infrastructure grew and eventually had to have our own database to handle our clients requests and we chose MongodB. So on each step, architecture changed, but not very much, just a little bit at a time. And the purpose of each step was to answer a question. So first was to answer if the main purpose of the company could be fulfilled, then if we could sell anything, if we could do it faster, and finally if could do it properly so we could have a bigger team handling these requests. What this have in common these two applications is that both of them are monoliths. Why monoliths? The main reason, because they are simple. Monoliths are simple to develop, simple to test, simple to deploy and simple to use. They can be simple to scale as well. But usually when you think about monoliths, you think about the drawbacks. And the main drawbacks of a monolith is that they are hard to scale, hard to scale their tests, how to scale the team to avoid that many people are working on the same thing. How CTO scale the deploy you can have a faster deploy. It's hard when you have a very large code base, how to scale a stack so you can have new technologies living with legacy ones, and how CTO scale changes when you have lots of changes happening at the same time. So let's talk about how Bxblue handled this type of scale. So our galls application was very simple, but it also relied on a lot of external services. We had more than 15 of them plugged on our application, helping us do our job over time. What we did was just go async so asynchronous communication was handled using sidekick to handle our jobs and our job queues. So this managed to have us scale at a very fast pace. And what helped us was not this architecture per se, but also how we did our development. And the first thing that we did was we always automated our tools, because you don't have to trust yourself. You have to trust your code, and you have to trust that on every change that you make, things are still working. So you have to automate your tests, automate your code quality, automate your deploy, and automate your monitoring tools, your tests. So every time you deploy something, you know that your code is still working. Automate your code quality checks. So every time somebody is code reviewing, they are not checking the same and same things again, so they can miss something out. Automating your deploys so people don't have to think about all their checklists over and over again. And automate your monitoring so you know that if something goes wrong, you'll be notified when you're done automating. What you should worry about on your test, you should worry about your unit tests, you should worry about things that are in common with your tests so you don't have to rewrite them every time. And you can better maintain integration tests, not only internal, but external as well. So you know that when some external tools change, you know if it broke something. And about speed, if your tests take too much time to run, you avoid running them. Your code quality, so you can control coverage. And you know that you have a blind spot, your linter, your code quality, so you know that your team is not checking something that a machine can help them do. And your security. So you know that if you introduce something bad in your code, in your deployment, you have a very good CI CD pipeline, stable. You have a source control, and you have a cloud pipeline that controls galls, your servers, and finally, monitoring, monitoring your errors, your servers, your logs, and your user journey. In our case, we have a rails application, so we use most of the rails ecosystem for that. But we're a monolith. And unlike Kokanao, our problem was not scaling the application to handle users, but sharing our application to handle its context. As our monolith grew, we started adding more and more context to our monolith, and this started to slow us down in our development. So that's where we complicated hints. So before I started talking about what we did, let me talk about the options that we have if we wanted to avoid the monolith. So basically, what we have here is a distributed system, and a distributed system mainly is something that run on multiple servers. And these applications, they manage some kind of data. We have many architectures available besides the monolith. We have the microservices that they are simple, self contained, they are loosely coupled, they are single focused, and they are services, they are connected by themselves. We have the proposal of the citadel by GhH. That's a large self contained monolith that's supported by small, single focused, problem specific services that handle what the monolith cannot handle by itself. We have also the microservices that are kind of more hungry microservices that people from Uber are starting to experiments. They are simple, they are self contained, they are context focused as well. But they are multipurpose services that try to engulf more context in their services than just a microservices does. And what did the explode? Okay, so we had our architectures as I presented before, just out of simplicity, I'll consider my application as this small box with sidekick and MongoDB, this square. What we did when we had to start and add that new context, we decided we're going to create a new application. So instead of building this new context in the same monolith that we had before, we are going to extract it and build its own application to handle it. And it did well. The development time was great. Integration with the legacy monolith was easy, and we started adding more services and more external applications to it. Then we did it again. A new context appeared. We built a new application to that context, more services, aggregate, CTO iT. And is it so well that now we have more than six applications in our part? And that's what we call the Stonehenge. So the Stonehenge is a distributed systems strategy just like the others. We call it as one step further into the microservices. So it's a simple, self sufficient, context focused, service enabled application. The difference here is that we don't think about services only, we are thinking about applications. So they are self sufficient because they work by themselves. They don't need the other applications to do their job. They are context focused, which means they can handle that folks very, very well. And they are self enabled, which means that other applications can integrate with them. So they can scale, but they don't need them, they can work them by themselves and by the law of conservation of complexity. Complexity has to go somewhere. So it doesn't matter which of these architecture I choose. Every application has an inherited amount of complexity that cannot be removed or hidden. So we chose the Stonehenge because was the better way for us. But complexity is still there, mingled in the applications, the way the whole park is connected. Just like microservices or microservices. What I'm trying to avoid here is not complexity as per se, but that code, when you see the statistics, something around five and 30% of what we code, they are not used. And when you think about startups, 70% of startups will fail. So the code that they build will never be used again. And what I'm trying to do is to make sure that the code that I'm building, they are the best ones for that purpose. And they are trying to help the companies where they live to move forward and they are used. So Samiya, if you have CTO take something from this conversation, is that God's law works. A simple system may or may not work, but a complex system designed from scratch never works and cannot be patched up to make it work. So a complex system that works is invariable, found to have a vote from a simple system that works. And if you are into a complex problem, start small, start simple. That's the way to go. And don't trust yourself, trust the machine. Automate your tools, your tests, your code quality, your deploy, your monitoring. This is the best way to ensure that every time you change, things are stable and things are going to break. And if you know when they are broken, you can fix it. So build a very good automated tool set to help you do that. And lastly, why not try the Stonehenge? The simplest solution is usually the best. So in the end everything is just distributed systems. Decision making is hard and things will change. So this is another option in your tool set. So thanks and I'm happy to hear your opinions and what you thought about what I'm showing here. To have any comments in the video on Twitter, just reach me out.
...

Fabricio Buzeto

CTO @ bxblue

Fabricio Buzeto's LinkedIn account Fabricio Buzeto's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways