Conf42 Site Reliability Engineering 2021 - Online

GitOps: yea or nay?

Video size:

Abstract

GitOps is a paradigm or a set of practices that empowers developers to perform tasks which typically (only) fall under the purview of operations. It’s a way to do Kubernetes cluster management and application delivery by using Git as a single source of truth for declarative infrastructure and applications. Being Git at the center of delivery pipelines, engineers use familiar tools to make pull requests to accelerate and simplify both application deployments and operations tasks to Kubernetes.

GitOps software agents (e.g. ArgoCD, Flux and Jenkins X) can alert on any divergence between Git with what’s running in a cluster, and if there’s a difference, Kubernetes reconcilers automatically update or rollback the cluster depending on the case.

This talk will include a demo of ArgoCD/Flux/Jenkins X on how to configure and use it to accelerate and simplify application deployments.

Summary

  • You can enable your developers for reliability with chaos native. We're going to see Gitops implementation in terms of software agents focused on delivery of applications to Kubernetes agents. We'll see a demo of one of those agents in practice.
  • A Kubernetes cluster suddenly vanished. What is Gitops and how will it help us in these kinds of situations? The idea for Gitops is that the entire system is described, declarative. This means that the configuration is guaranteed by facts instead of instructions.
  • Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration. Argo is a declarative Gitops continuous delivery tool. Jenkins X is the next evolution of Jenkins, built with cloud in mind. These are just three tools that we can use to quickly get up to speed on a Gitops automation workflow.
  • So here we're going to bootstrap flux. So flux is connecting to GitLab, it has already cloned the repository, it's installing components in flux namespace. We can see that it's already trying to apply front end and front end application. Just by pointing flux at a particular cluster, we actually deployed namespaces inside a Kubernetes cluster.
  • A question that usually arises is about secrets. Another question is Gitops versus infrastructure s code. While infrastructure as code is usually used to manage only infrastructure, it doesn't manage the whole cloud native stack Gitops. I hope this tasks was informative for all of you.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE, a developer, a quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your developers for reliability with chaos native. Create your free account at Chaos native. Litmus Cloud hello, welcome to conf 42 site reliability engineering Edition. My name is Ricardo Castro and I'm a senior site reliability engineer at Farfetch. And today we're going to talk a little bit about Gitops. So what we'll be covering today, we're going to cover what Gitops is, some of the things that Gitops can do for us. We're going to see Gitops implementation in terms of software agents focused on delivery of applications to Kubernetes agents. We're going to see a demo of one of those agents in practice and if we have some time we're going to see some access. So let us start with the story. It is a beautiful day. I decided to do a change on our production environment so it's a little bit tricky. So as usual I submit my pull request and my colleague Joe decides to take a look and see if everything is supposed to work okay. He looks in attentively, he looks around, seems fine. So everything's good, right? We click enter some pipeline triggers and all hell broke loose. So we had a Kubernetes cluster that suddenly disappeared. So at this point this is us, right? So we had some production cluster that suddenly vanished. We had applications running there. We have clients that are already complaining whats can't access our services. So we're in the midst of a big confusion. Of course this story has a second part, so thankfully we have everything in terraform. So we just need to relaunch the cluster and we're back to square one, right? So cool, we relaunched the cluster. So now we need to get that part that we need to figure out. So we launched the cluster. So we need somehow to deploy our applications and configure everything in there. But because we have a pipeline for mostly everything, we just need to trigger those pipelines. Long story short, we started uncovering some things that weren't supposed to be like that. So we have a pipeline that was deactivated. We have no idea why. So we need to go into tasks with a particular team to find out why was that. Also we have another pipeline that said it was successful but the application actually doesn't work. So we don't have any idea. So we need to talk to a specific team. Someone suddenly remembers that there were some manual changes that were done there that were needed for the application to actually work. So yeah, long story short, the idea is that everything wasn't fine, so we had infrastructure as code, but there were some pieces here that actually didn't fit our motto. So what is Gitops and how will it help us in these kinds of situations? So the idea for Gitops is whats the entire system is described, declarative. What does this mean? So Kubernetes is just one example of many modern cloud native tools that are declarative and treat everything as code. So declarative means that the configuration is guaranteed by a set of facts instead of a set of instructions. So this means that instead of me saying that launch this server, put this package here, start this service, I just declare state. And because we have everything declared in Git, we have a single source of truth. So our apps can easily be deployed and roll back to and from a Kubernetes cluster. And even more importantly, when we have a disaster like the one that we just described, our cluster infrastructure can also be independently deployed and quickly reproduced. Another advantage of Gitops is that the canonical desired state of the system is version in git. With the declaration of our system stored in a version control system and serving as our canonical source of truth, we have a single source from which everything is derived and driven. This trivializes rollbacks. We can use Git revert to go back to a previous application state, and because of Git's excellent security guarantees, we can also use SSH Git to sign commits and enforce strong security guarantees about the authorship and provenance of our code. Also, approved changes can be automatically applied to the system. So once we have a state declared and kept in git, the next step is to allow those changes to be automatically applied to the system. What's significant about this is that we don't need cluster credentials to make a change with Gitops, we can have specific agents that looks and interprets a desired state and knows what to do in that particular system. And those are those software agents. Whats will ensure the correctness and alert on divergence. Once the state of our system is kept under version control, those software agents can inform us whenever the reality that is described doesn't match the operations. Those agents can alert us on something like slack and then go the extra mile of actually reconciling the state. So on a day to day basis, what can Gitops do for us so it can increase productivity. So continuous deployments, automation with an integrated feedback control loop, speed ups time to deployment. This means that our teams can ship a lot lot faster and more and more changes and increases our overall output for several times. Also, we have an enhanced developer experience, so we push code, not containers. Developers can use familiar tools like git to manage updates and features to Kubernetes clusters more rapidly and without having to know the internals of kubernetes, so newly onboarded developers can get up to speed and be productive within days instead of months. We also gain improved stability when we use git workflows to manage our clusters, we automatically gain a convenient way to have audit logs of all the changes that were applied to a Kubernetes cluster. An audit trail of who did what and when to a cluster can be used to meet compliance requirements. We also gain higher reliability. So with git capability to revert or roll back or even fork, we gain stable and reproducible rollbacks. Because our entire system is versioned in git, we have a single source of truth from which to recover after a meltdown, and that reduces our meantime to recover. We also gain consistency and standardization. This means that because Gitops provides one model for making infrastructure changes, apps and Kubernetes add ons, we have a consistent end to end workflow for the entire organization. Not only are our continuous integration and continuous deployments pipelines all driven by pull requests, but our operation tasks also are fully reproducible through Git. We also gain strong security guarantees, so git strong correctness and security guarantees backed by the strong cryptography used to track and manage changes, as well as the ability to sign those changes to prove authorship and the origin is key to a secure definition of the desired state of the cluster. And of course, with all of this we gain easier compliance and auditing. Since changes are tracked and logged in a secure manner, compliance and auditing are made trivial. We can use tools like Kubediv, teradif or ansible diff to alert that something isn't actually the state that is described in git. And then agents can go the extra step and actually apply those changes. So now that we are going to focus a little bit on the delivery of software agents to Kubernetes clusters, what do those agents need to have to actually work? So they need to be declarative, so they need to actually find a git repository and understand some syntax that is there that describes state. They also need to have to operate automated, so whenever they see a change on a Kubernetes cluster, they can alert on the change. But we need to actually have a way to, if we want to make them apply those changes automatically, they also need to be auditable so we need to have a way to actually track what changes were applied to the Kubernetes and then trace that back to the changes that were in git. For our particular use case. We needed to be designed for Kubernetes because that's where we are going to focus on. Also, it needs to have out of the box integrations. We don't want to be reinventing the wheel every time we want to do something, but of course we all have specific use cases on our infrastructure, so those tools need to provide a way for us to actually add our custom things particular to our infrastructure. So one of the first tools for Gitops continuous deployments tools is flux. So Flux was developed by the company that coined the term Gitops, which is webworks. And Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration, like git for example, and automatically updates configurations when there is a new code to deploy. So basically Flux will be looking at one or more git repositories. It whats a particular syntax that it knows how to interpret, and then it will know how to interact with the Kubernetes API to make deployments. Flux knows how to interact with both customize and helm, so we can use our usual workflows to actually manage that kind of workflows. Another tool that is gaining a lot of traction recently is Argo, and Argo is a declarative Gitops continuous delivery tool for Kubernetes. So on a very high level it works similarly to the way that Flux works. So it knows how to look to one or more git repositories and then knows how to interact with the Kubernetes cluster to actually make those changes in a git repository a reality. It also has support for both helm and customize, and it has a lot of other features that Flux doesn't have, things like it has a nice UI and has a lot of more integrations that it comes out of the box with. And last but not least, we have Jenkins. So Jenkins has had a bad rep for several years, and Jenkins X is the next evolution of Jenkins, which was built with cloud in mind and with a Gitops approach. So it's essentially a pipelines automation built with GitHub, built on top of Gitops and previews environments to help teams collaborate and accelerate their software delivery at any scale. So again, on a very high level, it works very similarly to the other two tools that we've just seen. So it knows how to look to a git Repository, knows how to interpret those changes and then knows how to apply those changes to a Kubernetes cluster. So on a very high level, these are just three tools that we can use to quickly get up to speed on a Gitops automation workflow to deliver applications to Kubernetes cluster. So how would Gitops pipeline work? So imagine we have a sample application, we do some changes, we do a pr, someone reviews that code and merges the request. Some pipelines will actually see that change will produce container image and will put that on a container registry. Then some tool that could be running inside the Kubernetes cluster or outside will automatically do a commit into a git repository with whats particular can saying that there is a new version to be deployed and then some agent that will run inside a Kubernetes cluster will pick up that change eventually and know okay, so now I have a new application to deploy. I know what to do. The version is x. So I'm going to deploy this application automatically. So let's see some of this in practice with a demo. So we're going to be using a git repository that has a sample application. We're going to do a quick overview of what's inside, what this tool will be using. So we're going to use flux to deploy an application called pod info. So it is an application that is widely available on the Internet. That application has customized files to actually deploy applications and we're going to make use of that. So inside flux there's one component called the customized controller that we are going to use that is going to be used to actually look at these customizations and know what to do with them. So let's start by creating a cluster that is going to be used to deploy our service. So this just creates a Kubernetes cluster that's going to be run locally and is using K actually deploy that application to be the deployment of that cluster. So it's just going to take a few seconds. So should be running. Now we're just going to get the kubeconfig. So if we do Kubectl get nodes, we should have a Kubernetes cluster running. Okay, so we're almost there. In a few minutes we should have everything up and running. So next we're going to export a few variables just to have access to our cluster. I have done that already. So we're going to start and see all of this in action. So this first command, what it will do is that it will deploy the flux agent to the cluster. It will tasks those variables so that the flux agent can actually find the code and know what to do. So we're telling Flux to look at the master branch to a folder called staging cluster and that's it. So before we deploy that, let's see what exists inside this staging cluster. So inside this folder we have a few files. This folder here is just something that Fluxus uses to store some state if it requires. So let's start by seeing what this web app source is. So inside web app source we are declaring something that is called a git repository. And we are pointing to a git repository in GitHub, which is the application that we've just seen that we are going to deploy. So we're just telling flux that there is a git repository, that it is at this location. Next we have something called web app common. So it makes sense that we look at this first. So here we're declaring that we have a customization so that flux can look at it. And we're saying that it will use that git repository. And inside that git repository it should look at deploy web app common. So let's go at that repository and see what we have. Deploy web app common. So if we come here we start to see that we have normal Kubernetes files. So here we have the declaration of a namespace, here we have a declaration of a service account and so forth. If we go back and look at the other files, we see that we have something here called back end. So inside the back end we once again have a customization. We're again pointing at this repository, but now we're looking at a different folder. One curious thing to look here at here is that we can specify a depends on. So we're saying that we just want to deploy this customization once web app common actually has already finished doing it. So if we go here inside web app, we go into backend, and here we see that we have regular kubernetes manifests. So if we look at here we have a deployment, it's going to deploy a specific application. We have here the container image that we want and a bunch of other applications. And as expected, we have here a web app front end and that web app front end, exactly the same thing. Looking at the specific git repository, it is looking at the front end folder. And at that front end folder we have something called depends on. And this one depends on the back end. Again, just very quickly we look at front end and again same thing, a deployment, horizontal polytoscaler and a service so let's see all of this in practice. So here we're going to bootstrap flux. Here we go. So flux is connecting to GitLab, it has already cloned the repository, it's installing components in flux namespace. So because we didn't say what components, whats we want, it's going to basically install all of those. So it's going to install the source controller which is a component that actually pulls git repository and see if there are changes. And then we have other components like the helm controller, the customized controller and the notification controller. The only one that is being to be used is the customized controller because it's the one that understands what is inside that brigitte repository. So we're going to watch for customizations, we're going to watch for the logs for that particular component and we're going to see applications just showing up. So here we can see that at this point we don't have anything yet, but Flux has already recognized that it needs to deploy something called web app common, web app backend and webex frontend. And as we see it has already been done with the web app common. If we remember the web app common was creating a namespace, was creating a service account. Next it passes to the next component which is web app back end. We can see here that the reconciliation is in process and here we can see the log that it actually is applying. Eventually it finished, the back end is actually deployed and then it will start in a few seconds. The front end again, the same thing being the same thing doing the reconciliation. We can see that it's already trying to apply front end and front end application. It's almost up, should take just a few seconds to be up and it is already up. So if we do here a port forward and we go to our browser, we can see that we actually have our applications, we see that it is working, it has a metrics endpoint, we can see that it exports metrics for Prometheus. So just by pointing flux at a particular cluster, we actually deployed namespaces inside a Kubernetes cluster. We deployments several applications with dependencies among them. So it's quite easy to, if we have a scenario of disaster recovery, just spinning up kubernetes cluster with the configurations that we want, pointing to a specific git repository, and it will do everything for us. So now that we have our demo concluded, we have a few extras here to talk about. So a question that usually arises is about secrets. So people usually come. Okay, so that's all. Well and good, I have everything in git, but now I actually need secrets. So if we are not fetching secrets directly from the application from somewhere, we need somehow to inject those secrets. So there are several projects that actually deal with it. One is filled secrets and the other one is SOP. So they work on a high level on a very similar way. So we encrypt those secrets and put them alongside our code. They are encrypted. And then an agent that lives inside the cluster, once it receives that encrypted secret, knows how to decrypt that and knows what application it actually needs to deliver that secret to. Another option is the vault agent sidecar injector. So it's a vault agent that could run alongside our applications that knows how to fetch secrets from vault and actually deploy them to specific applications. So there are several options to actually deal with this. Another question that usually arises is Gitops versus infrastructure s code. So the idea of one of the main differences between infrastructure as code is Gitops is that the use of immutable containers, whats deployable artifacts that can be converged on by a suitable orchestration tool. For example kubernetes, as we've seen in the example, so all the desired state is kept under source control. This isn't always the case with some infrastructure code tools. So some infrastructure Xcode implementations vary and sometimes the source of truth is split between a git repository, a database, and sometimes spread between weekly linked union of multiple git reptiles. While infrastructure as code is usually used to manage only infrastructure, it doesn't manage the whole cloud native stack Gitops. Here we've just seen the use case of deploying an application, but we can go a step further and use the principle, the Gitops principle to actually do this for all the stack. And another question that usually arises is between push and pull. So we usually push changes, for example to a Kubernetes cluster. So say deploy this application. But with a Gitops approach we are actually on a tools based approach. We do some change to a kubernetes, to a git repository that then an agent pulls that change down and applies. That has some advantages in terms of security because now we don't have to actually open the cluster to an outside tool or an outside person to do a deployment. And that comes with those benefits in terms of security. And that's all for my part. I hope this tasks was informative for all of you. It was a gentle introduction to what Gitops is. It also showed a few tools and showed one actually in practice doing deployment. You can find me at those addresses. And don't hesitate to being me if you want to discuss this topic further and get into more detail of how these tools work. So thank you very much and thank you for being here and assisting to my talk.
...

Ricardo Castro

Senior SRE @ FARFETCH

Ricardo Castro's LinkedIn account Ricardo Castro's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways