Conf42 Cloud Native 2021 - Online

Embracing change: Policy-as-code for Kubernetes with OPA and Gatekeeper

Video size:

Abstract

Sometimes, RBAC is not enough: we need ways to define and enforce fine-grained policies for our clusters.

Gatekeeper and OPA make it easy to adopt policy-as-code practices in Kubernetes. You’ll learn how to adopt these techniques and how to integrate Gatekeeper with your existing tools.

Kubernetes provides a native Role based access control (RBAC) authorization scheme, allowing cluster operators to define rules to define which operations users or services can do against a particular Kubernetes object. As more enterprises migrate to cloud native environments like Kubernetes, RBAC alone presents limitations. The need for more scalable ways to define and enforce fine-grained policies increases: how can I limit the number of replicas of a pod for certain users? how can I ensure that all images come from trusted registries?

In this talk we will demo Gatekeeper for Kubernetes environments. You’ll learn how to adopt policy-as-code techniques and how you can integrate Gatekeeper with your existing tools.

Summary

  • Arapulid will introduce using a project called Gatekeeper and how you can use it to embrace policy as code in kubernetes. In case you want to reach out after the conference, feel free to do so.
  • OPA is a cloud native project, it's completely open source. The idea of OPA is that you are going to decouple the policy decision making from policy enforcement. Not only can you create policies for your kubernetes cluster, but also for all of your cloud native resources.
  • gatekeeper comes with out of the box observability. It comes with a lot of metrics like constraint templates and constraints. We also at Datadog has an out of these box integration. So you will get all your metrics back into datadog.
  • In this demo, we are going to use some of these templates for the gatekeeper library. The reason why I'm doing this is because I wanted to show you again how easy is to reuse these things that are already coming out of the box. Let's see if everything is running here.
  • Gatekeeper uses two things to answer policy inquiry. It uses these policy in Rigo, but it also have the ability to have some data about that. So how am I able to in gatekeeper to add new data that is going to be used for that decision making? Let's go to the examples.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to this 42 cloud native conference talk. My name is Arapulid, I'm a technical evangelist at Datadog, and today I'll be introducing using a project called Gatekeeper and how you can use it to embrace policy as code in kubernetes. So that's my twitter handler. In case you want to reach out after the conference, feel free to do so. Let's get started by introducing kubernetes. Obviously this is a cloud native conference, so many of you will know, just in case. Kubernetes is a container orchestration platform that helps you run containerized applications in production. And it comes with a lot of nice features like it's completely API driven, auto healing, auto scaling, et cetera, et cetera. And datadog. Obviously this is not a talk about data itself, but just so you know, it's a monitoring and analytics platform that helps companies improve observability of their infrastructure and applications. But today we are going to be talking about sometimes a little bit of a youll topic, which is policy. But what is policy when we are talking about services services. So basically policies are the rules that governs the behavior for services service. So basically what you can and cannot do with a software service. So in the case of kubernetes, what you can and cannot do in your cluster. So that sounds a lot when we are talking about kubernetes and what you can and cannot do, it sounds a lot like RBAC. So RBAC stands for role based access control, and that's basically what RBAC is already doing. So it helps you define roles of what a user or a service account can and cannot do in Kubernetes cluster. So usually you have rules in the form of a subject, a Kubernetes API resource, and a verb. So you can do things like for example my user ara, for the type of resources pods she can create, get and watch those resources in a particular namespace, we can say, so if we already have this, why we need something else, like gatekeeper. And the reason why we need something else is because auth, which is what RBAC tries to solve, is just a subset of the type of policy rules that you want to create for your environment. To put a couple of examples of things that you may want to enforce in your cluster, things like only run images coming from a particular registry or having a set of labels that are mandatory for all your deployments and your pods. These type of things are things that you may want to define somehow, and our back doesn't allow you to do so. So when we have our environments, our cloud native environments. In this case we are using Kubernetes as an orchestrator probably. We also have so many other things in our stack. We may have cloud resources, we may have API gateways, we may have service meshes, stateful applications, et cetera. So not only you want to create policies for your kubernetes cluster, but also for all the rest of your cloud native resources. So is there a way that we can do that in a very common way? And that is exactly what OPA, or open policies agent tries to solve. So basically OPA is a cloud native project, it's completely open source. And the idea of OPA is that you are going to decouple the policy decision making from policy enforcement. How does that work? So the idea is that OPA is only going to get a policy query in JSON format. So very standard, very domain agnostic, and based on some rules that you have coded in a specific domain language called Rigo and some data that you may or may have stored as well, it's going to compute basically a policy decision in JSON format. And that's the only thing that OPA is going to do. And that's how it tries to completely decouple that from any particular enforcement for a service. So now you have a decision, but how do youll enforce that decision into your service? So the way you do that with OPA is by using any of the integration it has. So if you go to the OPA website, these is a full list of all the integration. This is just a screenshot. So there are many more. And because it's JSON, it's very easy to create new integrations. There are more and more coming all the time. And this is exactly what gatekeeper is. Gatekeeper is one of these operations that enforce OPA decisions for Kubernetes. So the way gatekeeper is built is basically embeds OPA inside gatekeeper itself as a binary. And the way it enforce that is by using something called admission controller. So when you create an API request against the Kubernetes API server, it goes through several steps, it goes through authentication. So is this authenticated or not? Then it goes through authorization, usually in the form of RBAC rules. And then it goes to a third step called admission controllers. And admission controllers are a set of binaries that are already embedded in the Kubernetes API server that basically compute the request and decide whether two things, whether that's valid request and also whether you want to mutate that request. And there are two specific ones, particular ones that are very helpful, which are validating admission webhook and mutating admission webhook. On those two, basically you can hook any code through webhooks, any come to act as an admission controller. And this is exactly what Gatekeeper is doing. So it hooks through the validating admission webhook for now. They are also working on doing so as well with teaching. One, two, once you have the policy decision from OPA, it basically enforces it at that point. So if these request that you're doing is against one of the rules that you have encoded using Rego, then it's going to block that API request at that .1 of the great things also about Gatekeeper is that it was created having kubernetes in mind. So it's fully Kubernetes native. And by kubernetes native I mean that everything is created through new CRD objects. So custom resolve definitions and custom resolve definition is a design pattern that is used all over Kubernetes to extend the Kubernetes API. So you can create new objects in the Kubernetes API and then have a controller that does the classic reconciliation loop in Kubernetes that we all love. So you're able to create your policy by creating new Kubernetes objects and the gatekeeper controller is going to do the actions required for that to happen. There are many crds, well, I think there are three, four crds that are created, but the main two ones are these two which are constraint template and constraint. And constraint template is where youll define your policy. But the good thing about constraint template is that you can create parameters for those. So you can create reusable policy by just creating a constraint template and then instantiate that policies into as many constraints as you want. And we are going to see how that's done in the demo. But just to put an example. So let's imagine that we want to have a rule to ask for a required set of labels in our objects. So you have a name, required labels, you have the parameters here, like the labels that you're going to require for a particular object and then a set of rigor. And we are not going to go into much detail on the rigor six syntax in this talk, because the goal of this talk is for you to see the opportunity for gatekeeper to be used straight away, even if you don't know Rigo. To start with, once we have that template, we can instantiate that template into as many rules as we want. In this case, we have all namespaces require the gatekeeper label. But we also have another rule that says all ports in the default namespace require the do not delete label. So as you can see, just by creating once the rigor code, you can reuse that policy many many times. So that makes gatekeeper reuse of policies super simple. And the good thing is that usually when youll want to start creating policy in your kubernetes cluster, you may probably want to create the same set of rules that many other people are going to create as well. Things like images can only come from approved registries. That's a classic one. Deployments require a set of labels. Container images require a digest. All the containers that you have defined have to have a cpu and memory limits set. These are very common things that you want to do in kubernetes. Obviously the values is the thing that are going to be different from company to another, but the generic rules are very similar. So for that reason there is an open source project, part of the OPA organization called the Gatekeeper Library. So the community is creating all these reusable policies that you can use out of the box, even if you don't understand Rigo. So you can start getting value out of gatekeeper very very easily. So there are many constraint templates and more that are coming every day. To put an example of what you would encounter in that repo. This is an example of one of the templates. It's about having only HTTPs increases and not HTTP. So you have that template and you have these rigo code. You can try to understand rigo code. You can use this to start learning Rigo. But even if you don't, it's super simple to use because it has a description, it has a name, it has parameters or not. And then on that same repo you're going to get not only the template, you're going to get an example or several examples, and the examples come in the form of an instantiation of that template. So a constraint, and also based on that constraint, an object in this case can ingress object that is going to fail that validation. So very easily you can see all these examples on that Ripper and understand first what the template is going to use and what type of objects may fail that one. So this is the GitHub repo for the gatekeeper library. It's part of these, as I said, part of the opA.org CnCF project, completely open source, so easy to use. And another good thing about gatekeeper that I like a lot is that it comes with out of the box observability. So it comes with a lot of metrics like constraint templates and constraints, number, the number of requests and the latency of those requests, the number of violations, et cetera. We also at Datadog has an out of these box integration. So if you're using Datadoc as well, just without having to do much, you will get this out of the box dashboard to start with. So you will get all your metrics back into Datadog and we will see that as well during the demo. Good, so let's start by, with the demo. So this is, by the way, I have an alias to k, it's kubectl. So every time I type k, that is what it is. This is a one single node. It's very simple, it's mini cube. It's good for these demo. And I'm running Datadog already here. So I'm already running Datadog, I'm sending this data to Datadog and I'm not running anything else. Obviously I have some pod in Kube system as well. So this is my pod. We can have a look to that on Datadog as well. As I said, I'm already running it. So I have here my deployments, my replicas sets, et cetera, et cetera. So let's deploy gatekeeper. That's the first thing that we need to do. And we are going to deploy gatekeeper just using the default gatekeeper yaml that comes in the getting started. So I'm just doing everything here by default. And you see a lot of stuff has been created. We can see that some crds were created, four in this case. We also had a new namespace gitkeeper system that has basically two things, the controller in this case with three replicas, obviously, because this is going to be used to validate your policy. It's always to have more than one replica, but you can define how many. And you also have an audit pod that we are going to explain later what it's for. So once it's running. So if we now exec onto the data doc, let's see if this works. You can see that the agent, Datadog agent has found straight away the gatekeeper pods. So it's going to start sending all these metrics directly without having to do anything else. So let's see if everything is running here. Cool. So everything is now running and we are going to be sending those metrics as well. Okay. Now that we have all running, I'm going to use some of these templates for the gatekeeper library. The reason why I'm doing this is because I wanted to show you again how easy is to reuse these things that are already coming out of the box. So I'm going to use this. This is part, as you can see, this is part of the Gatekeeper library. I just cloud this from these GitHub repo so nothing changed there. So I have a lot to get here from. So I'm going to be using this, the required labels, and you can see that there is a name for a new object, there is some properties. So to parameterize that and some rigor code that is already tested and validated for me. So the first thing I'm going to do is I'm going to create that, so apply that template so I can have the new CRD and I have that template already available for me. So let's find this library general required labels and then template. So I just have to apply that and it's going to create this new constraint template object. But it's not only going to do that. So if I now do get me the crds, it has created a new object type, a new kind of object, which is the required labels that now I can use to instantiate as many times as I need just by using the same Kubernetes native format. The good thing also about this being crds is that you can store this with the rest of these configuration that you have for your cluster using githubs, et cetera. So let's do that. So as I said, every of these templates come with examples. So let's check the example that we have these. It's a constraint basically that says all namespaces require an owner key, an owner label with a set of accepted values. But instead of using this just I'm going to copy this one and change it a little bit so you can see how is this to reuse these thing. So going back to the terminal, I'm going to copy that one. Library general required labels, samples almost have owner I'm going to copy this, I'm going to change to conf 42, for example. So now that I have that I can edit it and I'm going to change this to conf fourty two and instead of name of spaces, I'm going to do okay, this is for pods. I'm going to change the error that I'm going to get if I don't find this. And instead of asking for a value, I'm just going to say, okay, I just need the key. So I'm going to change the name of the key to conf fourty two and I'm going to remove this. Okay. So next thing that I have to do is to apply that, okay, that has been created already. And basically this rule is saying to a cluster, all of our pods require these.com 42 key, but we already have some pods running, we already have the data pods, the kube system pods, the gatekeeper pods, and all of those didn't have that label. So what happens in this case? So this is where the audit pod, remember that one of the pods that were created for gatekeeper was the audit one. So basically that pod is going to check for violations that are happening when you create new rules. But instead of removing those spots, it's just going to get you a description so you can fix that afterwards. That makes it very easy to have those coming to create new rules without having to alter your cluster right away. So how do we know those constraints are happening? So if I say describe the type of hobbies and the o mass half conf 42, something is wrong here. So let me see, probably I made a mistake. Yes. So this has to be singular, which is important. If not, it doesn't work. So it's pod and not pots. So let's do that again. Let's apply the constraint it has been configured. And now if we do this, hopefully now we will get the violations after the audit has synced a little bit. Let's just wait. There's, okay, here we are. So we have now all the violations, all the pods that are violating that rule. So all the pods that we're already running. So if we go now to data doc, as we said, we are sending all that data back to data doc, so we can see all the latency of the webhook request, the number of requests, and also we can see the number of limitations here. So we had those 14 pods that are violating. So this is also a very nice way to see if you're the owner in your cluster of enforcing policy when you are creating new rules. Straight away check how many violations you have and start reducing those by modifying the objects that you already have running in your cluster. Good. But that's for existing objects. So what happens with new ones? Let's imagine that now we have this rule in place that says all pods must have the 42 label. So if we see this new object, we are going to create a pod for NgINX. Very simple. And we are not going to add the label that is required by my organization. But I don't know. So I'm going to just try to create that object and it's going to fail. So it's going to give me a failure coming from gatekeeper explaining to me why my pod wasn't being able to be created. So if I now modify this and I see conf 42 and I can put any value because we are not requiring a value. So let's just put April and now I try again. Now it allows me to create. So you can see that I already was able to create policy for my pods by reusing some of the examples that I got on the gatekeeper library. Make things very simple. Okay, let's go back quickly to the slides and then we will do a second demo. So we've seen in this case was simple because we said all pods must have this particular label. So you only needed the information that you had on the object you were planning to create to be able to answer the operations. But let's imagine that we have this other type of rule that we want to enforce in our cluster. All hosts must be unique among all ingresses. So if you have many ingresses, you have to make sure that the host name are unique for each of them. So for that, only having the information of the object that you're planning to create is not enough. You also need the information about all the rest of the ingress's object that you have in your cluster in order to be able to answer that question. We were talking about OPA. It uses two things to answer policy inquiry. It uses these policy in Rigo, but it also have the ability to have some data about that. So how am I able to in gatekeeper to add new data that is going to be used for that decision making? So it's super simple because again, it's Kubernetes native. Basically what I need to do is to create a config object, which is a new one, and then let gatekeeper know all the type of objects that I want to store as data to make those decisions. In this case, ingress objects. So let's do a quick demo about that as well. So we are going to use again, an example. So let's go here. And we have this one ingress host. We have a template as in any other one. So as I said, these is example I was talking about. All ingress is wholesome, Sb unique. You don't have a parameter because you don't need to just have the regal code that again, we don't require that to understand. Okay, so let's create that this is unique ingress host. And again, we have a template here. So let's apply that template first. And then I'm going to also use this sync object. And this sync object is going to tell gatekeeper all the stuff that it needs to store to make those decisions. So again, this is here as part of the gatekeeper library. So even the sync objects that are required are in the example. So let's apply that one. So as soon as I apply this object, gatekeeper is going to start storing any ingress object that I create that can be then used to make a decision. Let's go to the examples. And as usual, as I said, the examples come with a constraint. That's the first thing that we need to create. In this case, the constraint is pretty simple because it doesn't have any parameters. So it's just for any ingress that we create. And we also have examples about things that are not allowed. So in this case you can see that we are going to try to create two ingresses object. The first one, it's going to check these host that is unique and because it's the first ingress object that we have created is going to be unique. So that's good. For the second one we are going to try to use these same name. So probably it's going to fail at some point. So let's try that. Okay, you can see that the first one was created successfully and even though it was super quick, so I created the first one and then the second one, Gatekeeper already had that information on that, so it created the first one successfully. And when it tried to create the second one, it checked that sync objects that it already had and said okay, there is already an ingress that has that same name. So I'm going to stop this. So again, not only you can create policies with the information about the object that you are creating, but also in relation to the objects that are already in your cluster. So that's all I had for this talk. I hope you learned about Gatekeeper if you didn't know, that project is a fantastic project, makes super simple to start using policies as code in kubernetes. So check it out and thank you very much.
...

Ara Pulido

Developer Relations @ Datadog

Ara Pulido's LinkedIn account Ara Pulido's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways