Conf42 DevSecOps 2021 - Online

Centralized Policy Management at Scale

Video size:

Abstract

Systems are becoming more and more complex, built on microservices architecture, with large engineering organizations working together inter-dependently. In this new world order in engineering organizations, policy management has become a core piece in making this all operate more seamlessly.

Projects like Open Policy Agent (OPA) have brought policy management to forefront, and have provided one method for applying centralized policy at scale. This talk will review different methods for applying centralized policy at scale, demoing this through OPA as a policy operator, and applying policies to Kubernetes config YAMLs, for a real world example for how you can apply this to your services as well.

Summary

  • Today we're going to talk centralized policy management at scale. It's like our second Conf 42. Thank you very much for coming.
  • Today we're going to talk about how to manage centralized policy at scale. It can be applied not only for Kubernetes if you're a serverless organization or any other organization. We have an open source CLI that runs policies and we just passed 5000 GitHub stars.
  • OPA is a general proposed policy engine. It gives services the ability to decouple decision making logic from policy enforcement. To use OPA, it's very simple. Use OPA as an embedded package inside your project. This really empowers admins in the organization to have control over their system.
  • defining the policies is very important. All workloads should have memory limits. And number two, define a granular policy. Start from the top as broad policies and then go deeper and deeper to more granular policies.
  • If you want to enforce those policies in the CI pipeline we really recommend you to use conftest which is also built on top of OPA. To make sure that your policies are also enforced and checked on the runtime environment on your Kubernetes cluster, you can use Gatekeeper.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Me. Hello. Hi. Conf 42. Hi. Thank you so much for coming. It's so great to be here. It's like our second Conf 42. Yeah, it's a second of 42, I guess. I don't know. Yeah. So thank you very much for coming. I'm Shimon. I'm Noah. And yeah, let's get straight to it. So what are we going to talk about today? Today we're going to talk centralized policy management at scale. Scale. But first let's talk about us. So my name is Nael Barki, I'm a fullstack developer. I'm checkwriter and also one of the leaders of GitHub Israel community, which is the largest GitHub community in the whole universe. Amazing. And my name is Shimon, I'm one of the co founders and the CEO of the Tree. I'm an AWS community hero and I live and breathe DevOps and infrastructure and this is why we're here. So today we're going to talk about how to manage centralized policy at scale. We're going to show it to you in the eyes of a Kubernetes administrator that tries to manage the policies for the organization. But I think it can be applied not only for Kubernetes if you're a serverless organization or any other organization. So what we do in the trees, we see a lot of organizations that deal with policies because we help prevent misconfigurations from reaching production and this is what we do and policies is how we roll and we have an open source CLI that runs policies and we just passed 5000 GitHub stars. Yeah. Woohoo. Yeah. So we're really happy but let's get into it. So today we want to talk about how to avoid this misconfigurations. Yeah. And I really like to say that as a developer at a datree, what I do, we do policies for a living. But what I do is not only to understand about Kubernetes, how it works and what are the policies that we want, it's also to understand how you can blow up your own cluster. So I really understand why organization needs the centralized policies and this is what we are going to talk today. And the real question, the main question is how you can prevent the next misconfiguration, the next production outage. And after a lot of thinking, a lot of thinking, 100 postmortems, if you saw that talk, we summed it up into three major steps. So step number one or step number zero? Actually the first step, how many fingers does a developer have 4012. So step number the first step is meet OPA. It's your policy engine. OPA is a general proposed policy engine. It gives services the ability to decouple decision making logic from policy enforcement. You can basically think about OPA like a super engine. You can write your policies into it, you can publish your policies into it. So whenever you execute it with every input JSON input, OPA will check if it violates any one of your policies that you published. You can talk to it using API or importing it as a library, and it will evaluate the business logic of whether it should allow a policy to pass or not. Yeah, and there are many different use cases, not just to check core configuration, there are many authorization use cases that are being used with OPA and the other things. Yeah, but the real beauty of OPA, the real magic behind OPA is that OPA enables you to offload and unify all your decision making logic into a dedicated server. Yeah. So you can decouple your business logic from your application logic from your decision making of whether a user can perform an action of delete, let's say. So you offload it to a different service that does all of this calculation, and then you don't need to build this logic of policies inside every one of your microservices. Yeah. And this really empowers admins in the organization to have control over their system. Yep. So moving forward, by the way, I'll just say that OPA is part of the CNCF foundation. It is a graduated projects. We really recommend doing that and using it. Yeah. So to use OPA, it's very simple. First you need to integrate with OPA and you can use OPA as an embedded package inside your project. If you're using the Golang language or as a host demoing, you write your policies in regular language. We'll talk about later. And you query OPA by sending an HTTP request with the input and OPA will do the rest. Or you can call it as a library and just call it directly. Yeah, and this is can example of the regular language. It's a declarative language. It's very easy to learn from experience, I'll tell you that. And this is can example of a policy written in Orba which violates if deployment resource have the app label. It's very straightforward, you can mark my word. Yeah, it's a nice declarative language and there aren't many loops and stuff like that. It's like what you see is what you get. This is, I guess what they tried to do? Yeah, it's more like SQL than Javascript, I always like to say. But let's move forward to step number one which is define your policies. Cool. So defining the policies is very important. So I remember when I was an engineering manager for 400 developers and one developer made a mistake and it propagated to production and we had an outage and it's okay, it happens. I also make mistakes all the time. But I thought to myself what can we do, how can we prevent the next outage? And for me we tried sending emails and stuff like that, which obviously doesn't work. So we said okay, we need some sort of framework, some sort of guardrails so everyone will work by them and we call them policies. Number one, you need to define what are your policies. For example, make sure that every kubernetes workload has a memory limit and cpu limit and has a liveness probe and readiness probe. And every docker container has a health check. So that's a policy. And now you want all of your microservices to follow this policy. So number one, define the policy. Let's say all workloads should have memory limits. And number two, define a granular policy. So just having a top level thing that tries to narrow everything down, I don't think it's good enough because let's say you put a four gigabyte memory limit, but then you have can AI workload that needs 50gb. So start from the top as broad policies and then go deeper and deeper to more granular policies. Amazing. So now that you have defined the policies in your organization and you know what you want to enforce, the real question is how you integrate the policies inside your pipeline. And this is very crucial. I really want you to think about where in the pipeline you want to enforce the policies. This decision will affect the developers and the DevOps engineers in your organization. So the first option would be to integrate the policies in the CI pipeline. If you want to enforce those policies in the CI pipeline we really recommend you to use conftest which is also built on top of OPA. So conftest is can open source library which helps writing tests again any structures filed XML, JSON Docker, pretty much anything yaml. Of course as I said before, it's built on top of OPA. So all the policies should be written in Rego. And another amazing thing about, really awesome thing about confidence is that it allows you to push and pull your policies into docker registries. It's not only about containers anymore and to use conftest really straightforward, you need to download conftest, write the policies in Rego, and then execute that policy according onto a specific file using the conftest test command. And as you can see here, and you will see the violation output. Yeah, you can really think about it as a unit testing library. And as you can see here, we used GitHub action just to hook conftest into our pipeline. We used Docker pool to pull conftest and then we ran conftest test with this path and pretty much that's it. This is conftest straightforward? Yeah, very simple. I really like conftest as a developer. It made me a lot of sense. But what if we want to integrate our policies in the cluster? Yeah, so I'm a big believer in shift left and I believe we as developers, we want to find problems as soon as possible in the pipeline. But then still sometimes you want to make sure that your runtime is also secure. And I don't know, maybe someone cubectl something into your cluster or I don't know what. So if you want to make sure that your policies are also enforced and checked on the runtime environment on your Kubernetes cluster, you can use Gatekeeper. And Gatekeeper is a great utility, also part of the open policy agent GitHub project, and it uses the admission controller Webhook of Kubernetes. And it is much like the operating system webhook. So imagine you're an operating system, there is a process trying to run, then the operating system calls the antivirus and asks him hey Mr. Antivirus, can this executable run? And then it tells them yes or no. So it's the same thing. So you Kubectl apply a resource to Kubernetes. Then this webhook admission controller calls Gatekeeper. Gatekeeper runs a policy check and says you cannot push this, it does not have memory limit. Go fix your deployment. And then it rejects the deployment and then the developer has to fix it. And this way you achieve a runtime protection. Yeah, Gatekeeper has a lot of other options that you can configure. You can have it on audit mode, you could have it on test mode, it's really cool, it's really cool project. So how does it work? So you define a constraint template, which then checks for, it says like okay, for every resource that comes in we have a constraint. For example, this talks about labels and it's very simple. I won't go line up by line with you, but you say like basically check for metadata tag and see if there is a label and then some sort of label. Let's say you want to have a cost center, so you want to have a namespace and a label for every resource. And then what happens is that once you apply that, then you write a policy and you send a resource to it. And then you say, okay, this resource and this policy with the constraint, can this be applied to the cluster or should it be rejected? And this is how it works. So it's very simple. Yeah, but it's not the same policy. I mean, if you decide that you want to use both Gatekeeper and conftest, you will have to write the same policies, almost the same policies. One for Gatekeeper and the cluster will store those policies and one for conf test. Yeah, so both of them are written in Rego. They're almost identical. They're a bit different. And if it's in gatekeeper, gatekeeper will run inside of your Kubernetes cluster and it will be there. But you can almost use the same policies. Yeah, but you have it like twice. You have two instances of the same policy. Okay, amazing. So using conftest and using key gatekeeper, you can practically protect yourself, protect the entire pipeline. Totally. So it's great. You integrate it directly within your source control and you can be projects from dev to production. Yeah. So the next step is how do you control, review and monitor what you've done? For instance, as I said, we're in my previous role, so we had 400 engineers and like 1000 git repos. So let's say you define the policy which is by itself, you need to think like, which policy do we want? Which policies can there be? And then you need to integrate it into each one of your builds. But then let's say I want to make a change and I want to introduce a new policy. So what do I do? I open 1000 pull requests. So if I'm using raw solutions like gatekeeper and conf test, this is exactly what you need to do. If you use solutions like the tree, it comes built in. But just so you know, it is important to be able to dynamically change the policy. It is important, number two, to have full control and visibility into which policies ran on, which resource, what was rejected, what was passed, what is your status. So it's sort of like having a command and control solution. And those are the things that don't really come built in with OPA and Gatekeeper and conftest. And this is what the tree is complementary about. So we come with predefined battle tested rules that you can just use out of the box. You can write custom rules, and you can also have an enterprise grade control and management so you can oversee everything that happens and dynamically assign other policies and change them on the fly without having to change the code itself. So how does it work, Noah? This is really straightforward. All you need to do is to install the tree and execute the tree test with the path of the files that you want to test, and the tree will show a full output with guidelines to how to solve every violation and where that violation occurred. As you can see here. And it's free, it's an open source, and since it's my teammates and mine code, I encourage you to submit a pull request and we'll try to get to it in time because it, but we promise we read everything. Yeah. Cool. So thank you very much, Noah. Thank you very much. Iman. Thank you very much. Thank you. Conf 42 bye.
...

Noaa Barki

Full-stack Developer @ Datree

Noaa Barki's LinkedIn account Noaa Barki's twitter account

Shimon Tolts

CEO @ Datree

Shimon Tolts's LinkedIn account Shimon Tolts's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways