Conf42 Site Reliability Engineering 2022 - Online

Terraform apply secured by Open Policy Agent

Video size:

Abstract

Terraform has unprecedented control over the mission-critical infrastructure for our businesses and organizations. Think about the last time a misconfiguration went unnoticed for long enough to impact customers or cause an outage. Everyone should have a second set of eyes when deploying code that has the potential to create a negative impact. Let Open Policy Agent (OPA) be that second set of eyes.

OPA is an open source general-purpose policy engine that is especially adept at working with configuration data like Terraform manifest files. Using OPA, we can write policies that will ensure that resources created by any team and any engineer are compliant with the organization’s rules and requirements.

Implementing policy can be challenging, but it doesn’t have to be. OPA comes paired with a purpose-built dedicated policy language called Rego. This talk will show how to get started by deploying an OPA into your CI/CD pipeline and writing your first Rego policies to secure some of the primary AWS resources we use every day.

Summary

  • Peter Oneill Jr. is the community advocate for the open policy agent project. Today we're going to talk about adding policy as code to your infrastructure. We'll talk about introducing these concepts into Gitops best practices and putting them into your CI CD pipeline. And what does it look like to have a secure pipeline with Terraform?
  • Terraform brought about the idea of defining your infrastructure as code. The concept of moving from an imperative type of programming languages to a declarative type of language. Using that same analogy for driving, we are going to look at the three phases for deploying infrastructure.
  • Open policy agent is a general purpose policy engine. It is able to evaluate policies separate from the service that needs the policy evaluation. With this model you are able to have full control over the policy development lifecycle. OPA can also be part of your automated testing suite.
  • How to do some organizations on a terraform manifest file using OPA. First test case we're going to create 16 resources or 16 instances. The first one will fail. The second one will pass based on the policy that we have.
  • In the slideshow you'll see a couple of links here to help you get started with OPA. Using the OPA exec command, using another tool that you could use. Also have an integration with AWS cloudformation hooks if you are an AWS shop. So with that, say thank you for joining and I look forward to connecting with everyone.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. Welcome to Terraform secured by open policy agent. Today we're going to talk about adding policy as code to your infrastructure. As code my name is Peter Oneill. I am the community advocate for the open policy agent project. I'm also a digital nomad and contributor to open source projects. You can find me on on pretty much all social media platforms with at Peter Oneill Jr. On today's agenda. First we're going to discuss Terraform and infrastructure as code. Kind of just how these tools came about and what they do for us right now. Then we're going to talk about introducing these concepts into Gitops best practices and putting them into your CI CD pipeline. Next we're going to talk about how do we authorize the changes that are challenging with Terraform and your CI CD pipeline and how to best control what is happening next. After this, we're going to talk about decoupling these policies or decisions in this pipeline so that you're no longer baking these policies into your services, but rather having a standalone tool like open policy agent to handle this decision making. And then lastly, we're going to talk about securing putting this whole process together. And what does it look like to have a secure pipeline with Terraform? Terraform apply secured open policy agent everything up. We'll show a quick demo of how this all works with Terraform, Opa and GitHub actions. So to start, Terraform brought about the idea of defining your infrastructure as code. And with this idea cause the concept of moving from an imperative type of programming languages to a declarative type of programming language. And so to create a simple analogy here, let's talk about driving to a destination. When driving to a destination, you can hop in the car and drive yourself. Think of this as the imperative model where you choose every twist and turn that you need to do in order to get to the destination. Or you can do the declarative model where you choose a destination and then you call a taxi or a rideshare app and they figure but the directions, you're no longer in control of the individual twists and turns, but rather you just want to get to that end state. You want to get to that destination. So you're declaring that I want to be at this destination. So imperative is you decide each twist and turn on how to get there and then declarative is just wanting to get to the destination and letting the service handle the rest. So using that same analogy for driving, we are going to look at the three phases for deploying infrastructure as code. First, we have the coding phase where we are writing the terraform manifest files. And so you can think of this as choosing the destination that we want to go to, or understanding the full picture of our infrastructure as we see it in the end state. And then so knowing what the end state is or the destination, we can now generate a plan. And so with this plan, this is very similar to seeing the gps destination on your smartphone before actually going to that destination or taking that ride share. So this is seeing the route and each of the twists and turns. And so with terraform, we're going to see each of the resources we went to create or modify and then each of these changes that needs to happen in order to reach the end started. And once we have this plan and we know the end state, now we can approve this plan and execute it. And during the apply phase, when we are executing this plan, if everything comes according to plan, then there'll be no errors and we'll be exactly where we need to be. But if there are errors along the way, because the infrastructure at the time of the plan doesn't match the current state, this is where those errors will bubble up. And then when an error occurs, you can either roll back or stop the plan, or stop the apply where it is in order to fix the problems and then continue the planned code creation from the point that you are at. And while in a dev or test environment, running terraform apply directly from your laptop may be acceptable. Typically you will want something a bit more robust once you are in a shared team or production environment. And so with that you will want to run your terraform apply as part of a continuous delivery or continuous deployment model. So with continuous delivery or continuous deployment, we went to use the same Gitops best practices we use for the dev side on our infrastructure side as well. Right? So with these Gitops best practices, these are having a single source of truth. Being a git repo, this is going to be defining what you want in a declarative language, being Hashicorps HCL. And then any changes should be versioned. So anytime you do want to make a modification, this process is going through a pr in order to show that the underlying infrastructure is changing. And then this whole process should be automated so that no manual changes should happen to your infrastructure, making the current started different from the version state. And so with that, let's look a little bit about how this might look. And so on the left hand side here, this is going to be when you're actually defining your manifest files and then submitting them to a repository, doing a git commit or a pr in order to push this code to a repository. Once this code is sitting in the repository, there will be some automated testing that needs to happen. And then once all the testing has happened, this will generate the plan, and then it's sitting in a delivery state waiting to be approved and then deployed. Right? And so going through that authorization step, which, that manual authorization step, or you can look at a continuous deployment model where you set up a more robust testing and policy suite in the middle so that you can go straight from deploying your code to a repository to it creating the underlying infrastructure. And this is where policy comes very important, so that any changes that you intend to happen happen as you expect them to happen, so you don't have any runaway resources or unintentional actors doing anything that you wouldn't expect. And speaking of unintentional actors or malicious actors, right. This shouldn't be the only consideration when thinking about authorization. Authorization should encompass protecting your resources from any changes, whether those are intentional or not, malicious or not. And I think at this point, it's very common to consider storing secrets in a secrets manager to protect yourself from unknown users or malicious users. But even more important is being able to protect your resources from changes you don't expect to happen. Right? And so these can be unintentional changes or radical changes. So you may have an unintentional change where you've deleted all of the tags from a set of resources without knowing that you were going to do that. Or you may have a radical change that has unexpected where you created 1000 containers instead of 100 cause of a typo, right? And so these types of changes can happen and can have serious effects to your infrastructure without the proper guardrails in place set up to protect it. So speaking of these guardrails, let's talk about how to enact these guardrails with open policy agent. So open policy agent, or OPA, is a general purpose policy engine. And it is general purpose because it is going to work with any services, not just terraform, right? Because it is expecting a query or a question as a JSON blob, and then it's going to return a response as JSON as well. So this makes it a very flexible tool when needing to evaluate policies, right? So you are able to now evaluate these policies separate from the service that needs the policy evaluation, right? And so this is a decoupled method where you have decoupled policies from your service. And so removing these policies from the service itself allows you to have more fine grained control over how the policy over the specifics of the policy. Right. And so with OPA, OPA comes paired with a dedicated policy language called Rego. And so Rego is purpose built for defining policies. It is also declarative, much like Terraform's hcl language, allowing you to follow the same Gitops best practices that we talked about earlier for storing these policies in git and making sure that they are versioned and having an automated deploying of these policies once they are submitted. Right? And so this is a nice complement, having your policy as code next to your infrastructure as code using the same or similar deployment methods. And with this model you are able to have full control over the policy development lifecycle. So you are able to make updates and changes to your policies separate from the rest of your application and infrastructure. So anytime you need to modify what the policies state, you're able to roll these out without having to restart services or recreate resources. You can just start enforcing new policies from that point. All right, and now let's bring it all together and talk about how we're going to secure your infrastructure with open policy. Agent so after you define the manifest files that you normally would, you may now add in an additional step where developers or operations folks are using OPA on the command line in order to check that their manifest files are valid before they are shipping them out, just giving them a little bit more confidence in their changes before they're even submitting them to a git repository. And then you will submit these manifest files in a pr or a commit. And then OPA is going to now be part of your automated testing suite where this is going to be the crux of your policy enforcement. And this is where OPA is going to catch any resources that are not meeting your organization's policy requirements before they actually get deploying to the underlying cloud services or hardware. With that, let's do a quick demo where I'm going to show how to do some organizations on a terraform manifest file using OPA. All right? And so in this demo we have three files. The first file here is a terraform manifest file. We're going to be using the Amazon EC two module in order to create EC two instances. And then we're going to have two test case here. The first test case we're going to create 16 resources or 16 instances. The second one we're going to be creating three. And so the first one will fail. The second one will pass based on the policy that we have and so these are just standard AMI configurations. And so let's pop over to our rego policy here. In our rego policy, we see that we have set a blast radius of 30. And what this means is we are essentially giving weighted values to the different actions that terraform can perform. If it's going to perform a delete action, this is going to be weighted at ten. For a create action, this will be weighted at two. And then for a modify action, this will be weighted at one. And so remember, for our first test case, we're going to be creating 16 EC two instances which will be 32 just above our designated blast radius. And so that will fail the flow. And then underneath here, we can see the actual policy. We can see that by default we're saying authorization is cause, but we're using the word authorization auth z here. You can set this to anything. This is just the name of a rule, just a variable. So this is not defined by Rego. You can define, accept, deny, anything that works for your policy. And then underneath that we see the various rules that are going to do the actual scoring system. And finally, our last file here is going to be the GitHub actions file. That's going to be run once we submit the code to GitHub, right? It's going to check, but the code install terraform, install OPA and then run a format init validate plan. It's going to then convert the terraform plan that's coming out as a binary file into a JSON file for OPA. And then we're going to give that JSON file to OPA to evaluate and see what happens. So with that, let's go over and look at our terraform manifest file one more time. And let's say that we are a good dev and we went to actually check this ourselves beforehand, right? So then we run that terraform plan, create the binary file, convert the binary files to JSON. We see that we are creating those 16 resources. And then, so let's actually get the score here for that blast radius. We can see that by running OPA eval against that terraform plan and then comparing that to the regal policy we have defined in the policy folder, which was the regal file we just looked at. And then we are looking for the rule with the name of score. And that is going to give us the actual value of 32, showing that the weighted value of this change is 32, which we know will fail this evaluation. Right? And so with that, let's go ahead and say that we did not run this check, we didn't know that it was going to create this many resources and let's submit it anyway. Let's see blast radius 32. Send that off. Let's go over to our browser here. With that we should see it pop up blast radius 32. And then this is going to take a little bit of time for GitHub to set up the container, install terraform, install OPA, run the terraform formatting commands, create the plan as we stated, convert it to JSON. And then once we have that actual plan being compared to OPa for authorization, which we can see exited with a status code of one. So now let's go back, let's go back and modify the manifest file to be just three resources. All right, going back to our code editor here. Let's comment out this one. All right. And with that, let's do a, let's, let's just, just check this one more time. Locally, create that same output file. Let's get the score. Score is now six, which is as expected. And let's run the local, let's run the local eval. Make sure that that is not turning anything. All right, so everything AWS expected and let's submit it. So blast radius six, submit those changes. Let's go back to our browser and we're seeing that it is spinning up a new container. So we have to wait a little bit. So as this happens every single time, doing all the same checks, spinning up terraform, creating OPA, running the format commands and getting down to the bottom here. Terraform validate terraform plan. We're going to see that it's going to spit out the plan for only these three resources and then authorized and we can see that the job completed and everything is green. So now we can hand this off to our continuous delivery or continuous deployment system to finish off creating the resources. Cool. And with that I'll end the demo and back to the slideshow. And in the slideshow you'll see a couple of links here to help you get started with OPA. Using the OPA exec command, using another tool that you could use is comes which is built on OPA and Rego does the same sort of manifest validations. And then we also have an integration with AWS cloudformation hooks if you are an AWS shop. And on the right side you can see a couple of resources to the Styra Academy, to the OPA docs, and to Styra free just to get some hands on with OPA using an tool. And then lastly link here is the link to this demo. So with that, say thank you for joining and I look forward to connecting with everyone. Hope you have a great rest of your conference.
...

Peter ONeill

Community Advocate @ Styra

Peter ONeill's LinkedIn account Peter ONeill's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways