Conf42 DevSecOps 2022 - Online

How to Contain Security in your Containers 

Video size:

Abstract

Developers must think about security for their container before exposing it the rest of the project. Adrian Gonzalez discusses how security covers container image definition, scanning images, storing images and using CI/CD pipelines to automatically scans allows for automation of container security.

Summary

  • Adrian Gonzalez is a principal software engineering lead at Microsoft. Today we're going to cover the four phases that I've experienced when working with security around containers. In this section we'll cover four topics and outline some examples.
  • Make sure that you're running the latest versions or most relatively recent versions of the OS and as well as the container environment. The next best practice is ensuring to use non root users. Make sure that when defining images that they are as lean as possible.
  • A private registry is nothing more than just a regular public registry. It is segmented by having stronger network security policies in place. Another important concept when working with registries is affiliating a concept called digital signatures to that registry.
  • Another best practice is around identity access management, item and role based access management. Always consider minimizing highly privileged accounts and only grant permissions that are required per each account minimal privileges.
  • The next phase is around container devsecops operations. Consider creating as many account keys for different teams or individuals. The most highly privileged key is CI agent number one. It may be useful for a Devsecops team to perform to fully automate the provisioning and main management of future repositories.
  • Next is talking about the CI stages involving containers and devsecops. Step two is running tests. Think of this as unit tests for your container. Everything is based on a container code change or code commit.
  • Tox talks is a virtual environment management and test CLI that relies on Pytest. Next step is around scanning for vulnerabilities at the container level. In my experience, it's worthwhile to publish even the images that failed previous tests.
  • The next phase and last phase in our cycle is around best practices when securing the production environment that uses containers. There's three pieces to that environment, hardening, vulnerability assessment, and runtime threat protection for nodes.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello. Good morning. Good afternoon. Good evening. Thank you for joining today's talk on containing security when working with containers. My name is Adrian Gonzalez and I'm a principal software engineering lead here at Microsoft. As part of my role, I am accountable for ensuring that security is part of my team's engineering fundamentals and incorporate as part of all out puts. Whether it's working with a customer or working on a product outside of work, I enjoy experiencing new cuisines, going on wine tastings, traveling the world, most outdoor activities, and playing and watching baseball throughout the presentation. Today we're going to cover the four phases that I've experienced when working with security around containers. First, we're going to explore the top quadrant here, which is around creating updating container images. In this section we're going to cover four topics and outline some examples, as well as additional details around what are some of these best practices when creating and updating container images? The first practice I want to talk about is ink, making sure that you're running the latest versions or most relatively recent versions of the OS, and as well as the container environment. In this image and in the future slides, I'm going to be using Docker as an example. Typically, the steps to perform this will vary. The syntax will vary based on your operating system and tool of choice, but typical steps include things like uninstalling old versions, updating any packages that are required to perform the commands on the CLI, adding GPG keys to ensure security when downloading dependencies, setting up repositories to be able to store where the downloads will take place, and then ultimately determining what version of either the OS or the container framework you're using. So in this case it would be a particular version of Docker, engine container and Docker compose. And in this images here you see how I outline the command that I use to determine what are the recent versions of Docker as well as what version I want to use for my dependencies. The next best practice is ensuring to use non root users, and in this particular case you'll see examples both in the top left and top right images where I define a particular username, I give it a user id, and I create a particular group for that user to be associated with. And you'll see a command highlighted there where I just want to share that. In my experience, it's okay to have certain non root users be able to perform pseudo operations, but you do want to limit how many users and also definitely segment the users, potentially as separate groups as well. The last line in the top left image is that effective use of ensuring that when the container runs, it's using that username and not a root user. The bottom right image is effectively the same structure, but using a real example of where I needed to download a Golang base image from docker containers, and then I had to actually go back and use the root user to install some dependencies. In this case, I removed most of those dependencies and just showed the run command where I would update dependencies. And then once I finished that custom tailoring of my image, I completed the docker file by switching back to that non root user. Another best practice is making sure to use trusted sources or trusted registries. Here in this image I show how what is the relationship between a registry, an image, and a container, where registries is effectively nothing more than a repository of images? Images are static representations of running containers, and then a container is the actual instance that is running, and you'll see that when working with Docker. It follows that same convention when you see indications of the from command, where the first piece of that from command is a registry followed by a slash, followed by the repository name as well as the image name, and then followed by a colon and the specific version of that image. Now when using trusted sources and registries, know that that can be both on prem as well as public cloud through other providers, whether that be azure, AWs, Google, some other cloud, or JFrog artifactory. Another best practice is making sure that when defining images that they are as lean as possible. That typically translates to only install the dependencies that that container requires to perform its function. But it also includes another mindset that I outline in the top left image, which is separating build versus runtime dependencies. In this image we're installing what looks to be some kind of a node application, or we want to create some kind of a node application that is a web app. And so the first part that I highlight here is all the steps required to build that solution, and that's performed by running that run yarn run build command. And of course I had to install yarn to do that. As you can see, yarn install or, sorry, install all the node NPM packages to do that. And depending on the solution those can get quite hefty and include a lot of dependencies. Now the second piece that I highlight is a totally different docker base image called Nginx Alpine. And here I'm copying only the outputs of the run build command in the previous step and storing them in this new base image. And now I no longer have to worry about all the other NPM packages that were required to actually build a solution. And my image is now leaner, has less dependencies, and is therefore less prone for vulnerabilities. Also in the bottom right example is another extension of how I was able to extend a Golang image and a python based image. And again, I removed all the custom dependencies I had to install for keeping this brief. But you see how I start using in line ten the Golang image, I download it, and then in line twelve I download the Python image. I would perform my custom dependencies and install them there. And then in line 18, all I do is copy everything from the Golang base image into my current working image, and that is what I would ultimately produce. The next phase in our cycle here is storing that container registry I want to talk about a few concepts as outlined in this slide. First up is around using, considering what is using and what is a private registry. Well, really a private registry is nothing more than just a regular public registry, but it is segmented by having stronger network security policies in place. Now these can be things like firewalls, source IP range policies, or poor policies. The importance here is security by not even knowing that a registry exists and only granting individuals, or in this case networks, the ability to connect to your registry on a need to know basis. Another important concept when working with registries and securing a registry is affiliating a concept called digital signatures to that registry. I provide a couple of definitions on what that is here to keep it brief. A digital signature really is nothing more than what it sounds like. It's a digital signature that creates trust and a chain of custody of any image or any version of an image that is available for you to consume. It gives you that additional sense of confidence that that image is produced by whoever said produced it, and gives you that sense of accountability that you know who to hold and reach out to. If there were any issues with that image and the three images I have here, I show how I use Docker to first create a digital signature key docker trust key generate. I'm not Jeff, but I would replace that with my name. The second image, I add that key that was just generated. And it's important to note that prior to generating the key, I do have to provide that password to be able to generate the private key. But once I set my password and I have a private key, I would affiliate that key to the registry. That's what the middle image command is performing, and you can see it's saying that it's adding Jeff to the registry and it is prompting for that password so that it can successfully do so in the bottom image. We're effectively running the command to publish that image to that registry and you can see the command docker trust sign the name of the registry. The repository is called admin and the image name is called demo and it's version one. And again, I'm going to be asked for my password. Once I enter my password, that published Docker image in that registry will now be digitally signed by myself and any consumer can get that information that I was the one that signed it at a particular point in time. Another best practice is around identity access management, item and role based access management. To start, let's define two important terms here. A role is nothing more than a set of a unique id set of permissions and what asset or assets those permissions are being granted to for that role. An account is a set of id and roles, and that account I'm visually representing as keys in this image. In the bottom left image you can see how the service Azure container registry, which is Azure's registry solution, has a total of six or seven different roles, and each role has different sets of permissions. Now one best practice to consider here is always consider minimizing highly privileged accounts and only grant permissions that are required per each account minimal privileges. And so in this example, the least privileged would be a key to my image that may only contain, say the ability to download that image. So that would be the ACR pool command. Another more highly elevated privileged key would be the one on the far right, which is assigned to the Myelm chart and the base node repository. And that one, let's say for a sake of argument, is given the ability, the role contributor, so that whoever has that key can then perform all of these permissions and operations that are outlined in the left image. And the most highly privileged key is the one that is affiliated at the registry level. Even if it's the ability to just do, say pool of images for all repositories within that repository, I still consider that a highly privileged account and should be highly secured and restricted in terms of who can use it. The next phase is around container devsecops operations. And again, we're going to cover a few topics here from ensuring that the CI agent scans have access to the registry to what is container scanning and the various CI stages. So first, when working with CI pipelines and container registries, it's important to first make sure that your CI solution, and in this case in the image I use Azure DevOps has the ability to connect via the network to that container registry. That may involve making some security network changes on the firewall or security policies. Second is to consider creating as many account keys for different teams or individuals that will be using the CI pipeline platform solution. And the same mindset that we did before applies where we want to be as granular as possible. The most common account key I create in my experience is CI agent number three. So for each account only have maybe one or the minimal number of repositories and only granting that account ACR pool permissions, a little more elevated permission might be the CI agent number two key, which has in this case visually two repositories that grant that keyholder contributor and ACR image signer permissions. This is great for a pipeline that will be doing pushes to those repositories. That way the pipeline can digitally sign those images and consumers know which pipelines or which team that uses that pipeline produced said images. And again, the most highly privileged key is CI agent number one. This would be very limited, especially on a CI level. So definitely very cautious to have a CI have this account key performed because as owner it has complete control over that entire registry and all its underlying repositories. But it may be useful for a Devsecops team to perform to fully automate the provisioning and main management of future repositories. Next is talking about the CI stages involving containers and devsecops. We're going to go through each step here shortly. But just like all CI pipelines, everything is based on a container code change or code commit, specifically those that maybe pertain to the container definition, like the image or docker file in this case, step one build. Don't worry about the syntax from here on out. This is Azure DevOps and what I want to make sure is that we focus more on the actual docker commands or the tools that I'm using that I'm working that I'm going to be showing you all. So in this case the build this step is relatively very simple. It's just running docker build with a particular tag using the image name variable that would be passed in as part of CI pipeline. It would be putting in all build arguments that we'd be putting in as part of the docker build process and then a parameter called docker file name that tells what the docker file name is for docker to look and build a container image from. Step two is running tests. Think of this as unit tests for your container. The first thing I highlight here is a CLI command that you may not be familiar with called talks, and then the rest of that command e test infra make target environment. That's basically just a way to distinguish whether the container is suited for dev or all the way up to production, and then the parameter for image name as before. Before we get into what talk this, I also want to talk about the second task in this image, which is only triggered when the previous task succeeds. And if it does succeed, you can see that what we do is we effectively echo. In this particular case, what we're really doing is we're just setting a variable called test to pass and giving it the value of true that we will be using in further steps of the CI pipeline. So what is tox talks is a virtual environment management and test CLI that relies on Pytest, which is a Python package, to effectively run Python code that is comprised of methods that are test cases. And each test case has assertions. Let's show an example of what one of those looks like. Here is one very rudimentary example of what a file will look like. And you can see, like I described, each method starts with the def syntax and then we pass in variables like host like file content, and then inside of the method we perform the actual assertions. So the first method tests to see that certain files exist in the container, the running container instance. The second test test container running checks to see if the user that has an active session in that container is root. Note we just talked that we don't want to run that we want to run containers as non root. So I would argue that this assertion should be changed to say process user is not equal to root. So we give more freedom in our test case assertions that other users are allowed but root is not. And then other assertions include things like testing certain properties on what the host system is, checking to see, environment variables that are set, ports that are exposed, or sockets that the container may be listening on. Again, this is just a place to get started. Definitely encourage folks to look into additional assertions that make sense to test to ensure that the container is properly configured and defined. Next step is around scanning for vulnerabilities at the container level. In this example I use a software solution called trivia and you can see first step is for me to download and install trivia and all I'm doing here is making an HTTP request to install the DBM package and ensure that trivia got successfully installed. The second task is where things get interesting. I'm running two scans using trivia. The first scan in the first portion of the line there in the script is effectively telling trivia to pass the pipelines even if it finds vulnerabilities with severity low and medium when targeting a particular image repository and a particular image tag version. The second scan, however, will fail a pipeline if the severity if it detects any severities at the high or critical level. Again, this is subject to risk tolerance based on the industry and the team and the particular solution on our development. But I definitely would encourage individuals to side on the air of caution and ensure that there are no high nor critical vulnerabilities in the container dependencies here's an example of what a test vulnerability report from preview looks like. I've highlighted the key things just to keep an eye on. You'll see the total number of vulnerabilities and their classifications, and then at the bottom you see a table with really rich information around what's the dependency or library specific vulnerability id, its severity, the installed version that it was found, and then trivia actually goes and searches its data, sets its database to see what is the fixed version where that vulnerability is no longer in place, further exposing you and the team to decide how to fix the vulnerabilities. Here are more example scanning tools that I encourage you to look into. That includes Aqua sonar, cube, white source. The next step is around versioning and publishing the image. Now I'm going to break this down by first saying there's two parts to this. In my experience, it's worthwhile to publish even the images that failed previous tests. And the reason for that is it makes it easier for a subset of consumers to download those failed images and troubleshoot them, patch them, fix them, and then publish the code changes to fix it to the true repository. Now the caveat here is just like how we talked about many different keys. That needs to be a different key that grants a unique repository in that registry called effectively failed, with some form of like a failed syntax or indicator. That's what we're doing in the first highlighted section of this image. We're effectively on a test passed, variable value being set to false, which would happen if the vulnerabilities can, or if the talks test failed, we would set that to false. Now we will run a script to create a tag for that image and append at the very end of it the failed value as seen near the top of the images. After we've tagged that image appropriately, the failed images, we now publish that image. And again, the condition here in Asto. That syntax is effectively the same as before and all we're doing here is pushing it to the proper repository with a proper suffix. Now what's key here is just I wanted to call out for you the CI credentials that's being used to authenticate the CI pipeline with the registry is in Azure DevOps, found using this parameter called service connection. Just want to mention that it's specific to Azure DevOps, but just want to make sure that we are still doing this securely for all these examples. The continuation is now to also publish the happy path if an image passed all the way through, we want to make sure that we tag that image appropriately and you can see that that's taken place in the first task at the top and we're using a value called latest to give it a name called latest for that version. The middle task is effectively pushing the docker image, but using a parameter image tag instead of the value latest. Here you can decide to use the conventions such as major minor minor minor, or use a convention that maps the CI build guid or job id to that image as well. I've seen both options work pretty well in my experience. The bottom task publishes the same image, but now publish it using the latest tag or the latest version label. Now I caution to use this just because it is prone to more convenience, but it's also prone to having less control covers. How large the impact is if that docker image did in fact contain a vulnerability that just sneaked through that was missed because if latest is available, consumers will typically opt to do convenience and use latest, and you may find that you have a lot wider number of users that may be impacted if you were to push a vulnerability to latest versus pushing it to a very specific version as is done in the middle task. The next phase and last phase in our cycle is around best practices when securing the production environment that uses containers. The first best practice I want to share is the concept around network segmentation, specifically in your own time. I encourage to read up on a concept called nanosegmentation when working with containers. Just like with any other infrastructure, it can be pretty segmented and locked down to have security policies that limit who can connect to it as well as limit what other infrastructure that component can connect to. So with containers we're going to do the same thing. We want to wrap containers around a subnet or even be more nano about it, and wrap individual containers within multiple subnets so that it's further segmented and have pretty strict policies in place to limit what can connect to it and what the container subnets can connect to. Again, this is great to minimize the potential impact radius if there was in fact a vulnerability that was exploited with that infrastructure or that container. Next up is a great preventative measure to prevent denial of service or a depletion of container resources, and that is setting resource quotas. By default, containers have no resource contain. So if a running container was hijacked and for some reason it started really consuming all the cpu, memory or other infrastructure resource, it could deplete all of it. So the example I show here is how we can use kubernetes. The same can be done with Docker as well. But in my example here I show how I do it with kubernetes to limit at each either namespace or at the container level, what is the default number of cpu and memory allocated to each container, and what is the maximum that can be granted to that container. Another best practice is around continuous container monitoring. There's three pieces to that environment, hardening, vulnerability assessment, and runtime threat protection for nodes and cluster environment hardening is effectively any solution that performs container monitoring, such as Azure. Microsoft defender for container should provide these three solutions environment hardening checks to see if there's any misconfigurations or configurations that are not secure. For example, if there are no resource quotas, Microsoft Defender would flag that as a vulnerability in its continuous monitoring. Vulnerability assessments performs the same thing we did earlier in our CI pipeline and just scans for vulnerabilities in container image dependencies. But why do we will need to do that again and continuously? Well, the reason for that is vulnerabilities can come up at any point in time in the future. Not all vulnerabilities are known from the get go. So as you have images that pass vulnerability scans and now they're in the registry and they have running instances or solutions from those images. You want the ability for a platform to be able to continuously run vulnerability scans and map any vulnerabilities to actively running containers so that you as a team can determine how to best mitigate and minimize the chances of security issues. And then the last piece is runtime threat protection, which ultimately will scan the behavior of each running container and just raise any anomalies, whether it's the container doing a highly privileged operation like user management at the cloud level or at the active directory level, or whether it's the container performing a highly privileged operation against some other core piece of infrastructure that it supposedly typically has not done before. So any deviations in behavior would also be flagged in this slide, I encourage you to look up what Azure's container protector tool offers and what it checks against, and the particular link that you'll be able to look in your own time is center for threat informed defense. Teaming up with Microsoft to really build the notion of the attack container matrix that outlines all of these different checks that are performed by these tools as part of runtime threat detection. Here I wanted to provide a sample vulnerability assessment provided by the Azure container Defender solution, and you'll see, as I highlighted here, it does surface certain infrastructure misconfigurations. It surfaces things like active container image with running instances that have vulnerabilities installed and then also checks for kubernetes to see that it has certain azure policies enabled for further protection. Couple resources I also want to share here the top right or top left QR is Microsoft's commercial software engineering playbook. As it states in the slide, this is a collection of fundamentals, frameworks, just general best practices and examples that both myself and many other individuals have contributed to over the years. It's open sourced, so we continue updating this as better practices or new best practices are surfaced and in the bottom left or bottom right QR code is our open source for dev containers. I really like to share this out because it offers a great storing place on what good practice well defined docker images look like. Dev containers are a little more specific in nature as that allows vs code to run within a containerized environment, but that's another story for another day. Great resource just to look at best practices on Docker container or container image definitions and that wraps up our entire lifecycle. If anything, I really want to share five key takeaways. One is make sure that the entire team has awareness on container developers practices. It's going to make them feel more bought in, more informed and educated versus making it seem like it's just a lot more requirements and work that is being brought down to the team. Second, enforce RBAC policies to prevent individuals from disabling control gates at the CI pipeline level. This tends to be something that's overlooked in my experience, and it is a vulnerability where if a developer or a team is really in a rush mode, they might want to disable contain control gates that are there for a good reason. So really limit who can manage those control gates and limit individuals that can perform those operations. Third, hold all members of the team accountable for adhering to secure container management and make sure that they know that they can hold each other accountable as well. After all, security is a team effort and everyone is responsible for raising issues and or covers. Fourth is depending on the level of maturity around developers that you are experienced with or working in, there may be need to influence change. And like all things that require influencing, it's most effective when done as a community and when individuals connect business, the business mission and the business success criteria to these principles of security as well. And last but not least is probably one of my favorites. Decisions are all about ratios between convenience and security and there is no servo bullet. Everything needs to be custom tailored based on industry, based on solution, based on who the containers are. But one of the key thing that I've learned in my experience is especially when starting off at the beginning, weigh security heavier and covers time, you'll find that it's less costly and easier to shift the balances to find the right ratio between convenience and security. And the reason it's less costly is because if we were to shift that and weigh convenience heavier over security at the beginning, that is set up a potential for a vulnerability to be exploited and for there to be a data breach or some other type of attack. And with that, I'd like to conclude by thanking you all for attending and wish everyone continue having a safe rest of the calendar year. And in case we don't get to touch base later, wish everyone a happy new year in 2023. Thank you.
...

Adrian Gonzalez

Principal Software Engineering Lead @ Microsoft

Adrian Gonzalez's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways