Conf42 Cloud Native 2023 - Online

How to implement Security as Code?

Video size:

Abstract

The talk will discuss the implementation of Security as Code in a large-scale DevOps environment, covering benefits of automating security testing and its challenges for larger teams.

Summary

  • Security as code is the practice of integrating security controls and tools into the software development process. In the last year, we had over 25,000 vulnerabilities that are known there. This combined with a lot of ransomware stuff, is something that should concern us. Code can help us with security as code going forward.
  • The development team mostly spends time with the source code, local development and git, while the security team focus on the production environment. The testing tools that work for cloud don't help developers and that's why the feedback process is really slow. To implement security as code properly you need something that helps you smoothen the process.
  • Security starts with local development, is applied in source code, CI CD. The next part that we are looking at is package vulnerabilities from container perspective, but also from runtime perspective. We need to really focusing on securing every layer and every phase.
  • GraphQl can query the whole infrastructure, ranging from Windows, Linux, AWS, Kubernetes, and even terraform. CN spec is essentially securing everything from development to production. Open source policies and query packs available that really like you can apply it.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Mondo and I'm super excited to talk about security as code at Conf 42 cloud native 2023. I started my career at Deutsche telecom parental team mobile, and one of the biggest challenges we had was really securing a large of critical infrastructure across whole Europe. One way to do this was really going with full automation. So we had an internal project going on where we tried to figure out how we can automate all those different security requirements in a way that can be applied really easy. A lot of people try and they said, no, this is not going to work, don't try it, it's a waste of time. And we said, challenge accepted, let's just try. How can we implement security in a way so that it can be used for production environments on a daily basis? And the trick was to make it, we call it practical security. We made it so that everything was tweakable. It has a good amount of default, so you could work out of the box with this. But it was focused on production, so it had definitely most of the security features enabled, and that was really, really successful. What was missing was the insight of how good am I doing across my fleet? What's the next thing that I should start? Where are the servers that are not fully automated? And getting those visibility capabilities into the overall infrastructure is actually quite complicated. So what we started then was the first policy as code engine called Inspec. And we started this project in 2014 15, where we really helped companies to automate all those like pdf long boring requirements in an automated way so that they could quickly assess a huge amount of infrastructure in a very structured way. That company was quickly acquired by chef Software, and I was leading the engineering team for compliance and made it really big so that chef sold Inspect to Fortune 500 companies, the Department of Defense, and so on. So I have a strong background in policy as code, and we definitely look into how policy is. Code can help us with security as code going forward. So when we think about security as code, what is it? And security as code is really the practice of integrating security controls and tools, essentially into the software development process. And we will see how we do it. And the first question before is, really, why is that important? Why should I care about security as code? Covid? It's more effort. I need to do something in my pipeline. I have already my production environment secured. Doesn't make sense, right? So also, hackers are cool, they're going to help us. So that's how it used to look like. But nowadays it's more like this. We have ransomware everywhere and that's not individual hackers. Those are ransomware gangs. They try to do it everywhere and it behaves like a company. They have sales pullers, they have sales playbooks, they have customer support, and they have affiliate programs. So essentially it's an illegal business making a lot of money. And that is just easy for them because we have so much vulnerable infrastructure out there. Okay, so now we have a huge amount of hackers, a huge amount of crimeal gangs trying to attack everything that is connected to the Internet. And then if we combine this with the amount of yearly published cves, a CVE is a vulnerability that is being published publicly, and we see a 20% increase over time, year over year. Just in the last year, we had over 25,000 vulnerabilities that are known there. We assume this is just the tip of the iceberg and it could be really a lot more. And that, just in combination with a lot of ransomware stuff, is essentially something that should concern us. Once a vulnerability is being detected, hopefully by a person that reports it properly to the vendor, they report a CVE, they get a CVE number, then a CVE get assigned. The vendor hopefully creates a patch very quickly, and once the patch is being out, the CVE is being published with more details, what went on, so that we can learn from it before the patch is out. Like we call it zero day exploit. And after the patch, we call it just exploits. The interesting fact to see here is that 25% of all the cves have known vulnerabilities. That's a huge amount of those 25%, 90% are available within months after the vulnerability has been published. So essentially a lot of attacks are exploitable within 30 days. This is in contrast to how we roll out fixes. And we roll out fixes very, very slowly across the industry. And it's not because we don't want to, it's because it's very, very complicated. Like first, the identify step. Just imagine in your infrastructure, how quickly can you identify where a specific package or misconfiguration is being applied in your infrastructure? Really, really complicated. And once we have done this, we essentially help. Like, we generate a report, a lot of reds, and out of those reds we generate tickets. Those tickets will then be worked on, hopefully being fixed soon, and then after a while, it trickles down into production. It needs to be tested and works in combination. And then, so the rollout essentially is slow. According to studies, we see that this process takes very long. So it takes 246 days to just get a vulnerability that has been fixed, rolled out in our infrastructure. And that is on average, so we have 30 days versus 246 days. It's a huge gap. So if we see this in combination, all those issues outpace the fix, right? So we have a yearly increase in vulnerabilities. Hackers go full on automation like they can scan the Internet in three minutes and the rollouts are slow. And that in combination is giving the ransomware gangs just a really easy pattern to attack a lot of companies. And we see this in the numbers. Like if you ask a little bit over 1000 it and security professionals, 80% of them have been victims of ransomware and more than 60% paid ransom. And that's a huge number. And that's just caused by the amount of vulnerabilities and the slow response that we have to fix that. The main problems we identified here is that first stuff that is updated has not been patched, it's not even unknown, it's just it hasn't been applied. And then known misconfigurations have not been avoided in production. So it's really those two that make more than 90 or 80% of the attacks possible, and that's totally avoidable. And if we just reduce it from 100% to 20%, the attack vector for those ransomware gangs is much, much smaller. They need to do a lot more work. And then this essentially made the business may not be more viable and we just need to make it way more complicated so that the amount of VTex also are going to be reduced. And the challenging fact here is it's not just that we don't want to ship fixes very quick, it's just how the tooling that we use on a day to day basis helps individual teams. If we look at how the software delivery works in general, like we have the platform engineers, they work from local development, go to source code, put it into git, then it takes GitHub actions, then it goes into pre production and then prod, hopefully with terraform. And so we see a pipeline trickling through. The development team mostly spends time with the source code, local development and git, while the security team really focus complete on the other side, they focus on the production environment and. Right, so the attack vector is really on the production side and so they need to secure the production environment. But you see like the focus is on a completely different end of the whole spectrum and that leads to issues. And we can illustrate that on just one simple part. Let's look at cloud storage buckets and we don't want to have them public in most cases so the security teams naturally is going to AWS, Google Cloud Azure and just says, okay, is that thing configured properly? And if it's not, they say, hey team, I need fixes here, please roll this out. While the engineering team, they think in terraform they automate things. So the language of how we communicate, it's different. So the testing tools that work for cloud don't help developers and that's why the feedback process is really slow. We essentially need to deploy to pre production production before the tooling starts that security is using, so that then they see things in local development, which means we have to deploy a vulnerable software in order to detect the vulnerable software so that we can essentially fix the vulnerable software, which is not making any sense, we expose ourselves to the other world without any need just because the security tooling is not up to that task and avoids teams to get feedback early. Wouldn't it be nice if the team already sees in their local terraform configuration that hey, this is not right, you should really do this different. Hey, in pipeline it blocks the pipeline. If your terraform HDL is wrong, it DevOps the pipeline. If kubernetes manifests are not configured properly. So we have those individual tools, but it's really not helping with the communication because as a company you agree to a rule set. So even the bucket like you've seen, like the language is different. We can check that individually, but what if we need to make an exception? Then that triggers where it needs to align, right? So otherwise you can have tooling here, you can have tooling there. But in order to implement security as code properly, you need something that helps you smoothen the process going from left to right and essentially align you on one common knowledge. And that is what terraform has done for infrastructure. You have this really going from, you develop it locally, you have this state file, a plan file, and they push it in production. You can use multiple environments and it makes it really easy for platform teams to go from local to prod. We don't see this right now in the industry for security and so it's really difficult for them and it leads to massive amount of frustration. As I said, the platform team says, hey, wow, you should tell me how I should do it in terraform. It doesn't help me if you say you need to configure this in the dashboard this way, because I automate my software, the security teams really go the other way. They say, hey, what's wrong? I tell you all this stuff all the time again, so they don't see any progress? It's super driving people crazy. And I just want to say, this is not as humans, we don't want to work together, this is the tooling drives, drives the complexity so that we cannot work together very effectively. And in the end we just say, go to management and say, hey, those security is blocking my pipeline. Or security says, oh, engineering has done it again like super wrong, so we can't deploy it. So there's always this fight between the different teams just because the tooling doesn't fit together. And it's something that we need to figure out. So let's think about the solution, right? And if we think holistically, we first of all need to think about the whole tech stack. When we as a security person look at our tech, we have to secure the whole thing, we need to secure the cloud environment, we need to secure our kubernetes cluster, then the cluster configuration, everything that runs inside of the cluster, going from workloads to application containers and having that unified view helps you to prioritize the risk and helps you to really focusing on the right thing. But that's not enough. We also need to look into the pipeline because from security perspective we have seen many, many supply chain attacks. And you need to move the security into the left side. So security starts with local development, is applied in source code, CI CD, and then that's where we see all those anchors for security as code. You essentially need to go, it starts on local development, it goes into git, checks into GitHub actions, everything is secure there and then it deploys into production. It's very important that you do this on every individual step. Even if I have a local development, everything being checked, let's assume I do a supply chain attack. I can manipulate all the things that have been checked then before they run into production. So that means even if it looks good in git, it's being deployed, manipulated and then applied completely different in production. I think it's great in git. It's correct, but it's still wrong in production. So no matter where you are, you can always start manipulating it. And we need to really focusing on securing every layer and every phase. We already touched a few things. What we need for security as code in order to apply this really structural, you start with static and dynamic testing. You really want to check terraform Kubernetes manifest in the local development phase. You want to check that also in CI CD to always make sure the way we define our infrastructure is up to the task. It's really meeting the best practices. And that's amazing. The next part that we are looking at is package vulnerabilities from container perspective, but also from runtime perspective. Every VM that we are running, every laptop that we are running, they all need to be updated and be up to the task. And as we talked about, all the runtime infrastructure that needs to be checked very continuously. Even if I check the container in my pipeline, you end up in the situation that once deployed in production, if you don't update it very regularly, new vulnerabilities come up and then boom, one week after you deployed it already a new vulnerability popped up. So you always need this view across individual fire s's of the CI CD pipeline. So the CI CD pipeline is essentially the foundation to implement security as code very effectively. If you're not having a full automation, it's really difficult to implement security as code on top. The other part of the practice is really talking about how can we as engineers establish secure coding practices in our review processes, make sure we don't have like we have input validation, we have proper review of source code, and that helps us getting better beyond just the static testing. And we can only, those things in combination really, really drive the security upwards back to our problem. Where we talked about individuals, where security wants to check the cloud and platform engineers want to check the terraform part, we really want to focus on the problem, right? So the problem as a company is we don't want any bucket being exposed publicly, no matter if it's defined in AWS, Azure, GCP or terraform. There's really the foundational focusing on what is our goal and how to achieve that. And in manifestation, that essentially means we really need to see how security can be part of all individual process. We have seen this now, and I argued many times that we really need security in all aspects of that, and not just individually, but also consistently. So just plugging in individual parts is not the solution. You really need a unified view that helps teams to collaborate across those tooling. Otherwise you end up really, in this situation that local development has a checking tool. But then the rule for production is really different, and then you still have the clash, you still have data not aligned, you have agreements not aligned, and that makes the world not better. You have more information, more distraction. So the challenging part is really combining the individual controls, the team collaboration with effective tooling that helps you to build this very fast. So if we look at what infrastructure as code has done, we really want something that allows us to do the same thing for security as code. And one way to do that is having the flexibility as in terraform or in the Kubernetes manifest for security. And the way to do that is using policy as code. You define the security practices in a code that you can reuse and that should the important part, it really needs to be as flexible as infrastructure, as code. You really want to tie terraform HCL with policy as code. You want to tie ansible with policy as code. So you need something that really aligns completely on that level so that we can see things in our local development but also in our pipeline, and then see it in production as a result. So that sounds too good to be true, but let's see how we can implement that in a second. We have seen a few things that are super key as an organization if you want to apply security as code successfully in your organization. The first one is all the vulnerability. Misconfiguration information needs to be available to all different parties involved. Platform engineers, security engineers. They all need to have the consistent view and access to the tooling so that you don't have long cycles ranging from you need to deploy it to production just to get the report. The other one is coverage. The security tooling needs to support built in runtime. Otherwise it's not helpful. If it's not including both, it's leading to two silos. Still, the rules are different and it's not helpful. That still drives all the craziness that we had before. The next part is automation, building security as code. The primary goal here is to build automation security into the automation process. So the security tooling needs to adapt to the process you have already. You need to integrate that into the pipeline so that you can easily do this everywhere. And then of course, as I said, extensibility. And this can be achieved through policy as code where you really define individual roles on your own. Hopefully the tooling provides also out of the box policies, which makes the kickstart when you're trying to implement that. I'm going to showcase how we do this with our open source projects to help companies to be more secure. One part of the CVE discovery, what I brought up early on, is identifying where things are. For that we have a graph based asset inventory where you can use GraphQl to query the whole infrastructure, ranging from Windows, Linux, AWS, Kubernetes, and even terraform and Kubernetes manifests. So that gives you an easy way to quickly question where is what. We will focus right now on CN spec, which is essentially securing everything from development to production and to see it in action. Based on the use case we had earlier, how do we make sure a bucket is not being exposed in this case for terraform, we really want to make sure the block public acls is enabled as well as block public policy. And we want to enforce this across all the buckets. This is good for runtime, but we also said we want to have this in terraform. So in the same graphql based language I can query all the terraform resources, can check for public access blocks and can check are those arguments being set properly and set to true? If not, we already see this in our development part, like on the local IDE, that things are not configured well. So if we now deploy the automation with Terraform, it would lead to a public bucket and that is something that we want to avoid. So doing that early essentially also helps security teams not to worry. And one of the things to make this work in combination is we want to define a policy that is totally consistent across all those different teams and it's focusing on the implementation check. So the bucket should not be public. And based on the technology, the same check has multiple variants. It essentially says, hey, if I'm checking terraform, apply the terraform check. If you apply it against AWS, check the AWS rule. And that makes it super, super easy as a team to define the rule set that you want to have to make your infrastructure secure. Now you can combine essentially going from left to right and to kickstart it. You really don't want to start building everything yourself. But instead we publish a huge amount of policies. We have over hundreds of policies available with more than thousands of checks that help you to start being secure. Now check out the mondoo registry. We have open source policies and query packs available that really like you can apply it. Today it's open source, don't wait. Secure your infrastructure. And that really helps us as teams to be more secure. I just illustrated very quickly that you can now apply check in local development. You can check your terraform in your iDe, you can check it into git, you can check it in a GitHub action, you can run it in production against an AWS account and then do it also in production. So now not just, well, the platform engineers use Terraform to do all the things. The security team can now work with the platform engineers to assign policies for the whole pipeline and really make it work so that you as the whole team are being more secure. So we are working for a company that makes security posture management and that helps you to be more secure. We built a platform that we are using. We are using all automation, kubernetes, terraform and we use the product on a day to day basis to secure us. But we work also with a huge amount of large enterprises in healthcare, financial and manufacturing to secure their infrastructure and have a learning there. So thank you very much for listening to me. Hopefully this was helpful. In case you have any question how to apply security as code in your environment, feel free to reach out and let me know. Happy to help. Thank you very much.
...

Christoph Hartmann

Co-Founder & CTO @ Mondoo

Christoph Hartmann's LinkedIn account Christoph Hartmann's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways