Conf42 DevSecOps 2022 - Online

Taking Your DevOps Tooling To The Dark Side

Video size:


We support battle-hardened production applications every day. Learn how to protect the tools that get our jobs done, while accelerating our ability to connect systems together using programmable Zero Trust Networking technology powered by OpenZiti. No open firewall holes and exposed webhooks.


  • Currently, I'm leading the DevOps team at Netfoundry. We call it our Rav team, which stands for reliability, automation and visibility. I've used a lot of DevOps tools over the years. I'm generally pretty opinionated about which tools I like, which ones I don't.
  • Netfoundry is in the business of zero trust networking. What we assume is that the network itself is already compromised. It forces you to think about security differently in terms of what really needs access to what.
  • The problem in the DevOps space is pretty much every single tool that we use is an absolute gold mine for an attacker. Most people that I talk to are not comfortable with how they've set up their developer access. What do you do? You open the gates wide open.
  • The way that we secure our DevOps tools in particular, we got to step up our game. OpenZD is a zero trust networking platform. It's a way to connect systems together while stepping up security.
  • ZD allows controlled access between data centers. It's a massive step up from traditional networking or traditional VPN. The traffic is always encrypted from the source all the way to the destination. There's an SDK to create full end to end encryption from one application to another.
  • Netfoundry was challenged to try to dog food OpenZD and figure out how to use it internally. What he found was that it allowed him to step up his security game without introducing friction. And it really doesn't make the job harder.


This transcript was autogenerated. To make changes, submit a PR.
All right, well, welcome everybody. Thanks for joining. As I mentioned, diving into taking your DevOps tools to the dark side and stepping up your security game in the DevOps space. Quick back ground on who I am. I've done a mix of software development and DevOps for the past 13 years. I started in software engineering using support and web development. I moved on to infrastructure monitoring. I'd managed CICD for a shop that had over 100 DevOps, and it was just two or three of us DevOps guys trying to support 100 of them. So I'm used to things being busy and moving fast. After that, I moved on to building a site reliability department at a billion dollar company where we built a site reliability team. We built a NOC and all the visualizations and response procedures and so forth from the ground up. Currently, I'm leading the DevOps team at Netfoundry. We call it our Rav team, which stands for reliability, automation and visibility. And I've used a lot of DevOps tools over the years. I would say I'm generally pretty opinionated about which tools I like, which ones I don't. And that just comes from I've experienced both the gains and the pain from working from all sorts of different technology, ranging anywhere from config management, monitoring, automation, you name it. I love learning new tech. I love constantly growing and evolving my skill set and just continuing to learn about new tools that are out there and are available. So let's talk about zero trust networking. This is a term that is emerging in the industry, and a lot of people are throwing this word around. And what is tricky about it is it's become a marketing term. And so there's a lot of different definitions out there. And so it's one of those things, it's like DevOps to where it's almost grown into a term that has no real meaning. But I'd like to cover how we define it at Netfoundry just because we're in the business of zero trust networking. So traditionally, with network security, the security is built around a perimeter based model. It's the idea that I've got a firewall and I put everything that's important and needs to be protected inside of that firewall. It's kind of like a castle model where it's like I've got a castle wall. Everything inside it I just consider to be safe. I trust it, and I don't need to worry too much as long as I've got the firewall up and my doors closed to things that are on the outside that are threats. In a zero trust networking world, what we assume is that the network itself is already compromised. There's already bad actors inside the castle walls, and we should not assume that anything inside those walls are safe. And so it's assuming compromise in those resources already, and it forces you to think about security differently in terms of what really needs access to what. And let's not assume that these resources inside the perimeter are safe. All right? So in the DevOps world, in the Devsecops world, why do we care about zero trust networking? Why does this matter? Why is this important? Why are you even here at this talk? So we have a problem in DevOps, and a lot of us are not necessarily conscious of it, or we just kind of close our eyes to it and we're not really thinking about it. But the problem in the DevOps space is pretty much every single tool that we use is an absolute gold mine for an attacker. Let's take our CI CD system. Well, that is a fantastic way for an intruder to compromise that. And they can begin to inject code anywhere throughout the infrastructure, pretty much anywhere where our deployment systems go. Those are the places that matter in our infrastructure. Imagine if that was compromised. Now you got somebody that can deploy and execute their own code anywhere throughout your infrastructure. That's a critical problem. Let's take monitoring. Well, monitoring is just a data mining platform for your infrastructure. Anything that's important is going to be monitored. And so you've got one central stop where all your data is going to give you a fantastic inventory with various forensics about your systems and generally IP addresses, address information. It's fantastic way to gather everything important about your infrastructure and store all that data in one place. Let's talk about etls. For those of you who work in data warehousing, it is a collection of loosely hacked together scripts, typically all sorts of jobs that mine data from all your important data sources and store that typically into a data warehouse, which is a one stop shop for all your data. Again, fantastic target, because this needs all sorts of access to everything. All of your important data sources are exposed to your ETL system, to your data warehousing system, and brought together into one place. If any of those get compromised, somebody's pretty much got your whole infrastructure. Config management. This one's my favorite because it's your one stop shop, typically for root access to everything. If you want to take everything down in your infrastructure or exploit the infrastructure, plant something, whether it be crypto mining, a rootkit, you name it, config management is a fantastic targo. Once they get into the config management system, you're done. They've got everything. And then the last one is developer access management. Most people that I talk to are not comfortable with how they've set up their developer access. Typically, people are having to grant all or nothing. They got to give prod access to it, and there's not a great way to specify any kind of granularity and so forth. So if you ask them, are you comfortable with the way all your developers have access to your systems, most people will give you kind of a look and shake their head and so forth, or give they're not comfortable with it. And you wouldn't turn an auditor loose on your developer access or even your support access half the time. Typically, you grant a whole lot of access to the people that you know need it, because when things broken, you don't want to introduce friction. So what do you do? You open the gates wide open to a point to where everybody's a little uncomfortable, but at the end of the day, you got to get your job done. All right, so how do we deal with security in this world where at the end of the day, things move fast, we got to get stuff done, we have to move forward? Well, we've got audits and so forth to try and protect our systems, and we got to pass checkboxes for security team. And so what do we do to get it to survive the audits? Well, typically, most places I've seen, they apply the audits to the production application. So in DevOps, we're often in charge of deployments and monitoring, but usually we're monitoring some sort of production system. And that production system, typically is where we apply the scope of our audits. And we don't usually include the DevOps tools and the monitoring tools and things like that. Those are peripheral systems. And at some point it's fair, because you have to apply a logical scope to things. You can't just include your entire ecosystem into a security audit. It's too hard. You got to separate non prod and prod. You got to separate CICD and so forth. You have to create some sort of scope as a safety net. I've seen a lot of liberty taken at some places more than others to where they will make that scope really narrow to pass the audit. Most places are not doing a really broad scope that includes all of their DevOps tools because they're support systems. They're not actually the production applications as exposed to the public. And so we do what we need to do to pass the audit. But really, I want you to think about as kind of a gut check. Would you turn a pen tester loose on your monitoring system? Would you turn a pen tester loose on your CI CD system or your data warehousing? Would you be comfortable with introducing somebody that was looking for exploits throughout your system? And most people would probably say no. And the same thing with your developer access. Are we comfortable with the way that we've specified that when we lock down their permissions, most people are not, because these are things that we use for support. And again, they're just things that we use to get the job done. But they're not the things that we want to show to the world we put out there in public. Because at the end of the day, we need to access systems, we need to fix things when they're broken, we need to monitor them, we need to wire systems together. It's a lot of what we do within DevOps. So what I'm introducing is we start to think about this problem differently. The way that we secure our DevOps tools in particular, because every one of them is a gold mine, we got to figure out a way to step up our game. Because the truth is, in the industry, this is how people are getting into systems. This happened with forgetting the name of it now, but we've seen it where they got in through the monitoring system to where they injected exploits through the monitoring system through automatic updates. I've seen a major breach get in through the CI CD system. Why? Because it allowed them to inject code in all sorts of people's infrastructure. It's just a fantastic way to get in. So how do we secure these things and lock them down and step the game and make them more secure? The context, which I am talking about, zero trust and making these things dark, is a tool that I've learned to use called OpenZD. It is a zero trust networking platform. It's a way to connect systems together while stepping up security. The idea behind it is that, first and foremost, stop leaving ports open, stop leaving ips open. Make it dark means we basically cut off all ingress. You start with the firewall rule or security group policy that says no ingress, nothing gets in. And some people ask about zero trust. VPC peering, is that zero trust? No. If you've got vpcs peered together or peering between your private data center and a cloud data center, if one of those gets compromised, the other one gets compromised. We got to step up, we got to lock things down tighter than this, we can't leave it like this anymore. So the key concept with zero trust is that you get away from ips and ports entirely. The only thing that matters in the zero trust world is you've got services and you've got identities. You've got basically destination addresses that people need to access. And then you've got some form of identity, whether it's a cell phone, whether it's a laptop, whether it's a server, everything's an identity. And the only thing that matters is that certain identities need to talk to certain services. And so we manage that access through service policies or at netfoundry we call them applans. So in a zero trust world like this, with no ingress, how do things actually talk? So what we've got here is a diagram. So I want you to imagine that this is a diagram of simply connecting one data center to another data center without peering them, without compromising both of them, if you've got a compromised entity. So the idea is that, imagine ZD fabric is you've got this fabric mesh, it's running in a public cloud. And the idea is that everything dials into this mesh. And initially it may dial in, it's got a connection to it, a persistent connection, but nothing can talk as of yet. You still have to create policies that define what talks to what. Kubernetes has things like this, but it only has it within the cluster. This is something that you can put anywhere, across any cloud, across any region, across any cluster, anywhere that you need it to be. You can place this type of mesh and define policies, because again, everything is trust, a service and an identity. You define what needs to talk to what. And zero trust comes in. So that, for example, in this image where I've got a compromised entity, it's not explicitly defined in the policy, therefore it's not able to access the private resource on the right. So on the data center on the right, imagine you've got no inbound ports open at all. There's no ingress at all into the entire data center. The only thing that can get in are identities that are explicitly listed in the policy. So the way a service policy works, ZD is built around a concept of using tags to tie things together. So what I would do is if I've got a set of identities that I wanted to talk to my data warehouse, I might tag them with a data warehouse tag. And then I've got various services related to the data warehouse. I'm going to tag those with data warehouse. All I'm doing in creating an applan or a service policy is just saying, okay, this is my policy. Data warehouse talks to data warehouse, and I'm done. If you've ever worked with active directory groups or permissions for users, it's the same kind of thing. You put them in groups and that determines their permissions. It's the same kind of thing, except you're doing it at the network access level. All right, so just how dark can we get with this? Typically, there's three different models. We call it the network access, host access, or application access. And each one of these gets a little bit more secure, gets a little bit tighter lockdown the farther we go. But let's dig into this. All right. The first one, I would define it as good. It's the idea of creating controlled access between data centers or between, we'll say you can access from anywhere, but you want to grant access inside of a private data center. You don't want to open up firewalls. So the way that this would work is that you'd put a tunneler or an edge router in a data center. There's no ingress and there's no peering between the two. But at ZD, that's actually granting access through that. And so anything that's the device or the host on the left can talk to specified services on the right. It's a decent level of access. It lets you grant access to identities. So at a previous place that I was at, everything that was important was inside the private data center. Where it was tricky was when, say we acquire a new company or we brought in outside contractors, how do we get them in? Well, typically we'd use something like VPN was probably the most common method, but these got really difficult to manage, really clunky very quickly. They weren't very reliable, and they were very broad in terms of the access they granted. What ZD offers as a major improvement with this is that everything is explicit access, so that if you just need an identity to be able to talk to one specific service inside of that data center, you can set up your policy to do that. They only have access to the things that are specifically granted. You're not exposing your entire data center in this model. So it's a massive step up from traditional networking or traditional VPN. This is totally like a VPN on steroids, but a lot easier to manage as well, because, again, you're not managing ips and ports, you're just creating simple policies that say these identities can talk to these services. A better model we'll call this the host access model. This is where you've got tunnelers installed on the hosts or identities themselves. And so this is where you might have a ZD tunneler running as like a system D service on a Linux box, or it might be running as an agent on a Windows machine. And you actually set up your services to terminate on local host. So the traffic is never going around unencrypted anywhere. So it terminates on the host, so it gets tunneled through the mesh, which is completely encrypted. It exits on local holes. You can use this for SSH access, you can use this for web access, you name it. If you're running a container, you'd set this up as a sidecar container. And again, your service is terminating on a local host address, so it's never going outside of the host itself. This is a better model because you never have kind of a period of time to where your traffic can be intercepted unless something is on the host and the host itself is already compromised. Well, how do we get around that scenario if we need to go even further? That's where the best model holes in, and that is a fully application embedded implementation. One thing that's cool about ZD, and this is something that's pretty unique to ZD, is that there's an SDK to where, if you want to create full end to end encryption from one application to another, you can actually do that. Each application becomes an identity, and it can access a service that is essentially inside of the other application. You never have a point of unencrypted traffic ever. You never have to open up any ports ever. There's no ingress anywhere in this model. And essentially everything is dark the whole time. This is what we at netfoundry and developers of OpenZD, this is what we consider to be true. Zero trust, where nothing has access. There's never a point where you trust the traffic to be floating around in the open. The traffic is always encrypted from the source all the way to the destination. And this is the ideal scenario in a zero trust world. All right, so our internal use cases at Netfoundry, we were challenged to try to dog food OpenZD and figure out how to use it internally and learn from it. And what are the gains, what are the strengths? What are the weaknesses? And how do we leverage this and learn from it? And so I was admittedly a little bit skeptical when tasked with doing this. Just as a full disclaimer, I'm not a sales guy. I'm not an Openziti developer, I'm a DevOps guy. And so for me, I'm constantly trying to keep up with the rat rates of everything that needs to be done and constantly under deadlines. I need to get things done, I need to wire systems together, I need to automate stuff, and I need to move fast. That's the nature of my job. And so when tasked with things like, okay, we need to lock things down and tighten up security inside, I'm groaning because historically, this has always been like a really painful process for me. So I went and started looking at my systems, but taking an honest look in terms of what do we have and how can we make our systems more secure, and what systems do we need to make more secure. So I began to go through the lists, and these are the lists that we kind of chose internally to try and dog food. We did a data warehouse, we did our CI CD system, we did our SSH access. With ZD, we actually moved away from slack and used an open source tool replacement called Mattermost to where our internal chat for our company is actually accessible only with ZD. Grafana was something that we made dark. It's tied to lots of different data sources that are important. And so because it's a one stop shop for data, we decided to lock that down and make it dark. And then we've begun using ZD internally for support access to applications. We're running ZD in a sidecar so that we can get to things without opening up additional ports and security groups. We don't have to make any security group changes. If we need to access something, we just grant access through ZD instead. All right, so my reactions to doing this, like I said, I was skeptical. I'm not a huge kool Aid drinker, even of places that I work most of the time. Like I said, I'm super opinionated about the tools that I like and that I don't like. I like tools that allow me to get a lot of mileage. They allow me to get a lot of things done quickly, and I can reuse it for lots of different projects. What I found when I started working with OpenZD was trust, that it allowed me to massively step up my security game without introducing all kinds of friction in previous places and more traditional networking shops. Typically when we would lock down things and introduce more access controls, that inevitably meant breaking things and making our job a lot harder. And what I found as we started to shift towards this, was that we could actually do it very easily and we could actually pre validate everything ahead of time, because the way ZD works was that we would set up all the policies and set up all the networking, and that everything would work prior to us locking it down. So even though we were moving towards zero trust in a brownfield environment where we had existing deployments, it really wasn't difficult to migrate into it because we were able to enroll everybody. We got all the identities set up, we got all the policies set up, and we were actually able to verify was the zero trust networking working, using the traffic as an indicator to where we could actually see are people migrated and are they using ZD to access things instead? And we could see based on the traffic that they were. And so what happened on the switchover day, which typically I was used to this being like a really fireworks intensive day where there was lots of broken stuff. When we switched over our first tool, which was the data warehouse, to zero trust, it was a complete non event because we were able to validate everything ahead of time, and we were able to see, based on the traffic patterns, that everybody was already using the zero trust model to access it. And we already knew that it was working because the traffic was going by. And so when realizing that this was really not a big deal, that's when we moved on to our CI CD system and Grafana and other such things, to where now that's trust, how we set up access from day one to where every new tool that we stand up, every support system that we stand up, is zero trust from day one. And it really doesn't make the job any harder, because in terms of the user's ability to access it, as long as they have their ZD agent running on their machine, the way that they access systems is really no different than it was before. But the systems for anybody who don't have ZD are completely dark most of the time. You can't even resolve the DNS for the addresses that they need to access, because with ZD you can use fake DNS addresses for your intercept, and so you can have completely phony addresses where people don't even. You don't even know the actual ips and you don't even know the actual addresses of the services that you're accessing. The other thing I think that I've appreciated with all this too, is that when I need to set up access to things, I minimize the amount of red tape that is needed, because I'm no longer punching holes in firewall or opening up security groups, because with ZD, if anything, you can lock things down further. And so it's really easy to pass an audit using Openziti because you say, yeah, here's my security audit, my security group rules and my firewall rules are no ingress, nothing can get in. So I would feel perfectly comfortable turning a pen tester loose on my CDI CD system now because I can look at it and say, yeah, go ahead and try to attack it. There's no open ports, you can't get in unless you've been issued an identity and issued trust within the ZD network. So if you're more interested in learning about OpenZD in terms of what it can do, this is an open source project. It's OpenZD GitHub IO. There's also a blog at OpenZD IO if you're interested in more of a cloud hosted solution. We've got Cloudhostedzd at nfconsol IO, and there's a free sign up option, up to ten endpoints that you can try it out, as well as a getting started wizard and so forth. But the idea is that get started with it, try it. And when you realize that you can tie systems together very quickly without production, allows you to still be in a world where you're moving fast, you're tying systems together and you're not creating this classic security problem and you're not opening up your tools to the world. So I challenge you. As you work in the DevOps space and the DevOps space, lock down your tools, treat them as a first class citizen. Stop leaving them open to the world. The way that people are still getting into systems, more commonly than anything else, is scan and exploit. When you leave your ip is open, and when you leave your ports open to the world, assume somebody's going after them. Even if it's inside of, even if it's inside the firewall, assume something's inside already scanning and exploiting. So step up your security game. And that concludes my talk for today. Thanks for everybody for joining.

Mike Guthrie

Senior Devops Engineer @ NetFoundry

Mike Guthrie's LinkedIn account Mike Guthrie's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways