Conf42 Incident Management 2022 - Online

Policy as [versioned] Code - you're doing it wrong

Video size:

Abstract

In this talk Chris will trace back the origins of how policies are often incepted, how it can get out of hand, be slow if not impossible to update and measure compliance, and often lead us to question of is the policy helping or hindering. From this talk you’ll learn how to use a software development pattern and product ways of thinking towards how your organization can manage policy; achieve continual updates to policy allowing the risk mitigations to move as fast as the risk does, not get in the way and be easy to measure compliance.

Key take aways: - Policy often causes more harm than good, is slow to update, exemptions are harder still to manage, measuring compliance at scale is near on impossible. - Throwing some curly braces at a problem is not the solution. Policy if it is articulated as code, needs to embrace all the best practices of code. - Purposeless policy is potentially practically pointless. (now say it 5 times quickly)

Summary

  • Youll. Please do leave a comment and let me know who you are, where you're joining from. It'd be really kind of great to get a feel for who youll guys are and who's interested in this sort of thing.
  • Setting and changing policy is often slow and hard to communicate. People ultimately go off and do their own thing. So what are the chances of hindering you in my imaginary lift today?
  • Staying on top of patching things so we can react to the next fire. Writing consistent, good quality code and avoiding technical debt. And then seamlessly communicate that to the people that need to consume it without derailing them. If any of that sounds a bit familiar, I may have some answers for you.
  • Chris Nesbittsmith: Can you leave a comment of policy breaker in the chat? He says policy can often lead to code generally going wrong. This could result in inconsistent deployment in an inconsistent state. Chris will be around to respond to questions and heckles.
  • In software we're used to handling dependencies. What if your policy was just another dependency? Consumers of this policy need to be able to test themselves against the policy locally and also in CI CD. Think automatic pull request best, even auto merging if you like.
  • The situational awareness piece around software supply chain is something your organization is hopefully already thinking about. Here's how youll might be actually able to do this. Create an example GitHub organization here. Allow multiple versions of the policy to coexist and evaluated within a single runtime.
  • The way the policies is actually designed and written and distributed lends itself well to coexist with itself in a Kubernetes cluster. When the risk landscape changes, your policy can move with it. The real culture change is needed, and the exemptions of that is likely a long series of talks.
  • Chris Esbet Smith wants to start building test cases with other willing organizations. Purposeless policy is potentially practically pointless policy. Questions are very welcome on this or anything else. Be around in the comments for a while.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Youll. Hello. Thanks for joining me here today. It's a pleasure to have you. Please do leave a comment and let me know who you are, where you're joining from. It'd be really kind of great to kind of get a feel for who youll guys are and who's interested in this sort of thing. Okay, so the elephant in the room policy is kind of whole thing to make it sound sexy, but I'm going to try and at least get your attention, so just bear with me for a minute. So to set the scene, I'm in a lift. Yes, american friends, we really do call them lifts. And four people walk in. I think to myself, Chris, this is youre moment. Now or never. As the doors close, I position myself in front of them, a captive audience. They're mine. I've got them. I take a breath. I look to the first person on my left. She's in a suit. She looks really important, too. I gesture to her. See? She looks back at me as if to say, yes. Go on. I she nods. Oh, perfect. The CIO, the policymaker, the one whose neck is on the block. What are the chances of hindering you in my imaginary lift today? Well, I ask her, well, what keeps you up at night? And she tells me, I don't know what teams are really doing, what the volume of risk is, and what I should be showing more interest in. Setting and changing policy is often slow and hard to communicate, and people ultimately just go off and do their own thing. They think they know better, and to be honest, often they do. But ultimately I'm then left playing catch up with the risk that they've signed me up to. Okay, I say, trying not to sound like a patch noising snake or salesman. I can help. I turn my attention to the second person in a suit. They look maybe slightly less important. I make a guess. Let's trace it. It's my imagination here. It'd be weird if I was wrong. Product manager, I say. They nod. The whip cracker, I say. What's important to you? So managing risk, mostly opportunity risk. So the fear of missing out, getting features out the door and avoiding getting bogged down with. They glance to the CIO Bureaucracy. That feels, though, it's almost designed to slow me down. Awesome, I say. This is your lucky day. I turn to the next person dressed in overalls. I'm in a trendy part of town. They could be the CTO. Before I ask, they sense me staring at them. Cleaner, they say, how did you get in my imaginary lift today? Okay, I'll come back to you. My attention goes to the last person. Hoodie, headphones around their neck. My stereotypical developer. Yes, I know you well. What code do you write? Ask. It doesn't really matter. Python. Cool. Have you got everything? Updates to I work with. I pause. Python three? They offer. Yes, Python three. That must be hard, I add. They don't know it yet, but I've kind of just won a bit of their trust, which is important, nearly. They say. Cool, what's important to you? So, staying on top of patching things so we can react to the next fire, knowing what rules exist, what ones I can bend, break, and ultimately what might cause me to lose my job. Writing consistent, good quality code and avoiding technical debt. The rest of my team being able to cohesively work as one. Do you use any tools to help you with that? I ask. Yeah, Linta's code. Quality test coverage. The can I use all remit. Great, I say. I wrote code, too. Let's be friends. And I hand them a printed QR code. Here's my public GPG key, so you know that you can trust what I say. I return my focus to the cleaner. I've got it. How do you get told what to do and then when it changes? Well, we get a memo or something stuck to the notice board. Last week we got a memo saying that all the meeting room whiteboards needed to be cleaned every night. Interesting, I say. How does that then work out for you? Well, it's up to us to then maintain that to do list so we can onboard new people and just kind of keep a list of what we're up to. Does it go wrong at all? Yeah, sometimes. If when we compile that operational manual and we miss a memo or don't apply everything all in the right sequence, we get things wrong. They glance apologetically to the product manager, like when we hadn't updates the guide that the meeting room on the third floor was being used as a dedicated war room and we wiped all their boards down. I looked to the dev. Sound familiar? They nod. Turns out we're not all special snowflakes. Hey, all is not lost. I knew there was a reason that I imagined you here today. The lift is slowing. I feel it coming to its destination. Great. I've got the silver bullet for you, too. The CIO looking to me, ready to buy literally whatever it is I'm selling. They ask me, as the does open, who are you and what team are you in? As I move out the way. So to stop obstructing the door. Oh, I don't work here. I'm just here to fix the lift. People have been complaining it only goes to the top floor and no matter what button they push, and it's actually pretty slow. My audience storms out furious, heading towards the stairs and the door shut. I get back to my job. Okay, so if any of that sounds a bit familiar and you can relate to my imaginary friends, I may have some answers for you today. So what if I said you could update policies easily, even releasing several versioned updates, not just in a year, a month, what about ten updates in a single day? And then seamlessly communicate that to the people that need to consume it without derailing them. You could have visibility on compliance using tools that you maybe already use, and that policy could be readily consumable, easy to pass, demonstrate compliance, make sense, and not be bureaucratic to change when it needs to, and not ultimately going to get in the way. That same policy could be treated as a dependency and operate like a linter. So you can run compliance checks locally in CI and then ultimately kind of guard production that multiple versions of the policies like a dependency are supported. So emergencies like you must update now because there's no and vulnerability type updates are, in effect of business as usual activity in order to actually communicate to those that need to know. Does it sound interesting? Great, let's crack on. So hopefully I've got at least a little bit of your attention. It's time to introduce myself and start explaining things. So, my name is Chris Nesbittsmith. I'm currently an instructor at Learn KX and also control plane, a consultant to the Crown Prosecution Service which is part of UK government and a tinkerer of open source things, I've spent a fair chunk of my kind of professional career now working across UK government and large organizations where problems like this are rife. I've been promised we'll have time to be able to kind of handle questions and heckles or whatever. So please do leave those in the chat as we kind of go, and I'll be around to try and do my best to respond to them. If I don't get back to you, then please find me on LinkedIn or message me somehow. So while this is not a live or in person recording where I could ask you to say raise your hands, we can still try some audience participation. So if you could leave me a comment of policymaker if you're with my CIO and have ever set, written or applied policy before. So if you've ever written, set or applied some policy before, say like some coding styles or anything else, like cloud policy, things like that. Anything like that. If you've written it or applied it or implemented it, can you write policymaker in the chat? Policymaker. Cool. Hopefully I've given you enough time. So next round, if you've ever maybe sought exemptions or consciously bent, broken, circumvented, ignored, bypassed, whatever, a policy with hopefully, let's just say maybe some at least good intentions, youll you leave a comment of policy breaker in the chat. So if you've ever kind of worked around a policy, done a workaround, circumvented it, found a loophole, everything like that, kind of hacked around with it. Policy breaker. Well, great, you fell for it. Hopefully, if I've got anyone to do this. So we've got all of your names and employers details down thanks to the organizers. So you can put your phones down, lend me your ears, give me your full attention. Ultimately, the stakes just got raised. So where do I see policy as code generally going wrong? Well, before we dig into that, what do I mean by policy? Well, it usually comes in one of two forms, so security enforcing like data at rest, being incepted, for example, or maybe consistency enforcing such as code style, with tabs being two or youre spaces perhaps, or maybe you can think of some others, but in any case, it's hopefully intended to mitigate a risk of some sort. Hopefully with the best of intentions, though, these are often emotionally led rather than being grounded in a proportionate control, which then lends itself well to being the open door to case by case exemptions being required when you come against a situation that you weren't expecting. So this actually is not unlike how typically the laws of the land are created with causes law making for a complex if not impossible to navigate rulebook, and then ultimately, then harder still to measure for compliance, can often look like the thin end of a wedge where the precedent, which may have been an uncomfortable pill to swallow the first time round, becomes dangerous, with others looking to expand upon its scope, which can lead us to sometimes wonder if the cure was worse than the disease. But that's not how we at least typically develop software. So why does this sort of code have to be so hard? There must be a better answer. Well, we codified everything else, so isn't that the answer? Well, yes, in part, but my point of this talk is that we do it wrong. So maybe some of you are out there kind of screaming, or maybe even typing your favorite product name at me as the solution, and you're not kind of wholly wrong, necessarily, but the devil's in the detail. Throwing some curly braces or yaml at something doesn't inherently fix things. So if it's a security control, it's often tempting to keep that policy a secret. Exposing it could be used against you, perhaps as an adversary. However, that does not support us in the kind of shift left culture at all. It results in devs effectively reverse engineering what that policy might be by finding out when we kind of smash our heads up against it, and good thing, says no. Therefore, it doesn't take much imagination to see that in the scenario of an application deployed midway through, hindering one of the resources is noncompliant and gets rejected would leave the overall deployment in can inconsistent halfway state. This could result in some downtime, which begs the question of was that policy better than the downtime to the business? Especially if that consequently leads our engineers, who are all hopefully or plenty smart at finding inventive, shall we say, ways around the computer, says no response that they'd got. This is then further exasperated when updates to the policy are versioned. So maybe you get a pen test or something goes wrong. So you form that case law and need to apply a new policy. So maybe all s three buckets now need to be encrypted, a change that could be considered a breaking one. Youre, you might say that you provide warnings on at least the less important issues or new emerging policy, which is great so long as someone sees them. But if you've adopted Gitops or at least CI CD and somewhat guys, is anyone seeing those warnings? So who studies the results of a successful build log every time? Anyone? Every time? Well, if you are, I'd probably politely suggest that you're kind of missing the point of CICD, you should really be able to trust your job status. Okay, well, I'm not just here to throw stones, so if you remember my implied promises to my four imaginary friends of what the kind of utopian promised land might look like, well, there's nothing new under the sun. We've actually already unwittingly solved most, if not all, of these problems elsewhere. We just need to be reminded and kind of join the dots up and remember that it's code and it's software. So. Well, the first is, if it's something you're probably doing kind of policy as code, probably, or doing, say, put it in versioned control. The thing you might not be doing, though, is making that visible. So at least in a source set, by which I mean anyone within your walled garden of employees, suppliers, subcontractors, whatever to see the policy. I'm not saying give all away, give all your threat monitoring and intel away. You can probably keep that to yourselves. But I'd argue that the visible policy and the gaps therein is often better than the downtime, reverse engineered workarounds and opaque legacy exemption spaghetti soup. If you're brave, youll might even open source it. You will find that it unlocks the ability to work well with prospective suppliers without NDAs and whatnot. And widely distributed secrets are often expensive to maintain, difficult to handle, and often only stay secret for so long, after all. Okay, well, we're off to a good start. Our policy is visible now to those that need to see it. Many of you, no doubt, are used to semantic versioning, but a quick recap. The first segment is used to indicate breaking, perhaps conflicting, change in the context of policy. Let's say it's requiring resources to have a department label. Maybe that will help with some internal cross charging. Who knows? I'm not judging. An increment of that might look like requiring that to be from a predetermined list rather than just be free text. The second segment is to indicate minor changes that shouldn't really break anyone. An increment to this might look like correcting a spelling mistake on one of the department names. The third segment is to indicate patch changes, so these should be a no brainer for everyone to keep up to date with, and increments that might look like adding a department to the available options. Okay, so our policy is visible. It's in a repository now, it's versioned. So we can easily communicate that policy. We can trace on the release notes and expectations are all managed by semantic versioning. In software we're used to handling dependencies. So what if your policy was just another dependency? You might unwittingly already be doing this if, for example, you have, say, eslint as a dependency in your JavaScript package, for example. Okay, so our policy is visible in a repo, it's versioned, we can easily communicate it. We can tack on the release notes and expectations are managed by semantic version. It's beginning to look a lot more like software. Okay, I know testing is a dirty word, but in order to make this an asset that everyone can depend upon, it's also important to have some good examples and best are essential to give everyone confidence in the stability and surface any potential side effects before they hurt everyone involved. Consumers of this policy need to be able to test themselves against the policy locally and also in CI CD, thus shortening the feedback loop and better informing things. And as a bonus, we should therefore find ourselves and our consumers able to rely on that artifact that we're sharing with them. Well, we're well and truly on the home stretch. It's a dependency, so updating it should be no different to any other. We can even use some magic like GitHub's dependipot or men's renovate to do that for us. So think automatic pull request best, even auto merging if you like. Okay, so to check you're all still awake, can anyone leave me a comment and let me know of a recent event that caused people to want to know what version of a certain logging Java doohickey you were potentially running literally everywhere in the estate? Leave me a comment, let me know what that is, what topic and theme of that is. Yes, as you know, all presentations this year are contractually required to reference log four j, even when it's almost entirely out of context and include some memes about it. In just a few short months I can remove these and hopefully just point broadly at a list of scary looking cves in order to command your behavior through fear. What I'm getting at here though, is the situational awareness piece around software supply chain is something your organization is hopefully already thinking about, if not already addressing. So if our policy is a dependency, this is at least not a new problem software kind of bill of materials for the win, right? Which ultimately gives us the abilities and to measure the compliance across the estates. Okay, so I've just covered a lot of ground and hopefully sounded vaguely convincing. And this isn't just a fictional utopia that's been painted in PowerPoints. Well, it's time to look at how youll might be actually able to do this. And I know you really came here wanting to see a million words on a slide and not just the odd emoji or two that you've seen up till now. So we've reached the point where I get to show you some code. Hooray. To maintain the scope, though, I'm going to be limiting this to talking about two things to prove that it's not just one tech or tool. I've arbitrarily picked terraform and kubernetes, but I could have picked anything. And naturally I'll need some tools to go along with this. I'm generally too lazy to invent too much here, so likewise I'm going to pick two tools. But again, I could have used any, some, or even all, probably. So Chekhov is going to be doing my terraform and Caverno will be doing my kubernetes. If you want to browse along with me, I've created can example GitHub organization here. I'm not expecting you to read or grock the code on screen too much, so don't worry about it really, it's just to prove that I've made a real thing. So the policy is stored here. So here's where my policy starts. At version 1.0 I've got policy that requires a department label on all the resources, so as long as it's set, it doesn't actually matter what it is. I've written tests for this, so note how the passing test cases are really useful as a great example of what good and bad looks like. We pushed a tag in git. We've added release notes. I can sign it to provide further assurance if my heart so desires. Obviously it does, but moving on, version two point looks similar only now that the department field has to be one of a predetermined list like before test exists. Release notes are written tags assigned. 2.1.0 is where we notice and correct that spelling mistake of one of the options in the list of the departments two one one, and we've now added a new department to the list and app one and infra one. Some other repositories in that well, those depend on version one point of the policy if it's not compliant with version two point or beyond. But how do I know that it's not compliant with versioned two? Well, I configured renovate to actually be automatically make me a pull request. So when there's a new version of the policy it's super obvious. If I can update my dependency and I can also see clear feedback about where and why I'm not compliant. I can also see all the pull requests over my so I can now measure the compliance of my policy. So moving on from that app two as other repositories and infra two other repository, they depend on version two point of the policy. However, we could merge the open pull request there all the way up to two one one. Finally, app three and infra three, well they're dependent on two one one and they get a gold star from the CIO. So there is a small touch of magic here and it's not pretty. I've written some bash, so don't judge me, even though I probably definitely absolutely written a lot worse. But what this does is allowing me from my dev laptop or in CI to evaluate my code against the version of the policy. Ideally this might be a bit less cumbersome, but it is what it is for now, pull requests and collaboration are very welcome, and the last piece to the puzzle is managing the lifecycle of the policies and allowing multiple versions of the policy to coexist be accepted and evaluated within a single runtime. I cheated a bit here. Kubernetes gives you admission controllers. It's not so easy to get that same policy evaluation in a cloud done locally. They've got their own policy engines, and I've just not figured out a way to be able to execute and evaluate that locally. Again, pull requests, collaboration contribution are really all very welcome in this space, and I'd love to hear from you. So you may have noticed that the way the policies is actually designed and written and distributed lends itself well to coexist with itself in a Kubernetes cluster. Which brings us to cluster one, which describes a cluster that accepts all of the versions we've described in the cluster, all the versions of the policy that we've described so far. And the reason that's happened is I've applied all of those policies. Likewise, cluster two exists. However, this only accepts two point and greater, and we can automate some of this and to prove the point by using kind for some CI to apply that policies and deploy our applications. And there we have it, a full organization, all done, wrapped up compliance policy, all versions CIO, all aware of what's going on, everyone well communicated with. So this sounds great, right? But just one more thing. Wouldn't it be awesome if the policy carried a story about why it exists? After all, if your agile delivery teams are even half effective, they should be rejecting anything that they perceive as friction. If they don't see the value in it, it could allow our developers to know why they are compliance, and if they want to do something outside of what the policy permits, they don't need any sort of exemption granted per se. They can actually go and have a well reasoned and informed debate with a rationale behind a pull request to the policy that is attached to an understood risk of why the policy exists. So if you imagine, if you will, it's going through a stage of versions with risks that inform kind of the mitigations manifested as the policy, all maintained as kind of one living thing. So when the risk landscape changes, your policy can move with it. So, for example, when some new privacy regulation comes out, or your latest marketing strategy pays off and you acquire more data, even if your policy was perfect at one point in time, the risks and the appetite will stand still for no one. So we can liken this in some ways to over provisioning that we might be familiar with from elsewhere. Where lead times are long, changes hard, and there is a significant pressure in nailing it the first time, which can lead us to hedging bets against what that future state might be, rather than proportionate mitigation to the risks that are actually more tangibly in the real right now. So that's where the real culture change is needed, and the exemptions of that is likely a long series of talks in of itself. So now this is really all over to you. Honestly, the best thing that you could do right now is tell me this is all madness, already done, irrelevant, or just otherwise unachievable, something my esteemed echo chamber of peers have yet to do. But beyond making Paul requests and developing the theory more, I'd really like to start building some test cases with other willing organizations, and that will allow me to swap out more of my imaginary friends for some real ones. But the most important thing that I want you to remember from our time together today is, and please do feel free to say this out loud with me. Purposeless policy is potentially practically pointless policy. I've been practices saying that far too many times. I've been Chris Esbet Smith thank you so much for your time. You're now free to disconnect and leave. I will work with the organizers to make sure the evidence of your guilt admissions earlier are destroyed. I'll try like subscribe whatever the kids do on LinkedIn, GitHub and whatever you can be. Rest assured that there'll be no spam or really much content at all, since I'm pretty awful at self promotion, especially on social media cns me that just points to my LinkedIn and talks. CNS me contains this and some other talks. They're all open source on my GitHub. Questions are very welcome on this or anything else. Be around in the comments for a while. So please do. Please reach out or ping me a message on LinkedIn. Be great to hear from you and hear what your thoughts are. Thank you very much.
...

Chris Nesbitt-Smith

Consultant @ UK Government

Chris Nesbitt-Smith's LinkedIn account Chris Nesbitt-Smith's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways