Conf42 DevSecOps 2021 - Online

Using Infra-as-code, not Jira tickets to pass security and compliance audits

Video size:

Abstract

Jira tickets are often seen as a necessary evil in order to satisfy compliance audits however infrastructure-as-code can replace tickets while providing real security benefits.

Learn how Teleport utilizes Terraform to make developers, auditors and the security team happy!

Summary

  • Travis Carey: How you can get rid of Jira tickets in your organization by embracing infrastructure as code. Pass your compliance audits the developers way and improve your security too. This advice primarily applies to a Sock two type two certification.
  • No amount of Jira tickets is going to actually improve your security. GitHub is a fully featured tool for both planning and change management. We should stay native using the tools that where developers already are, and that's GitHub.
  • Change management often happens in Jira because that's maybe where your business people are but not your developers. But change management works way better in GitHub. It's all about automation here. Every team that you include in the process slows the process down dramatically.
  • It's a different world now. We can roll back entire data centers worth of infrastructure just by doing a revert. A lot of people don't think about the SaaS apps that are really controlling this. You can apply these principles to all the things in your tech stack.
  • Get rid of all your GitHub admins. You can do it to terraform itself and all sorts of SAS apps. As you do, your ticket count will reduce. No changes outside of code, no tickets. I hope this talk helps empower you at your organization to apply these lessons.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Travis Carey, it director here at Teleport, and today for my talk, I'm going to tell you how you can get rid of Jira tickets in your organization by embracing infrastructure as code and going to give you some practical examples today, as well as kind of explaining why you should do this and the general philosophies behind it. So let's get started. So terraform, not Jira tickets, pass your compliance audits the developers way and actually improve your security too. So quick disclaimer this advice primarily applies to a Sock two type two certification, and I can only 100% guarantee this advice works because that's the one certification that teleport has. But I can't tell you, I believe it most likely works 100% for PCI 27 one because that's about 90% similar. Should note that NIST for government certification and socs for public companies are much stricter. But I do believe, having gone through some of those processes, that these same lessons can be applied. Should also note there's a little bit of trash talking of ticketing systems here, but they do serve an important purpose, and the purpose is primarily for support teams, and we'll go into a little more detail about that before. And finally, if you're really embracing infrastructure as code, you have to remember that IC pipelines have really powerful API keys, and you have to take a lot of work to secure those. There's lots of ways to exfiltrate API keys, and if you're replacing dangerous console access with infrastructureascode as code, you have to make sure that you have a way to protect those API keys. And we're not going to cover that in this talk, but there's lots of resources you can look up for that. All right, let's jump into this. So first, why are we doing this? Well, it's mainly because hackers don't care about your change management process. No amount of Jira tickets is going to actually improve your security. So we have to think about why did we even start doing tickets in the first place? If they're not helping security, why are we doing it? I mean, isn't the point of doing compliance audits to actually have your organization's security be improved? So how did we end up with this process that we do this via tickets? And there are some good reasons, but they're a bit dated. So let's look at some of the common reasons that people decide to start using ticketing systems. So often people will say, well, we use Jira for planning, so we should also use it for change management you'll hear because it or systems work is considered a service organization. Or finally you might hear from some older IT directors that we follow itil philosophies and we'll dive a little bit into what that means if you haven't heard of it before. So first jumping know teams will say we already use Jira, so that's just the way it is. But GitHub is a fully featured tool for both planning and change management, and GitHub has improved a lot in the planning front, especially recently. You can assign your issues to project boards and have kanban for planning sprints and doing agile just like you would in Jira. But it's even better because they're innately linked with pull requests. You can use tags for doing automation, they're also great for doing auditing, to query different types of tickets for different views, great for release management that you can set up milestones. All this is really built in. And if you've worked for a large enterprise, they might have done a lot of integrations that are really linking a lot of the Jira functionality to GitHub. But why do we need two systems? That's just more work to integrate. We should stay native using the tools that where developers already are, and that's GitHub. So the next part is change management. Change management often happens in Jira because that's maybe where your business people are but not your developers. But change management works way better in GitHub. A pull request is a change request, but it's better than a Jira change request because you can't just change what it is. Once you ship it, when you merge that you approve the exact code or the exact infrastructure change described there, which is very different than if you do can. Older form of change management where you are describing a change in a Jira ticket and then someone has to manually perform that change. Now if they're manually performing that change, they could make a mistake or if their credentials got stolen by a hacker, any change could be made with those credentials versus doing it the Gitops way only what's approved via git is what ultimately happens. So in that scenario, if you compromise developer account, you'd have to submit a pr and then convince someone else to code review it and it would also have to pass your automatic tests. And this is a huge improvement that we all should be very familiar with in the DevOps world of automatic testing has already started to really replace QA departments and these other DevOps lessons can replace kind of all of the other needs for these manual change management things like if you work in an Itil organization, they might say, oh, you need a rollback plan in order to ship your change. And of course with GitHub, rollback is as simple as a revert. And finally, to make your auditors happy, GitHub is the best audit trail. We can see exactly what happened, who approved it, who reverted it, everything is right there. It doesn't have the kind of reporting that a lot of people really like Jira for, but it has a very full featured API and you can write some easy scripts to pull out those kind of csv files. You need to make your auditors happy. Let's take a look at the next reason. People often say that we can't use GitHub and we need Jira. And so often it's because they say that it is a service organization. And remember I mentioned that ticketing queues are for service organizations. But that's the it world of old. The new it and systems or platform DevOps way of thinking is that we're a platform team, is that we want to make tools that enable developers to do their jobs better. We don't want to do the work for developers and make those changes for them. We want to give them the tools so that it happens automatically. It's all about automation here. Every team that you include in the process slows the process down dramatically. I think we've all probably worked in an organization where you have to go through request process to make changes and you put in a request to it on Monday, they finally get to it on Tuesday, then they got to send it out for approval. Then it needs to get reviewed by the change management board or change advisory board. And before you know it, you've wasted an entire week just kind of waiting for the changes to get approved. And those things are really wholly unnecessary. The most we should do is have two people, preferably on the same team, two devs. This is just like your code review process that you're used to. We can apply the same concept to all sorts of places that we were used to do change request tickets. For now, I should note that some very strict compliance requirements that you might find in NIST or Sox. They do require approval from an independent second party. And this is often like an application owner. If you need access to that application, you need to ask the application owner, not someone on your team, whether that's okay. So it still fulfills the plus tool rule. Using clever code reviews on GitHub and code owner files, you can actually make that process happen pretty automatically. There are also some other access management tools on the market that help do access approvals and things like that outside of Jira, just quickly and easily, rather than having to do it in a ticket based workflow. And finally, talking about why it needs to not be a service organization is that requests just don't scale. You can't have an effective dev team and follow the right developers philosophies of like that. We want to ship code fast, we want to automate things if there's a manual process in the loop. So if you have to rely on an it team to say, complete a DNS request for you and get that approved, it's just not going to scale. Because if you're shipping fast or you have developers all over the world, suddenly you need a really responsive 24 7365 support desk. And that's just way too much to ask from a lot of small companies. Building a global team is a lot of work. I know quite personally it's hard work and it's also stressful for those teams. And it's a bit of a fool's errand of trying to develop this, especially at a small company, versus instead investing your time in creating tools and behaving more like a platform team. And that allows developers to self serve, solve issues by themselves within their own time zone, hopefully with another coworker in that same time zone. And that's what's going to scale, and that's what's using to allow your organization to have a competitive advantage. And it's only when you make your it team start developing or behaving like a developers team rather than being a service organization. And that's what really allows you to ditch the queues and the ticket queues that we're also familiar with that service organizations rely on. So the final one, ITil. Now, when you hear this word, I want you to think of a dead dinosaur, because that's what ITIl is. It's a philosophy from the past for when it people were racking servers and running bare metal compute, that's not the case anymore, and we need to let it go. So ITIL was created to give you a history lesson, folks that have hopefully the luck of not working in an ITIL based organization. These were created to manage processes that are really manual and error prone. So if you're having to rack servers, you have to talk to a lot of teams, you have to talk to finance and procurement, you have to plan where it's going to go in the rack and doing rollbacks is not as easy as just saying oh, git, we're going to revert Git and it's fine. No, a rollback is a lot of work where it could take hours to move servers around to reimage servers to rechange configuration. That's what this was developed for. It was developed for another era. So trying to hold on to this is not going to help your developers teams, it's just going to slow them down. So it's time that you need to stop following it philosophies from the pre cloud era. It's a different world now. We actually can deploy servers with the click of a button. We can roll back entire data centers worth of infrastructure just by doing a revert and watching terraform taint and rebuild all the infrastructure you need to run a modern app. So we need to make sure that the entire rest of the tech stack that you have is as sophisticated as deploying your infrastructure would be with terraform or other infrastructureascode as code tools. So let's talk about actually applying some of these lessons. So a lot of people are familiar with using IC systems like terraform to deploy your AWS infrastructure or setting up GCP or those kind of changes, but it's also really helpful for other parts of your tech stack. A lot of people don't think about the SaaS apps that are really controlling this. So this includes things like GitHub and Okta. So a lot of times those systems are still controlled by sysadmins who are manually pointing and clicking within the console to make changes. But when you think about how powerful those systems are, GitHub controls everything. If you follow the GitHub's philosophy, and if you're doing proper access controls with RBAC or the newer ABAC attribute based access controls, then your directory system like Okta controls the access to everything. So if we don't let people manually deploy servers via the AWS console, why do we let people manually make changes to the GitHub console or the okta console, which are arguably more powerful and dangerous because they control all the systems? So let's think about this. If you use GitHub to manage your infrastructure, then a compromised GitHub admin owns your infrastructure. So it's of critical importance that we get rid of GitHub admins. But if we're getting rid of GitHub admins, then how do we do the admin work? Probably figured this out. It's terraform we're going to use. You know, you can terraform your GitHub instance on GitHub itself. So you want to apply these principles to kind of all the things in your tech stack. And this includes like terraform cloud itself, you can actually apply these lessons to the same systems they're managing and you should. So we're going to look at a really short practical example that we did here at teleport about terraforming okta. So we're going to apply attribute based access directory rules via terraform to eliminate Jira tickets for what's a really common thing in can it department is handling access requests. So this is just three easy steps and you can apply the same concept to a lot of different systems. So let's take a look. We're actually going to have some code in this discussion. So first you want to understand what the schema here is of the relationship between kind of the users and groups. So first you need to create a directory group for every single app. So I prefix these with app what the system name is. So we might have one that is app GitHub or app Salesforce. And that group is used in assigning to can okta application that lets users in through the front door, that authorizes them that they can authenticate hopefully with Sam'l not password via the okta directory to go log into that app. And ideally that login should have no entitlements. That should be like a basic read only role, the least privileged user that people will want. And then for all the other users we should create roles for each of those. So in our code example here, we have our basic group for Salesforce and we're writing in here some attribute based rules to decide who should go in the group. So we're looking at the user profile and looking at what the department field is to decide who should get access to Salesforce. And we say okay, it's the sales team or the marketing team in this simple example. And then for the bigger role entitlements like who's a Salesforce admin? Again, we can use things like other attributes that you could say, hey, you're in the IT department and you're a manager. Things like, you know, I wanted to call it this example because there often are weird exceptions. We can't always use attributes that sometimes we can just name names here and keep it easy. So if we wanted to add a new Salesforce admin, we could create a pull request and add a new person right here and have someone approve it. And I should note that you should make these groups and roles even for systems that don't support the automatic provisioning of roles. So Salesforce, does they support. I can actually assign the admin role to the two of us because we're in that group, but not all systems do. But you need to still create these because it's that important placeholder for change management. Otherwise you would need to create a Jira ticket to keep track of this. So we have to keep track of it here. And it's an important form of future proofing that eventually this system might support automatic role providing, or you might decide that it's important enough for a critical system to write your own integration to make that happen. So a lot of systems, like good systems like AWS, Salesforce, teleport also supports this kind of setup where you can map groups within Okta for certain roles and then assigning those to the roles within that group. And you can see the terraform code here is quite simple. It's just a quick loop to loop through all the different apps in here and then go create the groups and then the associated group rule that uses the attribute based access controls we described to put the people in the group. So we mentioned you want to do this anyways even when you don't have an automation for it. And that way we've created a request, approval and audit system that lives entirely in git and we've eliminated the need for all access requests for Jira. So the next step is once you do that, you want to remove the ability for admins to manage those groups within the console. And this is a DevOps lesson that a lot of people do in AWS. When you reach this happy DevOps Nirvana, you actually take away console access from developers because they need to make the changes via terraform. So in this case, we'll actually remove just the permission of group admin from all the groups that are not managed by terraform. And we should manage if we can, 100% of your groups in terraform. But if that's not realistic for you, you can at least do the ones that control some sort of access based permissions because you don't want to give the permission to say like an it help desk associate that they should not be able to decide who gets AWS admin in your SaaS app. So step three, you want to alert on any changes made outside terraform. So this is to make sure that nobody was able to circumvent your IAC process. And this is important in providing to your auditors that this was the only way that changes were made. And it's also a great way to do security investigations if a hacker was able to find their way around your process. So you want to connect Okta to your steam, their security information events management platform. If you don't have one, and they're quite expensive, you can actually hack it together using Okta, webhooks and Zapier, a really cheap low code solution. So what you want to do is you want to write an alert to fire anytime a group change is made by anyone other than the terraform service user. So if someone were able to log in by any other means, for some reason there was a misconfigured thing. You can also check for metadata on that. Did the request come from the ip we expected from terraform cloud? Or maybe someone stole our terraform service user credentials and they were able to use them elsewhere. So the seam really helps make sure that no one got around the process. Now, you should still do an occasional audit process, going through your logs on like a quarterly or annual basis to make sure that nothing slipped through the cracks, that you missed an alert on something that was maybe an unauthorized change that was not made through terraform. So finally, any good loop has step N, and you want to repeat this process until you reach 100% terraform coverage. So you want to keep doing this for other resources you'd have in Okta. Your authentication policies, your application setup, everything you can until you've reached 100% code coverage. And at that point, you get to the really cool thing of removing console access entirely. And at that point, you can create what's called a break glass. You know, of course, if terraform or the IEC process breaks down, there's an incident and terraform is down. You need a way to get in. And what you can do is create that service user that is your super admin. And we use one password as our password store. And I highly recommend it, especially now that in their recent release, you can also connect it to your steam. And so we set up an alert that if the break glass service user's credentials are accessed, that creates an incident, because the only reason we should ever be using those is during an incident. And if someone's using them outside of an incident, they're either breaking the rules or they're a hacker that's trying to compromise your system. And you want to know that fast. So that's kind of the process. And if you reach that 100% coverage, you don't need change management tickets at all for any admin functions within that platform. And you can apply these same lessons to other important systems in your tech stack like GitHub. Get rid of all your GitHub admins. They are so powerful and dangerous. You can do it to terraform itself, you can do it to all sorts of SAS apps and keep applying these lessons. And as you do, your ticket count will reduce. So you can't just throw out Jira right now immediately you have to kind of slowly carve away at it and reduce that ticket number as you increase your code coverage. So if you're going to remember one kind of lesson from this whole thing is that tickets are only for changes made outside of code. No changes outside code, no tickets. So remember that we develop tickets for service organizations and for these older philosophies where we have lots of manual processes because manual processes are very air prone. So you have to come up with these systems to track manual processes, to come up with plans to make sure you don't make mistakes. But when you do things in code, we no longer need to do that. Gitops has paved the way to remove all those manual processes. So you want to do this completely up and down your stack, including managing the SAS apps in the realm of it that is traditionally still done with Jira tickets. So I hope this talk helps empower you at your organization to realize that you can apply these lessons. And not only is it going to make your life easier for your developers, that you're going to be more agile, you'll be able to work across many time zones remotely, you'll be able to get tickets done quicker because you don't have to interact with as many teams. You're going to be more secure because only the changes that actually happen in GitHub are what's happening in your system and it becomes very hard to circumvent that process. And finally, your it teams are going to be a lot happier actually working as engineers, writing code building systems and platforms rather than responding to ticket queues and behaving like a service organization. So this is really a win win for everybody. It does require some upfront investment, but I promise you it's worth it. It's drastically improved our process and we can't wait to expand our code coverage to more and more systems because we're already seeing the benefits in the time that we no longer have to spend handling access requests. We're now able that spend that time automating more systems, writing more ise, writing more tests, improving our theme alerts and all the other things that we enjoy doing as engineers rather than responding to ticket queues. So thanks for tuning in today and I hope you can apply these lessons at your organization.
...

Travis Gary

IT Director @ Teleport

Travis Gary 's LinkedIn account Travis Gary 's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways