Conf42 DevSecOps 2022 - Online

Say Goodbye to Manual Kubernetes User Access Onboarding

Video size:

Abstract

This talk will focus on the challenges with configuring access control for Kubernetes clusters and why it’s so important to make Kubernetes access both simple and secure. Any engineer that has worked with Kubernetes before — either as an administrator, user or developer — knows that cluster configuration is a massive iceberg. At the tip of the iceberg, you have “just make it work.” At this level one engineer can access one cluster. In some cases this can be tricky enough by itself.

Below the surface however, you get other problems quickly: infrastructure security, credential management, identity-native access, RBAC role management, audit-logging and compliance standards. And then once you figure those out you have to ask: But what about at scale? What if you have tens, hundreds, maybe thousands of clusters? What if you have a team of 40, or 200 engineers? Configuring Kubernetes access in a secure, manageable way can be an extremely daunting task. This is where open-source Teleport comes in. Teleport makes it easy to securely onboard and off-board Kubernetes access for engineers at scale, without the need for hours of manual configuration, all without using long-lived credentials. This talk will include an overview of the problem-space for Kubernetes access today, an indepth look at the technology behind Teleport and a live-demo of accessing and managing a cluster with Teleport. Learn how open-source Teleport can ease the stress of your DevOps team, and allow your security engineers to sleep peacefully at night without worrying about Kubernetes attacks.

Summary

  • One in four employees surveyed still have access to old passwords. 41.7% of employees admitted to having shared workplace passwords. Over 6 million leaked secrets in GitHub, and this is a two times increase since 2020. Every security breach has two things in common, a human error for the initial infiltration.
  • Open source teleport is a secure control accessing platform. It helps you manage all of your infrastructure access in a single place. In order to have secure access, you need authentication, authorization, connectivity and audit.
  • In teleport eleven, which we just released, we added support for GitHub actions. This means that you can interact with teleport protected resources directly from GitHub actions workflows. We also added Kate support for automatic service discovery. It's easy to scale up and down Kate's resources using teleport.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi and thank you so much for joining me today. We'll be talking a little bit about problems with Kubernetes access today and how you can say goodbye to manual Kubernetes user access onboarding using solutions like Opensource teleport. So this is what a terminal looked like at a job I has. Many, many years ago we had this file sitting there called secrets text. What do you think will happen if we run this command? Catsecrets text, we get some secrets. So these were access credentials for accessing various cloud infrastructure at my, again, old job. And so what happened was a lot of people on the team ended up saving credentials like this to their local machines, because the way of us accessing the infrastructure that we had was just so terrible and would take forever and was filled with pain points. The way it worked was that the way it was supposed to be has that we had a credentials vault that you would go every time you needed to access a resource. You'd go into the vault, search for the resource name, open up that file, and then just copy and paste those certificates or those access keys or whatever back into the terminal and then access the resource. The problem with this was that those were so many resources with ambiguous names. There was no naming conventions. Old credentials weren't always deleted, so sometimes they were deprecated and it just took forever and was just such a hassle. So what a lot of engineers did has the most frequent resources that we had to access. We ended up just copying and pasting those keys locally and just saving them in files like this. Now, obviously, this is terrible security posture, just absolutely abysmal. But the reason for it is, and the lesson we can learn, is that the most secure thing also has to be the simplest thing. Otherwise engineers are going to find workarounds. If you're sacrificing engineer productivity for security, you're going to have problems like this in very, very many cases. And this is a huge problem because when employees leave as well, they still may have access to these locally stored passwords. From a survey done by beyond identity in 2021, they found that one in four employees surveyed still have access to old passwords, just like we saw in the last example. And 41.7% of employees admitted to having shared workplace passwords. So this is, again, just like we saw. There was shared resources that people had all the credentials for. And then when they leave, they might still hang on to those credentials, still allowing them to access resources that they should not have access to anymore, which eventually will lead to a security breach and we found that every security breach has two things in common, a human error for the initial infiltration. So this could be something like keeping a sticky note on your computer, or an attempt to pivot and an attempt to pivot to maximize the blast radius. So human error. So this is from a Git guardian survey in 2021. They found over 6 million leaked secrets in GitHub, and this is a two times increase since 2020. And the types of secrets leaked here are not surprising. It's a lot of cloud infrastructure access keys, like AWS IAM tokens or Azure API keys, Google Cloud keys, scaleway tokens. And these numbers are only going to increase as compliance scale up their infrastructure. And you might say, okay, well, but we're not open source. Well, too bad, because 85% of those corporate leaks came from developers personal repos. When they were doing development, they would fork the company's private repo into their own public personal ones. They would make the code changes, they would not delete the repo, and then they'd merge it in. The other 15% came from public corporate repos. And this is Gitguardian's findings, and they're only finding that they're accessing these types of mistakes. The next step, of course, is to maximize the blast radius. For example, say you get into a slack workspace, like a recent high profile hack happened a couple months ago, then from there you can get access to a server, you get into a server, you can elevate your privileges using some kind of privilege escalation attack. From there, do what you do, maybe get some customer data, get some internal compliance files, and then profit from those. Your mileage may vary. You might end up on a beach somewhere, or somewhere a little bit less pleasant. And so what does that mean kind of for a cloud world today? Well, according to the Container research report published by Datadog in 2022, their scope was around 1.5 billion containers and tens of thousands of companies. So this is a pretty large survey of those companies with all those containers. They found that Kubernetes usage is actually rising in this container ecosystem. So more and more people are using Kubernetes. They found that over 50% now of these companies all are using kubernetes to manage their container access. All those Kubernetes clusters configured by humans who make mistakes. So Kubernetes configuration is a huge problem today and a huge hurdle for many development teams. This survey also found that 40% of these clusters are still using lax privileges. So what this means is that these user accounts are configured with privileges, such as being able to list all the secrets, create workloads or certificates themselves, or even do privilege escalation token requests so they can actually elevate their own privileges behind what their RBAC rules are supposed to be, and have access to a myriad of services that they otherwise should not be allowed to. And this kind of is the Kubernetes iceberg has, I like to call it. So on the very tip of the iceberg, you have making it work, right? Making Kubernetes work, configuring one cluster, all of your networking, all of your microservices together, allowing users to access it, making sure it doesn't go down. And this can be extremely hard on its own. Kubernetes is a complex beast. Configuring this to work with, even a couple of engineers can be very difficult. Then under the surface, however, okay, well, now that it's working, let's make it work securely. And then now things start getting even more difficult. When you're dealing with RBAC rules, you want to use identity based access, say passwordless even. You want to make sure all of your networking rules are not only work, but they're secure from outside attack at the very bottom of the iceberg, way under the water, in the very depths you have. Let's make it work now at scale, we made it work for a team of two or three engineers. What about 40? What about 100? You don't have one cluster, you have thousands at scale in production, and you need it to be secure and consistently available. This is when things become extremely tricky. And the fact of the matter is that kubernetes is not safe by default. You have to have airtight configurations, otherwise bad actors are going to take advantage of your configuration. And this is absolutely key. A good configuration is the difference between engineers being able to use your product securely, or outside actors being able to highlight those vulnerabilities and actually exploit them and get into your systems. So the types of configuration, this is an example. Kate configuration file for deployment. And you have to consider ETCD security. You have to consider secret management, safe networking policies, pod to pod communication, application level security. What does an individual pod look like? RBAc policies for users. Audit, logging. Make sure all that activity is being logged and monitored. Cluster onboarding, adding new clusters, scaling up, scaling down, ephemeral cluster control, all of this, there's so many places where it can go wrong for a manual configuration, and you just get more and more and more and more of these files until they just all add up and there's going to be a problem somewhere. So what can we do about that? Well, luckily there are solutions available like open source teleport, kubernetes access. And what teleport is, is it's a secure control accessing platform. So it helps you manage all of your infrastructure access in a single place, including your Kubernetes access. And it operates on kind of these four pillars of access. In order to have secure access, you need authentication, authorization, connectivity and audit. And this is what teleport provides. So the first step is authentication. So authentication, what teleport does is rather than using any long livedemo credentials, no SSH keys, no passwords, no long lived certificates, what we do instead is teleport acts as its own certificate authority and it actually generates an identity in the form of a short lived x 509 certificate for the user and ties that identity to a role managed by teleport mapped to the identity from their SSO. So say you log in with Okta or GitHub, teleport will actually use that identity and issue a short lived certificate per user, per session. For kubernetes access, the next step is authorization. So what teleport also does is it will automatically approve or deny these access requests to a range of resources. So you have servers, databases, kubernetes, clusters, microservices and various CI CD systems. So it'll always make sure that your users are accessing only what they are allowed to based on their RBAC problems. You also have connectivity. So teleport also acts as its own proxy, which means that it establishes a connection between the user and the requested resource using a reverse proxy tunnel from the teleport server to the resource. So all of that traffic, every command being run, every user session, is being passed through the teleport proxy, making it secure and fully encrypted using TLS. The next step is audit. This is a huge thing from a diagnostic and compliance standpoint. You need to make sure that your clusters are being accessed by only who's allowed to be accessing them and monitoring their activity in case something goes wrong. And with teleport, all of your audit logs are mapped to a central location and can be managed from there. No matter what region your cluster is running in or no matter how many you have, all of those audit logs get streamed into the same location, making them easy to manage and monitor. So here's kind of the high level architecture of it. So this is an example of if you're hosting your teleport in your teleport instance, in the AWS cloud, along with a Kubernetes cluster there as well. So you have this teleport cube agent running on your Kubernetes cluster. And what this will do is it'll communicate with teleport. Your teleport cluster will then communicate with the user. So if a user wants to access the Kubernetes cluster, they'll log in with their SSO, in this cases GitHub, that'll confirm their identity. Then what teleport will say is, okay, well what is their authorization level then? What are they allowed to access? And once they're there, what permissions do they have? Teleport will then communicate with the Kubernetes cluster, authenticate with the Kubernetes cluster and grab that kubeconfig. It'll pass that Kubeconfig to the user and the user will be able to run Kubectl commands from their local machine just as if they were in the cloud itself. And again, all of this traffic is being passed through the teleport proxy service. Here's an example using machine id. So machine id is teleport's automated way to do access control. So rather than say like a user accessing teleport, in this case we have this worker node and this can be in a CI CD workflow. And rather than having a shared credential for this worker node, what you do actually is machine id will run in the background and actually fetch and get a credential every 20 minutes or completely configurable from teleport and it'll have its own identity. So if you have a bunch of different worker nodes and have trouble keeping track of them all, teleport is a great solution. Every worker node and every microservice, every process will have its own identity just the same way as a human would, allowing you to keep control of your complete CI CD automation infrastructure and also allow you to easily manage and scale this logging and this access control. So let's take a look at teleport in action. So now I'm going to give you a little demo on how teleport works. So over here on the right, we have hours web console and this is just the public address of our teleport cluster. So when we log in, we're going to authenticate using GitHub as an SSO. We log in through GitHub and now we see all of our servers that we have access to. So we can see all of our SSh nodes here we have our applications and our Kubernetes clusters and we also have our databases and our Windows DevOps. So we're going to see how to log into a Kubernetes cluster using teleport. So over here on the left we have my local machine. So first thing what we're going to do is we're going to tsh login to our cluster using GitHub as the authorization with the user Dumas K, that's me. And the address of the teleport host. So this will log us into the teleport cluster. Now we're logged in, we have that short lived certificate and now teleport knows who I am. All of this activity is being mapped to my identity. So next we're going to actually see what Kubernetes resources do we have access to. Perfect. So we have our Kate's, those here, this is the one that we're going to log into. And TsH is just a little command line tool that allows the user to access teleport resources and interact with the cluster. So next we're going to do a Tsh cube login to our host. And because I'm already logged in with teleport, this is all you need to do. I have the role access which allows me to access this Kubernetes cluster. And so now what teleport actually did is it gave us that kubeconfig. So from here we can run all of our Kubectl commands. So we actually are in this cluster right now and can create pods, can delete, can list, can do all of our different deployments, whatever we want to do. And if we go in here now we can actually see in the audit log these commands being run. We see that I logged in, we see the certificate was issued, we see the SSO login and we see that my request to the Kubernetes cluster, Kate's host and if we go into details we can actually see those commands and all of that session data that was run. We can see that the verb get the resource pods and all of those other session data resources. And you can see that I had access to this because we can look in our users. So here we have Dumez K, that's me. The type is a GitHub because it's mapped to my GitHub account. And we can see the roles that I currently have. This access role is the one that actually allowed me to access this Kubernetes cluster. So if we go into roles we can actually view this. So here's what a teleport role looks like. This is an RBAC role that maps to various other RBaC roles in database access, SSH, kubernetes, clusters or even Windows RDP boxes. So you can see my databases, my database users, I have access to those join sessions, I'm allowed. So we have Kate's here, that's Kubernetes and the different Kubernetes labels. So we can see that I'm in the internal Kubernetes user group and I can log in using this user group. And so this is how we map a teleport role to a Kubernetes RBaC role. We have that I'm access assistant masters because this is an admin role that I am in. And the neat thing about teleport is that you can actually revoke these user sessions. You can see these active sessions here and you can see the various audit logs, all my commands that I'm running here. And that's kind of teleport in a nutshell and how it works and how you can use it to securely access your Kubernetes resources. You can share different Kubernetes user groups between teleport roles securely because of this audit logging feature and because all of these sessions are mapped to an individual's identity, even if they are sharing Kubernetes RBAC roles, for instance, they all map back to a teleport RBAC role which maps back to your identity. So every activity that you do, every command that you run and all your session is mapped to the individual rather than the shared user group. Great. So what's next for teleport? Well, in teleport eleven, which we just released, we added support for GitHub actions. So this means that you can interact with teleport protected resources such as SSH kubernetes and your databases directly from GitHub actions workflows without using any long lived credentials. So you actually don't need to store any sensitive values in your GitHub repo. We also added Kate support for automatic service discovery. So this means that based on Kubernetes labels, teleport can actually pick up and onboard clusters and pods to the teleport management system. And this way there's no need to manually add these clusters and pods to teleport yourself. It's easy to scale up and down Kate's resources using teleport. So thank you so much for watching and I hope you learned a little bit about some of the pitfalls of Kubernetes access management and ways around them. So check us out on our website@gotelport.com or come say hi in our community slack at teleport, slack.com. We'll also be at an in person conference somewhere near you, I'm sure, very soon. So if you're around, come say hi. We'll be at reinvent pretty soon in Vegas. So thank you so much again and have great day.
...

Kenneth DuMez

Developer Relations Engineer @ Teleport

Kenneth DuMez's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways