Conf42 Cloud Native 2023 - Online

Kubernetes audit log best practices

Video size:

Abstract

Keeping an eye on all of the activity in your Kubernetes cluster is crucial to modern cloud computing. With just one cluster, and a handful of engineers this can be done ad-hoc and often the Kubernetes native logging is adequate. At scale however, with multiple clusters, regions, clouds, and tens or hundreds of engineers, Kubernetes audit logging can become a nightmare. Learn how to do it the easy, and secure way with open-source tools.This talk will run you through the best practices of making sure your cluster is secure, and how to keep a bird’s eye view on everything going on in your cluster, making achieving compliance standards like SOC2 and FedRAMP a breeze. This talk will focus on various open source tooling such as Teleport, OpenRaven, and Elastic.

Summary

  • Presentation on the best audit logging practices when using Kubernetes. We'll talk about the native built in logging functionality in Kate's and its limitations. Also look at some third party opensource tools that can help make following all of these practices a little easier.
  • Kubernetes has a built in logging system that is used to record information about events that occur in the cluster. The important thing to note here is that these logs are super granular and highly configurable. Here are some good best practices to follow when setting up your logging strategy.
  • Open source Teleport is a secure access control platform for managing access across your infrastructure. Centralizing your audit logging at scale for organizations requires you to go beyond just your various Kubernetes clusters. Teleport acts as a gateway for all of your resources, ensuring security and compliance across your entire infrastructure.
  • How we log into a Kubernetes cluster without using a teleport managedssh node. Even from my personal workstation we're securely logged in through teleport and can actually run Kubectl commands on our cluster. It's the easiest and most secure way to access all of your infrastructure.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to this presentation on the best audit logging practices when using Kubernetes. My name is Kenneth Dumez, a developer relations engineer here at Teleport. So for a little background on my history, I came to teleport around a year ago after working at Pivotal Cloud Foundry and then was at VMware Tanzu for a few years working on their Kubernetes build service solution. Thank you so much for coming and I hope you can learn a little bit about Kubernetes best practices as it's such a rabbit hole and gets confusing very quickly. There's a bunch of other awesome talks today as well. Conf 42 is a great place for developers and leaders in various fields to come together and share some of their knowledge. So just want to shout out Mark Paulikowski for putting this together. It's always a pleasure to be here. So today in this presentation we'll be discussing the importance of good audit logging practices in Kubernetes and the best practices to follow to ensure a secure and compliant environment at scale. We'll talk about the native built in logging functionality in Kate's and its limitations. We'll also look at some third party opensource tools that can help make following all of these practices a little easier, while making your administrators and security engineers a little bit better. So the first thing we're going to talk about is the audit logging capabilities you get out of the box when you deploy your Kubernetes cluster. Kubernetes has a built in logging system that is used to record information about events that occur in the cluster. This information can include things like API requests, opensource changes, system events, basically everything and anything that happens inside of your cluster. Kubernetes stores this information as log files on the cluster nodes, which can be accessed using various tools such as Kubectl, or if you're using a hosted Kubernetes environment, there's usually a dashboard or UI or something that you can use to access these logs. The important thing to note here is that these logs are super granular and highly configurable. Kubernetes clusters, especially larger ones, can generate a lot of events and thus a lot of log data. This can make it difficult to separate the wheat from the chaff, so to speak, and maintaining a good signal to noise ratio can be really tough. One of the most important things when setting up your logging and to manage the spamminess of your cluster's log data is this object called the Kubernetes audit policy. The Kubernetes Audit policy configuration object is a native Kubernetes resource which you provide to your API server that defines the rules and settings for auditing events that occur within a Kubernetes cluster. This audit policy configuration object is defined like all the other Kate's resources in a YAML file that defines the audit rules and settings. This is the first object you want to configure when determining your logging strategy. The file contains several fields that can be configured to customize the audit policy. We're going to look at some of those fields in depth in a second. For one example, you can say only log anything that's done to secrets or just events concerning pods, or say everything that's done to any of the core APIs, but none of the custom resource definitions or extensions. As a good starting point, you can check out the audit profile for the Google Container optimized os. This is publicly available and you can then configure it from there to whatever best suits your logging needs. Within the Kubernetes audit policy object, the rules field is the most important. This field defines the audit rules that dictate which events should be audited and how they should be handled. Just as an example and so you can kind of see what an audit policy configuration would look like, here's a little walkthrough of the various audit rules fields. I would highly recommend not just copying this one and plugging it into your own clusters because like I said, this is just an example and you probably want to tailor it a little bit better to your specific needs. So first we have this omit stages field. This defines the audit stages to be skipped for your various events, such as request received or response started. This is crucial for cutting down on the parts of the events that you don't care about. You don't need every stage and you shouldn't track it all in your audit log. Then you have level which defines the level of the event to be audited, such as request, response or metadata. Next is your resources field which defines which kubernetes API resource that are to be audited, such as pods, deployments or services. Then you have verbs which defines the kubernetes API verbs to be audited, such as create, update, or delete. Then of course you have users which you use to tell the audit service which kubernetes users or groups are to be audited. And finally namespaces, which as it implies just defines the Kubernetes namespaces you want to include in the audit collection. As I said before, the audit policy object is very flexible and configurable depending on your various needs. When creating your Kubernetes audit policy configuration, there's a lot to consider and it can be pretty intimidating at first. In general though, here are some good best practices to follow. First, clearly define the audit policy scope it's important to define the scope of the audit policy configuration object and identify the kubernetes, resources, verbs, users and namespaces that need to be audited. This will help ensure that the audit policy is focused on the areas that require auditing and is not overly broad, which can result in a bunch of spam that isn't really useful to anyone. It's hard to parse, expensive to store and obfuscates actual useful important log events. If you have millions and millions and millions of log lines, it's going to be really hard to actually access the good data, the useful data that you're wanting to keep track of. Another good practice is to use meaningful audit rule names. It's important to use meaningful names for audit rules to ensure that they are easily understood and maintainable. Names should clearly describe the event being audited, the resource, verb and other relevant attributes just as we all know, maintaining legacy code can be challenging. The same thing applies to audit configurations. You want to do yourself a favor for the future and make sure that you'll be able to parse what you wrote. Another important step is regularly reviewing audit logs regularly reviewing audit logs is an important step in maintaining the security and compliance of the Kubernetes cluster. It's important to establish a process for reviewing audit logs and to regularly review them to identify any anomalies or security risks. SiEM tools, security information and event management tools can help with this task. Another important step is to use a dedicated storage solution. Storing audit logs in a separate and dedicated storage solution can help opensource that they are protected and available for analysis and review. It also helps save space for the actual functioning of the cluster. It's important to use a secure and reliable storage solution that can handle the volume of audit logs generated by the Kubernetes cluster. S three, for example, is a very popular place to store audit logs, and from there you can pipe them to different solutions and have monitoring and alerting tools in place. Similar to the above, it's really important to aggregate your logs. This is especially important if you have multiple clusters or if you have many nodes in a single cluster. But aggregating all of your log data into a single location makes it much easier to filter, ingest and manage that log data. It helps with observability and compliance as well. It's easier to show an auditor one central secure location rather than having to prove compliance for dozens of different infrastructure resources. You're leveraging to help with logging while the native Kubernetes API logging is powerful by itself, all of the logs in the world are useless if you aren't actively monitoring them. Audit logging is more than just a postmortem reactive solution to help you figure out what happened after your cluster is already compromised. If properly configured and monitored, it can be used to prevent attacks as they happen, rather than just used to look for something or someone to blame after the fact. The simple truth is that, especially at scale, it's completely impractical for a security team to constantly be looking at these logs themselves manually. Luckily, there are a few great Opensource tools that can help. One of these tools that I really like is Falco. Falco is an Opensource cloud native runtime security project that can be used to detect an alert on anomalous behavior in Kubernetes clusters. It can be used to monitor an alert on Kubernetes audit logs, and it supports a wide range of rules for detecting security threats and policy violations. Falco can also be integrated with external systems for alerting and incident response. Another great tool is Openraven. Openraven can collect audit logs from kubernetes clusters, including API server logs, and logs from other Kubernetes components. A great feature of Openraven is that it can centralize these logs from multiple Kubernetes clusters, making it easier to manage and analyze them. Openraven can also analyze these logs to identify potential security threats and compliance issues. It includes pre built compliance rules for various regulations such as PCI, HIPAA, and GDPR, and it can be customized to meet specific compliance requirements. Another important feature is Openraven's real time alerting. This tool can send alerts for potential security threats or compliance violations based on the analysis of your audit logs. It can also integrate with external incident response systems for automated incident response. One drawback, however, is that it can be pretty difficult to configure, especially if you have a multicluster setup, and managing that complexity can be costly. Another good tool out there, though, is elastic. The elastic stack is a suite of open source tools that can be used for log management and analysis. It includes tools for collecting, processing, and analyzing logs, including Kubernetes audit logs. The elasticstack can be used to centralize these logs from Kubernetes clusters and it includes features for searching and analyzing this log data. Elasticstack can centralize these logs and allow for easier management and analysis. It can provide real time analysis of Kubernetes audit logs, allowing for faster detection and response to security threats, threats and compliance issues. It also comes with kibana, a powerful visualization tool that can help in understanding the logs and identifying trends, patterns and anomalies. It's pretty similar to Grafana, another honorable mention in our Opensource tooling. While all of those other solutions are great and a huge step up from just sifting through logs manually, none of them address the big picture of Kubernetes audit logging and security. This is in large part due to them missing the key component of access. Configuring access to your Kubernetes cluster managing who has access to what resources, when, how privilege escalation is handled, and providing can of custody over all of your different resources can be a huge hassle. Access is not divorced from audit logging practices, however, as a key part of audit logging is knowing exactly who or what, in the case of machines and automated workers, is executing commands on your Kubernetes cluster. Open source Teleport, which is a secure access control platform for managing access across your infrastructure, solves all of these problems while also centralizing your audit logging not only just for your Kubernetes resources, but for your SSH database, Windows RDP and application access. Centralizing your audit logging at scale for organizations requires you to go beyond just your various Kubernetes clusters. For a truly secure infrastructure setup, you need to implement all of the previous principles and best practices across your organization that tens all of your various infrastructure resources as soon as you have siloing, whether it be at the cluster level or at the cloud resource level. This creates much more overhead, meaningless duplication and headache for both your security engineers and cloud administrators. Teleport coupled with fluentd, which handles all of the plumbing so to speak, the formatting, exporting and consolidation of your logs is the ultimate solution for Kubernetes audit logging. With teleport, you can tie every event in Kubernetes to can identity, meaning that you'll know exactly who did what on any given resource. Even at the Kubernetes pod level, each event is audited based on your configured audit strategy tied to the entity's identity, mapped to a teleport user with a teleport RBAC role. This makes it easier than ever to configure secure access and thus ensuring secure use and best practice enforcement across your entire cloud ecosystem and this is not just for human engineers. Teleport machine id ensures that every microservice, process or automated worker node also has an identity in the form of short lived X 509 certificates, eliminating long lived credentials, access silos, and allowing for a full rich audit log in real time. Teleport fully eliminates secrets, replacing them with short lived certificates tied to a user's identity. And again, this is for every piece of infrastructure, not just your Kubernetes resources, centralizing everything, allowing for easy monitoring and log management for not only your Kubernetes cluster, but for every resource in your stack. Another powerful feature of teleport is that it actually allows for session playback of kubernetes sessions conducted over SSH, meaning that if someone is accessing a node in your cluster, you'll be able to prevent obfuscation of commands, allowing you to see exactly what is happening on your cluster. Teleport acts as a gateway for all of your resources, ensuring security and compliance across your entire infrastructure. So let's take a look at exactly what I mean when I say that it consolidates all of this access and audit logging into one place. So here we are in the teleport web UI. The first use case I'm going to show you is the session recording when you're accessing your kubernetes clusters over Ssh. So here we have our kubernetes cluster. It's called Cookie, which of course, and here we have all of our servers. Down here we can see this server called Kate's host, and this is actually the server that's hosting our kubernetes cluster. So if we log in, we can actually open an SSH session directly from the web terminal. And all of this session data is tied directly to my user and identity. So we can go ahead and execute a couple of commands here. We can say Kubectl git pods a we can see all these pods, we can go ahead and describe this one here, Kubectl describe, describe pod colormatic. We can see all of the pod's information. We can see the container id, which container image it's using, and some health about what the pod is doing. We can go ahead and exit this session now that we know our pod is functioning the way we should and that we know the container image now it's correct. We're going to go ahead and exit this session. Then we can come back in the web UI and go into our management session here. We can see when the session started and that the session has ended. We can go into the session recordings and actually view exactly what we did. And as we can see, these are the commands that we just ran within the session. And this is actually not a video, it's a rich Json log describing exactly all of the commands that we ran, which means that this whole session can be forwarded to other Siem tools or other logging management tools that we can actually monitor these so you can play back these sessions based on every command that we ran. So the next thing I wanted to show you is how we log into a Kubernetes cluster without using a teleport managed ssh node. So in this case I'm going to be using my personal workstation. So in here in our web UI, we can go to our Kubernetes resources and we find our cookie. So this is the same cluster that we were using before, but now we're going to access it from my workstation. So first we're going to go ahead and log into our teleport cluster. We're going to execute this Tsh login. Here's the address of our proxy, which is the publicly accessible address of our teleport cluster. And we're going to go ahead and log in. Great, we're logged in. This used the same authentication method as before. It logged in through my GitHub. So we're using GitHub as an SSO here. Next we're going to select what Kubernetes cluster we want. So right now we can do Tsh Cube ls and we can see all of the Kubernetes clusters that we have available to us. Right now my user role only has access to the cookie cluster. Next we're going to go ahead and log in to our Kubernetes cluster and this will actually give us the Kubeconfig from teleport. Going to take a second. Great. So we're logged in. Now let's try to get all of our pods. Awesome. So now we're in and we can run our commands on the cluster. So let's go ahead and do what we did before and let's describe this colormatic pod here in the namespace. Colormatic, great. So we can see all that same information that we saw before when we were connected to the direct host. Now even from my personal workstation we're securely logged in through teleport and can actually run Kubectl commands on our cluster. Now if we go back in our web UI we can actually see the results of our session here. So we can see that the certificate was issued for my user. We can see all of the details about that, the AWS role arns that I have access to, all of the various metadata for this session. All of this is tied directly to my identity and we can see all of the various Kubectl commands that we ran. We can see the request to the cluster and we can see all of the various metadata for that. We can see the Kubernetes users, the teleport login, the namespace, all of the different protocol information. And like I said before, all of this information is very easy to export and ingest in a seam tool for easy monitoring and easy alerting on anomalies and various other things. Because all of this is just raw Json that we can choose to use however we want and this is how we use teleport to from my workstation access it securely. All of the traffic passing through the teleport proxy all being centrally logged in one location. So that was teleport in action. Thank you so much for watching and check out some of the other talks. We got some great ones here at Comp 42 cloud native 2023. You can also check us out on Slack at teleport slack.com. I'm always hanging out there and am totally free to answer any questions you may have or any clarifications. Or if you need help getting started with Teleport, you can also check us out@teleport.com you can sign up for a cloud trial for our enterprise solution or download our open source version and try it out for yourself. However you start your journey with teleport, it's the easiest and most secure way to access all of your infrastructure. Thank you SoC much. Have a great day.
...

Kenneth DuMez

Developer Relations Engineer @ Teleport

Kenneth DuMez's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways