Kubernetes Monitoring and Observability

Video size:

Abstract

Learn how to utilize monitoring and observability for any Kubernetes environment in production. Explore the tools that you’ll need, including DataDog, Prometheus, Grafana, and OpenTelemetry. Understand the “why” behind monitoring and observability, along with the key differences in using them.

Summary

Monitoring observability. It's obviously incredibly crucial whether you're on prem cloud kubernetes or standard containers. The homegrown or open source style solutions and the enterprise solutions. We're going to walk through installing both.
Cube Prometheus is a combination of Prometheus and grafana. It gives you a bunch of dashboards out of the box that are all kubernetes related. You can also write your own dashboards. It's open source, but there are still costs.
Mark: First thing you're going to want to do is sign up for Datadog. This sets us up for high availability. We have everything under one roof. If you want that enterprise grade feel, you can use a SaaS solution.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Monitoring observability. It's obviously incredibly crucial whether you're on prem cloud kubernetes, standard containers, wherever you're running, you need to understand what's happening in your environment, whether it's monitoring. So graphs, alerting, seeing everything on a screen, understanding what's happening from that perspective, or observability, which is more around the idea of taking action. My name is Michael Levan. Welcome to my session at Conf 42. We're going to dive into a bunch of hands on stuff, but primarily I'm going to show two different realms of focus. The homegrown or open source style solutions and the enterprise solutions. We're going to walk through installing both. We're going to do data dog on Q Prometheus and we're going to talk about the differences, which should hopefully help you decide which one you're going to go within your organization. Let's go ahead and jump right in. Let's start by diving into cube Prometheus. Alright, so I'm going to open up my terminal, do a quick cube Ctl, get nodes here we can see I am on an EKS cluster. It's not going to matter though, if you're on aks GkE on prem, all these steps should be relatively similar. Okay, so the first thing that you're going to do, you're going to want to add the helm chart for Q Prometheus. I essentially do everything via helm chart. Why? It's a great package manager. It's much better than just going and calling out to a bunch of kubernetes manifests. And instead of again using 567 different kubernetes manifests, everything is under one roof. So I typically go with helm charts. Next, going to go ahead and update the repo. Once that's done, we will install Cube Prometheus. Okay, now as the name sounds, Cube Prometheus is going to be a combination of Prometheus and grafana. Can you install these separate? Absolutely. But the reason why I actually like to do it together is because Kube Prometheus gives you a bunch of dashboards out of the box that are all kubernetes related. So let's say I just install Grafana and Prometheus separately. I'm not going to have any dashboards, but if I install Kube Prometheus, it comes again pre installed with all these different Kubernetes dashboards, which we'll go ahead and take a look at in a second. And once this is installed, it takes of course a little bit because there's a bunch of different pods that need to come up. You can forward and look at Prometheus via port forwarding, or you can just go ahead and hit Grafana, right? So let's go ahead and do this. That way we can get a nice visual, right? And then let me go ahead and open up a web browser. Web browser is up. We can see that here. I'm going to go ahead and just take a look here. To log in, the default username is admin, password is prom operator. So admin from hyphen operator. And now we're logged in. So what I was referring to before, if I go to dashboards, notice here how I have all these different Kubernetes dashboards. You will not have this by default. And of course if you want to, you can import a new one. So for example, if we just take a look here, we have the argo cd dashboard, for example. So what I can do is I can actually copy the iD, go back new, import, paste that id in load, you can see it is in fact argo cd import, and then boom, we have the dashboard. So it's pretty straightforward. You can also write your own dashboards. I believe they're still written in Python, at least they used to be, but nonetheless you can create your own. But there are a lot out there already, so don't reinvent the wheel if you don't have to. But if I go back to dashboards here and let's say I click on Kubernetes API server. Now, I haven't made any requests or anything to this, so it's probably not the best, but we can see here again, another dashboard, compute resources, some cpu information, some memory information, etcetera. But point being is we can see the dashboards work and then if we want to, we can get alerting on various dashboards and all that fun stuff. So this is the monitoring piece, and if you want the full observability stack for logs, traces, metrics, you're gonna have to do prometheus, which is already here, and then tempo and low key for traces and logging, and then you'll have the full monitoring and observability stack. But there are a couple things here, and it's not necessarily a bad thing, it's just you got to kind of figure out what option you want. So this is the homegrown solution. This is open source. I'm not paying for anything, okay? But I actually am, right? I'm paying for engineers to manage it, I'm paying for infrastructure, because this has to run somewhere, so there are still costs. And again, this isn't a bad thing. It's just all going to be dependent on your organization. If you're a startup, for example, and everybody's already working 13 hours a day, adding another tool may not be the best method. Or maybe it is, again, depending on how the organization is structured. So let's say you want all these tools, monitoring and observability and even APM and alerting and a bunch of other stuff under one roof. Maybe it's a SaaS so you don't have to manage the infrastructure or anything like that. Probably want to look at a enterprise paid solution. Okay. And that's kind of what we can get with Datadog. Now, with Datadog, again, we get everything under one roof, metrics, logs, full monitors, service management, infrastructure management, APM, all of it. All we have to do for this is if I go under my and I click on API keys, right? I'm going to have an API key here. I'll go ahead and I'll just create a new one. We'll just call it con 42 create key, right? And then now I have this API key. So if I copy it, I'm going to head back over to versus code. Okay. And I'm just open up a new terminal here and I'm going to paste in that API key, my cluster name, ks. Quick start. Okay, first thing you're going to want to do, going to want to sign up for Datadog. It's free to sign up. You're not going to be paying for anything. I've been doing demos on Datadog for a long time now and haven't got a bill because I just delete my stuff right away. Okay. But I'm going to set these environment variables. I'm going to use helm. Okay. So if you don't have the data dog helm chart, you're going to want to add it and update it. And then I'm going to use this fairly large helm installation. And the reason why is because this sets us up for high availability. So we're going to see, you know, multiple replicas, cube state metrics is enabled, we're enabling logging, we're enabling all the logs for the containers. So let's go ahead and run this and it may take maybe two to three minutes to actually see all the information within your environment. Right? So if I head back over here, I click finish, I'm going to go to dashboard. Oops, sorry, infrastructure and kubernetes explorer. Okay. And we can actually see all this stuff in here right away, but I want to click on one other. Let's see kubernetes overview. Okay, here it is. So if I check here, I can see my cluster, I can see all my namespaces. See the monitoring namespace, right. Because we deployed Q Prometheus. And then if I click on explore, I can see everything running here. So if I look into one of these pods, maybe, you know, one of the Q Prometheus pods, we can see the cluster, it's on the service that it's in. Well in back of the monitoring namespace, the host, the deployment, replica sets, ips, everything. We can see everything here, even the metadata. Okay. We can see any related resources which this is actually really cool. It's a little graph here that we can see. Right. Troubleshooter. I don't think we have anything on. Status is ready. Alright, so we're good to go here. So we have the pod phase, which is actually nice. We get a little bit of different information here, logs, if we turn them on. So any logs that are coming in through the pod, okay, metrics, etcetera. So point being is this, we have everything under one roof. Of course if we install it, we have to install different things for trace and stuff. But everything is under one roof. Okay. So we can dive down. We also have a visual of this, right? So we dive down, we see our clusters running, we see our namespaces, see all of our workloads. Okay. We see our networking. And this is really solid. Now, Datadog is expensive, don't get me wrong. But again, this is a good implementation. If you want that enterprise, I don't even want to say enterprise grade feel because you can get the same feel from Grafana and the Prometheus stack. But if you want that SaaS based solution that's set up for you, you just have to run a couple of installations or even just one. You got support behind you all that. Data dog is a great implementation. Again, just keep in mind, you know, never think that you're not paying because I know a lot of people go open source because they don't want to pay. Either way you're paying. You're either paying engineers to manage it and the infrastructure to run it on, or you're paying a SaaS solution. It's really going to be up to you at the end of the day. Thank you so much for joining me for the session. Really do appreciate it. Hope that you enjoyed it.

See all 22 talks at this event!

Conf42 Observability 2024 - Online

June 13 2024

Kubernetes Monitoring and Observability

Video size:

Abstract

Summary

Transcript

Michael Levan

Trainer, Consultant & Content Creator

Join the community!

Featured event

2026

2025

Info

Conf42 Observability 2024 - Online

June 13 2024

Kubernetes Monitoring and Observability

Video size:

Abstract

Summary

Transcript

Michael Levan

Trainer, Consultant & Content Creator

Join the community!