Let's talk about Kubernetes Cluster Monitoring

Video size:

Abstract

During my presentation, I will go into detail about why monitoring is important and how it enhances the infrastructure workflow. This talk will also explore why sometimes monitoring becomes challenging in the Kubernetes world. We’ll also explore how can we implement Prometheus, Alertmanager - an open source monitoring system and Grafana - an open source analytics platform in our clusters. You’ll leave this talk with a better understanding of how can a Kubernetes cluster be made more powerful by adding monitoring tools within it.

Summary

Twinkll Sisodia, software engineer at Red Hat, talks about why monitoring is so important. In the demo, he walks you through how an application is deployed on Openshift and how it can be monitored efficiently. Also looks at how Grafana can be used to visualize metric data.
Using Prometheus and alert manager to send alerts to slack. Now let's see how we use that data to turn it into insightful graphs and visualizations using Grafana.
We deployed an app, we deployed observability operator which installed Prometheus and alert manager. Then we deployed Grafana operator and its components. And lastly we imported custom dashboards to see insightful graphs us.
So here I would like to thank everyone. I hope you all enjoyed it. If you want to get connected, I'm there on LinkedIn and if anyone wants to do like a hands on you can visit my GitHub repository.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Everyone, I hope you having a great time in the Kube native conference. I am Twinkll Sisodia, software engineer at Red Hat and I work with red Hat partners to build their robust cloud native architectures. So today we'll be looking into why monitoring is so important. We'll look closely into what Prometheus is, its usage and its components. Then we will look into Grafana, how it can be used to visualize metric data, and lastly in the demo, I'll walk you through how an application is deployed on Openshift and how it can be monitored efficiently. We'll also be using an observability operator which will deploy Prometheus and alert manager instances for us. So as we all know how cctv cameras are used for safety and security purposes. Similar to that, we have few tools like Prometheus Grafana which is like cctvs for our system, say your cpu, your memory utilization reached critical limits, or your Kubernetes resources like pods deployments failed. So in this case monitoring can help and will minimize the risk of server down or unavailability of resources. And with that it will also help in proactive management of clusters. So for monitoring, we have few open source tools and one of them is Prometheus. Prometheus collects and stores its metrics as time series data, and it was designed to monitoring highly dynamic container environments like kubernetes, Docker, swarm, say there are many servers on which containers are running and they are all interconnected. Now, maintaining such complex systems becomes really very challenging and to make sure that everything runs smoothly without downtimes. Now imagine having multiple such infrastructures and you have no idea what's going inside it, either in the hardware level or in the application level, like errors, response latencies, overloaded, hardware down, or maybe running out of resources, et cetera. So this complexity would be minimized if you have a tool which constantly monitors your resources and activities, which is happening inside the cluster and alerts whenever something critical happens. So all this automated monitoring and alerting is what Prometheus offers as part of modern DevOps workflow. Now, for us to enable monitoring, we would need few Prometheus components, and I'll start with service monitors. Service monitors specify which services should be monitored. In place of service monitor. You can also use pod monitors. The difference is it specify which pods the Prometheus should monitor. Next we would need Prometheus rules. Now, Prometheus rule defines recording and alerting rules. Recording rule allow you to pre compute frequently used data and alerting rules specify when should we get the alerts like setting up the thresholds. Next we'll need alert manager config which specifies config for the alerts and custom receivers like Slack, pagerduty, etc. So this is a short glance of service monitors in this namespace selector has all the namespaces it will monitoring. The selector has the label for the app blue which it will match. And lastly the endpoint is HTTP port. Next is the Prometheus rule. It contains alerting and recording rule. In this example the app request per minute is greater than 20, so it will send low load alert and so on, so forth medium high. Next is the alert manager config secret. It has the API URL for the slack workspace and it has the channel name to which all the notifications will be sent. So so far we have seen how Prometheus works and how it collects and stores its metrics as time series data. Now let's see how we can visualize those data effectively on Grafana. And what's Grafana? Grafana is an open source software which enables us to query, visualize, alert on, and also explore metrics, logs, traces, wherever they are stored. Grafana provides us with tools to turn the time series database into insightful graphs and visualizations. Now these are the Grafana operator components we would need. So on the Grafana side we would need Grafana data source and Grafana dashboard. This is a short glimpse of how the data source manifest looks like it takes the Prometheus service URL, it takes the type the database type as Prometheus. So now this is an architecture diagram I'll be implementing in the demo how I monitored an application and got metrics out of it and visualized on Grafana. So here you can see we will deploy on an openshift dedicated cluster. We'll have a blue application in the blue namespace, an observability operator in the monitor namespace which is responsible for creating like instances of Prometheus and alert manager. Here Prometheus will scrape the metrics from the blue app and it will send alerts to alert manager, which will then send the alerts to slack as notifications. And lastly, the Grafana dashboard will visualize metric information in the form of graphs. So now let's move on to our demo. On the right hand side you can see the red hat openshift dedicated cluster, and on the left bottom corner you can see the slack workspace where all the notifications and alerts will be coming. So on the openshift dedicated cluster we have two namespaces, one for the blue application which is deployed already, and the other for the observability operator and its instance which is up and running already. Next I'm going to create the Prometheus components like service monitors, Prometheus rules, and alert manager. So I'll create the service monitors. Service monitor is up. I'll create the Prometheus rules. After that I'll create alert manager secret okay so the Prometheus components are in place. Next I'll create the cluster role and cluster role bindings so that the monitor namespace will have the permission to scrape the metrics from the blue namespace. The cluster role blue view is created. Next I'll create the cluster role binding so the role binding is now created. I'll port forward the Prometheus pod and let's see how the Prometheus dashboard looks like. So this is the Prometheus dashboard. If I navigate to alerts I can see all the lets like high medium, low. If I navigate to rules I can see the recording rules and the alerting rules. And lastly if I go to targets I can see the blue application which we have deployed recently up. Now let's trigger this blue application and see how we get the alerts on slack. So I'll created and curl it for at least 25 times. Youll once the threshold is met we can see the alerts popping up on the Slack channel. This shouldn't take time, should be like 25 to 30 seconds. So you can clearly see that the alerts are getting triggered low load, medium load. So on expanding one of these alerts we can see the metadata like where this alert is coming from. Like the alert name, the container name, endpoint IP address, the namespace path et how. So this is a small use case of how an organization can use all these monitoring tools. Like Prometheus, we can use slack alert manager to enhances the workflow and this is how one can minimize the risk of downtime. So far we have seen how we used Prometheus and alert manager to send alerts to slack. Now let's see how we use that data to turn it into insightful graphs and visualizations using Grafana. Let's move to operator hub and install Grafana operator. I'll install it into the monitor namespace and once the Grafana operator is installed I'll go forward and create its instance and data source. And then we'll port forward the Grafana pod to look how the dashboard looks like. So the Grafana operator is installed, I'll go forward and create its instance. The instance is created. Next I'll create the Grafana data source and the data source is created. Now I'll see if the pods are up and running or not. It is not. Okay, now it's up and running. So I'll port forward the Grafana pod at port nine. At port 3000 it's put forwarded. Let's put forward to 3000 and sign in with the same username and password I provided in the Grafana instance. Now before proceeding, I'll just quickly confirm if my data source is working. See when test my data source is working fine. I'll navigate to import and quickly import my sample dashboard which I created. Now you can create your own or just import it from the Grafana website. I'll rename it to blue dashboard and import it here. You can see we are getting different metrics. Starting with alerts. We can see which alerts are being triggered recently. So high load, low load and medium load are alerts which are triggered. What was the alert state, which container it was, what was the endpoint, et cetera. Next we see that blue request per minute metrics. So this metrics show that how many requests were there per minute for the blue application. Apart from that we can see response status, process, cpu, and lastly the up metrics. The up metrics show that how many containers are up currently. So there are one out manager, one for blue application and one for Prometheus which are up and running. So this is how you can use a data source like Prometheus and convert the data into insightful graphs and visualizations. This will help can sre to be mindful of all the resources and all the costs involved. And with that it will also help organizations to minimize their downtime. And that concludes my presentation. Just to summarize what we have discussed so far. So we have talked about the importance of monitoring. We have discussed about Prometheus Grafana components involved in the demo. We deployed an app, we deployed observability operator which installed Prometheus and alert manager. And finally we sent alerts to slack. Then we deployed Grafana operator and its components. And lastly we imported custom dashboards to see insightful graphs us. So here I would like to thank everyone. I hope you all enjoyed it. If you want to get connected, I'm there on LinkedIn and if anyone wants to do like a hands on you can visit my GitHub repository. It has all the in depth details. Read me for that. So yeah, thanks everyone.

Slides

Download slides (PDF)

See all 13 talks at this event!

Conf42 Kube Native 2022 - Online

October 20 2022

Let's talk about Kubernetes Cluster Monitoring

Video size:

Abstract

Summary

Transcript

Slides

Twinkll Sisodia

Software Engineer @ Red Hat

Join the community!

Featured event

2025

2024

Info

Conf42 Kube Native 2022 - Online

October 20 2022

Let's talk about Kubernetes Cluster Monitoring

Video size:

Abstract

Summary

Transcript

Slides

Twinkll Sisodia

Software Engineer @ Red Hat

Join the community!