Conf42 Site Reliability Engineering 2021 - Online

System State Clustering using eBPF data

Video size:

Abstract

The field of system observability has been greatly enhanced by the application of eBPF. eBPF generates data at critical points in the execution of a system and that data is used for observation via software like Sysdig and Cilium. I propose to utilize the data generated for system state clustering. This is an application of machine learning to the above data to understand if the system is behaving properly or not.

The amalgamation of machine learning and system data generation in real-time would open the doors to a plethora of applications like system state prediction, preventive replacement of system components aided by ML. This talk will take the attendees through an idea of how this could be done.

Summary

  • Reliability Engineering Conference presentation on system state clustering using EBPF data. You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus Cloud.
  • EBPF programs are event driven and are run when kernel or application passes a certain hook points. Hook points include system calls, function entry exits, kernel trace points, network events, and several others. Program can pass data back to the use space via some data structures called maps.
  • We are going to talk about a simple use case of machine learning called clustering. Clustering is an algorithm that can help cluster different data points into classes. EBPF can help us understand the basic characteristics of a system behaving.
  • Next we'll talk about some of the existing open source software that actually use EBPF. Very good software at providing full system view related to system reliability engineering or site reliable engineering.
  • Pixie adds EBPF probes to provide access to metrics, event stresses and logs. Like cilium, Pixie also has a CLA and a live UI. We use both of these software to understand the observability aspect of the site reliability engineering.
  • XDP is the earliest in the system that an interception of a network packet can happen. If you're able to filter your traffic at that point itself, that means you are going to save a lot of load on your system later in allocating socket power. This is one of those use cases that we are trying to explore a lot more in almost all our systems.
  • So that brings us to the end of this particular session. I hope it has triggered some sort of thinking as to how EPPF machine learning and SRE can work together. Now I'm open for questions.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE, a developer, a quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus Cloud hello and welcome to the conference site Reliability Engineering Conference presentation on system state clustering using EBPF data. What is our agenda for today? We will be looking at what is EBPF? What is clustering, how EBPF and clustering can work together, how EBPF and SRE work together, what are potential use cases? And we'll be wrapping up with some question and answers. What is EBPF? EBPF programs are event driven and are run when kernel or application passes a certain hook points. Now, what are hook points? Hook points include system calls, function entry exits, kernel trace points, network events, and several others. Once your execution passes that hook point, from there on you can run some of your own programs, nothing but the EBPF programs. And those EBPF programs can pass data back to the use space via some data structures called maps. Just to give you an example, let's say you want to know how many times a file is being opened. So that is nothing but a file open system call, right? So you can have an EVPF program defined that actually doors on to this particular system call, which will be triggered when your file is opened. And once that program is triggered, you actually get control into that, and then you can pass data in and out of the map, which will actually. In which you can maintain account of how many times a particular file is opened or not. So that's the basic fundamental of how EBPF works right now here, explaining the same thing which I was talking about earlier. So here we have two diagrams here. Like one is a storage hook EBPF, and the other one is a network hook, EBPF. So whenever I issue a syscall, which is going to do a write or read on a particular file, that syscall actually reads the file descriptor, and then via the block device, it actually reads and writes to the storage. Right. And that's where you can actually hook the EBPF program, and you can trigger that program to understand which file is opened or how many times a particular file is opened. On the right hand side, if you see the same kind of behavior you can get for, let's say you want some program to be triggered in case of network activity, like some sort of traffic you have received, or some sort of. Whenever you want to get information of receiving a request over a network, you can actually hook an EVPF program into that and then pass the data about that program back to the user space via map. So this is how essentially it works together, delving into it a little bit more. So how it happens is actually you write code. You write the EBPF code and then you compile it into a bytecode, and then that bytecode. Actually there is an EBPF verifier that actually verifies whether your code doesn't go out of the bounds that has been set of an EBPF program. So if it is going out, then the verifier will not execute the EBPF program, because basically your program is going to execute in the kernel, right? So we can't afford unsafe programs being executed there. So once your program satisfies all the verifier and jit bounds, it can be attached to many hook points, right? We talked about hook points. So one of those points could be like a class ingress or any sockets. So you can have socket filters, or you can have TC ingress or TC ingress. So you can attach your program in these kind of places, and then whenever you see any traffic coming or any particular network event happening, your program will get triggered. That is essentially how one of the software that I'm going to discuss later, cilium, that's how essentially it works. Right. Here we look at an example of an EBPF program, right? In this one. This is actually an example of how many times a kernel function is entered. Now our function here that we're trying to trace is exec v Cecv, right? So you see on top that there is a map, right? That's a k probe map, which basically says that how many times in this map you are going to maintain the number of times this particular function was entered, right? And that is the map. And in the program section in the functions, you see the k probe execv function. In that one, you actually have a map lookup element. So this is basically going to look up into the map and see how many times already it's there. And it's just using to update the map. It's going to fetch and aided one and using to update the map so this program can be attached. And then once the program is triggered, it will automatically implement or increment the number of entries, which basically means the number of times this particular function has been triggered. Good. So now we talked about EBPF, right? So EBPF helped us understand how at a certain execution points we can actually understand the way the program is getting triggered. Right. Now, how can that be used with machine learning? So we are going to talk about a simple use case of machine learning called clustering, right? So what is clustering? So, clustering is nothing, but if you have a lot of data and you want to make sense of what group each data belongs. So, for example, in this particular figure, right, you have lot of data points, and the color of the data points basically is telling us the type of data point. And if you want to actually maintain a fine line between all these three different types of clusters. So this lining, or the grouping up of different clusters, different data points into clusters, is called clustering, right? So it's an algorithm that can help cluster different data points into classes, right? And the data points that are similar will be closer to each other. Right. And when you represent them dimensionally, ideally it should be closer to each other. And the algorithm that helps us to understand or to do that, clustering, there are many algorithms can do that. So the methodology is clustering, and there are many algorithms to do that. Right? Now, here we look at an example. It's a very famous example. Whenever any person starts machine learning, the first thing I believe that I had done was the iris example to actually understand how clustering works, right? So iris is nothing but a data set of flowers, right? There is this particular flower which has the data that is provided is length, length of the sepal, width of the sepal height, and those kind of things. So, petal width, sepal length, petal length are provided. And these are three dimensions that are provided. And when you actually group them, you can actually see that the similar flowers or the similar flowers actually tend to light together. And that's what we are seeing. The green, yellow and the purple, they are actually similar data points and they are tending to light together. Right? Next, so we looked at what EBPF was and what clustering is, how both of them can work together. So what is the basic use case of? Why would we want them to work together? So EBPF can help us understand the basic characteristics of a system behaving right now. So how many network calls are coming, how many times this particular file is getting opened? Right? So these are data points. So these data points are generated by EBPF. Right? Now, once you have the data points, we can use clustering to actually cluster those data points and understand and basically create some sort of domain in which we say, okay, this behavior, this system behavior is good. This system behavior is not good. And that differentiation in domain can be done by our machine learning logic. Which is the clustering logic, right? So we have EVPF that generates the data points. Clustering can actually clustering the data points, right? So initially, to help us, so every machine learning problem is essentially a label problem, right? So initially we can label some data, right? We could actually label them positive, or we could label them negative. And then in a production scenario, we can use the model that is created to actually read whatever EBPF generates the data we can actually subject to the model. And then the model will tell us whether in the real time whether that is a good system behavior data point or bad system behavior data point. So that's how we can use the function together, to take an example. So we have something called express data path. In EBPF, express data path is actually the earliest. So whenever you get a network traffic incoming, there are a location of sockets to it, socket buffers are located to it, and then that socket buffer location happens in the kernel, and then the traffic moves up. But even before that, we have an express data path functionality in which even before the socket buffers are located, we can actually understand some characteristics of the incoming network request, right? And along with XTP, we can add our filter programs, right? So we can attach EBPF program to the XDP, and that EBPF program can actually help us understand the data about the received packets of the network, right? So there is a EBPF program that you have attached to an XDP filter, right? And once the network packets keep on coming, your program writes data about all these received packets into a map and user space. Program then reads the map and pushes, let us say it, to any data store, and then from that data store, your clustering algorithm can actually understand whether this incoming network request is normal or not. Right? If now there are a lot of requests coming in, right? It doesn't make sense to actually predict each and every data because there is an infinite number of data requests that are going to keep on coming. So you can apply some sort of a time series algorithm on top of all these to understand, okay, is it gradually building up into a series of not normal requests? And if so, that means there is something wrong that is happening. You could actually be facing a DDoS attack, or you could actually be facing a denial attack. So these kind of things can be estimated or predicted prior to that event already happening. So here we see how EBPF can actually help in understanding whether a system behaving is in the right domain or is in the wrong domain. Right? Next we'll talk about some of the existing open source software that actually use EBPF. And they are also be very good at. Very good software at providing full system view related to system reliability engineering or site reliable engineering, right? So you can actually install these software on your cluster, and you can actually have a single view of how your entire cluster is behaving and how do they do that. So we are going to discuss cilium and pixie. Both of these software actually use EBPF as their fundamental base to hook on to many system or kernel points. Like for example, network events or file events. They hook onto these system points and then they generate data. And then there is an API that actually captures all this data and then shows it as a UI to the user saying that, okay, this is what is happening in your cluster right now. And that is what essentially site reliability engineering. Site reliability depends on observability, right? And observability is enabled by software like cilium and Pixie. Not only observability, there are a lot of other things that are also enabled. For example, network traffic control. So cilium actually does a lot of load balancing. There is a lot of network policy management, bandwidth management and all that. And finally, you can also have operations and security metrics also available via the Hubble framework. So, Hubble framework is the one which actually reads all the data which is generated by cilium and provides it as a UI. So what does cilium do? Cilium actually provides the right level of information for troubleshooting application and connectivity issues. Right? Now, if you look at this diagram on the right hand side, right? So you see that there is a NIC card, right? There is a network card, the kernel, there is a EVPF program that is hooked onto this network card, right? And in user space, you have installed cilium, which is actually connected to the CNI. And there are pods that are running. So what happens is anytime your pod issues a traffic, or receives a traffic, cilium can track it, right? Cilium tracks it using EBPF doors and then sends it to data spaces, those databases, nothing but maps. And then those maps, those maps are like. Or ring buffers. Ring buffer is a map, right? So those ring buffers keep on getting populated, and then once they're getting populated, it automatically keeps on refreshing the data on your framework, on your hubble framework. So that is the UI. So essentially how it works is EBPF is doing the high volume work of understanding the requests and those requests. The data about the request is passing onto the map. Cilium reads those maps, provides it as a UI. So that's how this whole cilium can hook into the observability factor of SRE. The next one that we are going to discuss is Pixie. Now, pixie is similar to cilium. It's very, very similar to cilium. What it basically does is it adds EBPF probes to provide access to metrics, event stresses and logs. There are pixie scripts, right? So Pixie has scripts that you can run, so you can execute PX script commands and you can get data out, and you can actually get data out in JSON format, and it's very user friendly, right? Just like cilium, Pixie also has a CLA and a live UI. So if you look at this diagram, right? So pixie actually works on something called pems. Pems are nothing but pixie edge modules. They are nothing but daemons that run on all the nodes. So those demons are actually hooked on. So those daemon processes are nothing but EBPF modules that are hooked on to some certain system events. So anytime the system event gets triggered, your PEM or your edge module actually transfer data to the process that is running on the cluster, and that is actually going to collate everything and provide it as data on the CLI or UI. Or that is a place where the pixie API runs. And then pixie scripts can actually read off that API and then provide you good results. So if you look at both of these software, what essentially they do is at the lowest level of the system, at the lowest level of the system, which is nothing but the network card, or at the kernel level, they run EBPF programs. And those EBPF programs actually collect data, and then they collate that data and they send it, or they help in the observability part, just by collecting the data from the base, like at the earliest, that system can get a request. That's where it collects the data from. So these two softwares are very common. In fact, we are using many components of these software to actually help in the observability aspect. We have many verticals in our SRE system, and one of the verticals being observability. So we use both of these software and the fundamentals involved in these software very rigorously to understand the observability aspect of the site reliability engineering. So we looked at these two software. Right now, what are the potential use cases? So like I said, how we can understand the system behaving system behavior can be understood by the data that the EBPF doors are generating, right? So there is system performance degradation. Check whether your system, whether you are so for example, system performance can be measured by the number of database reads. Let's say there is a database read. Every time there is a database read, you expect some minimum level of database reads. If that is not happening, that means your system performance might be going down. Network traffic check how can you actually understand whether the optimum amount of network traffic is being handled or whether there is lot of network traffic which is malicious is coming into your system? So that also can be tracked by EBPF hooks, right? Preventive maintenance. Preventive maintenance is nothing. But let's say you have an API that is not behaving properly as it is. So now that API can, API aware modules of cilium can actually understand whether the API is behaving properly or not. How many get calls are issued, how many put calls are getting issued. So those kind of things can be understand by these software. And then if the API behavior is not consistent with what is expected, then we can actually run sort of preventive maintenance, as in we can actually upgrade the versions or upgrade the backend or something like that to actually help us cover the multiple aspects of SRE. So these are some of the use cases. Now we look at one use case in particular. So let's take a network traffic use case, right? So like I said earlier, so we have XDP. XDP is nothing but express data path, right? Express data path is the earliest in the system that actually an interception of a network packet can happen. So even before socket buffer, even before SKB, you have XDP. So XDP is the earliest. So basically if you are able to filter out something at XTP range itself, your system will not actually waste much resources in allocating socket buffers and doing all the processing up the kernel. So if you are able to filter it under, which is basically the entry point, which is XTP. If you're able to filter your traffic at that point itself, that means you are going to save a lot of load on your system later in allocating socket power. So how does it work? Right? So there is a receive queue, it receives incoming requests, right? But at the traffic class ingress, we have actually hooked a BPF program. Right? Now, our BPF program, let's say we have a BPF program which says that, okay, from a particular IP, if there is a packet, I need to drop it. Why? I say that because I believe that that IP is a malicious IP, right? And that IP is maintained in the map. So the BPF map actually maintains that IP, right? So you have a packet that is coming from a received queue your BPF program checks the receive frame, right? That's the XTP receive frame it, check it, look at the IP address, and if the IP address matches the one on the BPF map, then it actually does the XTP underscore drop, which basically nothing but drop it. On the other hand, if I have to redirect it or send it somewhere else, so I don't need the packet to go up the kernel and then come back, I can actually send the XTP underscore TX itself. And at that point it's when the packet can go out to some other interface. So that's how we can do XDP redirect or HTTP transmit these kind of things. And if everything is passed, if you feel that everything is green, then you issue an XDP understand pass, which basically says that, okay, now you can go up the network stack, you can get the buffers allocated, and all those cases. If you look at this, right, your single EPPF program has done a lot of, has saved you a lot of effort in this particular use case. Why? Because let us say you are maintaining a list of ips, right? Every program or every organization maintains a list of ips that are considered malicious, right? So you maintain a list of these ips, right? And you pass them onto a BPF map. All that you have to do is you actually update these maps, you actually update this map to just update the malicious IPS, and your EBPF program actually dynamically keeps on rejecting traffic if it's from those ips. So this is an excellent example of an interrupt traffic use case in which you actually use EBPF to control your resources, your resource allocation, and to stop any malicious actor from obtaining access to your system. So yeah, this is one of those use cases that we are actually trying to explore a lot more in almost all our systems, right? So that brings us to the end of this particular session. I hope it has informative, hope it triggered some sort of thinking as to how EPPF machine learning and SRE can work together. Now I'm open for questions. Thank you once again for listening, and all the best using EPPF. Thank you.
...

Sujith Samuel

Principal Software Engineer @ Ericsson

Sujith Samuel's LinkedIn account Sujith Samuel's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways