Conf42 Observability 2023 - Online

Use Falco and eBPF to protect your applications

Video size:

Abstract

You scan your images? Good. You have set policies to block bad practices? Good too. But are you sure your applications are really secured and your containers do only what they are supposed to? No? There’s a solution: Falco and eBPF. Let’s see how it works!

Summary

  • Use Falco and EBPF to protect your applications. Tomana Bowsias is currently OSS and ecosystem advocate at Sysdig, the original creator of Falco. You can reach him on these social networks if you want.
  • Falco is a CNCF incubation level project for securing running applications. It's the most advanced threat detection engine you can run inside Kubernetes. All the code you write for your EBPF probe will be verified by the Linux kernel. It enforces stability and the security.
  • Falco psychic basically forwards the alerts from your Falco instances to your ecosystem. You can also trigger your own call system with Falco. Falco is also able to collect any kind of events you may have. With the plugins you can for example protect your kubernetes clusters.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to my talk entitled Use Falco and EBPF to protect your applications. First, who am I? I'm Tomana Bowsias. I'm currently OSS and ecosystem advocate at Sysdig, the original creator of Falco. I was SrE for over eight years, so I know what it is to run stuff in production. I'm also contributor to Falco and the creator of Falcopsychic and Falco psychic ride to measure components of the Falco ecosystem and you can reach me on these social networks if you want. First we need to define what is runtime security. What it means runtime security are all the tools and procedures you can put in place to secured an application in a corner or not during its lifetime in production. It's different of what we currently do in our CI pipelines with an image scanning. It's also different from what we can do with Kubernetes or gatekeepers to create policies to enforce good practices in our clusters. It's totally focused on what happens when your application is disturbing real customers, is using real traffic. For that of alcove we rely on Ciscos. Cisco's or system calls are basically the way you program have to ask the kernel some accesses to those resources. For example, if your application needs to create a process access to the network, read or write into a file, your application needs to ask the kernel the access and to ask these accesses you use system calls. Basically you can see the system calls as the kernel API. If you are familiar enough with the Linux ecosystem, you already know about Glipsy or muscle for IPM. Basically Glipsy is the library used by your applications to call the system calls. You can see the step calls as an API and glipsy as an SDK. So for Falco, Falco is a CNCF incubation level project. It's a cloud native project in the CNCF landscape for securing running applications. Right now it's the most advanced threat detection engine you can run inside Kubernetes EBPF EBPF for extended packet filter. It's the Linux kernel feature which allows you to run a program in the kernel without any change of the code or without kernel or any load of a module like we did before. It enforces the stability and the security. It's really useful for security, for monitoring, troubleshooting. You also have to know right now the core maintenance of Falco are developing a new Falco EBPF probe. Basically the features will be exactly the same as the current ones, but it will also use the core reparation compile ones run everywhere. Right now you would need to build the EBPF rod for the exact version of your kernel. In the future, since the version five eight, you will use the same role for any kernel. You just have to download it or build it only once and it will run everywhere. For the EBPF OE does the collections of events. Basically in EBPF world you have hooks. Hooks are endpoints. For example, you can hook basically your probe and collect events. These events can be syscolls. They can be related to file system, they can be related to network, almost anything. If a hook is not already there by default, you can create your own. It's really convenient and really to ensure the stability and the security. All the code you write for your EBPF probe will be verified by the Linux kernel. So you code your probe with everything to the hook you want to use, the data enrichment, everything. It will be checked by the kernel. If the code is approved, it will be compiled into bytecode and injected to the kernel and it will be run inside the sandbox. The verification is there to ensure you don't have any security flows. You don't create infinite loops, you don't create overhead and bad performances in your system. Everything is there by default by design to ensure stability and security. For Falco itself, the architecture is there. You have the kernel and the EBPF probes is to collect the SyS course from the kernel. And then Falco, thanks to rule set, will trigger lets. If one event from the kernel from the SYs course matches with a rule, Falco will output an alert. This alert can be in standard, but a file program syslog HTTP sent to an HTTP endpoint or GrPC. If we take a deeper look at the FICO architecture. FICO is composed of three key elements. Lipscap two libraries Lipscap Elysiums hello regime so regime is basic and Lipscap is in charge to the inverter collections, elysium to the data enrichment and the extractions of field. You can see we have the ab prep code in the kernel space and Falco itself in user space. It's really important for us to as FICO is a security component to be as secure as possible. This is why FICO itself is running at the user space, so with less privileges. But the EBPF probe is running in candle space, but thanks to EBPF is secured and stable by default. So we have the first library, Lipscap, aka library for system captures. Lipscap is in a user space library. It communicates with the drivers. Basically it reads the syscol's events from a rig buffer if exposed by the driver and then these events are forwarded to listimp. Listinf aka library for system inspections is in charge to receive the events from cap and to enrich these events with machine state. Basically, if your application is running inside a containers, this containers is part of a pod. In the Kubernetes cluster you will have for your rules and for the lets, the containers id containers name, the pod name, the pod namespace, the pod levels. All these elements will be there to create nice rules and to be able to know what is the context of the audit. It will also perform some event filtering and extract fields from busy events. These fields are then used by the rule. So if we take a look at our first rule, for example, this one terminal shell in a constant, we have the name of the rule, the description for us human beings. It will not be used by any system and it will not be the final output we have, the condition we'll see later, and an output. The output is the exact message we will get. At the end you can see some fields starting by percent person. This field will be automatically replaced by Falco in the output. It means at the end, in the alert, you will get a real username and not this token. Each rule comes with a priority. In this case, running these priorities are useful for you to filter which rule you want to receive. And we also have tags. The tags are useful to understand the context of the rule, what is supposed to detect, and you can also set Falco to just enable a subset of rules. For example, you can enable only the rules which concern the contract or network or else. So for the rules you can use lets and macros. Lets are pretty obvious. It's just an array of AI items. In that situation is a list of possible files you can use in your system. Remember, Falco rules are yaml files, basically so you can override anything. And you can also append items or append rules or macros. It's really convenient and it will allow you to reuse macros over your rules and not copy past or duplicate codes. We also have this macro, shellproc, and you see macro name. Macro name is a built in field from Falco you can use in your roles. Even if you are not really familiar with Falco, if you're not familiar with Linux, Cisco sort of stuff, it's quite easy to understand that plug name means the name of the process. You also have proc id for the id of the process. Or plug pid for the id of the parent of the process. It's really convenient and easy to read even if you are not a specialist. We also have this macro containers if containers id already a built in field is different from host, just means if we have something different from a hash, it means the applications or the events happened inside the governor pitch abuse and we have spawn process with a tip typo and we also have event type of use and easy v art are real system calls. You can see these exact names inside the kernel code base if you want and we have event deer. It's just to specify if we want a question to the kernel or response from the kernel. Even if the rules are convenient and easy to read, we know it would be complicated to create new rules. This is why Falco comes with default rule set. Right now he has almost 70 step rules and they cover most of the techniques practices used by the attackers to do privilege escalation, to read or write sensitive files or directory to spawn a shell, exfiltrate data, start ransomware, that kind of patterns. For example, right now we have all these rules 79 so we can see some of them are disabled by default. It's just because they can be noisy if you don't happen the exception list with your own context. So we prefer to disable them, but they are there and you can use them. We also have tags, so if we take a look at the full switches, the condition is a little bit different because my slide is quite old now, but basically the idea is the same. We have macros, spawn process macros is there governor, governor, shellprocess, et cetera, et cetera. And the output with the token to replace everything is there. You Falco have tags and if you are familiar with the meter framework we are trying to cover as much techniques as possible and you can find which rules is related to which technique with the tags, meter, underscore and t number after having lets is nice, but we need to use them, we need to exploit these alerts. Here comes Falco psychic basically forwards the alerts from your Falco instances to your ecosystem so you can forward the lets to a chat system, logs system like elasticsearch loki or a queue system or streaming like kafka nats pub sub. You can also forward lets to a function as a service serverless pycopsychic also exposes Prometheus endpoint. It's useful if you want to create and do some statistics about the number of alias and so and for the SRE or devsecops or health of setups. You can also trigger your own call system with Falco right now with Falco psychic right now we have pager duty, opsigenny and Grafana on call and you can also do call storage in s three or s. Basically we have one Falco instance per node because it relies on the kernel and the kernels are not distributed. So we have one Falco instance per node. They can forward all their events to single deployment of Falco. You can pull Falco to get metrics and you can send all the events to elasticsearch for data analysis for long term storage, but only alerts with priority above critical to your on call system. You can also add static speeds or else really convenient. So with Falco we have the detection. With Falco Psychic we have the notification. If you forward this event to serverless or to a function as a service system, you can react as long as you are able to write your own reaction. You can do whatever you need with lambda, openfast, knative, argo, workflow, Google function, everything. For example, you can terminate a port. You can create a network policy to isolate a port. You can also scale in or scale out an autoscaling group, whatever you need, as long you are able to write your own function. Falco psychic comes with a specific output called Falco psychic Ui. And basically it's a basic interface with statistics, with pie charts. And so to have in few minutes an overview of what has been detected by Farco in your environment, it's pretty convenient. It's not used for long term storage or else, but at least you have a quick overview. It's pretty convenient to use. At the beginning Falco was only for system calls. Then we introduced a web server to collect the Kubernetes audit codes, but it came with a lot of drawbacks. So in the last year we also introduced a plugin framework. Right now we are able to collect cisco thanks to EBPF. But Falco is also able to collect any kind of events you may have. So by events we often think about logs for example. So plugins are shared libraries used by Falco to collect insight from three more events. Right now we have plugins to collect Amazon, EKS, ODi cloud, to collect GitHub, webhooks, docker events, and even nomad events. We developed these plugins with Ashico. So with EBPF you collect the Cisco. So with EPPF and Falco you protect your applications. With the plugins you can for example the Kubernetes or deploy plugins. You are able Falco to protect your kubernetes clusters. With the Amazon cloud trial you are able to protect and detect suspicious behaviors at your account level. And with the GitHub plugin you are able to detect strange situations in your CI or in your pipelines or in your repositories. It means right now with Falco you can protect all stages from the development to the production. So the situation now with Falco is we have the EBPF probes for these discord collections, we have the plugins for the events collections, Falco and its rule engine, and to manage the plugins and the lifecycles of the plugins and of the rules we introduced a few months ago, a tool called Falco CTl, Falco Kotle. Basically it will install plugins and rules and it will also track new versions of the rules to automatically download them and reload Falco. So your cluster, your Falco fleet will always be up to date. So another few of the architectures basically same idea, that behind it. And once again the plugins are running in user space, so without any big privileges, once again for security purpose, time for a demo. So in this demo cluster I have two nodes and like I said, falco relies on kernel. So two nodes means two Falco pods. Basically they are deployed as a demon set to have one Falco planet. It's quite obvious. I also install Falco psychic, Falco psychic drive the front end and Falco psychic drive the storage backend is a radius and another deployment of Falco with the Eks plugin. So imagine you have this pod is your critical application. It can be WordPress, Drupal, anything you can run and exposed to Internet. So an attacker gain access to this docker, to this customer. As you can see, when I created my shell, it has been detected immediately. So we have the priority, we have the exact output message with the user root, the namespace default, the pod name, even the containers id and what shell has been used and what command line has been used to start the shell. All these elements are there also as output fields. They are used by Falco, Falco psychic for routine. So now I will add curl can see it's automatically detected in real time once again thanks to EBPF. So right now it's an error with packet management process launched the containers and once again the user the exact command that has been run and the containers name is there, the images, everything. So we'll try to reach the Kubernetes API. Now thankfully in that situation the API is protected, but at least we have detected it. Unexpected connection to Kubernetes API server from a containers we have the exact command once again the namespace as a pond name. Imagine overriding a critical file. Five below OTC has been opened for writing and we have once again all elements got the name, image, the pod, et cetera. And if we take a look, Sci-Fi cosychic. So we have all things that happened in the last five minutes, 15 minutes. We have the pie charts, statistics, but the policies, the tags, the source. We can filter on the source, we see what I did and if we want more details there. Right now we also have terminal shell in a containers. It's exactly what we saw in the logs, but in a more formatted and nicer way. With the tags we can filter for example other namespace. Then we have the installation of girl, the attempt to reach Kubernetes API and the override of the file. Everything is there. And we also have text to the communities audit log. We have the details about someone attached to attack or executing something into a pod. We have all these details. In a real world it will be a web shell or else. But in my example, like I did an exec, we can detect it. And once again we have the target name, the pod name, the data, the namespace. And so if you want to start with Falco, the easiest way to install and start with Falco is to use the official m chart. By setting these values you will install Falco Falco psychic Falco psychic UI and use EBPF prop in namespace called Falco. In less than two minutes everything will be set on and running and you will be able to access to the web UI with a powerwall. If you want to contribute or know more about Falco, you can join us in our Falco Slack channel. You can take a look at our new website. A total revamp has been made in the last month, so we hope it's better for everybody and we also on GitHub. Thank you and have a good day.
...

Thomas Labarussias

Ecosystem Advocate @ Sysdig

Thomas Labarussias's LinkedIn account Thomas Labarussias's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways