Conf42 Chaos Engineering 2021 - Online

Leveraging the crowd power to regain faith in Internet’s zero trust architecture

Video size:

Abstract

“Did you know that, every day across the Internet, each IP address is scanned hundreds of times? Or that more than 2,000 attacks are perpetrated, stealing 1.4 million personal records? That’s right, every single day! Today, there is a way to rebalance the odds and protect our resources through crowdsourced security and reputation.

In 2020, our ways of living and working turned completely upside down in a matter of days. We all brought our companies home and our homes in our companies’ systems. Staying connected to our colleagues, friends and family became a critical necessity, which opened the door for hackers to cause disruption and we saw a huge increase of attacks all around the world.

Even though worldwide spending on cybersecurity is predicted to reach $1 trillion in 2021 according to Forbes, the game will still be asymmetrical and all companies will keep being hacked regardless of their security budgets. Expensive security doesn’t mean better security. A new approach is needed and this is why we created CrowdSec.

Join us for this talk so we can explore why a collaborative approach to security could contribute to solving the problem and how we could make the Internet safer together.”

Summary

  • CrowdSec is an open source security engine that can help detect and respond to attacks in real time. The project then aims at building a global crowdsourced reputation system around it. How you react to a given attack will vary a lot depending on your environment.
  • We're trying to create a CsCTI crowdsourced cyber threat intelligence. Having thousands of real users exposing real services and facing real attacks every days is going to significantly increase the efficiency. And so how do we mix all this information together?
  • CrowdSec never ever sends your log to the platform. Your logs are never exported to us. People that want to be able to access the reputation database without contribute to it. And most of all, and this is the corner case here, it's API access.
  • Thank you very much for your time. Hope you enjoyed the presentation and hope to see you soon. Either hop into the GitHub on its course on GitHub. com. See you, bye.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everybody, glad to be here with you today. I will be speaking about CrowdSec. CrowdSec is an open source security engine that can help detect and respond to attacks in real time. The project then aims at building a global crowdsourced reputation system around it. Why did we created CrowdSec? Because we do believe that a few elements are in the favor of the attacker. The time time is in favor of the attacker. The delay between the publication of a vulnerability and a weaponized exploit is often way shorter than the delay between publication of said vulnerability and the application of patch in all the systems. Then unfiltered access. As we have seen in higher high hackers recently, a lot of compromise are done through access and applications that are not filtered, which makes firewall useless in a lot of situation, or at least a firewall as we know it, then the perimeter with the cause of public cloud and various architectures such as internets, et cetera. It's a lot harder to have one central point of control for your architecture and to filter out the malevolent traffic. And last but not least, money. While hackers are using their own time, stolen resources and free or stolen software. In order to do this, defenders, they need to have teams, licenses and maintain systems. And last but not least, when you are attacking you need to be right only once, while when you are defending you need to be right every time. So we do believe that the castle strategy is like a Walkman needed CTO from the thats every asset on the information system needs to be able to defend itself on its own. And we do take the bet that HTTP is the most common language spoken by both the most vulnerable Unix as well as the latest smartwatch. And this is why we created protect and we aim at creating the ways of firewalls. So we combine local behavior that is created and available in the open source software with a global reputation system that we can redistribute and share with the community by aggregating signals that are sent by all the users all around the world. So it's a software built by secops for DevOps. How do we aim at achieving this? So crowdsec itself, the open source software, can be seen a bit like as a fail to ban. It's something that is going to read logs in order to detect attack patterns and then react to those. When we speak about reading logs here, it can be things as simple as a log file on a web server, but it can be more or less any stream of information that we consider being logs. It can be your AWS, cloud trail or anything then those logs need to be normalized and reached before being matched to scenarios. And this is where the community aspect starts to kick in is that besides being under permissive open source license, CrowdSec aims at building a community altogether. So we have a hub where people can find scenarios and parsers that are fitting their needs either because they need to cover a given technical stack or because they need to address a specific business need. Once those logs have being normalized and a pattern for example thats been detected in a scenario user wants to react to the attacks that it just detected and we do believe thats first of all you don't react at the same place where you detect most of the time. And second of all how you react to a given attack will vary a lot depending on your environment, either technical or business. For example, someone doing ecommerce is not going to react the same way as an attack as someone that is managing mail servers. So with the approach of the bouncers which are software components thats can react to a given attack, the user can choose. Sometimes you want to ban an IP, sometimes you will want to simply present a captcha to a user to ensure that he's not a robot. And in other more corporate environment you might want to reinforce the security of the target rather than trying to can the offending IP that might be part of a botnet. And so your action might be enforce multifactor authentication on the user that is being targeted. And last but not least, and this is the main point of the project, is that you share your own sightings. Don't be afraid, logs are not shared. The only information that is shared is that whenever you are blocking an IP you are going to share with us. I blocked this IP at this time because it triggered this specific scenario and this is the data that is going to be then crunched and redistributed to all the users once it has been curated. So that if you are using for example WordPress specific scenarios, you are going to be fed in real time with the ips that we have seen attacks other WordPress and have been reported attacking other WordPress in a reliable way. How does the software architecture itself work? So as fate to band, we aim at doing something with a very low technical barrier in terms of installation. However every component are all staying together through an API which enables you to have more distributed architectures as we are already seeing users using it. So the crowdsec agent itself here is in charge of passing log and reaching them and matching them against scenario. And the local API here is in charge of taking decision based on alerts it's received and giving back those decisions to the bouncers and staying them with a central API. Bouncers can be at Vius level of your application stack because we aim at being able to speak to a lot of values personas. So for example you can have a Crowdsec stack bouncer when you are going to inject Captcha to the users that has been cooked doing batting. While you might want as well to filter directly at the firewall level if you are protecting larger infrastructures and you are as well pushing the signals. So this metadata I was just staying about to the central APIs that is going in exchange to share back with you the signals and the reputation feeds. The behavior engine itself aims at being able to detect various scenarios. You can detect things as of you such as brute force, et cetera, but the engine is powerful enough to allow you to detect more advanced attacks such as distributed in your service web vulnerability scans, specific targeted exploitation or even more business focused aspects such as credit card or credential stuffing. The software itself is true open source software under MIT license as free as it can be. And we are aiming at building here a true community and not simply pushing open resources software to users. So the technical barrier is as low as possible and we might created contribution around us. We already succeed and manage into having external contribution in things such as scenarios, parser or even bouncers. A short demo often being worth 1000 slides. Let's directly jump into a practical example of using and deploying crowdsec. So here what I'm going to do is that I'm going to deploy CrowdSec on a very typical setup. On the top it's a Linux machine with Nginx, MySQL, SSH and so on. So simply installing crowdsec from the repositories as you are going to see the setup is fairly automated so that the technical barrier for the user is as low as possible. The setup process through the wizard is going to identify the services that are deployed on my machine, Nginx, SSH, MySQL and debian distribution. And for each of those it's going to spot the logs and install what we call collection which is a current and table of configuration to help you CTO protect this attacks. I can immediately out of the box take a look at the logs and we are going to simulate an attack on the web server using a good old Nikto which is a web application scanner which might not be very modern but has a very typical behavior of a web application vulnerability scanner. Here I can see that the tool is being detected for coming with a non bizuzargent, trying to access a lot of files that don't exist, or trying to crawl non static resources, or even access sensitive files. So here through CSLI, which is the main tool for system administrator to Internet through Cross Sec in the command line, we can see in the decision that MyAP should be scanned for a few hours. First of all, because of this bad user agent, we can as well look at the other alerts that were triggered. And here we can see the various alerts and we can even deep dive into a specific alert. For example, let's say I want to see more into detail what happened and why the sensitive file scenario was triggered, which is here alert number five. And here we can see for example the value sensitive files that were tried to be accessed on my web server. However, if I try to access to my web server as an attacker, I still can, right, because CrowdSec is only in charge of the detection. So what I'm going to do is that I'm going to jump into the hub and find a bouncer that is suitable to my technical environment here. As I'm using Nginx, I'm going to use the NgInX bouncer which leveraging UI integration within Nginx to provide the blocking capabilities. I'm simply going to download the provided table and deploy it on my machine. So I can now simply restart my NgInx service which thanks to the bouncer edition is now going to allow NgInX whenever it is an ip that it doesn't know to interrogate, protect local API and ask whether or not it should let this ip go through. So of course now if again I try from my attacks point of view to access to my website, I'm blocked and I get a four or three because my ip is still bad. So what I'm going to do is that I'm going to remove the existing decision on my ip. And here now we can see that I will be able to access again to my website as I remove the decision. One thing is that I configure my iptables firewall to log establish or attempting to be established connection. And this is a great insight from a security point of view. So again we are going to jump into the hub and find collections that will allow us to take advantage of this. So of course there's a collection for iptables, it includes parser for iptable logs and as well a scenario that is going to allow me to detect port can and as you can see here installing a new collection is as simple as using CSLI to install the collection. This aims here at reflecting the fact that your technical stack are changing faster and faster. And so you need to be able CTo easily adapt security software to new changes in your infrastructure. And now that we restarted the service, we can simply launch a port scan with a good old Nmap and see that now CrowdSec is able to detect this kind of behavior. Installing the collection taught crowdsec how to understand these logs and how to detect these kind of patterns. So it's simply an example to show the evolutivity of the software. And now my ip again is can, but this time for a different reason. And again trying to access Nginx is going to stop us. One more thing that I want to showcase is visualization, right? Is thats we know that the dashboard is something that is sometimes or often missing in open source software and it's very important for some users to be able to have a visualization of the data. Here we are using metabase. Metabase is a great open source software, a bit like, let's say key banner or something like this for those of you that are not familiar with it, that allows you to create fancy dashboards. And we are using metabase in combination with Docker to simply being able to deploy out of the box some fancy dashboards for the user to see the activity of timeline and see what is going on. Metabase is now being deployed and we can simply access it through the web interface. Credentials are provided and as you will see, it gives us a very good synthetic view of the activity of the machine. So of course there is not much activity to be seen right now because we simply and just deployed the machine and we can see our attack from IP. And funny enough, another IP attack that's during the demo, we can see the timeline, we can see the kind of attacks, the sources of the attack and so on. One more thing that I want to showcase is that we do know that cause positive are something that is very often very frightening for users. And CrowdSec has the ability to work on call logs as it does on could logs. So it provide us the ability, whenever you are trying out CrowdSec or writing your own scenarios, to place those scenarios on past logs and see whether it would have led to false positive, false negative, or simply have an overview of the activity. So here I'm going to ingest within cross sec my log of 2019 of my web server within the existing instance. And as we see now, the attacks that are being detected are the attacks that were happening in January 2019. And so, of course, if we jump back to our dashboard right now, we are going to see, and we change the period time to be last three years, we are going to see the activity that was ingested, and here we see all my could events being ingested and the timeline being reconstituted, and here we have the visibility, et cetera. So I guess it's a great way for users to familiarize themselves with the software. One last thing I want to show is the metrics here. Okay, here's through the command line. But actually, CrowdSec is instrumented with Prometheus, which is a tool that ops people love. And it gives us some good metrics on what is going on here. For example, we can see the values, resources of logs that we are reading, how many lines are read, how many lines are passed, how many lines are even being connected into budgets, which are existing scenarios, giving you a good idea of whether your configuration is appropriate or not. And same goes for all the components. Now that we saw the open source software part, what are we trying to achieve here? We're trying to create a CsCTI crowdsourced cyber threat intelligence. We do believe that not only running honeybot, but having thousands of real users, exposing real services and facing real attacks every days is going to significantly increase the efficiency and the accuracy of our cyber threat intelligence approach. And this is what we want to do with thousands of users. We are going to create a very accurate CTI mechanism. And so how do we then mix all this information together? Because why do we believe that the crowd is so important? Because context is key, and context can be gained through the crowdsec. An attack is something that is very time dependent. Can IP that is malevolent right now was legitimate a few days, a few hours. A few weeks ago, it was a legitimate asset that most likely got hacked, is now taking part into attacks. And once a legitimate owner will be made aware of the behavior of his asset, he's going to clean up his mess, and the IP is going to become good again. That's why the crowd is so important, in order to evaluate who's good and who's bad at a given time. And currently, we do consider that an IP that we didn't see attacking for 72 hours are going to became good again. Being able to curate all the reports from the user in real time is a real change here. How do we deal with false positive? How do we deal with poisoning? What is happening here is that the users thats are participating in the network are given a trust rank. It's based on things that are hard to fake. It might be things such as persistency. For how long have you been sending information? Or consistency? Do the IP to report are being actively reported by other users? Are they being reported by users with higher trust rank? And so on. So this is a mechanism for us to fight against poisoning. And then as well we have our own omnipotent network. At the beginning it was a way for us to bootstrap the consensus chamber. And now it has all used for two purpose. To being able CTO artificially increase our presence in some part of Internet or some part of the network, or to being able to fill in the technological gaps that we might have if tomorrow there is, for example, a dramatic vulnerability that comes out on Drupal. Our honeypot network will allow us to easily deploy hundreds of vulnerable machines very quickly amongst various public clouds in order to be able to capture emergent signals or their emergence threat. Then we have the canaries. Canaries. It's an analogy with the canaries in the mine and it's something that is part both in circumstances and in the open source software. It's first of all a promise for the user of dealing with false positive easily. You can tell the user out of the box. You are never ever going to ban Google's SEO boat or your PSP because of crotch, because are part of the whitelist. And for us it's a good way to fight against false positive and more specifically to be able to identify easily which scenarios created by the community are subject to false positive. And a good way as well to fight attempts of poisoning. Predictive algorithm is a way for us to address the thats that are under the radar. A lot of groups are using value assets, CTO being able to perform the attack. One ip is going to come here, do the scooting or fingerprinting of your web application. Another one is going to exploit the vulnerability, one shot and a third one is going to be able to access the backdoor. And the predicted algorithm based on the huge amount of data that we're dealing with, aims at being able to identify the low signal and aggregate them together to identify more advanced actors. These are the various mechanisms that allows us to lead to this reputation database and how we can have a consensus curate the data and then redistribute it to the community. In case my accent didn't give me away yet, we are french and so GDPR and data privacy is a big topic here. CrowdSec never ever sends your log to the platform. Your logs are never exported to us. The only information that you give us is the ips that you decide to block the scenario and the timestamp. And this is everything we need to perform this consensus and then redistribute the information. CTo the network. You might legitimately ask what is our monetization and what is our business model? So it will be on one side, fleet features for people managing huge fleets of a CrowdSec installation, as well as things such as self monitoring. And most of all, and this is the corner case here, it's API access. People that want to be able to access the reputation database without contribute to it. Thank you very much for your time. Hope you enjoyed the presentation and hope to see you soon. Either hop into the GitHub on thats course on GitHub.com. Find us. See you,
...

Thibault Koechlin

CTO @ CrowdSec

Thibault Koechlin's LinkedIn account Thibault Koechlin's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways