Conf42 Python 2024 - Online

How to prove the safety of your software

Video size:

Abstract

During shopping for food, consumers have the ability to learn what the contents of a package are before buying it by looking at the food label. Why not do the same and examine the ingredients of software before deploying or purchasing it? This allows you to assess the risks.

Summary

  • Marco Valon wants to discuss the ability to reveal the safety of software without revealing application logic. An industry that is facing similar challenges is the food industry. Valon hopes to create a bit of awareness about the tools available.
  • We use S bombs to be in control of our software or to convince others that software is safe to use. A bill of material tells you what is inside, just like a food label does. Once you know which packages you've got, you can easily match it against the database of known vulnerabilities.
  • There are many tools listed here to generate s bombs. It's pretty easy to do, it's easy to integrate in pipelines. But keep in mind that there are many scanners, but there are not all good.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Welcome back to my talk about software security. You my name is Marco Valon and I'm going to talk to you from the perspective of a Gitops DevOps cloud container engineer. Give it a name. But just for the record, I'm not real. I am very passionate about software dough and hence that's why I'd like to talk about it. What want to do today is to discuss with you the ability to reveal the safety of software without revealing application logic. Even if you're developing open source software, for many people, analyzing the code and judging it is way too complex. They need other means to identify whether or not software is safe. And in this respect, safe does not mean that it's performing well. The quality of your product is something totally different. I hope to create a bit of awareness about the tools available, the techniques available, and I'd like to encourage you to adopt this in your workflow as well. And in order to do so, I want to make a sidestep to another industry which has similar challenges. But before we do so, I'm going to present you a QR code which you can scan and you can use it to have a look at the presentation. Later on. It will be online behind the link that is shown in the QR code. In the context of a talk like this for Python developers, we have to start, of course, with the code. It all starts with the code, and eventually the code ends up in production somewhere, whether it's in an appliance, or on a host, or inside a container. Could be a mobile phone, it doesn't really matter. The code is where it all starts. And if the code is unsafe, this will propagate all the way down the line. In order to demonstrate what is happening down the line, I'm going to use an application which is written by Jerome Petazo. It's an application called Worker Py, and it's used to demonstrate some techniques inside Kubernetes clusters. Its function is not relevant for today, but the nice thing is, it has a flaw. Only one, but it's just enough for demonstration purposes. We do not care about the application logic. We only want to make sure that the application does not contain too many cfes. And the big question is, how can we do this? We want to make sure to our colleagues that they can safely use this application. An industry that is facing similar challenges is the food industry. The food industry has to prove to you that food is safe to consume. Then you could ask yourself the question, would you consume this? It's an unlabeled empty jar. It could be sweet, it could be sour on the inside, it could be dog food, it could be delicious. Nobody knows up front it. Or would you consume this? Maybe you would, maybe you wouldn't. It depends on allergies. Or maybe you're a vegetarian and you don't want to consume anything that contains fish, for instance, or milk. The food industry has to inform you upfront about nutritious facts, or they have to inform you about potential risks if you have an allergy. And even though you might be very adventurous, some people would probably open up the unlabeled container and consume it. There is a risk involved and I'd like to eliminate the risk as much as possible if it comes down to software. So with food, it is nice to know what is inside. And even if you know what is inside, it doesn't mean you have to consume it. But in the food industry, they use food labels to tell you about the contents and it could look like this. We'll see this picture a bit later. Again, it tells you all that's inside, but it doesn't tell you how it tastes. It doesn't tell you what the recipe is, how it was produced. It just tells you where it's from. It's from France and it was sold in Singapore. Well, we could ask ourselves the question, why do we not do something similar with whatever we've got? Hardware, software, SaaS solutions, you name it. And often it is already done. Many companies have a CMDP where they try to manage their assets. So from a hardware perspective, they often know what they've got, what the serial numbers are, what the components are inside, et cetera, et cetera. If we want to do it for other things, we might look at other bombs. Bombs are basically bills of material and they describe what is inside. If you want to build a device, you get a bomb, which is a shopping list, and once you've got all the components, you can start to assemble it. Well, if you would follow the URL to GitHub.com, you'll find some examples from organization called Cyclondx. They have different examples on file formats on how you could exchange information about the bill of materials for hardware, software, sound solutions, you name it. And during this talk we'll be focusing on the software bill of materials, hence the S bomb. Now why would we use an S bomb? Basically, we use S bombs to be in control of our software or to convince others that software is safe to use. Well, let's look at a few examples and let's see if we can find other purposes for sbombs, you might have seen this announcement where people tell you that there is a flaw in a curl library. If your application is using the curl library, you probably want to be able to easily identify if you have to fix this or not. Did you identify whether or not your app was affected and if so, how did you figure it out? How long did it take you to figure it out? Was it easy to figure out? Or were you lucky enough? And could you simply enter the CVE number into a database with all bill of materials of the software? You have to reveal that in our database there is only one application listed as vulnerable for this particular CFE. So once again, here it comes again. A bill of material tells you what is inside, just like a food label does. It only looks different. The most common formats are JSON and XML, and if you look at the Sbom snippet in the presentation, then you see it has a lot of identifiers. It tells you a bit about the package name, the package type, the location where it was found. There could be a checksum involved, et cetera, et cetera. But just like the food label, it tells you what is inside and not whether it's harmful or not. A food label doesn't know about allergies you might have an Aspom doesn't know much about cfes, but the trick is to know what you've got and to compare it to a list of known vulnerabilities that are presented by different organizations around the world. And once you know which packages you've got, you can easily match it against the database of known vulnerabilities. See if there is a match, and if so, you have to take appropriate action. That is basically how every security scanner works internally. They generate asbomb files and match them against the database. But there are some advantages to keeping the asphalt information stored. One of the reasons why you want to store it separately probably is that more and more often people present this information on GitHub or it's a requirement for purchasing process. For instance, the US government requires s bomb files prior to purchasing software nowadays. That allows them to evaluate the quality of the software without knowing the application logic. And it will tell them which risks are involved in installing the software and it helps them to make a decision to purchase it or not. In the example on the screen, you can see a GitHub repo where somebody is distributing an s bomb file. If you want to know the risks involved in installing this software, you can first download it and then analyze it instead of the other way around. Downloading it, installing it and then scanning it is probably not the best way of doing things. Once you've got it downloaded, you could upload it in a tool like this where you've got a GUI that does a bit of analysis for you. You could also use command line tools. There are plenty of choices, and I'm not here to endorse one or the other, so feel free to do whatever you think that is best to make this world a bit safer. But as you can see that in the example we've got a container images with 1200 contents, and there are about 100 vulnerabilities. Many of them could be fixed. As you can see by the yellow triangle, the fix is already present. So basically these components are outdated. But let's go back to the app I mentioned earlier, and let's see what the app itself is doing in regards to security. And then let's see what happens when we start to containerize it. Let's bundle it with an image and we might see some shocking results. The app by itself is not very sophisticated. It imports a couple of modules, it has a couple of loops. No real fancy, complicated application. We've decided to distribute this app in a container because that's what our customer wants, and we have to pick an image to use as a base. And the Python latest image is quite popular in this respect. Almost everything works inside that particular image. But in order to be on the safe side, we've also tested two other images. One is the Python Alpine image based on Alpine, and the 3918 slim images based on Ubuntu. Let's see what the differences are. The build process is always the same. We take a docker file and the only thing that changes is the front line. Everything else is similar. And at the end of the build process we validated that in all three images, the application is running fine. So from a consumer perspective or a user perspective, there is no difference between one or the other. The first thing that's interesting to note, without even going into the cves, is the difference in size. The result of a build with the Alpine based image results in a container image roughly 110 megabytes in size, which is quite nice. If you look at the image which is built on the Python latest image, you'll see that it's more than ten times as big. It's 1.5 gigs almost. Well, if you take the difference between the two, then you end up with more than 1.3 gigs of stuff that apparently are present in the image, but not required to run the application, but it might cause all kinds of hassle, as we'll see later on. In order to do a bit of an analysis, we've taken a tool called sift. Sift is used to create the S bomb file with all the information about the packages present and required, and we analyze them with grip. We don't do this because it's the best tool, but it has the nice output for this presentation. All the numbers shown are valid at the time of writing of this presentation, quite likely. Since then, new cfes have been discovered and the results probably will be worse over time. And that's one of the reasons why you might want to keep an SBOM file at hand, so you can reevaluate it over time and see if you need to fix your code, starting with the Python application itself. Well, we're in pretty good shape. There is one medium CVE found nothing to worry about. As an Ops engineer, I'd be more than happy to deploy this that does not apply to the image which is built using the Python latest image. As you can see, we get a bonus of 1699 vulnerabilities simply by storing it in this particular image. Even worse is that a lot of them are criticals and high cves, which could have a big impact. Yes, I do realize that it's a containerized application. It's running isolated, but if you're able to compromise the application inside the container, you do have a lot of tools at your disposal, and the likelihood of getting into the container is quite big as well, because there are enough vulnerabilities to abuse. We could make this a bit safer application, but by simply changing the base image, we don't have to rewrite any code or whatever. But by buying the Python 3918 slim image, suddenly we only have 101 vulnerable matches, which is compared to six to 1700, quite a reduction. The difference is huge, but we can do better. If you look at the alpine based image, then you see that we have only one high 18 medium, no criticals or whatsoever, which means that we are in a pretty good shape. But we could be even be in a better ishare because the image itself already is a bit outdated as well. And inside the image there is already a fix for nine known vulnerabilities. So if we put this in a table, well, I think it speaks for itself. You probably can guess which image I prefer to deploy as an engineer. Well, like I said, I always like to keep the SBOM files at hand somewhere. It allows me to do evaluation over time, but being a bit lazy, I leave it up to tools to do it for me. And one of the tools that's very neat is dependency track. Basically it is a database in which you can upload the SBOM files, and dependency track will do periodic analysis every 24 hours or so. It will download the newest CVE databases from different sources and match it against the SBOM files that are stored inside the database to see if there are any matches. And if new cves are discovered, it will create tickets for you or send you notifications or whatever, basically helping you to be on top of the quality of your software with regards to safety. You could even consider buying tools like renovate to automatically fix these vulnerabilities to make sure you're always in a good ishare. But as you can imagine that over time you might want to upgrade to different base images. Alpine will have a successor every now and then, and then only by keeping your images up to date, you keep the application itself safe as well. Another advantage of having the S bomb files at hand is that you can convince customers that your software is safe from a CFE perspective. A CFE does not prove that you don't have malicious code somewhere. It only tells you that the malicious code is executed in a safe manner. More and more you'll see that tools like Docker build and others also include the ability to create these Sbombs and store them in container registries, which allows users to pull the S bomb prior to pulling the container image again in an attempt to be preventive in the sense that you want to analyze the S bomb first before you allow your system to deploy it. And this is quite common and nowadays in Kubernetes clusters that you analyze the container prior to deploying it, because once it's deploying you're often too late. I hope that I've given you some good reasons to start working with SBOM files. It's pretty easy to do, it's easy to integrate in pipelines. It is a great way to show to your customers that you're on top of things, that you're updating your software on a regular basis. But if you enter this world, keep in mind that there are many scanners, but there are not all good. Some are great at only scanning Java packages. Other really focus on analyzing Python applications. They'll evaluate the requirements of the. TxT files as well and take it into consideration. Other tools are great at doing analysis of the base OS packages, for instance like the Debian or the RPM packages, and some try to do it all. So a bit of benchmarking here helps to determine what is good for your environment. In order to make it easy to start, I've got a couple of risks that I find interesting that I'd like to share with you. If you visit this application online, this presentation, then you have the ability to click on them as well. If you're only looking at this one, I'm sorry, then you have to type them. But as you can see, there are many tools listed here to generate s bombs. Microsoft has some tools. There are some tools that run inside kubernetes. There are some tools that you can use from the command line. There are tools that you can use to store sbombs. Sbomb OCi means storing an SBOM file in an OCI container registry which allows you to evaluate images before you download them. Dependency track is a nice tool that fits in the middle of the supply chain that also analyzes them on a regular basis. Trify is well known nowadays. Trivia is often used in a pipeline because it can do all in one. It can generate the SBom file for you and do an analysis of vulnerabilities which allows you to break a pipeline. If a critical CVE is discovered, it prevents you from releasing bad software. Cube clarity is a nice illustration of software that you can run inside the Kubernetes cluster. It does basically the same and it will tell you which container images have to be replaced due to CVE issues and that's about it. I hope this is enough for you to get started. Like I said earlier, it's not complex at all, but it's adding great value to all of us in the industry. So thanks for your attention and good luck.
...

Marco Verleun

Linux / DevOps / Kubernetes Engineer @ i-share

Marco Verleun's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways