Conf42 DevSecOps 2023 - Online

Securing the endpoint with open software

Video size:

Abstract

Endpoints are crucial for today’s organizations. Join us to discover how osquery and Fleet revolutionize endpoint data collection and management. Enhance trust, integration, and reliability in your DevSecOps practices!

Summary

  • This talk is a reinterpretation on devsecops, which is commonly focused on bringing security practices to developers. What I'd like to do at this talk is do a little bit of bringing developer practices and DevOps practices CTO security.
  • An endpoint is basically a computing device. Any sort of end user computing device could be considered an endpoint. And the strategies and techniques that we talk about using here can potentially be applied to any of these environments.
  • An osquery enables non developers to access and aggregate data from disparate sources across these systems. It supports macOS, Linux, and Windows, which covers a lot of the computing environments. And we'll talk a bit about how it uses SQL to do that.
  • Fleet is a system that allows us to package, deploy and manage OS query at scale. Fleet can run live queries, detect vulnerable software, detect compliance with organizational policies, and trigger automations. Everything that I'll talk about in this talk is available on the fully open source MIT license portion of fleet.
  • Fleet is typically deployed in AWS via terraform that fleet the organization provides. It can also really be deployed to any suitable infrastructure and suitable means. For OS query, the deployment essentially looks like generating the installation packages via the fleet control command line tool.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome to securing the endpoint with open source software. This talk is going to be a little bit of a reinterpretation on devsecops, which is commonly focused on bringing security practices to developers. What I'd like to do at this talk is do a little bit of bringing developer practices and DevOps practices CTO security. So before we get started with the talk, I'll tell you a little bit about myself. I'm the CTO and co founder at Fleet and I'm a co creator of OS query and on the steering committee for that project. Those are both projects that we'll be talking about more later in this talk. But before we get going on the talk, I'd like to make sure that we're clear on some definitions. In particular, what do we mean by endpoint? There are a lot of definitions that folks use for this. So an endpoint is basically a computing device. It could be one of these macOS laptop, it could be a desktop computer using macOS, Windows, Linux, could even be a Chromebook. Any sort of end user computing device could be considered an endpoint. And for our purposes, we also think of these servers as endpoints as well, and containers. These can also be endpoints. And the strategies and techniques that we talk about using here can potentially be applied to any of these environments. And that also includes environments like this operational technology, the control planes running the robots inside factories, and IoT sorts of devices, Raspberry PI, all of these things this software can potentially run on and manage. So first, I'll talk about Osquery, which is the agent that we use to collect data from all of these different endpoints. So Osquery lets us write queries to collect logs on the state and how the state is changing on endpoints, and also on the events taking place on those endpoints. It supports macOS, Linux, and Windows, which covers a lot of the computing environments that I discussed, that I just mentioned. And we'll talk a little bit more about Chromebooks later as well. An osquery enables non developers to access and aggregate data from disparate sources across these systems. And we'll talk a bit about how it uses SQL to do that. An OS query was designed explicitly with the goal to have performance and reliability, to be able to deploy it across these corporate and production parts of organization's infrastructure. An OS query is fully open source, licensed with an MIT license that allows users to do essentially whatever they'd like with it and the source code. And I mentioned SQL. So here's a very basic example. Of an OS query query. So select star from users will give us, across all three of the different supported systems, the information about the user accounts that exist on those systems. And there are a huge number of data sources that are available in OS query. For example things like the Etsy hosts files or the Cron tab file, the known hosts, these flat files that can be parsed and typically have specific configuration formats. SQLite files which are becoming increasingly more common on systems to store configuration and state for applications. The data available from system APIs, for example the Apple system log, the keychain on macOS, these are common sources. However, there are certainly many other system APIs on Windows and Linux that are useful and are also abstracted into this SQL concept. With OS query application APIs that are exposed such as Docker's API, the carbon Black APIs, many applications are exposing these APIs on local systems. And then we have access to event based APIs that are exposing lots of information such as FS events which can be used for file integrity monitoring purposes, the BPF subsystem on Linux and the older audit subsystem, Windows events and other systems like that on Windows. And then there's a lot of data that we might get just as metadata from the file system, such as information about shared folders, hashes of files, the permissions set on files. These can be all interesting and relevant pieces of information to have from a security perspective. And then there are more specific file formats such as plists on macOS that are sort of a combination of XML and binary. And again, just a nice way to abstract all of those. And these are all abstracted under that same SQL interface. And one of the pieces of value that we get from that is that we can start to combine together the tables that are available from each of these sources. And so for example here we can take a query that joins the processes and process open sockets tables and it does that by looking for processes for the information where the processes share the same PID. And then we can do filtering as well in the SQL query. And in this case, what we're doing is we're looking for SSHD processes that are listing on a port that's not port 22. And essentially we could interpret this as SSHD running on a nonstandard port. Now next I'll talk about Fleet, which is a system that allows us to package, deploy and manage OS query at scale. So remember, OS query is our agent. Fleet is essentially our coordinator for the agent. It helps us manage these agents across thousands, tens and hundreds of thousands of machines, and it helps us drive insights out of the data available with osquery. Fleet can run live queries, detect vulnerable software, detect compliance with organizational policies, and trigger automations. Fleet also allows us to configure scheduled queries via configuration as code so the queries that we were just looking at can be run on intervals and then those logs shipped into our logging pipelines. And this is all also available via API. So I think that this is kind of an important part of bringing the developer concepts into the security realm is this configuration as code and everything available via API. These allow us to build the kind of automations that are richer, more robust and more future looking. And I mentioned that we can get logs to our logging destinations. Commonly these kind of things are s three elastic, splunk, snowflake, and potentially any logging destination is viable as long as there's some way to get text. And in fact these are JSON based logs into the system. And as a bonus, fleet also includes, as I mentioned, support for chromebooks. So Fleet has an open source Chrome extension that essentially mimics OS queries functionality and provides that same SQL interface on the information provided by the Chrome OS APIs. And fleet is open core, so part of it is licensed with an MIT license and then part is available only on an enterprise license. Everything that I'll talk about in this talk today is available on the fully open source MIT license portion of fleet. So this can all be taken and applied immediately. And just for an example, here is some fleet user interface in which we can take an OS query query and we can save that query, check compatibility and generally get some friendlier UI on top of what we're learning from OS query. And this is more of the fleet user interface. This is what you see when you get a host enrolled into fleet. There's a whole bunch of information that's collected by default, and this can become a great baseline for understanding the data that's available from OS query and CTO. Start to understand some of the concepts that are exposed by fleet. So for example, we get the software inventory collected from the host, we get the policy compliance, and here in this example, this host is failing two policies. We also get inventory of software across the entire organization or all of the hosts that are enrolled. We can filter that software across multiple axes, but right now in this case, we've got it filtered by software that's vulnerable. And so we can see that we have some Google Chrome instances that probably need updates here because they've got some cves associated with them. And we talked about policies a little bit, but again, fleet provides a way to define organizational policies that we have and allow us to keep track of the compliance across our hosts. This is also a good example of where automations can be enabled so that we can start triggering into other systems to do response to policies that have failed. And what I'd like to do now is show a bit of a demo of what a sort of modern configuration as code practice could look like with fleet. So in this case we've put up a pull request that adds a detection using the osquery that we talked about looking for unusual SSH processes other than the standard port 22. And when we come over and look at this, we can see that there's a YAML file that defines the query and with the name and description and the query SQL that we looked at just a few slides ago, and we have configured this to run on a ten second interval and turn automations on so that we can get those logs into our logging pipeline. Now in this example, I've also used another tool to build a detection on top of that. That tool is Matano and that allows us to trigger alerts anytime that logs are generated from this query. So in this case we've also configured the further details about the alert that we want CTO fire off here. And essentially we want CTO investigate whether this nonstandard SSH is an intended practice or possibly some malicious activity that could be happening on the system. And because of all this CLI and API support built into fleet, we're able to configure all of these things through a Gitops workflow. So in this case, I've requested review now from someone else on the team, and this is going to generate an audit trail that allows us to keep track of why changes were made, who made them and who approved them. So I've switched over to a different browser where I'm logged in as the reviewer. And in this example I can now take a look and provide my review. I'd probably be looking in this case to ensure that this is going to be generating what we think will be a low number of false positives and a low number of false negatives so that we're getting a very high fidelity detection in place here. And when I submit this review that will allow the pull request to be merged because we've configured our repository to only allow approved pull requests to be merged. And what I want to really highlight here is now that we've got this pull request approved and we're able to merge it. It's the CI actions that enable this Gitops workflow that I think are the really powerful thing here. So we'll come down here and we'll merge our pull request and that's going to kick off the CI. We click through GitHub's interface to pull up the actions that are now running. And we've configured our repository to apply these new configurations as soon as they emerge to our main branch. So when we click into this we'll be able to see the status of the job. And essentially our CI action just installs the tools and then it applies the configurations into both Fleet and Matano. So effectively what we've done is we've used configuration as code practices to build our security detections. And this enables all of the advantages that hopefully many of you are sold on already from your familiarity with devsecops practices. And so now we'll talk a bit about what deployment of these tools looks like. So fleet is typically deployed in AWS via terraform that fleet the organization provides, but it can also really be deployed to any suitable infrastructure and suitable means. A place where we can run MySQL and redis and where we can run a Linux server binary. So it's a pretty minimal set of requirements. Fleet also does provide a SaaS offering of this, but mostly we're focused on the open source version in this talk, and then you can expose it to the public Internet or not. And the considerations around this are essentially whether you have workstations that will be off of a VPN that you might want to be able to access the interface so that they can run queries and send logs up. So depending on the kinds of devices that you want to enroll, you may or may not decide to expose it to the public Internet. Then you'd want to install the fleet control command line tool which is used for managing the server and packaging up installation binaries. There's more about this on the docs@fleetdm.com so feel free to check that out. And this is an architecture diagram of what that deployment looks like. On the top left we can see the OS query agent which checks in via HTTPs with the fleet server to find out if there's any work to be done essentially, and to send any logs that it's generated. On the bottom left we see the API clients, which could be the user interface that I showed earlier, which is a web browser user interface that uses also the same APIs that the fleet control command line tool uses. And those are the same APIs that are accessible to any user of fleet who wants to write code or integrations there. And the fleet server has its MySQL and redis dependencies and then optionally is able to send logs out to any of those logging destinations that we discussed. Those and more are available. And for OS query, the deployment essentially looks like generating the installation packages via the fleet control command line tool. That would be MSI on Windows, PKG on macOS, Deb for our debian flavored linuxes and RPM for our red hat flavored linuxes. And then typically you'd install those packages via the standard management workflows and that often looks like chef for servers, it often looks like MDM for workstations. Doesn't really matter how we do this as long as we get those packages out there, but it could also be instead baked into the master virtual machine or container images, so that whenever those vms or containers do start up, they are automatically connecting up to the fleet server and securing their data as well. And there's more about this enrollment process and deployment of OS query again on the fleet docs, so feel free to check that out there. Hopefully you found all of this a useful introduction. Cto the possibilities of using fleet and osquery, these open source tools for building a more devsecops oriented security program and bringing some of these interesting DevOps practices to securing endpoints. Feel free to reach out to me on any of these platforms and thank you very much for attending this talk.
...

Zach Wasserman

CTO @ Fleet Device Management

Zach Wasserman's LinkedIn account Zach Wasserman's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways