This talk is accessible to all levels, and will include specific examples from Fleet’s codebase that will engage even advanced practitioners. We will also touch on the capabilities in osquery that help to improve reliability.
Questions posed and answered from Fleet’s experience include:
- Where to focus efforts when considering performance and chaos?
- How does Fleet engineer for resiliency when our users self-host the software and have hundreds of thousands of (potentially misbehaving) agents checking in for * coordination?
- What Go tooling can we use to improve monitoring of the Fleet server both during internal performance testing and while running in production?
- How do we collect information about the infrastructure dependencies (MySQL & Redis) to understand the entire system beyond the code we control?
- How can we ensure osquery agents don’t bring down the endpoints they are deployed to protect?