Conf42: Site Reliability Engineering 2020

- premiere 5PM GMT

...

Driving Service Ownership with Distributed Tracing

Daniel "Spoons" Spoonhower
CTO & Co-Founder @ LightStep

Daniel Daniel


While many organizations are rolling out Kubernetes, breaking up their monoliths, and adopting DevOps practices with the hope of increasing developer velocity and improving reliability, it’s not enough just to put these tools in the hands of developers: you’ve got to incentivize developers to use them. Service ownership provides these incentives, by holding teams accountable for metrics like the performance and reliability of their services as well as by giving them the agency to improve those metrics.

In this talk, I’ll cover how distributed tracing can serve as the backbone of service ownership. For SRE teams that are setting standards for their organizations, it can help drive things like documentation, communication, on-call processes, and SLOs by providing a single source of truth for what’s happening across the entire application. For embedded SRE teams, it can also accelerate root cause analysis and make alerts more actionable by showing developers what’s changed – even if that change was a dozen services away. Throughout the talk, I’ll use examples drawn from more than a decade of experience with SRE teams in organizations big and small.

Awesome conferences for

Priority access to all content

Community Discord

Exclusive promotions and giveaways