Conf42: Site Reliability Engineering 2021

...

Sustainable Incident Management for happy SRE teams

Ajuna Kyaruzi
Developer Relations @ Datadog

Ajuna Kyaruzi's LinkedIn account Ajuna Kyaruzi's twitter account


How you respond to production outages can affect both team morale and development velocity. With the proper Incident Response processes in place, it can reduce this stress, and make it easier to ramp up new teammates, and the focus on reducing TOIL. This talk will look at Incident Management at its core, covering Incident Command and how to scale it with a growing organization sustainably. We’ll go over common areas of pain for Incident Responders and how to ease them to reduce friction between Product and SRE teams such as best practices for playbooks, on-call rotations, error budgets, postmortems and incident communication to streamline incident resolution.

Awesome conferences for

Priority access to all content

Community Discord

Exclusive promotions and giveaways