Conf42: Chaos Engineering 2022

...

Maintaining Reliable systems: How to minimize Incident's impact?

Ayelet Sachto
Strategic Cloud Engineer @ Google

Ayelet Sachto's LinkedIn account Ayelet Sachto's twitter account


Incidents are expensive to the business, especially if customers leave us if we are perceived as unreliable. But failures will happen, it’s not an issue of IF, but a question of when. So how can we reduce the impact on our users? In this talk, I will review the production incident cycle, the time that we are not reliable and our users are not happy which includes the time to detect, time to repair and time between failures. I’ll share a few methods to tackle each one of those parts in order to minimize incident impact both from technical and people aspects, expending on incident response and postmortems to know what is the most important thing for us, and we want to be data driven in those decisions.

Awesome conferences for

Priority access to all content

Community Discord

Exclusive promotions and giveaways