What if instead of designing cloud architectures where failover is an exceptional case, we embraced failover as a normal part of running and system and failed over all the time? Let’s deep dive into an architecture currently in production doing just that and share lessons learned along the away. This talk will use production examples and real-world experiences to showcase an architecture where failover is now the norm instead of something that happens in exceptional situations.
Early in my career, I envied those who answered calls at 2am to jump in and heroically save mission-critical systems. I saw the late nights as a badge of honor. After participating in my fair share of on-call events, I started to think about if we could optimize for events happening at 2pm instead of 2am. This evolved into thinking what if failover handled as part of the normal running of the system and not only in exceptional situations.
I’ll dig into an architecture where we are able to artificially inject chaos as part of the normal running of the system and discuss tradeoffs of where an approach like this makes sense.
Priority access to all content
Exclusive promotions and giveaways