We can improve the reliability of services by decoupling dependencies, using health checks, and understanding when to use fail-open and fail-closed behaviours. In this session we’ll talk about and demonstrate how to implement graceful degradation, monitor all the layers of your workload to help detect failures, route traffic only to healthy nodes, use fail-open and fail-closed as appropriate in response to faults, and reduce mean time to recovery.
We’ll take some lessons learnt from the AWS Well-Architected framework and from the Amazon Builder’s Library, showing some of how Amazon builds and operates it’s software.
Priority access to all content
Community Discord
Exclusive promotions and giveaways