Some of the perturbation models for chaos engineering are based on a random strategy such as ChaosMonkey. However, realistic perturbations could also come from errors that have naturally happened in production. I would like to share more about how to improve the realism for CE experiments.
During the talk, I would like to share our research work on chaos engineering for system call invocations. I will present a novel fault injection framework for system call invocation errors, called Phoebe. Phoebe is unique as follows: First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they mimic errors that naturally happen in production. Third, Phoebe is able to automatically conduct experiments to systematically assess the reliability of applications with respect to system call invocation errors in production. We evaluate the effectiveness and runtime overhead of Phoebe on two real-world applications in a production environment. The results show that Phoebe successfully generates realistic error models and is able to detect important reliability weaknesses with respect to system call invocation errors.
The corresponding research paper could be found here: https://arxiv.org/abs/2006.04444
Priority access to all content
Community Discord
Exclusive promotions and giveaways