Conf42: Chaos Engineering 2021


Maximizing Error Injection Realism for Chaos Engineering with System Calls

Long Zhang
PhD Student in Computer Science @ KTH Royal Institute of Technology

Long Zhang's LinkedIn account Long Zhang's twitter account

Some of the perturbation models for chaos engineering are based on a random strategy such as ChaosMonkey. However, realistic perturbations could also come from errors that have naturally happened in production. I would like to share more about how to improve the realism for CE experiments.

During the talk, I would like to share our research work on chaos engineering for system call invocations. I will present a novel fault injection framework for system call invocation errors, called Phoebe. Phoebe is unique as follows: First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they mimic errors that naturally happen in production. Third, Phoebe is able to automatically conduct experiments to systematically assess the reliability of applications with respect to system call invocation errors in production. We evaluate the effectiveness and runtime overhead of Phoebe on two real-world applications in a production environment. The results show that Phoebe successfully generates realistic error models and is able to detect important reliability weaknesses with respect to system call invocation errors.

The corresponding research paper could be found here:

Awesome tech events for

Priority access to all content

Community Discord

Exclusive promotions and giveaways