Conf42: Site Reliability Engineering 2022


The Freedom of Kubernetes requires Chaos Engineering to shine in production

Henrik Rexed
Cloud Native Advocate @ Dynatrace

Henrik Rexed's LinkedIn account Henrik Rexed's twitter account

Like any other technology transformation, k8s adoption typically starts with small “pet projects”. One k8s cluster here, another one over there. If you don’t pay attention, you may end up like many organizations these days, something that spreads like wildfire: hundreds or thousands of k8s clusters, owned by different teams, spread across on-premises and in the cloud, some shared, some very isolated.

When we start building application for k8s, we often lose sight of the larger picture on where it would be deployed and more over what the technical constraints of our targeted environment are.

Sometimes, we even think that k8s is that magician that will make all our hardware constraints disappear.

In reality, Kubernetes requires you to define quotas on nodes, namespaces, resource limits on our pods to make sure that your workload will be reliable. In case of heavy pressure, k8s will evict pods to remove pressure on your nodes, but eviction could have a significant impact on your end-users.

How can we proactively test our settings and measure the impact of k8s events to our users? The simple answer to this question is chaos Engineering.

During this presentation we will use real production stories to explain: - The various Kubernetes settings that we could implement to avoid major production outages. - How to Define the Chaos experiments that will help us to validate our settings - The importance of combining Load testing and Chaos engineering - The Observability pillars that we will help us validating our experiments

Awesome conferences for

Priority access to all content

Community Discord

Exclusive promotions and giveaways