Conf42: Cloud Native 2021

Building a K8s Operator for a Distributed Database

Video size:


How did we build a k8s operator that allows 100% up time for a high availability high workload database?

Operating a distributed high load, high throughput database in the cloud comes with several interesting challenges. In order to manage real-time serving of mission critical workloads at 100% availability we developed a Kubernetes operator that handles the operational complexities.

We needed to handle the following requirements: - Apply live patches - Replace live cluster with tens of nodes - Handle degraded/crashed nodes

Under these conditions: - High Availability - remain 100% online with no down time - Operate under very high workloads and traffic - Manage replicated records across different hardware failure groups (rack awareness)

Due to its stateful nature and the type of workloads that are usually handled, cluster management and recovery are non-trivial. We are using the Operators API to handle that complexity and control the clusters from within Kubernetes.

In this talk we’ll cover the steps we took to plan and execute and the challenges we faced and share the best practices.


Natalie Pistunovich

Lead Developer Advocate @ Aerospike

Natalie Pistunovich's LinkedIn account Natalie Pistunovich's twitter account

Awesome tech events for

Priority access to all content

Community Discord

Exclusive promotions and giveaways