Conf42 Cloud Native 2023 - Online

Building a self service DBaaS for your Internal Developer Platform

Video size:

Abstract

Learn all there is to know about the difficulties of running a DBaaS at scale and how cloud-native tools can help build a reliable and scalable DBaaS offering within your platform, by leveraging MongoDB, Kubernetes Operators, and ArgoCD

Summary

  • Presentation and demo about building a self service database as a service for your internal developer platform. Using our own Atlas developer data platform and our Kubernetes operator to demonstrate how this can work.
  • According to internal developerplatforms. org, 95% of internal developer platforms idps are built on top of Kubernetes. It offers highly flexible networking, including options like directly exposing pods and load balancing connections. It provides a high degree of customization and extensibility, particularly in the form of operators and custom resources.
  • Internal developer platforms are built to enable developer self service of platform infrastructure. Databases as a service is often one of the most critical components of an internal developer platform. But building a database as a Service is not without risk or complexity.
  • Both of the above items touch on the topic of balancing developer empowerment with central oversight. Self service is nearly always faster, as we haven't got to wait for someone else to become free to do what we need them to. It frees up any central teams to deal with support and improving the services of the IDP.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to our presentation and demo about building a self service database as a service for your internal developer platform. My name is Dan McKean and I'm a product manager at Mongre DB, and I'm joined by George Hanzaris, who's an engineering director also at MongoDB. We're responsible for enabling our customers and users to run MongoDB in two ways, self hosted in Kubernetes, using our enterprise or community Kubernetes operators, or using our Atlas Kubernetes operator, which is designed to manage and configure our atlas developer data platform. In this we're going to cover a range of non MongoDB specific considerations, starting with Kubernetes as the platform of platforms, building and managing a dbas in your internal developer platform, also known as an IDP. Why? To build a DBaas and the risks and the criticality of enabling self service in a dbas. Then we're going to use our own Atlas developer data platform and our Kubernetes operator to demonstrate how this can work, covering what Atlas is and how our operator works and how to put it all together in theory and in a demo. We're going to start with Kubernetes as a platform of platforms. According to internal developerplatforms.org, 95% of internal developer platforms idps are built on top of Kubernetes. Many of you already know of Kubernetes as an open source container orchestration system for automating, application development, scaling and management. In recent years, Kubernetes has become nearly synonymous with container orchestration and with so many services being built as microservices and designed to be automatically scaled containers and kubernetes are reigning supreme. So what does Kubernetes offer? It offers highly flexible networking, including options like directly exposing pods, load balancing connections and ingress services. Storage orchestration to provide either ephemeral or persistent storage, high availability and high levels of resiliency by making it easy to deploy many copies of a service across many physical or virtual machines. Self healing by monitoring the state of objects in Kubernetes and keeping those aligned with the declarative configuration and a low degree of vendor lock in. Thanks to the many standardized flavors of kubernetes available for either self hosting or as a cloud based platform, as a service and finally, and arguably most critically, when it comes to an internal developer platform, it provides a high degree of customization and extensibility, particularly in the form of Kubernetes operators and custom resources. An operator extends the native Kubernetes control plane with custom logic that helps manage a lot of the essential tasks that are bespoke to a specific product. Like MongoDB, it's usually paired with custom resources defined in Kubernetes using custom resource definitions. These custom resources allow for the creation of new types of kubernetes objects, which can be monitored by the operator and allows the operator to take action. The actions taken can vary massively, but some of the common ones include deploying an application, taking a backup, upgrading an application, or exposing a service to applications that do not support the Kubernetes. API. Operators can also be used to manage resources external to kubernetes. Most commonly, this is done by using the APIs of the external service in which the objects are actually being run. Custom resources within the Kubernetes cluster can be used to represent the desired configuration of those external resources, allowing the operator to monitor the custom resources and then use the external service APIs to make the required changes. The value here is that those custom resources can then be managed in the same way as the services in the local Kubernetes cluster, benefiting from using the same tooling, processes, permissions and automation. Now we're going to dig into databases as a service within an internal developer platform, but first a brief recap on what an IDP is and offers. Internal developer platforms are built to enable developer self service of platform infrastructure. They're typically built by an Ops team and used by developers. They provide a common process and method of engaging with the platform, often via templates. This automates recurring tasks such as spinning up environments and resources, and helps enforce standards such as security requirements. IDPs often abstract away the complexity of the underlying platform technologies, saving everyone from needing to be an expert. Development teams can gain autonomy by being empowered to spin up fully provisioned environments and manage them with a minimum of effort or complexity. Idbs can be built or bought, or some combination of both. A dbAs or database as a service is often one of the most critical components of an internal developer platform. Most applications need a database at some point, and databases can be some of the most complex services to deploy and manage. A company's choice of database can make a dramatic difference on not only the success of the application, but also on the speed, success and happiness or unhappiness of a development team. All this makes simplifying the consumption, use and management of databases incredibly valuable. This is especially true for day two operations such as upgrades, where developers can be spared a huge amount of ongoing work through the centralization and automation that a DBAs can offer, especially when a Kubernetes operator has been used to handle those sort of day two operations. But building a database as a service is not without risk or complexity. Databases can vary a lot, even from a single manufacturer, and one of the key questions to identify is how much customization and configuration to expose to development teams. Security sizing, performance, backup, sharding and resilience are all major considerations, and that's without taking into account any of the specifics of the underlying platform technologies that underpin the IDP itself. We see many companies turning to fairly strictly defined templates that predetermine many of those things and give minimal customization to the end users. An example of this could be t shirt sizes for the database deployments, with guidance about which sizes suit which use cases, troubleshooting is often a challenge. Security best practices encourage a minimum number of people to have a minimum level of rights and permissions. But how do you avoid a central team being a blocker to development teams? Many companies opt to have far fewer restrictions on preprod environments. This enables developer to try new things and have some hope of fixing it when it goes wrong. But for production environments, this is often heavily restricted as far more damage can be done by a wrong move. This divided approach works well to allow a balance of self service whilst protecting production services. Both of the above items touch on the topic of balancing developer empowerment with central oversight. Self service is nearly always faster, as we haven't got to wait for someone else to become free to do what we need them to. It frees up any central teams to deal with support and improving the services of the IDP or DBaas. Self service empowers users, particularly by allowing us to try new things without worrying about wasting someone else's time. There's a few common methods for achieving this publishing assets such as helm charts that users can then customize and deploy themselves a Gitops workflow where the configuration of all resources, whether local or remote, are stored in a git repository and tools such as ArgoCd or flux are used to deploy those resources in kubernetes, a portal or marketplace. Even further abstracting the complexity and allowing users to see what's possible and select what they need. All of these have tradeoffs, in particular in terms of the level of investment and maintenance for a central team versus the level of knowledge needed by the end user. So now we've seen the importance, we've seen the value of building a database as a service offering for your internal developer platform. But we've also seen the difficulties and we've also seen the importance of making these platform features available through a self service approach and we've seen possibilities of how you can do that. So let's now explore the tools and the architecture of how we can actually go on and implement this. So initially the first step on a really high level, the first step is that we want each developer to be able to define what database requirements they have. The second step is we want this definition to be translated into resources that our platform can understand. And then finally we want to give our platform the ability of deploying and managing and managing databases. Now let's start looking into the tool. So initially we're going to look into the Kubernetes operators we're going to be using. So on a very high level, the user defines what type of database deployments they need through a Kubectl command. They apply this to a cluster and then the operator in that cluster makes the necessary API calls in the Atlas API node to deploy those managed databases. What happens now under the hood is that you define a new custom resource and we're going to see in a bit more detail what that is exactly. And that custom resource is managed by the operators. And then the operator interacts with Kubernetes and makes the necessary call, sees what is the current state, what kind of adjustments the Kubernetes states needs, and then goes on and creates this custom resource. And the custom resource here we can see is the Atlas deployment resource. And to define this, it's pretty straightforward. You can just add the name you want for the database, the instance size, the provider and the region you care about and that's pretty much it. This is a good way, this is an easy way to deploy and manage databases. But still, in this scenario, using just the operator, you would need the developer, you would need anyone who needs to spin up a database to have this definition file locally, this yAml file locally, and you would have them to run Kubectl manually and deploy the database. So what we want to do is to automate this process. And instead of having this YAML files locally, instead of running commands locally, we would like to do this in a kind of a different way. So we have a developer, the YaML is developed and then the YaMl is pushed to a repo which is specifically designed to have our infrastructure as code files. From that point what we do is we pull the files from this repository, ArgocD pulls those files and ArgoCd is responsible for applying the changes in our Kubernetes cluster. And the way it does that is by creating a simple application like this, we define exactly the repo URL that we want Argocd to be watching, exactly which revision, and we see again the sync policy, whether we want automated sync and some other conditions. So let's kind of put all of these together and see how this is going to work. So to get started, you need to set up some prerequisites. Initially you would need an Atlas account and an API key. You want to run in Kubernetes cluster and you want to install the Atlas Kubernetes operator in that cluster. You would then go on and install Argocd, create a dedicated infrastructure as code repository, and then create an ArgoCD application towards that repository. And finally, this is what our service database as a service looks like. So initially the developer develops the file, pushes it in the git repo, when usually we would have a pr open. At that point when that's merged to the specific branch that we have Argo looking, then Argo would be triggered, it would pull the changes, apply the changes in Kubernetes, and then as a new custom resource is deployed, then the Kubernetes operator would take on and of call the Atlas API and create the resources, the users, the databases that we need and so on. And this is pretty much how our database as a service offering for our internal developer platform is going to look like. Thank you for watching.
...

George Hantzaras

Director of Engineering - Kubernetes @ MongoDB

George Hantzaras's LinkedIn account George Hantzaras's twitter account

Dan Mckean

Senior Product Manager @ MongoDB

Dan Mckean's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways