Simplifying Cluster Provisioning at Scale: A Cloud-Native Tool for the mass, k0rdent

Video size:

Abstract

This talk directly advances the Kubernetes ecosystem by addressing key challenges in scalability, automation, and operational efficiency that many platform engineers, DevOps teams, and infrastructure operators face today. It aims to talk about: Simplifies ClusterAPI (CAPI) Adoption, ClusterAPI is powerful but notoriously complex, requiring hundreds of lines of YAML and deep expertise. k0rdent abstracts this complexity, making it easier for teams to adopt CAPI without a steep learning curve. This increases CAPI adoption in the Kubernetes community, accelerating infrastructure standardization. Enhanced Kubernetes Multi-Cloud & Hybrid Operations, The ecosystem thrives on interoperability across clouds and on-prem. k0rdent provides a unified approach to managing Kubernetes clusters across AWS, Azure, GCP, vSphere, OpenStack, and beyond. Strengthens GitOps and Declarative Infrastructure: GitOps is key to reliable, automated Kubernetes operations, but many struggle with applying it to cluster management. k0rdent integrates GitOps practices, making declarative, version-controlled cluster deployments effortless. This reinforces best practices in Kubernetes automation, improving security and consistency.

Join the talk to learn how k0rdent enables scalable, GitOps-driven Kubernetes automation and helps platform engineers build secure Internal Developer Platforms (IDPs).

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Welcome to the session. today we're going to talk about current. It's an open source project. That's aimed at simplifying Cuban these cluster processing at scale, right? So all of us who've been in the DevOps world know what a pain it is to, probably in a cluster, right? You have to come up with, an optimal way to do it at scale, and to manage, complex Configurations as well. So Codent solves this problem. in the session, we're going to look at the challenges and we're going to see what difficulties are there in platform engineering and then approach Codent. And towards the end of it, we'll see, Codent in action, right? So I'm Bharath. I'm in the OSPO team at Monantis. I'm a senior software dev and an open source contributor. Been working in and around Kubernetes for a while now. Let's get started. So we all know how the cloud was supposed to be right, but very quickly It got into this maze where everything became just so much complex And even kubernetes which started as a very basic project quickly grew and although it is the ecosystem, is what makes it today the ecosystem has become so big that It's super complex, even for an experienced person in this world. That's why we need a simplified streamlined process, which we call platform engineering today. And for enterprises specifically, when you look at platform engineering, if I have to sum up, what are the key things that you look for? Developer self service is extremely important. So having a dedicated DevOps team that's as big or even bigger than your development team is not scalable anymore. So you've got to have a self service portal. Where your dev teams and your QA teams and your security teams and all the other teams can manage the infrastructure themselves with the platform engineering being the backend You also need operational simplicity If you don't have that then providing this for multiple teams across your company becomes extremely difficult Customizations is again another important thing because like I said, for different teams, the requirements will be different. So without changing the core model, having the option to customize is what, brings this whole process about for platform engineers. And visibility and control, again, at an administrative level, this is extremely important to manage your clusters. And then, last but not least, security and compliance. Every company is going to follow it. that's a primary, these days. So given all these challenges, there's a growing need for multi cluster configurations. So for a platform engineer, you need multi cluster configurations for multiple things, such as AM and work nodes or multi tenancy, edge devices, IOT, and so on. Now, when you look at the market and see what are the options available for me to accomplish this, Then if you don't do anything that's always an option, right? So if you don't do anything, your infrastructure costs grows up and up. And before you realize you're paying like huge cloud bills, right? Then let's take it like a basic step ahead with DIY open source tools. of course, there's been so many tools for cluster provisioning and maintenance in the open source market. But then there's always this. Learning curve, that's, that comes with it. And then you gotta understand each tools and you gotta understand how one tool works with another integrations and configurations and all that. And then of course, you have the problem for extending it to your on premises as well. Then if I have to take it another step further, there are proprietary solutions, but the proprietary solutions obviously come with expensive licenses and they're quite limited to the options that they provide. And more often than not, two years, three years down the line, what's going to happen is that without, you looking at it really hard, you are in a vendor lock in, right? And if you have to expand to multiple clouds, and it becomes really difficult at that point. So to solve all these problems, we are presenting an enterprise grade open source solution, right? That's cost efficient and that's customizable and it's completely in your hands and your control, right? And it works on multiple public clouds as well as on drum. This is where Codent comes in, right? At a very high level, we have three things. Cluster management, state management and observability. So the cluster management, what I mean by that is, how are you going to manage multiple clusters across clouds at scale? That level of cluster management is taken care of by Codent. State management talks about the day to operations when you deploy or need to deploy multiple applications across your clouds in a single click. That's your state management and then comes. Observability. So when you have a working operation, clusters and workloads and apps running on it, the one thing that's 100 percent needed in this world or any world is really observability tools. So we're baking this into the current ecosystem without having to rely on external resources. We have to again do configuration yourself. All this is that so open delivery is used and you can just use it right from the package. So before jumping there, just one last thing about the golden paths that's often talked about in DevOps. So everybody wants to perfect this. Every company, every team is looking to move as close as possible to a golden path, although you can always argue there isn't one. But right. Having something close to that is still a lot of success. It still saves you a lot of effort, money, and it gives you a repeatable process for maintenance as well as scalability. So especially when you have a choice of multiple clouds, then this becomes extremely important as well. A quick look at the Codon architecture. So at a high level, like I said, there are three components. One is the cluster management, one is the state management, and then you have the observability stack. So with the cluster management, what happens is we use upstream capi controllers. So Since this is completely open source, the underlying technologies to make this happen is also open source. Based on the cloud, we have the CAPEX controllers as well. CAPA for AWS, CAPZ for Azure, CAPV for VMware, and so on. We deploy the upstream CAPI controller, as well as the cloud specific controllers, onto your management. Cluster along with that will be the Cosmotron controller. So K0s and Cosmotron form a core part and this entire architecture wherein we use that for controller management. So this becomes your management cluster wherein the KCM controllers and a bunch of other components that are related to current along with the upstream components. Are running on the management cluster and right here in the management cluster for state management We have sveltos controller So sveltos like I said is for managing the states and deploying applications across multiple clusters, right? again, this is an open source project and it's Integrated into codent so that all you need to do for example If you want to provision or deploy cert manager across all your clusters Then all you need to do is include search manager in the configuration file. And that's that's automatically deployed to all the clusters And that is what's shown, here on the right side in the child clusters So in the child cluster along with all the kubernetes add on services that you'll find You'll also see that this is felt as agent that's operating there To talk between your child cluster and the management cluster, right? How you accomplish this entire thing is by defining something called as a cluster deployment We'll look at this in detail when I go to the demo part But just briefly I want to mention that your entire cluster configuration will be controlled and monitored using this cr called as cluster deployment and to facilitate the Configuration of the clusters based on the cloud. We have multiple cluster templates and to facilitate the applications configuration, we have service templates. So the example that I gave you earlier about cert manager is an example of a service template and example, let's say you want to deploy it on AWS. That's defined as a cluster template, right? So this is a very overall architecture of, Got it. So now let's go into the demo and just before we go into the demo. I want to open the documentation and just briefly go through this so if I Look at the installation part, right? So we have a help package. like I said, you need a management cluster. It could be kind or it could be running on any other infrastructure as well. for simplicity, you can try kind to run on your laptop or any VM and then all you got to do is do a helm install. Of, of the KCM, right? Once you do this, a bunch of components are installed, like I said. The controllers, the Cosmotron controllers, a bunch of other KCM stuff. And then we proceed to define our, cluster templates and then define the cluster deployments. So now let's jump into the terminal where in I already have a kind cluster, right? And if I do this, you'll see that there are all these namespaces. And if I do this, you'll see all the parts that are up and running. So it's exactly the same. Like I mentioned, there's an alive kind cluster. All the controllers and Cosmotron and other components of KCM, right? And the next thing, by the way, that's this. So kubectl hyphen and KCM system is Ka. And if I do cluster template get, here you go, you see all the different cluster templates. So these are pre packaged templates that are available for you to try based on the clouds. So we have for AWS EKS and then AWS and then Azure EKS and Azure OpenStack and vSphere. Similarly, you have service templates and by default, a bunch of service templates are baked into the management cluster. when you install it, the other Helm shot. some of the examples are these, right? So all you've got to do is specify this in the cluster deployment. So just to show you how that looks let's take a look at a shore. These are all the files that you typically need. not all of them, but I'm going to show you one by one. You'll need a secret. All these instructions are provided in the docs, by the way. So the instructions are segregated. Or cloud provider and it's slightly different for cloud provider based on how they expect the credentials to be set up and all that So example aws you have these keys for azure you have something different, right? you've got to create You've got to create a service principle based on the subscription ID and for OpenStack, again, it's all these variables and similar, similarly for vSphere as well, right? essentially, there's a secret. yml file and then there's a credential file. this credential is referenced by the secret and cluster identity is also referenced by the credential. cluster identity is a file. That is a cr of azure cluster entity and it references these stuff now Finally, let's take a look at the Cluster deployment. So i'm defined two cluster deployments here. One would be a basic cluster with One control plane and one work node, right and it uses minimum cpu Okay, and then there's another control under the cluster deployment pretty much a similar configuration but I have used A gpu enabled cluster, right? I've applied both these things. Just to show you the other bit as well For aws you have all these things as well, right? So aws How you have to choose? Is if to choose the instance type based on what you want. So here i've chosen an instance type Which is a gpu enabled worker node if you don't want a gp enabled worker node It's absolutely fine. All you got to do is select something like this to three dot small t three dot medium Whatever is your requirement and similarly for open stack as well Now i'm going to show you the kind of objects that we work with. So Since i've already ran this Here you go. So when you list the cluster deployments, you'll see something like this, right? Let's go ahead and list one of them and Do if no yaml, there you go. It's as simple as this So this is where you mentioned the cluster template and this is where you mentioned the credential and this is where you mentioned the configuration that's specific to you and this is completely customizable if you look at you can look up the documentation for all the various, fields that are available. and that's that so as to show you another example, let's look at the open stack one So there's an identity ref. There's a specific to open stack cloud And then there's the control plane configuration the work configuration and then the template so on and so forth, right? So this is our private cloud. So some configuration that's related to that has to be provided All right. now that we've seen the cluster deployments, and like I mentioned, it's all Capi objects, right? So you can see the Capi object being created underneath, right? So you'll have the AWS cluster, Azure cluster, OpenStack cluster, and you will also have the AWS machines, and Azure machines. And open stack machines, right? And finally if I have to do a machines Then as you can see it's listing all the machines that are available on the cluster to access this cluster it's as simple as getting the secret so Ka like I said is in the kcm system namespace. Let's look at all the secrets in the kcm system There's a bunch of secrets, but what we're interested in is the secret that has kube config appended to it. So for every cloud, you have a kube config and I've exported them all. I think. Let's also export this and let's also export this. Now, if I go here and do this. Now I'm in the Azure context. I do this. There you go. I have a single worker node and a single control plane node and it's up and running, right? Similarly for AWS. Alright, there you go. Last but not the least for OpenStack as well. All right, so you have all these notes, right? So you know, it's as simple as that and on the cloud You can see that these machines are provisioned here Similarly for a short as well. So as you can see this one is using the standard one and this one is using instance type that has gpu support as well, right? So that's running the tesla t4 nvidia chip Then aws has these So essentially, it's as simple as this, end of the day, what you need to do is define your cluster deployment and specify the configuration and Gordon takes care of it end to end, right? yeah, you have the KCM repository. It's completely open source. You can go take a look. Contributions are absolutely welcome. We have a slack channel as well. It's updated here. yeah, that's pretty much it. Thank you. Thank you for listening. Bye bye.

See all 81 talks at this event!

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Simplifying Cluster Provisioning at Scale: A Cloud-Native Tool for the mass, k0rdent

Video size:

Abstract

Summary

Transcript

Bharath Nallapeta

Senior Software Engineer @ Mirantis

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Simplifying Cluster Provisioning at Scale: A Cloud-Native Tool for the mass, k0rdent

Video size:

Abstract

Summary

Transcript

Bharath Nallapeta

Senior Software Engineer @ Mirantis

Join the community!