Conf42 Cloud Native 2023 - Online

Managing k8s: moving from the Openstack Magnum to the Cluster API

Video size:

Abstract

When considering both technologies, it would be explained why magnum was chosen in the first place, what advantages and disadvantages it has, and which of these we can no longer tolerate. After considering the alternatives, we have decided to move forward with the Cluster API as it offers a more comprehensive and efficient solution for our managed Kubernetes clusters. There will be provided a detailed overview of our implementation of the Cluster API, including the features and benefits that we and our users have gained from this new solution. We have been able to achieve a more secure, flexible, and cost-effective solution for our managed Kubernetes clusters with the Cluster API, and we are confident that our users will be pleased with the results.

Summary

  • Andrei Novoselov: I work at these Gcore company and I'm a system engineer team lead. Today we'll talk about the managed kubernetes as a service and about the OpenStack Magnum and the cluster API.
  • OpenStack Magnum orchestrates container clusters. Magnum itself cannot configure the virtual machines. It uses another Openstack component which is called OpenStack heat. There is no control plane isolation from the user. Magnum has pros and cons.
  • Cluster API manages the lifecycle of the Kubernetes cluster using a declarative API. It works in different environments, both on premises and in the cloud. It can be extended to support any infrastructure.
  • We moved from the cluster API to the Magnum and what we got from it we got a great speed up. We have 1.24, 1.25 and 1.26 version of the Kubernetes out of the box and we can update all the infrastructure inside the client's cluster with our Argo CD applications.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, my name is Andre and today I'm going to tell you about two different approaches of managing Kubernetes clusters. About these OpenStack Magnum and about the cluster API. But before that, let me introduce myself and introduce my company. So name is Andrei Novoselov. I work at these Gcore company and I'm a system engineer team lead. Juicor has a lots of products such as can DNS cloud and a lot of more. But I work at the Gcore cloud. So I'll tell you about the Gcore cloud. We have more than 20 locations around the globe where we are provided the public cloud service. So users can use some basic cloud services such as virtual machines, load balancers, bare metal servers and firewalls, or more complicated platform services such as managed kubernetes as a service, function as a service, logging as a service, basically anything as a service is in my team's responsibility zone. But today we'll talk about the managed kubernetes as a service and about the OpenStack Magnum and the cluster API. And we'll start with OpenStack Magnum. So what does Magnum do? It orchestrates container clusters and it supports two technologies, dockers form and kubernetes. And we'll talk about kubernetes. But Magnum itself cannot configure the virtual machines and it uses another Openstack component which is called OpenStack heat. And heat creates cloud init config for the Kubernetes nodes. And it also updates application versions and configuration on the Kubernetes node. So let's take a look at the Magnum architecture. First of all, Magnum has an API and it's an HTTP API. It gets some requests, and basically all those requests are some tasks and tasks. For example, create a cluster, update these cluster, delete the cluster. So Magnum API puts those tasks into the Rabbitm queue, and Magnum conductor gets those tasks from the rabbitm queue and makes them done. And it's a pretty common architecture for the OpenStack service. So an API RabbitMQ and some engine and heat is pretty same. It also has heat API RabbitMQ to handle the tasks from the API to the engine and heat engine. It also has a heat agent. Heat agent runs on every virtual machine configured with heat, and it updates the application version and application configuration if needed. So what's the limitations of this approach? Well, first of all, there is no control plane isolation from the user. So the user gets the full admin access to the Kubernetes cluster and he can do whatever he wants with the control plane components. And that's not how you do the managed Kubernetes service. And there's one more minor thing. It's an OpenStack API and we are not providing users access to the OpenStack API. So we have to hide this API behind the Gcore cloud API. But it's not a big deal. Let's talk about the control plane isolation. Here's a scheme of Gcore Magnum based managed Kubernetes service architecture on the cluster level. So managed Kubernetes cluster is always inside the client private network and client can select which networks he prefers. And let's talk about the control plane. Control plane is free virtual machines. And on every virtual machines we have all the control plane components, Kubernetes, control plane components such as EDCD, Kubernetes API control, managed Kedola, kubedns and so on and so forth. All of them are Portman containers controlled by the system D, etCD, Kubernetes and control and Kube API have these ports exposed outside of these virtual machines. And we have load balancers. So there's free virtual machines, free TCD replica, free Kube APIs and we have cloud load balancers to hide all these replicas behind these load balancers. And we also have a firewall which only allows access to those free exposed services. And none of that is visible for the client. So client cannot see the firewall, the control plane nodes and the load balancers. Client can have as much worker nodes as he want. And on the worker node, Kubernetes is also inside the podman container which is controlled by the system D. And for the Kubernetes workload, the container engine is docker itself. And if kubernetes or any other pod inside the kubernetes need to access Kube API, etcd or kubedns, well, you have to go through the cloud load balancers and through the firewall inside the master nodes, inside the control plane nodes. And that's it. And what's the pros and cons of OpenStack Magnum? Well, the big plus is that it is OpenStack and it has a great community which is developing Magnum and supports it and adds new features and fixes bugs for us. And I guess that's it. That's the biggest plus. And let's take a look at the downsides. Well, first of all, it is extra RPS for the cloud API. Like I said, we had to hide the Magnum behind and these heat APIs behind the cloud API. So now all the requests to those APIs go through the cloud API and that's a lot of requests if you have a lot of clusters, but that's not a big deal. Well, the second thing is that this constructor is really fragile. So what? I mean if something went wrong while the cluster was creating or updating, the Magnum cluster goes to the error state and there is no way using the OpenStack Cli to make it alive again. So basically the debug looks like this. You have to find the reason. For example, OpenStack could not create a load balancer for the ETCD or for the cloud API while was creating these cluster. So cluster went to the failed state while it was creating. So let's say you fix it. The original reason, I don't know, you restarted the Octavia. Maybe the problem was in the rabbitm queue, but now it's fixed and load balancing can be created and now you have to log in into the production MariaDB and update some roles for this cluster and set. Like now this cluster is active, not in an error state, and the same for the heat. And if it's just one line for the cluster inside the mammum DB, the heat has a little complicated structure of heat templates and heat stacks and you have to find these gained stack and update it well. And it's a lot of update operations on the production database, which is not what you want to do on daily basis. And like I said, if Octavia was in some error state or maybe rabbit MQ, it happens from time to time. So you have to not only to fix that OpenStack component, but also to fix the Magnum clusters if they were creating while Octavia or rabbit were not available. The other thing is observability. So let's say you want to know the state of all the Cube API containers in all your clusters. Well, you have to log in via Ssh to the control plane nodes and to find out is everything okay or not. Or you may create can Prometheus exporter if you're using Prometheus to monitor these clusters or do something with it, but from the box there's no such thing as health check for all the system D units on the master cluster. And if you want to be sure that everything is okay, you have to log in via Ssh to those nodes and there is no bare metal nodes. And that's a huge minus. And one more thing, here's the compatibility metrics. On the left column you can see the OpenStack version and on the second column you can see the supported Kubernetes version. So if you are using yoga, which is just one year old, the highest Kubernetes version for you is 1.23. And here's the Kubernetes supported versions for now, and it starts with 1.24 and 1.25 and 1.26 is the highest version. But in April we'll have a release I hope, and this will change and the supported versions will be 1.25, 1.26 and 1.27. So the OpenStack is about two Kubernetes version behind the Kubernetes. So I guess it's a big deal. And we did not wish to add support to those Kubernetes versions to the Magnum and we decided to take a look at other tools for managing the lifecycle of the Kubernetes cluster. And there's a project which is called Cluster API and it's a Kubernetes project. And what's the goal of this project? Well first of all it manages the lifecycle and by that I mean create, scale, upgrade and destroy all the Kubernetes cluster using a declarative API. It works in different environments, both on premises and in the cloud, and it defines common operations and provides the default implementations and provided the ability to swap out implementations for the alternative ones. So if you don't like it, if you don't like something in cluster API you can do it your way and cluster API can be extended to support any infrastructure and. Sounds great. So what is it made of? What are the main components of the cluster API? Well first of all it's a controller manager, a bootstrap controller manager, control plane controller manager and infrastructure provider. Let's talk about it a little. So cluster API, basically it's four Kubernetes controllers. So it runs inside the Kubernetes. So you need a Kubernetes cluster to create another Kubernetes cluster. And it operates with Kubernetes objects with the custom resources. So these controllers, these control those custom resources and reconciliate them and what do we have out of the box? Bootstrap control. Out of the box supports KubeadM microcades, Taylors and EKs control plane control. Out of the box supports KubeadM microcades, talos and the project called Nested. And there's a bunch of infrastructure controllers for AWS Asia Wesphere metal free. Well lots of, lot of, lot of providers, but there is no jquery provider yet. So let's talk a little bit more about how it works. So we have those binaries like controller manager, bootstrap controller manager and control plane controller manager. They handle the whole lifecycle of the Kubernetes cluster and they know nothing about the infrastructure behind them. So that's why you need an infrastructure provider. Infrastructure provided is a thing that allows cluster API to create some basic infrastructure objects such as load balancers, virtual machines, firewalls, server groups, et cetera, et cetera. And the cluster API says okay, if your infrastructure provided allows me to create a virtual machines, load balancers and firewall these, I can do anything on that cloud. So let's take a look at some yaml and let's take a look at the cluster object. And there's obviously some cluster configuration inside these pack. But I wanted to point your attention at the control plane ref and infrastructure ref. So we have a cluster and it has infrastructure reference to the GcoRe cluster object which is the implementation of this specific cluster on the Gcore provider. And if we take a look at the control plane we'll see pretty same thing. It says okay, we need free replicas of control plane nodes with some Kubernetes version and it has a reference to the infrastructure provider to the kind Gcore machine template. So we have a template for the Gcore specific infrastructure provider for the machine to create control plane node and we need three of them. So control plane has a reference to the Gcore machine template on these infrastructure and the same for the machine deployment. Machine deployment is an object which describes the worker group of some Kubernetes cluster. So this one says okay, we need six workers and to create them please use the infrastructure reference to the Jacob machine with this name and that's it. So what I tried to say that this is how cluster API works. It has some basic objects which do not change from one cloud provider to another, and the cloud provider will always give an infrastructure reference. Infrastructure provider, my bad. So cluster is always referred to the Gcore cluster. Control plane and machine deployment. Both are created from the Gcore machine templates. And one more thing, meshing. Meshing is an object which describes control plane node or virtual machine or worker virtual machine or bare metal machine. So what are the limitations of the cluster API? Out of the box we have Kubeadm bootstrap provider which is more or less suitable for us, cube ADM control plane provider. So that basically means that the Kubeadm will be used to bootstrap the control plane and to join workers to that control plane, and has a provider for the cluster wrapping which uses Kubeadm and there is no infrastructure provider for the Gcore infrastructure. And what else limitations can we see? There's also no control plane isolation from the user. You need a Kubernetes cluster to create another Kubernetes cluster and there is no jiggle provider. And we still have free virtual machines for the control plane if we use Kubeadm. So we decided to do it our way and this is our implementation of these cluster API. We decided not to use free virtual machines for the control plane. We already have a Kubernetes cluster to handle cluster API and we decided to put the control plane containers into these cluster as well. So we have a namespace for each client cluster where we have some custom resources describing cluster juicy cluster, some control planes, some machine deployments and we also have the control plane pods inside it. So the control plane components are pods inside the service Kubernetes cluster and all the cluster API objects are at the same namespace as the control plane pods. And we have no virtual machines for the control plane. And so to do that we had to create Gcore bootstrap controller, Gcore control plane controller and infrastructure provider which is called CAPGC. And thanks for our colleagues from these working for help with that. Thank you guys. And we has to do two more things. We had to create open ven controller and we use Argo CD and we'll talk about it a little bit later. So let's take a look at the Gcore cluster API based managed Kate service architecture. So now you can see that there are some difference. In client private network there is no master nodes anymore, only worker nodes. And in Gcore private network we have a Gcore service Kubernetes cluster where in some namespace we keep all the custom resources of the cluster API such as like I said, cluster machine deployment, control plane and so on. And also we have control plane binaries such as ETCD, Kube API controller, manager, scheduler and some more. So since because of the service cluster is in some Jacob private cluster private network and the worker nodes are in the client's private network, there's no directly connectivity, network connectivity. So what did we do? We used a cloud load balancer to expose the Cube API to these public ip address, so all these components on the worker nodes can access the Kube API via the Internet. And that's it. That's simple. But what about the reverse connectivity? What if someone would like to do to type the command Kubectl logs? And what happens when you do that? Your Kubernetes API works as proxy to the Kubernetes and shows the log from the Kubelet so Kubelet can get the logs from the node. What if someone would like to use a port forward? It's pretty same. Kube API works as a proxy to some service in the Kubernetes. And Cube API has no way to access Kubelet because the Kubelet is inside the client private network and is not accessible. So and the admission web hooks. So what if Cube API would have to validate some custom resource before putting it to the ATCD? It will need to access some pod inside the client clusters and there's no way to do it. So VPN we decided to do it this way. On these client side, we put a pod with OpenVPN server and we expose it with cloud load balancer to the Internet in the control plane in Kube API port there are two containers. One of them is Kube API and the other one is these OpenVPN client. And the OpenVPN client connects to the OpenVPN server port through the cloud load balancer and it gets the routes to the node, network, port, network and service network. And now right inside the port of the Kube API, we have some routes to all those networks. And Kube API can access kubelets on the node networks, services and pods inside the Kubernetes cluster. But what if client deletes something? What if client deletes an openupian server Kaliko KQ proxy and we have an argo CD for that. So inside our service cluster we have an argo CD which have apps for all infrastructure which should be controlled by Jacob inside the client's worker nodes such as Kubeproxy, Kube DNS, Calico and OpenVPN service for the reverse connectivity. So argo CD manage these renders and manifests for all that infrastructure and puts them directly into the Kube API which is located inside the Gcore service cluster. And then the Kubelet accesses the Kube API and finds out what ports should be run on the node. So if clients deletes anything using the Kubectl, the Argo CD will recreate it. And one Gcore thing about the observability. So comparing to the OpenStack magnum, well, we could not find any suitable tool to find out. Is everything okay with cluster? We have it just out of the box. So we can use a command like Cubectl, get cluster minus a capital a and we'll get all the clusters which were provided, which were provisioned using the cluster API in our service cluster. So we can see that we have 1234 clusters and all of them are ready and these version is 1.24 point ten. We also can get information about all the control planes so the output is pretty same, but it tells us everything about the control planes, not about all costs. So we can see that all control planes are ready. So EDCD, Cube, API, Kubescadal, controller, manager, all of them are up and running. And here's the version of the control plane. We also can take a look at the worker pools which are called machine deployments in the cluster API and we can see that in this namespace, the first one there should be free replicas, so free worker nodes and all these are ready and all these are updated and they are running and they're two days old and the version is 1.24 point ten and we can get information about any cluster we're interested just with single Kubectl comment or about all of them and we can get a machine. So Kubectl get machine will minus capital a will bring us all the virtual machines or biometric machines which are controlled by cluster API inside this cluster and we even can see the Openstack id of these virtual machines and how long it exists and the Kubernetes version on it. So it's really great. And that's it. We moved from the cluster API to the Magnum and what we got from it we got a great speed up. All the control plane is just a bunch of ports and it's really much easier to spin up some ports than to create a virtual machines. We got the easy updates and that's much faster. Obviously we got these easy updates. We have 1.24, 1.25 and 1.26 version of the Kubernetes out of the box and we can update all the infrastructure inside the client's cluster with our Argo CD applications and it's really easy as well. We got the reconciliation loop. So if the heat tried to create for example a load balancer and failed, then you are the one who have to fix it. If cluster API tries to create a virtual machine, a load balancer or whatever, and fails, it just was a little and tries again and tries again and tries again until everything is done well and it's really great. So after you fixed your Octavia or Nova or RabbitmQ inside the OpenStack, you do not need to reconfigure all the cluster in the region. Cluster API will just do another try and succeed and we'll be happy with that. We got the biometal nodes with our powered by intel and it's a great feature and a lot of our clients wanted biometal worker nodes and now we have it because we created an infrastructure provider which can use biometal nodes as worker nodes for the Kubernetes we got that transparency. So if you want to look at these specific machine or at the machine deployment or at cluster using the Kube native way with the Kubectl, you got it. And we have no control plane nodes no more. So there's no extra cost for us for managing control plane nodes. And I guess that's it. Thank you for your attention and feel free to contact me if you have any questions.
...

Andrei Novoselov

–°loud Ops PaaS team lead @ Gcore

Andrei Novoselov's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways