Conf42 Cloud Native 2023 - Online

Scaling K8s Clusters Without Losing Your Mind or Your Money

Video size:

Abstract

Cloud computing offers different options of powerful, sustainable, cost effective infrastructure. How do you tailor it to meet your K8s workload needs? Karpenter! In this talk we will review how Karpenter (open-source) simplifies and accelerates provisioning cost-effective infrastructure.

Summary

  • Yale is a options architect at AWS focusing on compute services. In this talk we will dive into Karpenter, an autoscaling solution that helps scale efficiently the Kubernetes infrastructure. We will touch on the technical aspects of the implementation.
  • Auto scaling aims to achieve efficiency, flexibility and density. Being able to scale automatically when scaling choose the right instances that maximize efficiency. Also want to minimize the operational overhead that you invest in order to get to this goal.
  • Kubernetes cluster autoscaler is a popular open source solution that is in charge of ensuring that your cluster has enough nodes to schedule your pods without wasting opensource. The goal is to bridge between the Kubernetes abstraction into the cloud abstraction. Using the most cost optimal resources as possible based on your container requirements.
  • Karpenter is an open source scaling solution that automatically provisions new nodes in response to unschedulable pod events. It directly scales resources based on the requirements without the middleware of node groups. This provides simplification of the configuration and improves efficiency.
  • Kubernetes and carpenter can binpack containers into shared resources. Containers fit really well to use spot instances. You can tap into spot capacity and gain up to 90% discount.
  • Next way to save on your compute is by using the Amazon develop chips Graviton. Graviton can provide you up to 40% better price performance. Carpenter is an open source project and new features are going out all the time. It offers flexibility and cost optimization using spot and gravitron instances.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to scaling Kubernetes clusters without losing your mind or your money. In the next 30 minutes we are going to talk about the challenge of making efficient use of the ever changing options of compute type software by cloud providers. You can find today on AWS different options of powerful, sustainable, cost effective infrastructure. And there is a question. How do you tailor that to meet your Kubernetes workload needs while implementing automations to scale based on your business demand? The answer is carpenter. In this talk, we are going to review how carpenter simplifies and accelerates provisioning of cost effective infrastructure. My name is Yale and I'm a options architect at AWS focusing on compute services. In the past few years I've been working with clusters to help them make optimized and efficient selection of compute infrastructure for different kinds of workloads with offerings like EC two and Graviton, as well as specialized hardware based on their requirements while keeping their operational efforts to the minimum. In this talk we will dive into Karpenter, an autoscaling solution that helps scale efficiently the Kubernetes infrastructure. We will touch on the technical aspects of the implementation and the integration with other cost optimization techniques like easy to spot and graviton. So let's start with what we want to achieve with auto scaling. In other words, efficiency. We start from talking about scale, which is the obvious part. Pay only for what you use, provision just the right amount of resources that you actually need based on your business requirements. The next part is density, and by saying density, I mean being able to select the right compute option and bin pack the containers intelligently into shared resources to maximize efficiency. And this is the main part of the advantages that you can get by using kubernetes, but it's still not an easy task to do. The next part is flexibility. Flexibility is a requirement that is related to being able to take advantage of cost effective compute options you can find today. Different types of instances on AWS and usually more than one instance type can power your application needs, and there might be more costeffective solutions than others. The most effective way to get high amount of resources very economically is easy to spot the spare compute capacity of AWS. Another way to get higher performance with lower cost is the latest graviton processors that are based on the arm architecture. Being flexible between different instance types and options will allow taking advantage of spot and graviton in a way that will let you get more and pay less. Now you can notice that these three requirements are obviously related, being able to scale automatically when scaling choose the right instances that maximize efficiency, and that's what we want to accomplish. And in addition to all of these, we also want to minimize the operational overhead that you invest in order to be able to get to this goal. And if you're in this session, you're probably a DevOps or a platform engineer. You're in charge of operating production, test dev environments, and you have ton of tasks to do. When choosing a solution for auto scaling, one of the main requirements would be that it will be easy to implement and will require minimal effort from your side. So I talked about how efficiency translates into scale density and flexibility, which are, in other words, scaling the right opensource, using the most cost optimal resources as possible based on your container requirements. So let's talk about container requirements. They start from defining cpu and memory that should be defined as resource requests in your pod or deployment manifest. They also might need storage network, sometimes gpus. On the other hand, we'll have EC two instances that provide a set of opensource that will support the needs of your applications. The EC two naming convention represents the amount of resources that you get from each instance. The instance size, what we see here is extra large, represents the amount of cpu that the instance is providing, and the instance family represents the cpu to memory ratio and therefore defining how much memory we get. You can also find attributes that talk about how much additional resources you're getting, for example, disks or increased networking throughput. In this case, the G represents the graviton processor. You can also identify what processor you're working within each instance. Type now, in a Kubernetes cluster, the Kubernetes scheduler is in charge of matching pods to nodes and initiating preemption when required. But it's not in charge of node creation, pod creation, rescheduling, scheduling, or rebalancing pods between nodes. And here comes a need for an external solution that will perform the task of node management that complies to the same pattern that the Kubernetes scheduler has, as well as be aware of the cloud elasticity, the pricing model, and the infrastructure options in order to maximize the value getting from it. The question is, in fact, how do we scale EC two instances to support our application needs? Now, when working with Kubernetes, a common practice to scale nodes is using the clusters autoscaler. Cluster autoscaler is a very popular open source solution that is in charge of ensuring that your cluster has enough nodes to schedule your pods without wasting opensource. It runs AWS a deployment inside your eks cluster and it's aware of pod scheduling decisions. So essentially one of the goals is to bridge between the Kubernetes abstraction into the cloud abstraction, the auto scaling groups that are the entity that supports provisioning nodes for the application needs. Now let's take a look on a process of the scale up activity presented in this slide. It starts from a pod that is in a pending state due to insufficient resources. This is a good thing because we always want to run just the amount of resources that we need, and when we'll have more applications that need to run more containers that are being created, they will go into a pending state. Then the scaling solution, the Kubernetes cluster autoscaler, identifies that event and it's in charge of scaling up the underlying resources to support the requirements of the pod. How it works is that it's in charge of selecting the right auto scaling group that can support the needs of the spending pod. It increases the amount of requested instances in this auto scaling group and waits to get them back. When those instances are provisioned, they will run the bootstrap scripts, join the eks clusters, and then the pod can be scheduled. Now let's dive deeper into phase number two and three. When the Kubernetes cluster autoscaler reaches out to the auto scaling groups API, it needs to know how much resources it will get back. It works by simulating the amount of resources it expects to get from each one of the auto scaling groups it works with. So that enforces a requirement that each auto scaling group should run instances that have similar resources between the different instance types. So you need to run each auto scaling group with homogeneous instance types. That means that in your auto scaling group you can combine instances like c five to Excel, c six to Excel. You can put there the older generation of four. If you don't mind how much memory you're getting, you only care about the amount of cpu you're getting. You can combine C instances together with M instances or R instances, but you can't, for example, combine two Excel instances with four Excel because the cluster autoscaler would not be able to know in advance how much cpu opensource it's getting from each instance. The solution of cluster autoscaler to this problem is to replicate and run multiple auto scaling groups in your environment. So if you know that you have applications that require two excel instances and others require twelve Excel instances, simply run a lot of auto scaling groups. And every time that the cluster auto scaler will have a pending pod that needs a big instance, it will provision resources from the big auto scaling group. When it needs a small instance it will provision resources from the small auto scaling group. But this does bring a lot of challenges. So for one, managing a lot of auto scaling groups is tough because you need to update the AMIs and roll the instances and make sure you are maintaining every configuration there. There are also other challenges related to running application in multiav fashion for high availability applications that do have flexibility between different instance types and they just want to choose the most optimal one for them and being able to use spot capacity spot is the spare capacity of AWS and one of the main best practices in order for customers to be able to take advantage of spot is be able to diversify their instance selection AWS much as possible. One of the best practices there when working with spot capacity is be able to diversify between different sizes of instances. So for example be able to use four Excel and eight Excel. If you can pack your application to one eight excel instance, you can also pack it to two for Excel instances and so on. So this is something that we would like our autoscaling solution to be aware of and implement in an easy way. This brings me to talk about carpenter because Carpenter was designed to overcome these challenges. So similar to cluster autoscaler, Karpenter is an open source scaling solution that automatically provisions new nodes in response to unschedulable pod events. It provisions EC two capacity directly based on the application requirements you put in your pod manifest file, so you can take advantage of all the AC two instance options available and reduce much of the overhead that cluster autoscaler had. Carpenter has lots of cool features, but I'm going to dive specifically into the features that are related to managing the underlying compute. So carpenter is implemented as a groupless auto scaling, meaning it directly scales resources based on the requirements without the middleware of node groups. This provides simplification of the configuration and it allows you to improve the efficiency because different kinds of applications can run in shared infrastructure. It also improves performance because scaling decisions are made in seconds when demand changes. Even in the largest Kubernetes clusters, carpenter will perform an EC two instant fleet request based on the resource requirements. So if we recap for a second how we saw that the clusters autoscaler works, we first have some entity that creates more pods. They enter into pending state due to insufficient capacity. Cluster autoscaler will identify this event and will perform an API call to the autoscaling group that was already created by the administrator and the administrator already had to define what instance stack requirements you should include inside your auto scaling group and manage multiple groups in order to support multiple pod requirements. With carpenter this changes. You have carpenter right here consolidating the two phases that we had with cluster oil scalar and carpenter simply identifies depending pods and creates an API call to EC two fleet. This API call is custom made based on the requirements we have right now from our pending pods. So there is no need to prepare in advanced list of instance types that support this pod requirements and it simplifies the process a lot. So carpenter is implemented in Kubernetes AWS, a custom resource definition which is really cool if you think about it because it's Kubernetes native and you don't need to manage any resources that are external to your Kubernetes microcosmos. So the provisioner CRD holds all the configurations related to the compute that you want to work within the cluster. By default you can just leave it as is and allow the provisioner to take advantage of all the instance types available by EC two, which are more than 600 today. But if you want to customize that and you want to include or exclude something from your instance specification, you can also do that. The provision also allows defining other configurations like limiting the amount of resources provisioned by a workload in case you want to control a budget for a team for example, or define when all nodes will be replaced by putting a time to leave setting inside the provisioner. Now let's see how it actually works for different common use cases. So inside your Kubernetes microcosmos you might have containers coming with different requirements. These requirements will usually be managed by resource requests, node selectors, affinity topology spread. Carpenter will eventually select the instances to provision for the pods based on the combination of all these requirements, so it reads directly all these constraints that you can put inside your pod. Yamls you have different types of topologies that you can build with karpenter. So let's start from a single provisioner. A single provisioner can run multiple types of workloads where each workload or container can ask for what it needs, but it has the option to share resources with other applications as much as possible to maximize efficiency. On the other hand, if I want to separate workloads and I want to enforce them to run on separated nodes, I can do that with multiple provisioners and each provisioner can define different compute requirements. For example, I can have my default provisioning to use spot and on demand and use all the instance types available. And I can have another provisioner supporting only GPU instances for containers that require GPU and I don't want to share these instances because they are expensive. Another option is building prioritized or weighted provisioners if I want to use different configurations, but don't really separate it between the two configurations, but allow for example running 30% of my deployment on graviton instances and run all the rest on x 86 instances. I can do that with prioritized provisioners and implement kind of waiting. So inside a single provisioner, the point is to be as flexible as possible between the resources that can be consumed by the containers, so that carpenter will be the one that will make the intelligent choice of the right instance type to support the application needs. So what we see here is that we can have inside a single provisioning, use multiple instance types and multiple Aws, and I can have my deployment opensource topology spread between availability zones so that each replica is required to run in a different availability zone. Carpenter will be aware of this requirement and carpenter will be able to provision a node an instance for each replica in a different availability zone. Or carpenter will be able to run instances to run containers that require different instance types. For example, one container can request a memory optimized instance while all the rest can just run on whatever is available for it. One of the major ways that you can save on compute infrastructure is by using spot instances, and I already touched about it a little bit. So AWS offers different pricing models to allow you to choose the best option for your specific needs. Spot instances are the AWS pair unused capacity and it's offered in the same infrastructure as the other models at a lower price without any commitment. The only caveat is that whenever EC two needs that instance back, it will be able to interrupt it with a two minute notification warning. Now spot is a very effective way to get a large amount of capacity very economically. As long AWS, your application is aware of these interruption events and is capable of moving from one instance to a different one. So let's talk about containers. Containers are usually very flexible. If you modernize them, you went through the process of building them in a fault tolerant way. They are usually stateless. We have kubernetes and carpenter that can binpack our containers into shared resources. So we can use different sizes of instance types and different families of instance types or availability zones. And so containers fit really well to use spot instances. What's unique about carpenter is that it implements all the spot best practices which are listed in this slide. It's simplifies flexibility because by default it allows us to use all the EC two instance types that are available on the EC two platform. It uses allocation strategy of price capacity optimized that helps improve the workload stability and reduce interruption rates by always choosing the EC two instance that is from the deepest capacity pool. And carpenter also manages the spot lifecycle which includes identifying the interruption events, moving your containers from the interrupted instance to a different one, and making sure that we always choose the cheapest instance to work with. So this helps us get to the understanding that spot could be a very good fit for containers when working with Karpenter and you can tap into spot capacity and gain up to 90% discount. The next way to save on your compute is by using the Amazon develop chips Graviton I won't dive too much into graviton, but in two sentences graviton can provide you up to 40% better price performance. The list price is usually around 15 or 20% less than the equivalent x 86 intel instances and you can gain up to a lot more of the performance benefit depending on the use case. Graviton processors also provide improved sustainability, up to 60% more than more energy efficiency than the comparable x 86 instance processors. So why carpenter is a great system, a great orchestration system to work with graviton processors if you went through the process of building multi architecture container images, which means that you want to allow your applications to use both graviton as well as x 86 processors, Karpenter is able to combine graviton, intel and AMD together in a single cluster just by adding the support of the different processors in your provisioner. And then when carpenter will scale up an instance, it will be able to choose whatever is available in the lowest price. Let's say that you got a graviton instance, then your multi architecture container manifests will be able to pull the graviton container image and run on Graviton. On the other hand, if you got an intel based instance, the multi architecture container manifest will pull the container image that is suitable for x 86 processors. So carpenter really simplifies the combination and the usage of different processors inside worker nodes in kubernetes. So I'm going to summarize now what I've been talking about. Remember we defined in the beginning what is efficiency and what we want to achieve with auto scaling. We want to be able to provision just the amount of resources that our applications need. We want to densify them and be able to choose the right instance sizes that will allow for the highest bin packing, and we want to be able to be flexible with different purchase options. Instance types and instance families so that we can use the best price performance instances for our applications. Carpenter essentially provides the ability to accomplish all of those. It's compatible with native Kubernetes scheduling. It offers flexibility and cost optimization using spot and gravitron instances. And because all the configurations are built in with carpenter, you know you are scaling with the best practices so that you can gain the most out of your carpenter deployment. Last but not least, carpenter is a project in huge development right now and new features are going out all the time. Carpenter is an open source and you can follow code and roadmap on GitHub, and you can open issues directly to the development team to get quick feedback. So I really recommend taking a look on the Carpenter project on GitHub. Thanks so much for are listening to me and enjoy the rest of the conference.
...

Yael Grossman

Specialist Solutions Architect, Flexible Compute @ AWS

Yael Grossman's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways