Conf42 Kube Native 2023 - Online

Cost and Resource Optimization In Kubernetes

Video size:

Abstract

Have you ever worked in an environment where servers were just sitting around underutilized? What about in an environment where you requested resources and they weren’t available?

This talk is all about how you can optimize resources for Kubernetes clusters utilizing various tools and platforms.

Summary

  • Why should engineers care about saving money? Don't we have unlimited resources? Well, yes and no. A lot of it already works with finance teams. But even though we want to save money, we should not do that if performance is degradating.
  • On prem, we know that we don't have unlimited resources. There are limits in regions, there are caps. From a resources optimization perspective, you want to ensure that what you're using actually makes sense in your environment.
  • Scalability can come into play for both overutilizing and underutilizing. Don't spend if you don't have to. Ensure that your resources are optimized.
  • There are various tools in this space for cost and resource optimization. One is more of a managed service, like a SaaS in a sense, and the other is you're actually managing it yourself. Let's see how both of those work here and we'll dive into our Kubernetes cluster.
  • All right. Now the next tool is Stormforge and this is going to be a tool that's more SaaS based. We have everything from what's currently being used, what we can optimize our cluster information. And those are two tools that we can use in cost and resource optimization to ensure that our environments are running as expected.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
You. Let's go ahead and dive into what cost and resource optimization actually is. We're going to be diving into the theoretical piece of it first, and then we're going to get a little bit hands on and see a couple of tools that can help us with this. All right, so, first things first. Why should engineers care about saving money? Don't we have unlimited resources? Well, no. Well, yes and no. Let's just dive into that. First things first. Why should engineers care about money now? It's actually funny and ironic to me when finops and the whole cost optimization thing started coming out, because I can remember going back ten plus years as a sysadmin, even in the help desk space when I was setting up desktops and laptops and stuff like that, we always had to care about the finances. In fact, we typically worked pretty closely with the finance teams to say, okay, we have this budget. We can't buy this. No, we have to change this spec, et cetera. So it's actually ironic that a lot of it already works with finance teams. And funny enough, there are a few. I wouldn't say a lot, but there are some organizations where the director of it, for example, will report to the CFO. I've seen that a couple of times. So they're definitely in conjunction pretty well. And I would say it has always been this thing where it didn't make money, it always spent money. I think that mindset is definitely shifting in today's world, especially since it's obviously technology driven. Of course it is. I guess I'm dating myself here. I have some gray hair on the sides, if everybody has already noticed that. So, yes, we should care about money. We should care about saving costs. But I want to be very clear here. Even though we want to care about saving money, we should not do that if performance is degradating. Now, what do I mean by that? For example, let's say you have three worker nodes running and you want to save money. So you're like, oh, I'm going to scale down to one. No, don't do that, because it's going to mess up performance on the cluster. So you want to make sure that you're saving money, but at the same time, you don't want to be saving money if it's messing up your environment. Now, the next thing is, don't we have unlimited cloud resources? Well, on prem, we know that we don't have unlimited resources. If you want an extra server, you got to go get one. You got to talk to your reseller. It's got to get shipped, it's got to be configured, it's got to be put in the data center operating system, yada yada, blah blah blah. But in the cloud, no, we still don't have unlimited resources. There are limits in regions, there are caps. I forget exactly what region it was, but I would say about a month ago or so, I believe it was the azure storage account service, it ran out of storage. So in one of the regions you couldn't create a new storage account or add stuff to a storage account. So yeah, no, we don't have unlimited resources. So when it comes to resource optimization, really what we care about here is to ensure that what's running is needed, is necessary. For example, if you have 20 worker nodes and you've never had to use more than six, well you probably don't need an extra ten. You should keep an extra couple around just for scalability purposes, just in case you have a spike and it goes up to seven or eight. But from a resources optimization perspective, you want to ensure that what you're using actually makes sense in your environment. Because if it doesn't, whether it's from an application perspective, whether it's from a cluster perspective, whether it's from a network, from a storage perspective, if you're just spending money and overalllocating resources or underallocating resources for no reason, you're going to have a problem there. Now, speaking of under allocating resources, that's where scalability can come into play for both overutilizing and underutilizing. So from a resource optimization perspective, with scalability, I feel like we kind of always go to the, like we need to scale up. We need to scale up. Yeah, we need more nodes. Yeah, we need the ability to scale up, et cetera. But there's also the thing of scaling down, and you want to be able to scale down as well. That's arguably just as important as scaling up because guess what? Maybe you're in peak season, maybe you're an ecommerce site, cyber Monday, got to scale up. Maybe you need an extra two, three worker nodes. But guess what, six months out of the year, eight months out of the year, you don't need those two extra worker nodes. So because of that, you want to scale those things back down. Otherwise you're spending money for no reason. So cost and resource optimization both kind of come into play with each other. Now, speaking of cost optimization, don't spend if you don't have to. That's arguably the biggest thing that I'll say, don't spend unless you absolutely have to. There's no reason for it, you're going to lose budget, people are going to be angry, all that fun stuff, nobody wants to deal with it. So when it comes to cost optimization, ensure that what you're spending makes sense. Ensure that your resources are optimized. Because guess what? If your resources are optimized, cost optimization is pretty much just doing its thing in the background anyways. So you're good there. All right, so now there are various tools in this space. There's cast AI for cost optimization, resource optimization, there's Stormforge, there's Cisivio, there are even cloud specific tools in AWS and GCP and Azure for all of this cost and resource optimization. Now we can't go into every single tool here, but I want to pick out two for you, Socivio and Stormforge. And we're going to see what both look like because one is more of a managed service, like a SaaS in a sense, and then the other one is you're actually managing it yourself. So let's kind of see how both of those work here and we'll dive into our Kubernetes cluster. So the first that we'll take a look at is Socivio. So what you want to ensure is you want to have at least two nodes running, right? So the first thing that you're going to want to do is you're going to want to go to the download page and then what's going to happen is you're going to get an installation based on your operating system. So there are installations for Mac, Linux boxes, windows, et cetera. Right. So I'm on a Mac. So I've actually already brought the installation down. But what I could do is I can tar it and then I can actually run the installer. So if I CD into cost and resource optimization, I see that I actually have that installer right here. So I'm literally just going to go ahead and run it. As we can see, we get some terminal output, we have the ability to choose where we're running. So in this case I'm on an aks cluster, but if you're not, totally fine, of course. Next I'm going to choose my cluster name. All right, we'll use the default. Now in production you're going to want to set your domain suffix, but in this case this is a demo environment so I don't care, I'm just going to do example. We're not going to hit it from that domain anyways. Of course. And what we're going to see here is it's going to do the full installation, it's going to connect to the environment and then we're going to have the ability to see it via the UI. So let's go ahead and just give this a few minutes here and we can see that that was installed here. So what we're going to do is we're going to use the Kubectl port forward command. All right? And then we're going to go and we're going to hit this URL. All right? And then if I just go back to vs code here really quick, this is the password that we're going to use to log in for the first time. So admin and password. All right. And then if I zoom in a little bit, we can now see that Susivio is installed. Now, again, want to just point this out here. This is a managed tool that you're managing. It's not SaaS, it's not managed for you. It's really awesome and I love it and it's great and it has a lot of capabilities as we can see here. But you do have to manage it yourself. So definitely do just keep that in mind when you're getting this thing up and running. All right. Now the next tool is Stormforge and this is going to be a tool that's more SaaS based. So you're just going to log into a portal. So I'm going to go ahead and type in my environment name. All right? And as we can see here, we're going to go ahead and we're going to copy those helm values. What I'm going to do is I'm going to go to vs code here, I'm going to create a values yaml file. All right, I'm going to paste it in, I'm going to click continue and then I'm going to go ahead and I'm going to install via helm. Now I am going to make this change here because I just called the values file. Values yaml. All right, let's go ahead and run that and then we'll wait for our helm chart to install all the way it was deployed. But I'm sure there's some resources still coming up. Let's go ahead and check that. Oh, sorry, Stormforge system. All right. And as we can see, pods are still initializing and all that fun stuff. So it'll probably take a few minutes and then we'll be able to see everything in the portal, but we can just do a verify install here really quick. All right, we can see that that was installed successfully. Maybe it took like 15 seconds or so, 1015 seconds. So we'll click finish were, and then as we can see here is our portal. So we have everything from what's currently being used, what we can optimize our cluster information, the efficiency around our clusters and around our namespaces. Again, total current request, total optimized request. There's nothing going on here because this is just a demo cluster, right? But we can see all of our information here based on cluster, based on namespace, which is really cool, and then based on workloads that are running, we also have this optimize pro capability, which this is more of a paid piece here. Stormforge is a paid tool in general, and then we can click on that performance button and we can create some new performance testing, which is pretty cool. It's like benchmarks if we want to. All right? And those are two tools that we can use in cost and resource optimization to ensure that our environments are running as expected. Again, we have one tool, Stormforge SAS managed for you costs money. And then we have Socivio, we can use it out of the box. We do have to manage it ourselves. And with that, thank you so much for joining me today, really do appreciate it and I hope that you enjoyed the session.
...

Michael Levan

Consultant, Trainer, and Content Creator

Michael Levan's LinkedIn account Michael Levan's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways