Instant KAI Sandboxes with vCluster: Multi-Tenant, Multi-Scheduler GPU Sharing

Video size:

Abstract

Sick of GPU deadlocks and cluster sprawl? In 30 min we’ll spin up vCluster sandboxes, drop NVIDIA’s open-source KAI Scheduler inside, and watch fractional GPUs push utilization from 40 % → 90 %. Multi-tenant, multi-scheduler, zero downtime—copy the YAML and go.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everyone. My name is Zaki. I work at V Cluster Labs. We are the creators of V Cluster. We like Kubernetes so much that we put Kubernetes inside of Kubernetes. Today I want to talk about. Kai Scheduler or Kubernetes AI scheduler from Nvidia and how we can use virtual cluster technology to make scheduling easier and more multi-tenant friendly. The setup I'm currently running will be available in the description or in the video notes. You can run it yourself if you have an Nvidia graphic card and A GPO. I already created a cluster. This setup is a little bit involving. It configures the docker for GPO pass through installs all the components on local cluster. This of course can be run on any managed Kubernetes cluster wherever GPU workloads run. So let's talk a little bit about the scheduler itself. In Kubernetes, we schedule workloads on different nodes depending on various conditions. Kubernetes comes with a pre-configured scheduler that works out of the box for CPU notes. This has been a space. Heavily developed. You have tools like Carpenter or others to make the scheduling process easier. However, in GPU we would like to schedule workloads that are running on graphic cards that require GPU to run on the notes. The idea is the same, but the technology is slightly different. Nvidia open sourced Kubernetes AI scheduler or Kai Scheduler in 2025, and now everybody can use this scheduler to run GPU workloads on Kubernetes cluster. So first of all, this scheduler enables fractional GPU allocation, meaning that having a single GPU on a node can split the workloads such that each workload can just have a fraction of GPU usage. It enables queue based scheduling, so we can schedule workloads based on a queue mechanism that Kai Scheduler comes pre-configured with. It has various advanced features like note topology awareness and other elements. So first of all, this is a live demo and hopefully everything will work. So let's verify that we actually have access to our GPU. So indeed, I have pretty old NV graphic card on my computer one 60, and we just run this simple cube catalog command to verify that my computer actually exposes the GPU card. Into my kind cluster, which worked, and we can see it from inside of the Kind Kai Demo cluster. So with this out of the way, let's talk a little bit about V Cluster. V Cluster enables us to create essentially a fully fledged Kubernetes cluster inside of a port. An easy way to think about it is to imagine that V cluster enables containerized Kubernetes inside a port. So we have a team one here that run their own workloads inside of a cluster pod, but they share services from host cluster. So maybe we have third manager, Istio Ingress or others Istio Service Measure ingress controller or others, and the team can use it. However, we can also have a separate G cluster that install everything inside of the virtual cluster namespace. So there's a quite interesting use case here where we can have different teams using virtual clusters in different ways. So I cluster provides a full Kubernetes API server. It's a fully certified Kubernetes distribution like any other. We actually run Kubernetes inside of a pod to regular Kubernetes binaries. This enables us to have very flexible isolation in various multitenant scenarios. We're gonna talk about Kai scheduler in this talk, but obviously all kinds of other interesting multitenant scenarios are available as well. It makes the resource utilization much more efficient, which contributes to limiting cost and better managing existing resources. Instead of creating a full Kubernetes cluster like EKS or a KS you actually can use virtual cluster for various purposes because its provisions much faster than regular. Host cluster or Kubernetes managed cluster. You can use it in test dev ci, but also production wherever performance is important. So a little bit more about technical architecture. V Cluster creates a pod that encapsulates Kubernetes binary. So we have Cube, API server. We have a data store inside of a pod. This is where we start with open source, V cluster. It's of course very easy to. Connect V Cluster to existing Ed, CD cluster or other data stores outside of the pod for redundancy and high availability. We have components such as controller, manager, API server, optionally we can in install scheduler. We are going to obviously do this during this talk. And the key component of V cluster is sinker. Sinker enables essentially. Dynamic and con, highly configurable synchronizing of resources between the pod of virtual cluster and host cluster. This opens up all kinds of interesting scenarios, but for now, it's important to know that this works like that. So host cluster will provide. At the beginning, all services think of them as shirt, low level services. Again, we can have V cluster running in different modes and have their own services, but for the sake of the stock, we can think of host cluster as providing a base of the shirt services. So we have API server. Which is fully k Kubernetes compliant, which is about a hundred megabytes. So the whole virtual cluster port would be about four, 400 megabytes or so or even less depending on the installation. So the key insight here is that we use containerized control plane with intelligent resource sinking back and forth between the virtual clusters and host clusters. Little bit more about Syner. Syner runs in a reconciliation loop, so it's bidirectionally synchronizes resources running inside of virtual cluster, which means they talk to Cube API running inside of virtual cluster Pod and talk to QAPI running in the host. API pod Syner does things like making sure there are no name collisions, synchronizing various resources. All this is very highly configurable. We'll look at it later a little bit. Coming back to GPUs obviously AI is the current hype and. Most of the resources, most of the workloads that run on gpu, SRAI. But not only, so obviously we have model training, things like fine tuning, lms, deep training, base training. This is, can be extremely costly. And this will run. For weeks or months on end, depending on the model. And this utilizes GPUA hundred percent. So in this case we would have specialized equipment that needs to run on GPU. So that's one use case, stable diffusion image generation, but also inference. All of us now interact with ai. So whether we do this through API or web interface inference would also be mixed workloads based off of GPU and CPU resources. But it's not only ai. We have also video processing and all kinds of GPU heavy. Processing that still requires GPUs as well. Depending on the workload, it can be also to a hundred percent. We have Cuda development Jupyter Notebooks. All kinds of older workloads that Kubernetes can help us run. And finally, scientific computing with batch processing. And we can utilize GPU here very effectively also. So GPU workloads are not only about ai. Obviously we are focusing on AI on this talk, but there's a lot of legitimate use cases where we would like to run GPU workloads and use Kubernetes to help us schedule them and manage. So just for fun, we are going to deploy a very simple pod and we'll see how V Cluster can help us with synchronizing this pod. So this pod is just a really silly demo to show what we can do with GPU. So you can see now the port has been deployed and since this is a recorded talk, you will not unfortunately have access to this link. But if you will run this presentation by yourself, you will be able to also see this link. So we have used service called NRO to essentially create this pod. And here we can scan the QR code. I will do this on my phone and we are gonna have some fun here with the demo in a second. So now for the demo, we're just going to see how GPU scheduled web application can run AI inference. So here we have different kus For now we. Didn't generate any kus. This is my favorite topic site reliability, engineering. I'm generating on my phone haiku now. And we can see AI is helping us create this various horror stories, haiku, like poems. Obviously this is mix of inference from ai, but also runs on my kind cluster utilizing GPU as well. So it's not that hard to also mix and match various resources as well as various types of workloads. Wrong context. Pots vanish into devoid 3:00 AM despair. That's very relatable. All perfect. So this was the G Ps. Let me just swap to this real quick. So now I. Let's talk a little bit about how would we upgrade schedulers in production. If you have host cluster set up and you use any scheduler, we are obviously talking about the Kubernetes AI scheduler. You will need to upgrade it at some point and it will it can be a significant risk when there's a lot of teams depending on the same scheduler version, so testing, it must be. Significantly more disciplined. We need to have a very good rollback. Procedures and teams are often blocked on a single scheduler. Maybe there is a new version of, in our case, k, a scheduler that the teams are waiting for. And we are going to go through the upgrade process. So let's talk a little bit about risk. So if we have a single scheduler on a big cost cluster where we share the same scheduler across multiple teams, if there's a bug in a new scheduler version, obviously all the polls that depend on it will crash or pending being pending state. And the recover time depends on, what are we going to do? What is the problem? So it might be a few hours, it might be more we, at this point we might be able to revert the change and be okay with it, however because we are using host cluster deployment. We use typically CRDs for managing the installation of the scheduler and related operators. If this goes wrong and there's a namespace corruption or CRD corruption. This can be very costly. So I've seen scenarios where CRDs of operators that using Kubernetes manage external resources like databases or infrastructure were incorrectly upgraded. And this resulted in data loss or, some older disasters. So this is a critical risk scenario that we have to take under consideration. And of course, the bigger the cluster is the larger the amount of teams that we are serving, then the more critical this becomes. Version mismatch is similar to scheduler bugs. Maybe our pods require a feature that is in specific version, so we have to test all of them before creating an upgrade. And finally maybe a new version has a resource leak, something that is difficult to detect instantly, but after it will be deployed, we will maybe detect it in a few days or hours, depending on our observability setup. So according to New Relic 2024 report, the enterprise downtime cost can be. Up to 1 million per hour, obviously, depending on the size of the company. So we definitely want to avoid having upgrades of critical infrastructure like Kubernetes schedulers and hopefully be able to compartmentalize it and isolate it. So that's where can help. You can think of virtual cluster Pod as something that encapsulates your scheduler inside of it. So we install special we set up special setting. You will see it in a moment. And what it results with is that we can start from single visual cluster, we can schedule some pos. Just using this very isolated and small blast radios and perform AB testing, making sure that the new version of the case scheduler in our case is correct and everything works well, which also limits the cost of the testing infrastructure significantly because we can have a very small cluster and just have several virtual clusters kind of testing various scenarios that are important for our business. So the idea here is that virtual cluster will create an isolated Kubernetes cube, API and other machinery including Kai Scheduler inside of it. So we don't need to additional host cluster for testing at all. We can do this right in production because it's all isolated. So one more time, reiterating on the risks. We can obviously there are other risks, like we slow down innovation because the teams must wait on the upgrades. And all of all upgrades in Kubernetes are pretty costly and stressful. So how can we cluster help? I'm going to run this in the background and. To talk a little bit about how V Cluster uses the configuration file that we see here at the Top V cluster essentially provides us a very highly customizable way of synchronizing various resources back and forth, but also creating virtual clusters in a way that serves our needs. So this first group of settings. Make sure that if we want to use a host cluster scheduler, like in one of the demos we want to use the set owner flag, which essentially tells Kubernetes or V Cluster specifically not to take ownership of the deployed pulse in the V cluster. This is how Kubernetes works by tracking ownership plugs. In the second demo, we want to make sure that our control plane, which is the Kubernetes API and Kubernetes machinery that we deploy inside of virtual cluster Pod also has virtual scheduler enabled. And thanks to this virtual scheduler enabled, we can use Kai Scheduler. And you, as you'll see later, we need some other settings. We need to be able to synchronize nodes from host and also runtime classes so that we can synchronize n video runtime class as well as node labels. So we will see how, we can use it just to target only special nodes that are labeled with the GPU so that our future cluster synchronizes those things correctly. So the benefit is obviously that we will now have independent Kai scheduler versions, so you can think of it. Each team can come with their own. Scheduler version. We have an older version here, seven 11 or nine two or others. So that's already a huge benefit. Now we completely isolate scheduling so we can target different no pools and different nodes. And we achieved sch autonomy by teams. And this of course. If you add a little bit more advanced use cases that VLA supports, you can schedule on all kinds of infrastructure and you can support you can support very flexible multi-tenancy scenarios. Let's see if this works correctly. So we still have some problems. Which image? This is probably will take a moment. Because of my connection. Oh, now it's running. But you can see we are inside of virtual cluster here and we have actually created. This new virtual cluster with all the settings. So we don't have any demos here. We don't have any pods running. But I just wanted to show you that this works. So this command Create Kai Isolated, created a new virtual cluster using those settings that you can see here. And I opened Canine s, which is a, just a terminal Kubernetes viewer to show you that we are connected currently to the virtual cluster. Okay, so now if we refresh this, you can see that we are connected to our virtual cluster. It's a pretty long name that we cluster builds based off of various variables, but we are essentially running inside of my kind cluster. There is an actual virtual cluster running there. So just a quick recap. We cluster components. We have an API server, which is a normal Kubernetes cube, API. We have a syner, which enables us to be directional syncing, and it's at the heart of how virtual cluster work we have. A. Our complete isolated backend. In our case, it's SQL Light. As I mentioned earlier. You can also use set CD and you can also use external backend so that we have high availability and redundancy. And in our case, we are going to soon install the Kai Scheduler for independent scheduling. So currently our virtual cluster. Which is this pod uses 370 megabytes and 4.6 media CPUs, and we have the database right here. Okay, perfect. So now we will make sure we are connected to RV cluster and inside of it we are going to install the Kai Scheduler. So let's do this right now. So you can see we have connected to virtual cluster. It gives us friendly warning here that there's a newer version, but we are already operating with the correct version. So what I am doing now, I am installing the same Kai scheduler that we have on our host cluster. But I'm installing it inside of virtual cluster. So we are moving one level of abstractions up, and I'll show you in a second how it looks in a pod. But from our perspective, you can install inside of the cluster anything. It's a normal Kubernetes cluster. You can install a web app like we've seen earlier. You can install scheduler like in our case or other components. This is done and now we can see how it works. So we've connected again to our virtual cluster here, as you can see by the context name, and you can see that we have now several components from Kai Scheduler, which is Port Grouper Q Controller, and others running inside of our review cluster. Okay, this is great. So let's do a quick recap. We now have. High scheduler inside of virtual cluster with A GPU sharing. So we should be able to install some kind of GPU demo and make sure that we are actually able to schedule pods from this demo on our virtual cluster. So how are we doing it? There are several things that we have to configure inside of our pods, so you can see here. First of all, we are labeled it with appropriate queue from Kai Scheduler. I have created it earlier. So this is Case Scheduler internals, but this one is interesting. Inside of Case Scheduler, we use fractional GPU scaling. So in this case, 20% of GPU can be taken by this pod. So GPU Demo two and 10% of GPU can be taken by GPU demo one. And another important setting here is schedule name. We are selecting case scheduler and runtime class and video. So those bots don't do anything interesting due to sleep, but we are able to schedule them inside of the cluster. So let's give it a try. I'm going to apply those things and we are currently connected to a virtual cluster. I'm making sure I'm applying the queues, and I just applied those two pos that we've seen earlier, and as you can see, they correctly take the fraction of the GPU as we configured. So this was pretty easy. So now let's imagine that we want to. Change our Kai Scheduler version or maybe we did just a simple experiment. We install all those pos, we tested it, we've installed Kai Scheduler and we are done. So v clusters are very ephemeral if you want them to be. It's very easy to create a V cluster pod. We've seen this is around 300 megabytes and a very small fraction of CPU. And you can quickly delete them if you're done with them. That's why they're so helpful. When you do AB testing or when you upgrade various components, you can use virtual clusters to test behavior. And here we can see in 39 seconds, we were able to completely remove virtual cluster as we are done with our testing. So let's demo one more thing. So imagine you have, you're supporting maybe machine learning team that needs a specific new Kai version. But the research team is still stuck on the old one. So you definitely don't want to upgrade the host cluster scheduler version. And also dev team uses the default host cluster scheduler. They don't really care about any version here, and we are able to support. All three use cases simultaneously. So first of all, let's create two new virtual clusters. We are creating one for team stable and one for team beta. Both of those virtual clusters are going to be separate pods, which you know by now, and we just have the same configuration, just with different names. Okay. So now if this is done, we can actually connect to each of those clusters and we can install Kai Scheduler version different per each cluster. So in our case, team stable have 0.7 version, and Team Beta has 0.93 and we are now. Doing like an AB testing. We are creating two separate scheduler versions and we are, once this is done, we are going to disconnect from the virtual clusters and we are going to explore how it looks on the host cluster itself. So while this is still running it'll take a moment to install and we will soon be able to, test the virtual cluster test the host cluster and see what happened here. So this is still going, and we can now connect to each virtual cluster that we deployed and we can create the same demos that we've seen earlier. So pod one and PO two. We can of course, imagine that those are different pods. Maybe they have different allocation here. Maybe we are using completely different allocation strategy for other V clusters and so on. Other virtual clusters and so on and so forth. This shows the flexibility that virtual cluster provides us alongside the Kai Scheduler and can give us this really nice middle ground for supporting various teams without having a headache of upgrading the Kai versions or other components. So this is done and the result of this is that we have currently two V glasses running. One team better. One team stable. They both run on 0 27 virtual or cluster version, and we've achieved isolation that we talked about. So let's look at how it is actually on the host cluster. So remember, our host cluster is just a kind cluster running clock on my machine, but this can be. Any cluster managed cluster bare metal future cluster does not require any special setup. It would work anywhere because this is just a Kubernetes distribution. So here we can see the future cluster pods. So we cluster team beta in, team stable, they're running, and those pos are going to. Our cube, API and if somebody wants to connect whether using cube kettle or V cluster CLI or just consume web app or any other workload virtual clusters running here will provide the isolation level. And here we can, of course, other components from the previous demos as well. What we have achieved, let's bring it home. From my perspective, I think as platform engineer or SRE, that's where I've worked for most of my career, and I know that. Upgrades and updates are always painful, especially in distributed systems like Kubernetes. Those things are not glamorous. They are behind the scenes, but they have to happen and they are very stressful and often costly. They require a lot of planning and errors or risk that they carry can be very significant to the business. So testing, scheduler, upgrades. We, instead of creating a new host cluster, maybe a new EKS or having a test environment that we have to pay for, which would typically take several hours, we limited it to few minutes when we were able to create a separate virtual clusters and install scheduler there while configuring V cluster in a specific way. So we've reduced the risk literally to 0% because if something doesn't work in our future cluster, we don't care which is deleted. And there's no impact on other production workloads. All backing changes we've seen from potentially several hours. We were able to lower this time to about 39 seconds. We just deleted the virtual cluster and nothing happened. And my favorite, we've enabled very strong AB testing. So we can now see which versions work. We can gradually introduce them and we can split this IB testing across time as well. So from pretty high risk if we would be using if we would be upgrading scheduler in place, in host cluster, we went to a very low risk or almost zero per team schedulers. This is obviously. Multi-tenancy benefit. But other than that, we were able, we wouldn't have to essentially talk to the teams and making sure that we have this middle ground version. We can essentially. Satisfy each team's need for specific virtual cluster and Kai scheduler version, which is super powerful. And as is as an SE I'm always amazed that this works and makes this blast so small and in the same time enables me to serve the teams that I'm taking care of in exactly the versions and software that they need without impacting others. And we've also seen that even our simple demo can use proper GPU sharing and you could run it at home if you have a powerful enough computer on environmental, or of course, in the business application at scale as well. So let me just change the settings real quick. A little bit of resources here. You can go to v cluster.com/docs and there's a dedicated article about using Kai Scheduler with V Cluster, or you can explore order configuration. There's also a case scheduler page on GitHub as well. And the specific integration with Kai Schedule, as I mentioned earlier, please join our Slack. We would love to have you and talk about Kai V Cluster or any other topics related to multi-tenancy and Kubernetes. You can connect with me on LinkedIn. I would love to hear your experiences. With similar setups and how you're using whatever tools you're using to help yourself. We have also Office Hours, which is an event hosted by our marketing team which can give you first experience with future Cluster and it's technology. And this is in our Lage events page. Perfect. So that's it. I hope you enjoyed it. Again, please feel free to contact me on LinkedIn. I would love to chat with you about virtual cluster and related technologies, multitenancy and other things. Please feel free to reach out. Thank you everyone and have a great rest of the conference.

See all 53 talks at this event!

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Instant KAI Sandboxes with vCluster: Multi-Tenant, Multi-Scheduler GPU Sharing

Video size:

Abstract

Summary

Transcript

Piotr Zaniewski

Head of Engineering Enablement @ vCluster

Join the community!

Featured event

2026

2025

Info

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Instant KAI Sandboxes with vCluster: Multi-Tenant, Multi-Scheduler GPU Sharing

Video size:

Abstract

Summary

Transcript

Piotr Zaniewski

Head of Engineering Enablement @ vCluster

Join the community!