Machine Learning Meets Kubernetes: Orchestrating AI at Scale: Orchestrating AI at Scale

Video size:

Abstract

Unlock the future of AI with Kubernetes, In this talk, we’ll reveal how to supercharge your machine learning workflows, scale models effortlessly, and deploy cutting-edge AI. Whether you’re driving innovation or conquering data challenges, discover how Kubernetes takes ML to the next level

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello. my name is Prashant. I work as a principal member of technical staff at Salesforce. and I'm going to talk about how to use Kubernetes for machine learning. so let's get started with it. so this is, this will be the agenda of our session. We'll be talking about, the introduction. Then we'll be talking about why water is Kubernetes and why we need Kubernetes. We'll also touch upon the concepts of the machine learning lifecycle and the benefits of using Kubernetes for machine learning. And then we'll try to discuss some of the core concepts of, Kubernetes, especially from a machine learning point of view. we'll also try to understand some of the core platform tools that have been developed on Kubernetes, which are very specific to machine learning or deep learning, where we'll be talking about cube flow to be very specific. Cube Flow is just not a framework. It's a platform combining multiple different tools together. We'll just look at, a simple training example that, we can use, on Kubernetes with cube flow. then we'll be talking about some of the best practices and real world use cases, and we'll be ending the session with future trends. it's going to be a little bigger session. I'm going to concentrate on a lot of theoretical concepts. hopefully one day I'll get a chance to, present my workshop with Con 42. and having said that, let's move on. as most of I think by now, with all the chat Gpt, Claudes and other Lums out there, there is an AI revolution going on and with this create AI revolution. there comes the need of great infrastructure too, and the optimal usage of infrastructure. saving the costs at the same time, saving the energy at the same time, and doing things more ethically. And, we will talk about, how we can do this with Kubernetes, how we can do all these things that I've been talking about with Kubernetes and why Kubernetes is a pivotal concept or a pivotal framework, or a pivotal platform for doing machine learning or deep learning. just to revise a lot of bottlenecks that happen with machine learning and deep learning. The traditional infrastructures that a lot of these organizations have are, face, a lot of issue when they need to scale up or scale down. especially in scenarios where there's a huge traffic that is coming to them. now how to handle this and all is what we are gonna talk about this se in this session. let's move on to the next slide. what does Kubernetes, I think most of you who are listening to this session might have already worked on Kubernetes, or you might be new to Kubernetes. But yeah, for someone who is really new to Kubernetes, Kubernetes is an open source container orchestration platform. that's a bigger word, but if anyone of you have already worked with Docker, you already know about containers, but Kubernetes allows you to take the concept of the containers to next level by automating a lot of things. It introduced the concepts of, it introduces the concepts of deployments, rollbacks services. Stateful sets, demon sets and other thing, a lot of other things into it. but what does Kubernetes bring to the table? By, by orchestrating this containers, the first thing is it simplifies the process of running your applications right? You can simply run your application by packaging it as a docker image and writing a couple of VL files. It is also very reliable in the sense that when you deploy your application to Kubernetes, it takes care of it, it does the autoscaling, it does the horizontal autoscaling, the vertical autoscaling. it takes, it makes sure that, it tries to keep your application as much as possible and scale it appropriately. It has, it does container rollbacks, container restarts, container, crash. Look bags when a crash look back is happening when you're coming. What I mean by crash look back is when your container is failing for some reason, it tries to bring back the containers. So the key features of Kubernetes, although I'm not trying to list all of them here, given the space constraints and everything else, is it. It supports automated deployments. You, there are several tools out there like Helm, customize several other things that allow you to package your, package your deployment, and allow you to do different kinds of automated deployments. It allows service discovery through its services. It has a built in load balancer that is service. It comes in the form of a service, as it spoke about, it has self-healing and fall tolerance, so it does, it make, makes sure that it maintains a minimum number of replicas. and it also brings up the new number of containers required. Minimum number of pods required. POD is the basic unit of operation Kubernetes. and finally it takes care of resource management and scaling. I. Before we go and talk about why machine learning has to be done on Kubernetes, or what are the benefits of doing machine learning or deep learning? When I say deep learning, everything from neural networks to self attention to transformers. why we should be doing that on Kubernetes? We'll just try to quickly understand at a very high level of how a machine learning lifecycle looks like. And machine learning lifecycle starts with data, right? You get the data from your customers or from your, from whatever data sources you have. You try to clean that data or prepare the data for training, right? Preparing data is a multi-step process, but, but at a very high level, preparing data involves scaling your parameters, bringing them all to the same scale. converting the alpha numerics or alphabets into numerics and so on, right? So that's what the data phase is all about. And then you do the training on the data, right? So you choose the appropriate. Training model to train your data. And there are several hundred algorithms which are available out there. Platforms like hugging Face host a ton of models that you can use. then you perform the model validation, right? You check for the performance accuracy, the confusion, metrics, everything else, right? You check for how well your model is performing. And finally you deploy a model into production, for deployment, also for model deployment also. There are several tools that are available for deploying your models into production. and once you deploy your models into production, you spend time monitoring it, right? You try to monitor your model, and then you repeat the process back again. You keep fine tuning your models every day with newer data or you have a chron schedule. And according to chron schedule, you keep refining your data. Now let's talk about the problem, right? So what is the current problem? What is the problem? If you don't use Kubernetes, or if you're trying to do machine learning on your own massive, massive machine learning on your own first party clusters. Now first thing is data volume, right? as more and more data as you start saving more and more data, obviously there's a lot of strain on your infrastructure. You have restore this data somewhere, you transform this data somewhere. Now you can always provision the entire infrastructure at once. Keep the infrastructure empty and just do it. based on, based on the kind of, the volume of data you receive. You can also end up having a very small infrastructure and you end up receiving large volumes of data. All right. And how does Kubernetes solve it? Let's talk about it in the upcoming slides and the computational intensity, right? a lot of, deep learning models on neural networks today required a GPO. Having said that, would you really buy all the GP ahead and keep it on your first party data centers or anywhere, or anywhere on the cloud in the form of EC2 machines. So that, that's a lot of, lots of cost to your company, right? So how do you handle this computation industry? There are lows and highs. During the peak periods, you might require a lot of GPU, a lot of CPU, but during the non-peak period, you don't require a lot of GPU, a lot of CPU. You wanna save your cost. There's also deployment complexity, right? So when you deploy a machine learning model, you need to have an inference endpoint. You need to have a proper influencing backend where you deploy a model. So again, the. During peak periods, how do you handle it? You have to, everything has to be scaled very efficiently. And finally, the co, the cost involved in all these kinds of resource management that we have been talking about, right? so we have to efficiently allocate resources to all these workloads. Which is very important so that you can maximize the performance and you can reduce the cost at the same time. Now, this is a general problem with any scaling machine learning workloads. It does. I'm not saying that Bernet is going to solve all these problems for you, but Kubernetes allows you to make right decisions and it provides you several avenues by which you can easily solve all these problems. and then, yeah. now a question about why I should use Kubernetes for machine learning, Um. the first thing is scalability, right? As you saw, as you're speaking about, you can dynamically scale up and scale down. we'll have a couple of other slides which talk about benefits, but at a, roughly, at a very high level, scalability, resource management, portability and automation are the four biggest things, according to me, which Kubernetes provides you. Now let's dive deep into each one of those, right? So first is scalability, right? as I said, during the peak periods, you might want to scale, you might want to auto scale. autoscale vertically, you want to, you might want to autoscale horizontally or you might need to introduce GPUs right now. Kubernetes with Kubernetes, right? You can have, for example, if you're using E case cluster or something on, Google Cloud or something on Zoom, You already have this cluster at your disposal and you already have this nodes at your disposal, but depending upon your cost model, you will be charged only if you use them. So during your peak periods, you can use the GPU. Kubernetes makes it very easy through its Yamal file and a couple of annotations it. Easily allows you to target by using tames and tolerations and Kubernetes allows you to target the right nodes that have the GPU. It allows you to perform auto scaling. It allows you to perform horizontal scaling. You can set several, you can set several parameters, several attributes based on which you want to perform the horizontal scaling. In fact, there are. Third party CNCF. platforms are tools like kda. Kda is Kubernetes event driven autoscaling, so it allows you to autoscale based on, different kinds of factors. Okay. Now, from a resource management perspective, how does Kubernetes help you? If your organization of your company is already using Kubernetes, you might have a bunch of microservices running on your Kubernetes already, then you are you. You're an apt company for running machine learning workloads on your Kubernetes. Kubernetes has the concept of resource quotas that allow you to rested the amount of resources you can use. When I say other source, CPUs, you can use. Per namespace. if you're a large, very large organization, you can segregate, a co. You can segregate the groups inside the, into sub organizations or groups inside the organizations into namespace. For example, your finance and sales could use different namespace. They can keep submitting the machine learning workloads to their own namespace without encroaching the others. Others, others, resource quota or others resources, right? For example, finance will not go and use the, sales resources and sales will not go and use the finance resources, as I said earlier. You also have node selectors things and tolerations that allow you to target specific nodes, where, specific nodes, where you can perform machine learning or deep learning. And when you require GPUs, right? that is what that is. That's how Kubernetes helps you from the resource management perspective. from a portability perspective. So Kubernetes, you write all, whatever, what Kubernetes functionality you're going to write, you're going to write in terms of YAML files. If you're going to define your cus own customer source definitions, you are going to use co files. Now these are pretty much. Independent of the cloud or wherever you're going to deploy them, let it be on-premise, cloud, or hybrid. You'll still be working with same set of YAML files, the same set of customer source definitions, which themselves are YAML files and a combination of co files. So there's a huge portability if even if you shift your cloud provider or if you want to move your on-premise, on-premise Kubernetes cluster. There's a huge portability when you work with Kubernetes. Okay, so having, we have discussed some of the, discussed some of the basics of Kubernetes, but, let's try to understand, some of the core concepts of Kubernetes. And these are very core. some of you might already know this. I. From a machine learning perspective, there are four things that we need to do know about, Kubernetes. Having said that, this is not the entire list, I'm just trying to put the four important. There are several other, built in resources that Kubernetes provides, and there are several hundred projects in cloud native compute foundation, CNCF, that work around the Kubernetes ecosystem. For example, the Orgo rollouts, which allows you to roll out your, which makes it rollout deployment where the rollouts very easy or rollbacks very easy. There are. Hundreds of, CNCF providers out there who built a lot of, resources around machine learning and Kubernetes. But for our session today, we'll discuss the four most important things, the pods, deployments, visas and namespace POD is the smallest deployment unit in Kubernetes. as, and deployments are the ones that bundle all your pods together. They allow the pods to be, they decide the number of replicas. a pod can have. They allow the auto scaling, horizonal, auto scaling, and vertical scaling. the services are the ones that expose your deployments to the outside world. And obviously we spoke about namespaces earlier. They allow you to logically group, your logically group your resources, which just by improving the isolation and as well as the security, it also allows the proper resource management, of your Kubernetes cluster. So four things, pods, deployment, services, and namespace are the basic things, at least what you want to know when you're working with, Kubernetes, and machine learning. As I spoke about pods, deployments and services. These three comprise the three main things, that, Kubernetes, these are the fundamental things that anyone should know when working with Kubernetes. And of course with the machine learning workloads that we are going to discuss about. Now there's one more thing which is really important, right? Kubernetes has this concept of persistent volumes and persistent volume classes and storage classes. So these allow you to store persistent data, when you're trying to interact, when you're trying to interact with Kubernetes classes. for example, if you have submitted a machine learning job to Kubernetes cluster. And it, you want to do some kind of checkpointing at some time, right? You want to store your intermediary model or intermediary, parameters. that's where the persistent volumes, persistent volume claims and storage classes shine. And they're so easy to use that it's like. Like any other, sourcing Kubernetes, it's nothing but a Yamal file. And of course, there's a vendor tie up, right? But that is handled by annotations. And same goes with storage classes, right? These are the concepts that allow you to store your data. When machine learning workload is running, when you want a checkpoint, when you want to store your parameters, when you're actually running your machine learning model. So all these, all these three help you to maintain some kind of persistent state when your machine learning workload is running. In fact, these three are not very related to machine learning workload. They're related to. They're also related to any kind of, any kind of, um. workload that you're trying to run on Kubernetes. Okay. We spoke a lot about, Kubernetes. We try to understand what is, what are the basics of Kubernetes. We try to understand what are the basics of machine learning and, why we should be using Kubernetes for machine learning. Now you can go in and build your own framework on Kubernetes, and you can run your machine learning. Workloads over there. But there are several proven, tools out there that allow you to run your machine learning workloads on Kubernetes. They just don't allow you to run. They also allow you to visualize, do an auto ml and several other things. Very easy. And now one of those very, very famous tools that is gaining traction these days is Q Flow. It has been there for a while, but it's gaining a lot of traction with the, with the new LMS and foundation models. Is Q Flow. So Q Flow is an open source machine learning toolkit, which simplifies the deployment and management of machine learning workloads. So it is specifically targeted for Kubernetes. It has several key features, Q Flow, and we'll be talking about each one of them in the upcoming slide. It has several key features. It has training operators, built-in training operators. it has its own built-in serving infrastructure called as Kerv. It has a Python library that allows you to write pipelines and it also has a model storage model. It also allows you to store the models, and several other benefits, right? There's several key features that we'll be talking about in the upcoming slides, as I said, instead of building your own framework. There's an open source initiative, that came up from a lot of successful projects. and, Q Flow makes it very easy to build and deploy and manage your machine learning models, and it accelerates the AI development life cycle, Q Flow. Again. Q Flow is a game changing platform for machine learning, on Kubernetes. Okay, now we'll try to talk about each one of the components on Q Flow. Unfortunately, I couldn't put a lot of examples here in the slides, but, definitely as I said, we'll have workshops so in the future. But, what are the different kinds of Q flow components? The first component of Q Flow is training operator, right? There's a built in training operator in Q Flow. there is, there several building. You can run a TensorFlow job. You can run a job, you can run ANet job. So these training operators allow you to specify your training. In the form of, in the form of a YAML file. Now it's a YAML file that could run across any cloud on-premise. as, as long as it has, it has a Kubernetes cluster to run on it, you can just interact with the training operator since it's a YAML file that Kubernetes understand you can interact it with using by your cube cuttle file. The training operators is one of those things which easily makes, which easily makes, running your, um. running your training, you just need to follow a specific contract and create an image out of it and create the yamal file out of the image. The second important component is the q plus serving. So once you have built your model and you've trained your model and you've validated your model, you want to deploy your model somewhere. And that's where the Q plus serving comes into picture. It has a standard Python, API. It also has the Yamal files that you can use to expose your Q Flow serving. And there's also something called a model mesh that provides high performance, scalable model serving with intelligent routing, right? So the it's, and it's beneath the KF serving, but on the top you have q plus serving, which allows you to serve your models very efficiently. so Q Flow pipelines, right? So this is the third most important model. In the Q Flow pipeline, there's nothing but it's a Python. It's a Python framework that allows you to define your machine learning workflow, orchestrate it, so obviously you try to come. It allows you to define your entire machine learning lifecycle in the form of a Python script. Having said that, it also has a YAML file. You can do it, the YAML file also, and it allows you to track the performance also. But you, when we say track, you can also actually do an auto ML by using something called a s CIB in Q Flow. And, but we'll talk about PIL later. But this Q Flow pipeline is a third component, which plays a pivotal role in writing your entire pipelines end-to-end pipeline in Python. And then you can use, you can deploy the Python pipeline and run the Q flow. Okay, so what does the lifecycle of, a machine learning workload look on Q Flow? We define a Q Flow job. In this case, I took an example of a TensorFlow job, and you can define it by using a Python script or a YAML file. We submit the job to the Kubernetes cluster. We monitor the progress by using the Q Flow ui, and we finally deploy the model by using the Q Flow serving. And, we deploy the model. And of course, once you've deployed the model, you can actually, use the model for inferencing by using the Q flow serving. So this is the data addition steps that we have already discussed. it involves, working with data sources. Then, you fetch, you get the, all the data. I'm just trying to go through step by step guide of how you want to do the machine learning on your Kubernetes. you start with data sources. You fetch data from different data sources. You do different kind of data transformation on it. You try to clean the data, remove the alphanumeric characters, alphabetic characters, scale everything to one level, and then you finally come out with a set of features, right? That's what is called feature engineering set of features for your machine learning model. Now then you do the machine learning, then you do the actual training. You define the training job, you specify the resources that you want to use for the specific, training job. Again, the training Q Flow has its own training operator that you can use for, performing the training. You specify the resources and find, you validate the model by using the Q Flow pipeline components. and at the end, once you have the model ready, you. Try to serve the model by using the KF serving. I just went into some of the internal details. You need to create an inference, service resource to define your model serving deployment. You specify the model, especially where the model is located. And finally, you deploy the model by using the KF serve, which provides your rest endpoint, or this is the rest endpoint you can reach out to through your Kubernetes service. A gateway, which is associated with the cloud, which with the cloud, with the cloud gateway, right? So you can reach that, reach from an external, source to your, to your serving endpoint on Kubernetes cluster. So what are the best practices for optimizing machine learning on Kubernetes? one, one thing is there's always, you have to choose the right, right resources, right? and again, this is not something you can come up at one shot. You have to experiment and play around with it to choose the right kind of resources, right? And. Use the GPUs, right? Optimize the usage of GPUs, right? So having said that, don't make all your workloads run on GPUs. You can optimize, the GPUs by using proper teams tolerations on node selectors. and you can just run those workloads, which really require the GPO, right? For example, the deep learning with billion of parameters or billions of parameters. And finally, you need to keep monitoring your monitoring the performance of your model. And there are several ways we can monitor the performance, right? You can keep constructing the accuracy, you can keep constructing the confusion matters as precisions, recalls, et cetera, to keep, you keep monitoring the performance of your application and you keep training your model, on a regular basis. Okay, so what are different resource allocation strategies? I want to have a separate section for this, especially because, we have to understand what are the right resource allocation strategies. Now, one of the. One of the basic things that you can do is you can have static resources, right? You can have always hard code, your resources, but there is a concept called a dynamic allocation that allows you to allocate your resources based on the workload and find there is autoscaling, right? You can do horizontal autoscaling or vertical autoscaling depending upon the number of factors, right? You can also use some of the resources from CNCF that allow you to do the auto scaling based on external factors like your queue sizes. Your database sizes, some event that happens outside of your Kubernetes, Kubernetes, cluster. So there are three different ways. Basically we can do autoscaling, the static allocation, the dynamic allocation, and autoscaling. Okay, and we already discussed about this. Keep having your metrics. Keep pumping your metrics. Keep looking at your metrics. Have proper alerts around your metrics. Collect and analyze your logs, right? Make sure that you collect and analyze your logs so that you can catch any potential problems. have metrics or key performance indicators that keep monitoring the health. Of your ML applications, and if something degrades immediately give an alert so that e even before it starts degrading, you can have warning alerts, you can have critical alerts so that your teams can become, can actually have a look at it and, and try to resolve the issue before it becomes a real production issue. what are the different security considerations? I know till now we have spoken about, the other kind of things, apart from security, right? So we have covered every other aspect except the security now. Again, since you're using Kubernetes, Kubernetes has a very great security mechanism. It has built in authorization in the form of RBAs ABAs, and you can use authentication, right? You can securely access to your Kubernetes, by using certificates. And of course you can include the data. So all these are really built into Kubernetes. I'm not going into the fundamentals of Kubernetes, but Kubernetes has several tools like CER Manager, you can use service meshes like STO Linkerd. Or, or different other kinds of service measures that allow you to achieve this different authentication, authorization, encryption, because you're working on Kubernetes, all that is taken care for you. so two real world use cases that I really want to talk about, right? I'm not going to name any company over here, but I think I. finance services and healthcare AI are really gaining a lot of, are really gaining a lot, a lot from using Kubernetes for the machine learning. So financial institutions really use Kubernetes to deploy and scale the fraud detection models. Similarly, the healthcare providers use Kubernetes to deploy and scale medically image analysis models, right? so Kubernetes is. Gaining a lot of traction across these industries, as the foundation for scalable and label machine learning deployments, right? So these are a couple of real world examples where in fact we do a lot of those things for, we build a platform and we do a lot of those things for, the other platforms that are there within our own organizations. As you said about there are financial services, right? A lot if by the, in these days, I think any financial organization or a health organization should have a Kubernetes cluster and maybe having a Kubernetes cluster. They might be moving towards an idea of a Kubernetes cluster to deploy their own microservices. And since they all, if they already have a Kubernetes cluster, they can use the same Kubernetes cluster for, for their machine learning work workloads. And it really helps a lot of things, right? For fraud detection, for risk management, for customer analytics. So all this is where, Kubernetes and machine learning workloads together can help, in the financial services, as we spoke about in healthcare, this medical imaging, drug discovery, personalized medicine, although all these are not that related to Kubernetes, but. Kubernetes allows you to make all this very easy, very streamlined, focusing on performance and minimizing the costs. Now, before I wrap up, there's some future trends that we are going to talk about, right? There is serverless ML on Kubernetes. There is something called a scale native that is coming up that is going to offer the serverless platform on Kubernetes. Function as a service, right? which is gaining a lot of traction. You can deploy individual ML functions directly and even driven, architecture. And that's what I was talking about, Kubernetes event driven architecture, kdi, you can go and look out for it on CNCF. So all these are really evolving. KDI is pretty much stabilized these days. all these are really evolving and providing, a lot of benefits for someone who is trying to, use Kubernetes for machine learning. And thank you so much. You can always write to me, for any of your questions. I'm available@prashant.one six@gmail.com. please do mail me if you have any questions. I'll try to reply back to you as, as soon as possible. and that's what it is, right? So we have discussed about, Kubernetes and how it. How it is the foundation, how it is becoming the foundation of scalable ai. what all it does is what we've discussed in this section. and, I really want to thank everyone for listening me to this session, for listening to this session, and thank you con 42 for providing me an opportunity to talk in this session. Hopefully, I'll do a great workshop in one of the future upcoming Con 42, sessions if possible. But again, thank you so much for listening to me.

Slides

Download slides (PDF)

See all 81 talks at this event!

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Machine Learning Meets Kubernetes: Orchestrating AI at Scale: Orchestrating AI at Scale

Video size:

Abstract

Summary

Transcript

Slides

Prashanth Lakshmi Narayana Chaitanya Josyula

Principal Member of Technical Staff @ Salesforce

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Machine Learning Meets Kubernetes: Orchestrating AI at Scale: Orchestrating AI at Scale

Video size:

Abstract

Summary

Transcript

Slides

Prashanth Lakshmi Narayana Chaitanya Josyula

Principal Member of Technical Staff @ Salesforce

Join the community!