Bringing Container-Native Simplicity to AI/ML

Video size:

Abstract

The deployment of AI projects often faces significant hurdles due to the fragmented nature of their components—datasets, models, and model weights are frequently stored in separate repositories. We will explore the critical challenges posed by this and how to overcome them with the help of modelkits

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey there, welcome to my talk at the Conf 42 cloud event. Today I'm going to be talking about exploring container native simplicity for AI ML workloads. I'm Shailambha and you can follow me on X or Twitter at TheRateHowDevelop and I'm also a community member of the open source project KitOps, which is currently being donated to the Cloud Native Computing Foundation and is undergoing application for being a sandbox project within the CNCF. So without further ado, let's get started. Now, we truly live in an era where we are seeing a lot of adoption of AI and different companies are adopting generative AI in their workloads or shipping products that have gen AI. Now, One of the key challenges that have always existed ever since the era of just typical machine learning models and of course today with the generative AI models is that transforming these AI models from these experimental Jupyter notebooks being written by interested AI researchers, data scientists into robust production ready deployable machine learning models can be extremely challenging. And there are a number of different reasons because these experimental notebooks do not offer the capability of direct deployment with the help of a CICD solution or perhaps on top of communities. In fact, like a lot of times there are, a lot of transformational steps that are required to move away from these experimental notebooks. And there's a really unique statistic over here that almost 80 percent of the machine learning models that are being written by the researchers or by the scientists never make it to production because of the complex model deployment issues or the procedures that we have just spoken about. what is the biggest challenge today that we find with machine learning packaging? And the reason we are talking about packaging is that we want to move these models from these Jupyter notebooks into production so that they can be used. but the biggest one is that there is actually no standard for packaging and versioning the various artifacts which are needed. to reproduce an AML project. So of course, as we know that an AML project is not just your source code with the version dependencies that you will have in a typical software. There are different pieces or there are different puzzles or parts of the puzzle of a machine learning project. So it starts off with your models that are typically stored in your Zupyte notebooks or might be part of some envelopes tool. Then your data sets might be stored in a data lake or in a database. Your code for your project might be stored now. Get repository and then all of the metadata that's involved with your models such as the hyper parameter tuning or the features of the weights might be scattered across different storage systems. So what you're seeing is that there are all of these moving fragments of a machine learning project and they're all being stored in different locations. So being able to then. Get all of them under one location and then using them for productionizing them can be a big challenge So there is this lack of standardization for packaging these artifacts, which is causing a lot of different issues Now why can't we actually use the same pipeline for MLOps? You know that we might actually use for conventional DevOps based applications So there are a lot of different reasons for that, right? So as we covered that Because of the fact that the nature of a machine learning project is a bit different as compared to a software project, right? So that's the reason that we cannot just directly just use all of the tools that we typically use for a DevOps project and apply that for machine learning and that's also because for machine learning you might actually need more expertise and the complexity of running these machine learning projects is also significantly different because you generally need a things such as GPUs or even a cluster of GPUs if you're running very large machine learning models. So the above challenges have essentially led organizations to adopt a completely separate MLOps pipeline. However, implementing that comes with its own set of challenges, which is, the data management and standardization, the running cost of having to deal with expensive hardware. Security and compliance that has to be occurring with machine learning itself, right? And then of course, managing the complex model life cycle, the different steps involving with pre training the model, then training it and then inference, all of those lead to their own set of challenges. And a lot of times the standard SRE teams or DevOps teams might not be equipped in order how to handle all of these challenges because machine learning might be new for them. if we explore this even further, we will have distinct workflows that will be for their scientists because they are working with the Python notebooks or model training DevOps team because they are dealing with the deployment and setting up the infrastructure that is required for deployment. And all of this basically leads to increased cost because you're more often than not having to duplicate the efforts because you're having a separate DevOps pipeline and then you have a separate MLOps pipeline and then you have to do the manual handoff between the different teams. So this increases the technical depth. And, can lead to inconsistent deployment processes, right? So if we also look at how over the duration of time we have seen, the different kind of, specific centric approach. So we of course started off with machine centric where we were just running simple servers. Then we went into VMs because we found that, okay, like there are a lot of inherent benefits of being able to run VMs because then, you can actually, Virtualize your software and not have to worry about whether it's supported on a local server or not. And then we went into container centric approach where we saw that we are able to ship, your applications in a much more smaller footprint while not having to worry about the system dependencies because you can run these. particular, small containers in the host VM, right? And not have to ship an entire operating system that we had to do with the virtual machine. So now, of course, we are looking at what we are calling as a model centric approach where we are able to actually containerize. Not only the model, but all of the different dependencies of a model. So how would that look like if you were to build a model centric approach? How would that look like? So that's what we're going to be covering in today's session. And that is where KitOps actually helps us to solve this particular problem. So KitOps is a completely open source standard based packaging and versioning system that has been designed specifically for AI and ML projects. So what KitOps allows you to do is that It takes advantage of your existing software standards and tools that can be used by the DevOps and the SRE teams to basically build containerized applications. So the idea is that it will use the similar OCI compliant format, which allows you to package your Standard software applications as container images and then you can deploy these container images on any model registry or for that matter a private or a public registry. We can use those same set of tools and use an OCI compliant format such that it can be, you can take all the different components of your machine learning model systems. And you can basically package the different components with our models, datasets, and metadata into a format that is called as model kit. And this model kit is completely OCI compliant. So KitOps allows AI teams to not have to deal with all of these different locations where different parts of your model are being stored. You can store all of them together into this model kit. Now let's dive deeper into what this model kit basically means. So at the heart of KitOps, what we have mentioned is model kit, which is, as I mentioned, an OCI compliant packaging format that will seamlessly take all of the different artifacts that you have. In your machine learning life cycle and this basically includes the data set and the code configuration model itself and package it together into just one single unit, which is called as a model kit and then can be deployed to any OCI compliant registry. So whether you have things like a docker hub or you might have a private. A cloud based registry like a Azure, registry or an AWS or Google registry, you can deploy this particular model kit on those same set of registry, right? So it uses the OCI compliant standard and, of course it, you can basically version all the different parts of your code. at the core, you will basically define what we call as a kit file, which is similar to something like a Docker file. And then, your model, the code, dataset, documentation, all of them will be packaged together. Now, the kit file, as I mentioned, is what, is complementing, like how we have a docker file that complements the docker image. Similar to the model kit, we have the kit file, which is a YAML based configuration file, which simplifies. The, description that, okay, what are the different components that you will have. And, of course, this kit file has been designed specifically with ease of use and security in mind so that you can basically officially package your, software. And this is how a typical kit file would look like. So it will be a very similar to a YAML file that has been designed to encapsulate all of the necessary information. that you need to actually package together. and this, manifest will be broken down into different sections, which is the manifest version, if you're having the package details, then the code, the dataset, and where exactly they are stored, because all of them are then packaged together into the model kit. And, then, finally, we talk about the kit CLI, which is, similar to, a Docker CLI, where the idea is that you get a set of command line tools that enable the users, not only, to manage, but to also go ahead and create and run these model kits and then also publish them. So you can, have your. Git file, and then create the kit or the model kit from it, and then go ahead and actually deploy to any container registry that you want. Now, the great thing over here is that since these particular model kits are compliant with OCA compliant registry, that means that. you can use a set of DevOps tools, whether it's CICD tools, deployment tools, that you would typically use with a container image, because these are similar to like a container image. That means you don't have to reinvent the wheel and introduce new tools for machine learning based systems. And that's the entire idea about reusing your software, right? so that's the KitCLI. Now let's look at how a typical KitOps pipeline would look like. So the kit unpack would allow you to pull model kits that have been stored in a private or a public registry They're like a docker Pull, or Docker Unpack. So what it allows you to do is that when you do kit unpack, it will pull the model kit and then it will produce all of the individual components that you had locally stored, that you had stored in the model kit as separate files. And we'll see that in action and then kit pull will allow you to then retrieve a model kit that has been stored in a remote registry into a local environment. And if you want to create one from scratch, so you can create a new kit file for your AI project. And then you will use the kit pack command to basically create the model kit from that kit file. And then you can use the kit push, which is like similar to a Docker push, which will allow you to push your newly created model kit from your local repository to your. Private, cloud based registry and you get a lot of, things out of the box because you can also, sign your images, so that increases the security. So if you want, because of compliance reasons, you want your AI models to be more secure, then you can leverage, the kit push with signing off the images as well. So it comes with inbuilt security. So with signing, you can know that the model kits are actually, Verifiable so let's look at the demo and we are going to be seeing a demo of the kit file So here I'm going to be running a kit CLI where I'm have opened my terminal. I have already installed kit CLI So let's take a look at how you know, how that would basically function So I'll proceed a bit further and show you so first of all to verify whether you have kit version installed you can just use kit version and that shows you the current version of your, kit CLI and now, let's basically, proceed further. So now, I will use kit login to basically log in to my kit and here you will basically provide the details of what is the registry that you're using. So in this case, I'm going to be using Zozo Hub, which is a really specialized registry for being able to store your model kits. And here I just put my username and password. And once I've logged in. Now I can start to manage, like this is very similar to a Docker Hub sign in that you have to do when you're using the Docker CLI. So here you can see that I have this example of a Zozohub model kit that has been hosted on Zozohub, which is this YOLOv10. I can very easily now go ahead and actually use the pull tag or I can do kit unpack and give the name of this particular registry. In this case, you can see that this particular model kit has a model, the docs and the configuration. Files which are available. So in order to basically create what I'll use is the kit unpack command now Of course, I can do it in this particular registry here What you are seeing is that the kit unpack jozu. ml is of course where this model kit is being Deployed and then I give the name as the name of the the owner and the name of the repository. So name of the owner is Dozu and then the repository name is YOLOv10 and then the v10x is the tag. So I'm just creating this new, repo, locally, and then this is where I'll unpack it. Now, as soon as I unpack, you'll see that it will unpack all of the different components of the model kit that had been used to store. which is like very similar to like we have layers in a docker image. So these can be thought of as different layers. So it is unpacking all of these different files right now. as soon as it basically does that, if I take a look at my list of the model of the files, you can see that it has the kit file, the readme, the model itself, like in the tensors format. And if I see the model kit itself, you can see the kit file over here, which shows me the manifest file, the package details, and the model details. In this case, it's the safe tensors and it's the YOLOv10 model. And, I have my docs, which has the README file. So this shows all of the details, about my kit file. Now, what I can do is that I can actually use this same kit file to now package a new model kit that I can store locally. So for that, you can see that the list kit list, like a Docker list shows you the list of all the model kits that you have, locally stored right now. so the one that I just recently downloaded, you'll be able to also see those. Now, if I want to create a new model kit, I can use the command kit pack, right? So I'll use the command kit pack and I will give it the details where exactly have I stored my kit file. So dot means that it's in the same directory. Then I use the hyphen, T command. and then I provide it the name. so in this case, because I'll be published, publishing this on those who, ML, I just give it the name and then I will give it the name of the user where I'll be uploading it. So that is Shivalamba in my case, because that's my name on those who have, and then I'll give it a new name. So this is the name of the model kit with a tag that I'll give to it. And you'll see that it basically packs the different layers from the model, from the kit file. So different layers are the model, the, data set, and you can see that it just created that particular model kit. Now in order to actually upload it, I will have to also create a repository name and we'll keep it as the same name as our model kit so that it matches when we push it to this remote, registry. And now once I've created that particular, repository on those who have, I can now use the kit push command. And, again, provide the model kit that's stored locally in my system. So that's Shivalamba and this will upload it to, the repository YOLOv3 that we have pushed online. So as you can see, it's starting to push it to this repository. So just to ensure that name of the model kit locally and the repository in your Zozohub registry matches or for that matter, any other registry. So you can see again, it, pushes this layer by layer. And, one thing to keep in note is that when you basically also use like the git unpack command, you can do selective download of different components. Because as I said that there are different components, the data set, the model, all of those are separate. So if you just want, let's say, the data scientist does not want to download the actual original model. They just want to download the data set. So I can selectively. Let's say that by giving the filter, by giving the flag that filter, I just want the dataset and the model instead of download all the things. So this way, like in case multiple teams are using the same model kit, you don't need to download every time, all the things like only if they need some specific parts of the model kit, they can do that. So that's the great part about re sharing this model kit across the different team members that you will have in your team. So by using that particular filter that allows you to very easily just manage specific parts and now you'll see that, when I go back into Jozuhub, I would have uploaded it successfully. Now you can see some of the most popular repositories including the ones that are signed. So this basically tells you that these are Proper properly, security tested and these are signed. So these are properly authenticated as well So we can apply those type of security measures when you are uploading your model kits to the registry In this case like you can see like it has successfully, you know uploaded it. So that's a quick overview of the Kit file and how, what basically the kit file looks like. Now, of course, one important aspect is that how can we automate the machine learning life cycle with the help of CICD? Because as we know that CICD is an important aspect for Us not having to manually, like every time we pushed some code, having to, test it and rerun our pipelines each and every time. So that's where the CICD really works. And the similar is the case for machine learning as well. So imagine that when you, whenever you're trying to push a machine learning model. to production with the new version. So you push the code, then you, generate the new, dependencies. for example, the model files that could be a H5 format or a pickle format and manually having to then deploy it where you are running it in production. So all these steps can be automated with the help of CICD. So you can also then add GitOps as part of your CICD pipeline to automate the deployment of these AI models, which can be triggered either manually or automatically. Based on let's say the changes that are taking place in your model or the artifacts So this means that kit ops not only helps you to simplify the packaging of the models But by leveraging it with the help of CICD you also automate the deployment of the AI to make it You know make it something that you have to do manually each and every time. So you can trigger these deployments and truly streamline the model life cycle management by defining your own way in in which, how you want new versions of the model, or your entire model pipeline to look like. So you can very easily then adopt GitOps with GitHub Actions, Jenkins. in fact, like you'll find a lot of different resources to use it in conjunction with your CICD tool. And here's like an example of how you can use it with a Dagger CICD. So you install kit ops, you have Dagger, I've installed. And here you'll see that, what we are doing is that you create your file, you run your Dagger pipeline and based on the Dagger pipeline, what would happen is that once you have created the model kit with the kit file, you initialize your. Dagger module and then you Daggerize your kit file and then when you define your pipeline functions, and then you can you know integrate your Dagger module with a CICD That could be an example of a GitHub action. So whenever you are pushing some new changes to your kit file, it will instantiate your Dagger module and then that basically goes ahead and kicks off your, CICD workflow pipeline that's running in GitHub actions. And then you can automate the deployment of your, latest changes to your kit file or to your model kit to the registry. In that case, that could be a Zozo hub. So this is an example, like a typical example of how it would look like. In case you had Jenkins, you, it would look like something like similar where you define your Jenkins pipeline, you define your model kit behavior when you want it actually to in, kick off the Jenkins pipeline. So you can define all of that in your configuration. so of course. MLOps is also a really important aspect and you can really, instantiate that with the help of model kits. So whether you like, you're using, your machine learning models for training experimentation, that's where you need, the ability to package and validate your model deployments. And even then you can use, your model kit as inference because you can technically go ahead and, leverage, the model kit that you have deployed on Zozo Hub, let's say. You can run it as a container image or you can deploy it on communities as an init container or as a kit file or a kit CLI, right? So you can basically do experimentation because let's say that you have defined a model kit. Now you could have different versions of the model kit. You can store them as different tags and upload them. So now you can very easily see the differences between. The different, versions or the tags based on experimentation and the changes that you had to make to your, model. And then you can very easily understand all of them and do experimentation as well with the help of model kits and having the different versions that you will have, right? So this makes training experimentation very easy and then inference, you can deploy a model kit as an inference service with the help of, and on top of like different type of modern day cloud platforms like Kubernetes or Docker. And as I mentioned that you can deploy these model kits, by either creating a container or a Kubernetes deployment. So it could either be an init container or a kit cli, which is a highly specialized, containerized kit cli that will be used to tailor the running of the model kit. because you can basically run the kit cli command, or if you don't want to do that, you can just use an init container, which is like a default way to basically then run it inside of a Kubernetes based environment. So those are readily available, to all of you. these are some of the resources that you can check out as you can check out GitOps on GitOps. ml. You can take a look at the documentation. You can take a look at the dev. to blogs which show how you can integrate with CICD tools with. different kind of MLOPS tools. And another thing is that, as I mentioned earlier, that we are looking at a wave where we meant from, the storage, to, the VMs, to our, containers. And now we're looking at models. So we are proactively having discussions on the model spec discussion on the CNC of Slack. and here is a Google doc that actively takes part in that. So the primary target here is that we want to make AI models as first class citizens. So we want to proactively make some changes to our OCI spec that is better well suited for machine learning based models. with that, thank you so much for attending. You can scan the QR code for these slides and you can also connect with me and you can join the kit ops discord, which is again, an open source project, that will lead you to, have discussions around kit ops and of course you can connect with me on how to develop. So thank you so much for watching this video and I hope you have a wonderful time watching the rest of the amazing talks at the conference.

Slides

Download slides (PDF)

See all 81 talks at this event!

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Bringing Container-Native Simplicity to AI/ML

Video size:

Abstract

Summary

Transcript

Slides

Shivay Lamba

Senior Developer Experience Engineering @ Couchbase

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Bringing Container-Native Simplicity to AI/ML

Video size:

Abstract

Summary

Transcript

Slides

Shivay Lamba

Senior Developer Experience Engineering @ Couchbase

Join the community!