Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey there, welcome to my talk at the Conf 42 cloud event.
Today I'm going to be talking about exploring container native
simplicity for AI ML workloads.
I'm Shailambha and you can follow me on X or Twitter at TheRateHowDevelop
and I'm also a community member of the open source project KitOps, which
is currently being donated to the Cloud Native Computing Foundation and
is undergoing application for being a sandbox project within the CNCF.
So without further ado, let's get started.
Now, we truly live in an era where we are seeing a lot of adoption of AI
and different companies are adopting generative AI in their workloads or
shipping products that have gen AI.
Now, One of the key challenges that have always existed ever since the era of just
typical machine learning models and of course today with the generative AI models
is that transforming these AI models from these experimental Jupyter notebooks being
written by interested AI researchers, data scientists into robust production
ready deployable machine learning models can be extremely challenging.
And there are a number of different reasons because these experimental
notebooks do not offer the capability of direct deployment with the help of a CICD
solution or perhaps on top of communities.
In fact, like a lot of times there are, a lot of transformational
steps that are required to move away from these experimental notebooks.
And there's a really unique statistic over here that almost 80 percent of the machine
learning models that are being written by the researchers or by the scientists
never make it to production because of the complex model deployment issues or the
procedures that we have just spoken about.
what is the biggest challenge today that we find with machine learning packaging?
And the reason we are talking about packaging is that we want to move these
models from these Jupyter notebooks into production so that they can be used.
but the biggest one is that there is actually no standard
for packaging and versioning the various artifacts which are needed.
to reproduce an AML project.
So of course, as we know that an AML project is not just your source code
with the version dependencies that you will have in a typical software.
There are different pieces or there are different puzzles or parts of the
puzzle of a machine learning project.
So it starts off with your models that are typically stored in your Zupyte notebooks
or might be part of some envelopes tool.
Then your data sets might be stored in a data lake or in a database.
Your code for your project might be stored now.
Get repository and then all of the metadata that's involved with your models
such as the hyper parameter tuning or the features of the weights might be scattered
across different storage systems.
So what you're seeing is that there are all of these moving fragments of
a machine learning project and they're all being stored in different locations.
So being able to then.
Get all of them under one location and then using them for productionizing them
can be a big challenge So there is this lack of standardization for packaging
these artifacts, which is causing a lot of different issues Now why can't we
actually use the same pipeline for MLOps?
You know that we might actually use for conventional DevOps based
applications So there are a lot of different reasons for that, right?
So as we covered that Because of the fact that the nature of a machine
learning project is a bit different as compared to a software project, right?
So that's the reason that we cannot just directly just use all of the tools that
we typically use for a DevOps project and apply that for machine learning
and that's also because for machine learning you might actually need more
expertise and the complexity of running these machine learning projects is also
significantly different because you generally need a things such as GPUs or
even a cluster of GPUs if you're running very large machine learning models.
So the above challenges have essentially led organizations to adopt
a completely separate MLOps pipeline.
However, implementing that comes with its own set of challenges,
which is, the data management and standardization, the running cost of
having to deal with expensive hardware.
Security and compliance that has to be occurring with
machine learning itself, right?
And then of course, managing the complex model life cycle, the
different steps involving with pre training the model, then training
it and then inference, all of those lead to their own set of challenges.
And a lot of times the standard SRE teams or DevOps teams might not be
equipped in order how to handle all of these challenges because machine
learning might be new for them.
if we explore this even further, we will have distinct workflows that will
be for their scientists because they are working with the Python notebooks
or model training DevOps team because they are dealing with the deployment
and setting up the infrastructure that is required for deployment.
And all of this basically leads to increased cost because you're more
often than not having to duplicate the efforts because you're having
a separate DevOps pipeline and then you have a separate MLOps pipeline
and then you have to do the manual handoff between the different teams.
So this increases the technical depth.
And, can lead to inconsistent deployment processes, right?
So if we also look at how over the duration of time we have
seen, the different kind of, specific centric approach.
So we of course started off with machine centric where we were
just running simple servers.
Then we went into VMs because we found that, okay, like there are a
lot of inherent benefits of being able to run VMs because then, you can
actually, Virtualize your software and not have to worry about whether it's
supported on a local server or not.
And then we went into container centric approach where we saw that we are
able to ship, your applications in a much more smaller footprint while
not having to worry about the system dependencies because you can run these.
particular, small containers in the host VM, right?
And not have to ship an entire operating system that we had
to do with the virtual machine.
So now, of course, we are looking at what we are calling as a model
centric approach where we are able to actually containerize.
Not only the model, but all of the different dependencies of a model.
So how would that look like if you were to build a model centric approach?
How would that look like?
So that's what we're going to be covering in today's session.
And that is where KitOps actually helps us to solve this particular problem.
So KitOps is a completely open source standard based packaging and
versioning system that has been designed specifically for AI and ML projects.
So what KitOps allows you to do is that It takes advantage of your existing software
standards and tools that can be used by the DevOps and the SRE teams to basically
build containerized applications.
So the idea is that it will use the similar OCI compliant format, which
allows you to package your Standard software applications as container images
and then you can deploy these container images on any model registry or for that
matter a private or a public registry.
We can use those same set of tools and use an OCI compliant format such
that it can be, you can take all the different components of your
machine learning model systems.
And you can basically package the different components with our
models, datasets, and metadata into a format that is called as model kit.
And this model kit is completely OCI compliant.
So KitOps allows AI teams to not have to deal with all of these different
locations where different parts of your model are being stored.
You can store all of them together into this model kit.
Now let's dive deeper into what this model kit basically means.
So at the heart of KitOps, what we have mentioned is model kit, which is, as I
mentioned, an OCI compliant packaging format that will seamlessly take all of
the different artifacts that you have.
In your machine learning life cycle and this basically includes the
data set and the code configuration model itself and package it together
into just one single unit, which is called as a model kit and then can be
deployed to any OCI compliant registry.
So whether you have things like a docker hub or you might have a private.
A cloud based registry like a Azure, registry or an AWS or Google registry,
you can deploy this particular model kit on those same set of registry, right?
So it uses the OCI compliant standard and, of course it, you can basically version
all the different parts of your code.
at the core, you will basically define what we call as a kit file, which is
similar to something like a Docker file.
And then, your model, the code, dataset, documentation, all of
them will be packaged together.
Now, the kit file, as I mentioned, is what, is complementing, like
how we have a docker file that complements the docker image.
Similar to the model kit, we have the kit file, which is a YAML based
configuration file, which simplifies.
The, description that, okay, what are the different components that you will have.
And, of course, this kit file has been designed specifically with ease of use and
security in mind so that you can basically officially package your, software.
And this is how a typical kit file would look like.
So it will be a very similar to a YAML file that has been designed to encapsulate
all of the necessary information.
that you need to actually package together.
and this, manifest will be broken down into different sections, which is the
manifest version, if you're having the package details, then the code, the
dataset, and where exactly they are stored, because all of them are then
packaged together into the model kit.
And, then, finally, we talk about the kit CLI, which is, similar to, a Docker
CLI, where the idea is that you get a set of command line tools that enable
the users, not only, to manage, but to also go ahead and create and run these
model kits and then also publish them.
So you can, have your.
Git file, and then create the kit or the model kit from it, and then
go ahead and actually deploy to any container registry that you want.
Now, the great thing over here is that since these particular
model kits are compliant with OCA compliant registry, that means that.
you can use a set of DevOps tools, whether it's CICD tools, deployment
tools, that you would typically use with a container image, because these
are similar to like a container image.
That means you don't have to reinvent the wheel and introduce new tools
for machine learning based systems.
And that's the entire idea about reusing your software, right?
so that's the KitCLI.
Now let's look at how a typical KitOps pipeline would look like.
So the kit unpack would allow you to pull model kits that have been stored in
a private or a public registry They're like a docker Pull, or Docker Unpack.
So what it allows you to do is that when you do kit unpack, it will pull the
model kit and then it will produce all of the individual components that you
had locally stored, that you had stored in the model kit as separate files.
And we'll see that in action and then kit pull will allow you to then retrieve
a model kit that has been stored in a remote registry into a local environment.
And if you want to create one from scratch, so you can create a
new kit file for your AI project.
And then you will use the kit pack command to basically create
the model kit from that kit file.
And then you can use the kit push, which is like similar to a Docker
push, which will allow you to push your newly created model kit from
your local repository to your.
Private, cloud based registry and you get a lot of, things out of the
box because you can also, sign your images, so that increases the security.
So if you want, because of compliance reasons, you want your
AI models to be more secure, then you can leverage, the kit push with
signing off the images as well.
So it comes with inbuilt security.
So with signing, you can know that the model kits are actually, Verifiable so
let's look at the demo and we are going to be seeing a demo of the kit file
So here I'm going to be running a kit CLI where I'm have opened my terminal.
I have already installed kit CLI So let's take a look at how you know, how
that would basically function So I'll proceed a bit further and show you
so first of all to verify whether you have kit version installed you can just
use kit version and that shows you the current version of your, kit CLI and
now, let's basically, proceed further.
So now, I will use kit login to basically log in to my kit and here you will
basically provide the details of what is the registry that you're using.
So in this case, I'm going to be using Zozo Hub, which is a really
specialized registry for being able to store your model kits.
And here I just put my username and password.
And once I've logged in.
Now I can start to manage, like this is very similar to a Docker
Hub sign in that you have to do when you're using the Docker CLI.
So here you can see that I have this example of a Zozohub model
kit that has been hosted on Zozohub, which is this YOLOv10.
I can very easily now go ahead and actually use the pull tag or
I can do kit unpack and give the name of this particular registry.
In this case, you can see that this particular model kit has a model,
the docs and the configuration.
Files which are available.
So in order to basically create what I'll use is the kit unpack command
now Of course, I can do it in this particular registry here What you are
seeing is that the kit unpack jozu.
ml is of course where this model kit is being Deployed and then I
give the name as the name of the the owner and the name of the repository.
So name of the owner is Dozu and then the repository name is YOLOv10
and then the v10x is the tag.
So I'm just creating this new, repo, locally, and then
this is where I'll unpack it.
Now, as soon as I unpack, you'll see that it will unpack all of the
different components of the model kit that had been used to store.
which is like very similar to like we have layers in a docker image.
So these can be thought of as different layers.
So it is unpacking all of these different files right now.
as soon as it basically does that, if I take a look at my list of the model
of the files, you can see that it has the kit file, the readme, the model
itself, like in the tensors format.
And if I see the model kit itself, you can see the kit file over here,
which shows me the manifest file, the package details, and the model details.
In this case, it's the safe tensors and it's the YOLOv10 model.
And, I have my docs, which has the README file.
So this shows all of the details, about my kit file.
Now, what I can do is that I can actually use this same kit file to now package a
new model kit that I can store locally.
So for that, you can see that the list kit list, like a Docker list shows you
the list of all the model kits that you have, locally stored right now.
so the one that I just recently downloaded, you'll
be able to also see those.
Now, if I want to create a new model kit, I can use the command kit pack, right?
So I'll use the command kit pack and I will give it the details where
exactly have I stored my kit file.
So dot means that it's in the same directory.
Then I use the hyphen, T command.
and then I provide it the name.
so in this case, because I'll be published, publishing this on those
who, ML, I just give it the name and then I will give it the name of
the user where I'll be uploading it.
So that is Shivalamba in my case, because that's my name on those who
have, and then I'll give it a new name.
So this is the name of the model kit with a tag that I'll give to it.
And you'll see that it basically packs the different layers from
the model, from the kit file.
So different layers are the model, the, data set, and you can see that it
just created that particular model kit.
Now in order to actually upload it, I will have to also create a repository
name and we'll keep it as the same name as our model kit so that it matches when
we push it to this remote, registry.
And now once I've created that particular, repository on those who
have, I can now use the kit push command.
And, again, provide the model kit that's stored locally in my system.
So that's Shivalamba and this will upload it to, the repository
YOLOv3 that we have pushed online.
So as you can see, it's starting to push it to this repository.
So just to ensure that name of the model kit locally and the repository
in your Zozohub registry matches or for that matter, any other registry.
So you can see again, it, pushes this layer by layer.
And, one thing to keep in note is that when you basically also use like the
git unpack command, you can do selective download of different components.
Because as I said that there are different components, the data set,
the model, all of those are separate.
So if you just want, let's say, the data scientist does not want to
download the actual original model.
They just want to download the data set.
So I can selectively.
Let's say that by giving the filter, by giving the flag that filter, I
just want the dataset and the model instead of download all the things.
So this way, like in case multiple teams are using the same model kit,
you don't need to download every time, all the things like only if
they need some specific parts of the model kit, they can do that.
So that's the great part about re sharing this model kit across the different team
members that you will have in your team.
So by using that particular filter that allows you to very easily just
manage specific parts and now you'll see that, when I go back into Jozuhub,
I would have uploaded it successfully.
Now you can see some of the most popular repositories including
the ones that are signed.
So this basically tells you that these are Proper properly, security
tested and these are signed.
So these are properly authenticated as well So we can apply those type of
security measures when you are uploading your model kits to the registry In
this case like you can see like it has successfully, you know uploaded it.
So that's a quick overview of the Kit file and how, what
basically the kit file looks like.
Now, of course, one important aspect is that how can we automate the machine
learning life cycle with the help of CICD?
Because as we know that CICD is an important aspect for Us not having to
manually, like every time we pushed some code, having to, test it and rerun
our pipelines each and every time.
So that's where the CICD really works.
And the similar is the case for machine learning as well.
So imagine that when you, whenever you're trying to push a machine learning model.
to production with the new version.
So you push the code, then you, generate the new, dependencies.
for example, the model files that could be a H5 format or a pickle format
and manually having to then deploy it where you are running it in production.
So all these steps can be automated with the help of CICD.
So you can also then add GitOps as part of your CICD pipeline to
automate the deployment of these AI models, which can be triggered
either manually or automatically.
Based on let's say the changes that are taking place in your model or the
artifacts So this means that kit ops not only helps you to simplify the packaging
of the models But by leveraging it with the help of CICD you also automate the
deployment of the AI to make it You know make it something that you have
to do manually each and every time.
So you can trigger these deployments and truly streamline the model life
cycle management by defining your own way in in which, how you want
new versions of the model, or your entire model pipeline to look like.
So you can very easily then adopt GitOps with GitHub Actions, Jenkins.
in fact, like you'll find a lot of different resources to use it in
conjunction with your CICD tool.
And here's like an example of how you can use it with a Dagger CICD.
So you install kit ops, you have Dagger, I've installed.
And here you'll see that, what we are doing is that you create your file, you
run your Dagger pipeline and based on the Dagger pipeline, what would happen is
that once you have created the model kit with the kit file, you initialize your.
Dagger module and then you Daggerize your kit file and then when you
define your pipeline functions, and then you can you know integrate your
Dagger module with a CICD That could be an example of a GitHub action.
So whenever you are pushing some new changes to your kit file, it will
instantiate your Dagger module and then that basically goes ahead and
kicks off your, CICD workflow pipeline that's running in GitHub actions.
And then you can automate the deployment of your, latest changes to your kit file
or to your model kit to the registry.
In that case, that could be a Zozo hub.
So this is an example, like a typical example of how it would look like.
In case you had Jenkins, you, it would look like something like
similar where you define your Jenkins pipeline, you define your model kit
behavior when you want it actually to in, kick off the Jenkins pipeline.
So you can define all of that in your configuration.
so of course.
MLOps is also a really important aspect and you can really, instantiate
that with the help of model kits.
So whether you like, you're using, your machine learning models for
training experimentation, that's where you need, the ability to package
and validate your model deployments.
And even then you can use, your model kit as inference because you can technically
go ahead and, leverage, the model kit that you have deployed on Zozo Hub, let's say.
You can run it as a container image or you can deploy it on
communities as an init container or as a kit file or a kit CLI, right?
So you can basically do experimentation because let's say
that you have defined a model kit.
Now you could have different versions of the model kit.
You can store them as different tags and upload them.
So now you can very easily see the differences between.
The different, versions or the tags based on experimentation and the changes
that you had to make to your, model.
And then you can very easily understand all of them and do experimentation
as well with the help of model kits and having the different
versions that you will have, right?
So this makes training experimentation very easy and then inference, you can
deploy a model kit as an inference service with the help of, and on top of
like different type of modern day cloud platforms like Kubernetes or Docker.
And as I mentioned that you can deploy these model kits, by either creating a
container or a Kubernetes deployment.
So it could either be an init container or a kit cli, which is a highly specialized,
containerized kit cli that will be used to tailor the running of the model kit.
because you can basically run the kit cli command, or if you don't want
to do that, you can just use an init container, which is like a default
way to basically then run it inside of a Kubernetes based environment.
So those are readily available, to all of you.
these are some of the resources that you can check out as you
can check out GitOps on GitOps.
ml. You can take a look at the documentation.
You can take a look at the dev.
to blogs which show how you can integrate with CICD tools with.
different kind of MLOPS tools.
And another thing is that, as I mentioned earlier, that we are looking at a
wave where we meant from, the storage, to, the VMs, to our, containers.
And now we're looking at models.
So we are proactively having discussions on the model spec
discussion on the CNC of Slack.
and here is a Google doc that actively takes part in that.
So the primary target here is that we want to make AI models as first class citizens.
So we want to proactively make some changes to our OCI spec
that is better well suited for machine learning based models.
with that, thank you so much for attending.
You can scan the QR code for these slides and you can also connect with me
and you can join the kit ops discord, which is again, an open source project,
that will lead you to, have discussions around kit ops and of course you can
connect with me on how to develop.
So thank you so much for watching this video and I hope you have a
wonderful time watching the rest of the amazing talks at the conference.