Conf42 Python 2022 - Online

Using Reproducible Experiments To Create Better Machine Learning Models

Video size:

Abstract

It’s easy to lose track of which changes gave you the best result when you start exploring multiple model architectures. Tracking the changes in your hyperparameter values, along with code and data changes, will help you build a more efficient model by giving you an exact reproduction of the conditions that made the model better.

In this talk, you will learn how you can use the open-source tool, DVC, to increase reproducibility for two methods of tuning hyperparameters: grid search and random search. We’ll go through a live demo of setting up and running grid search and random search experiments. By the end of the talk, you’ll know how to add reproducibility to your existing projects.

Summary

  • Using reproducible experiments to create better machine learning models. Throughout this talk, youll need Python three. If you have any questions, or if you want to get in touch with the whole DVC team, feel free to reach out to us on twitter.
  • There are a few common issues when it comes to machine learning projects. Finding the best combination of hyperparameter values, algorithms, data sets, environment configurations. How do you manually keep track of that many different experiments? When you're ready to deploy to production, it becomes a lot more consistent.
  • DVC is a tool that helps us manage all of this experiment. It works on top of git so that you're able to bundle your code, your data, and any hyperparameters together for each experiment you run. With DVC, there's no API calls to slow down your training.
  • DVC shows how youll experiments should guide your model training. With something like DVC, you're just able to do this kind of whimsical experimentation without worrying about taking notes every 2 seconds. That is just this awesome thing you need to get out to production right now.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. My name is Melisa, and I'm a developer advocate here at iterative AI. And today I'm going to talk to you about using reproducible experiments to create better machine learning models. So feel free to reach out to me personally on Twitter at flipped coding. If you have any questions, or if you want to get in touch with the whole DVC team, feel free to reach out to us on twitter@dvc.org but to get started, if you want to follow along or at some point go back and reference the project that I'm going to be using as an example. Throughout this talk, youll need a few things installed. So you'll need Python three. You don't need vs. Code, but vs. Code does make it a lot easier. And you'll need to fork this repo here, and it'll give you the exact project that you see me show in this presentation. So let's just jump into it. There are a few common issues when it comes to machine learning projects. First, we're tuning to find the best combination of hyperparameter values, algorithms, data sets, environment configurations. There's a lot that goes into each of these models we produce. And every time we get new requirements, or we get new data, or we get access to new resources, we still have to come back to this fundamental thing where we find this best combination of all of these different things that go into our model. So this is fine, right? We're just going to keep trying out different things. We'll do our hyperparameter tuning. We'll read through academic papers and find the most cutting edge algorithms to practice with or to try out. And we have to keep track of all of these changes, because eventually you will find a model that's really, really good. Like it'll give you some kind of incredible accuracy, or youll notice that the performance is a lot faster or something. And you need to keep track of all of the changes you made throughout all of these different experiments, so that when you get this incredible model, you can reproduce it. So as we go through all of these experiments, trying to find the best combination of hyperparams and data sets, we have to keep track of each experiment we run. And the problem with that is that over time, it gets really hard to follow those changes. So there's this thing where we might have hundreds of hyperparameter values to test out. How do youll manually keep track of that many different experiments? And in between times you're changing hyperparameters? You might take a look at your code and think, oh, maybe if I just change this one line here, that might do something different, and then you get more data from production or something. So you layer on to these experiments really fast and sometimes without even noticing that it's happening. That's why over time, it's hard to follow those changes that led you to your current best model that actually has that best combination of all of those factors that go into tracking this model. So we want to make sure we have some kind of way to keep track of all of these experiments, make sure that we know what code, data and configs were associated with our model. So when we get ready to deploy to production, we don't have that weird drop in accuracy or there's this strange environment difference that we just couldn't account for before. When we're able to follow these changes over time, when you're ready to deploy to production, it becomes a lot more, it just becomes a lot more consistent, a lot more reliable. So let's look at how we actually fix these issues. The first way is just by thinking of each experiment as its own little bundle. So can experiment consists of your data set, any hyperparameters you have, and maybe you have a model to start with, or you just have some algorithm you want to test out. But for each experiment you run, they all have these same things in common. So as you're adjusting your parameters, as you're updating your data set, you want to be able two track each of those experiments, kind of like you see on the screen here. So that's why we're going to track about a little background on hyperparameter tuning before we jump too deep into how we fix the problem. So with hyperparameter tuning, we know that hyperparameters are the values that define the model. If you're working with a neural net, that means the values like the number of layers in your neural net, or if you have a random forest classifier that will be like the max depth for that classifier. So these aren't the things that your model predicts for you. These are the values that actually build the model that does the prediction. And there are a couple of common ways. Two, approach hyperparameter tuning, and that's through grid search and random search. With grid search, you have sets of values for each of your hyperparameters, and you go through all of them. So if there is a best combination of hyperparameter values, grid search is definitely going to find that for you because it's testing everything. And random search is another method that we use for hyperparameter tuning. And it's similar to grid search in that you give it sets of values for each hyperparameter, but the difference is that it just jumps around random conditions of these values instead of going through each of them very systematically. A lot of times if you run a random search for about the same amount of time as a grid search, youll end up getting better conditions of hyperparameter values. And that's just because random search samples a wider variety of those values. So we know our problem is keeping track of all of these experiments. We know we want to solve it by making these little bundles for each experiment. And we know with hyperparameter tuning, we have a lot of values that we're going to experiment with. So let's take a look at DVC, which is a tool that helps us manage all of this experiment. Tracking DVC is an open source tool. You can go check out the GitHub repo. It works on top of git. So think of DVC as git for your data. So you're able to check in your code with git. You're able to check in your data with DVC, and it works on top of git so that you're able to bundle your code, your data, and any hyperparameters or other environment configs together for each experiment you run. And the best part is, it's not opinionated at all. So to use DVC, you don't actually need to install any particular libraries. You just need to initialize DVC in your git repo and use the commands. That's really it. There's no API calls to slow down your training. I know this comes up a little bit with Mlflow because it makes API calls to their service. But with DVC, there's no API calls. Everything is right there on your local machine. If youll decide to set up some kind of remote environment, you can use DVC there. It works with AWS, which is the one that I think most people work with when they're handling their file storage. But you can use GCP in Azure as well. The main thing that I really like about DVC, there's a lot of stuff to it, but my favorite thing is experiments. So every time you run this DVCep run command, it takes a snapshot of your code, your data, and your configurations, and it stores this as metadata in DVC. And all of that is attached to whatever model you produce from this experiments run. So let's say we've decided to update a hyperparameter, and we run an experiment with this command that will be bundled together with our data and everything else we already have in place to tie to the model that we're going two get from that experiment. So basically, what we're looking at is something like this. A single experiment has our current data set that we just used to tracking our model. It has the hyperparameters that we use to tracking the model. And again, it has a model. If maybe you're working with some from production and youll need to do some kind of comparison. It's all lumped together in this one experiment, so you don't have to keep some kind of ridiculous spreadsheet stashed off to the side where you have a link to your GitHub commit. For this one hyperparameter value you changed, and then another link to this zip file on Google Drive that you can't change because it was just for this particular experiment. And then you have another link to some other git repo that has all of your configurations, because of course, that's in a separate repo, you don't have two do that anymore. It's all right there in DVC, and you can look at your experiments as you run them. So when you run DVCEP run, it's going to go through your training script, it's going to look at whatever dependencies youll gave, and it's just going to run that experiment. And once it's finished, you can take a look at the results from that experiment and decide which way you want to go from there. So in this example, we have some experiment we run. It has an average precision, and this rock AUC value and a couple of hyperparameters. So the average precision looks really good, but we want to do some more hyperparameter tuning because we think we can get something better and what we'll do, which is pretty common, we're going to set up a queue of experiments. So we have a bunch of different hyperparameter values that you can see over here under our train nest and our train min split columns. These are all the different hyperparameter values that we want to test out for this particular project. So we've queued up these experiments in DVC, and you'll see they gave their own ids associated with them, and they are in the queued state. We don't have any results yet because these haven't run. One really big advantage of queuing experiments like this is not only can you see the values before you run the experiments, you're also able to push these experiments off to some cloud environment. If you want to run them on a different server, use a GPU or some other resources. So we have those experiments queued and now we're going to run them all. So we'll use this exp run command and then we'll use this DVCEP show command to take a look at the results from all of those queued experiments. Now you can see all of the different experiments that were run with all of the different hyperparameter values. And you can see all of the results. Now take a second and imagine if youll had two document all of this manually. So you have to go back in that spreadsheet or in some kind of document and manually say when I had an nest of 26 and 355 and a min split of 355, these were my outputs. Then you have to attach the data somewhere. You have to attach the code somewhere. Now you don't have to worry about how everything is being track. All you have to do is take a look at this table, decide which experiments you want to keep working with, which experiments you want to share with other people. And you already have these results here just to share and look at at any time. No more managing all of those things separately. So let's say that you want to make some kind of plot to compare a couple of experiments, because you see some that just have some interesting results. Maybe you want to get a second opinion from somebody else on the team. So we have this DVC plots command. You can take the experiment ids you want to compare and it generates this plot for you based on the parameters you define. So all of this data is coming from either some kind of JSON file or tsv is generated typically within your training script. So DVC isn't adding anything new, it's just using the information that you provided with. But you're able to quickly generate these plots. And again, I want you to think about if you had that many experiments you had run and you wanted to create a plot like this, how much effort would it take to actually do that? I'm pretty sure it would take a little bit more than just one command. And yeah, it's just easy to use some of the tools that are already built to handle this stuff for us, but we're not done. We have more hyperparameter values to try out, of course. So we're going to queue up a few more experiments this time. Let's just say we're doing a random search, but we see the values is jumping around. This hasn't been anything we've tried before. Let's just see what we get. So we'll go ahead and run all of the experiments we had queued up, and we'll take a look at our table again. And now you see these new values. Well, it still looks like one of our earlier experiments gave us better results. But we might not need to use numbers quite as big as we did in the first experiment. So just being able to quickly look at these metrics shows you which direction you should take your hyperparameter tuning. Or maybe it tells you that it's time to try a different algorithm, or maybe it's time to try a different data set. Or youll need to slice up your data set different, or youll need new data points. But whatever it is, your next step will be. This is a very quick and easy way to see how youll experiments should guide your model training. And of course, we have to do hyperparameter tuning one more time, because we have all of these different experiments to run. We have all of these different values to try, and it's not uncommon to run hundreds of experiments in a day for a machine learning engineer or a data scientist. So we're using to queue up a few more experiments to see maybe how low we can get those values. Or maybe we just have another theory we want to test out from showing our results to somebody else on the team. And again, we'll run our experiments and look at the table, and we see some promise. So this one looks a little bit better than the previous one, and these values are definitely a lot smaller. So maybe we're getting a better feel for the range of the hyperparameter values, or maybe which hyperparameter values are the most important. So with something like DVC, you're just able to do this kind of whimsical experimentation without worrying about taking notes every 2 seconds. You're able to focus on finding that good model instead of having this eureka moment and no idea how to get back to it. And just to make sure that we are not crazy and we're looking at our values correctly, we might take another look at some plots just to see if these experiments are going in the direction we think they should. So you might share these with somebody else on the team. They might just be for you to get a range of what you should be expecting or what you should do next. Either way, DVC just makes it easy to do that and play around with your metrics in whatever way you need to. So these are all of the experiments that we've run over the course of this talk. And actually, these aren't even all of them. These are a few of them just from this table. So, as you can see, there's a lot of experiments that we can really fast, and we didn't have to keep track of all of it. You can see right here in the table that DVC has tracked every hyperparameter combination we've run, and we don't have to worry about it. With each experiments, it's taken a snapshot of the code and the data set that we have associated with it, and it's created those little bundles with our hyperparameters, our data, and our code to associate with each model produced by each experiment. And basically what that means, if you wanted to go back and redo any of these experiments, all you need is the experiment id over here, a few DVC commands, and you have the exact reproduction of the conditions that led up to the model. That is just this awesome thing you need to get out to production right now. But I hope that you see just how we can solve some of those problems that are in the machine learning community, and how we can use tools that already exist to do this heavy lifting for us. Please don't use spreadsheets to keep up with your machine learning experiments when we have stuff that'll do it for you now. But there are a few key takeaways that I hope you get from this. First, adding reproducibility to your experiments is important when it's time to deploy your model two production, you want to make sure that you have the exact same accuracy and the exact same metrics that you had while you were testing in production, so that there isn't any weirdness happening. And you need to roll everything back. And DVC is just one of the tools that helps you track every part of your experiments. Of course, there's still Mlflow and some others in the ML ops area, but you always want to make sure you have some kind of tool that's tracking every part of your experiments. DVC is, at least from what I've seen around, it's probably the best one. Just because it tracks your data changes too. So when you're dealing with data drift in production, you still have the exact copies of those data sets before the drift happened. So if there's any research you need to do that you can check it out. If there's anything you want to go back to and refer in your model, you can check it out. DVC just does all of this for you, and then don't be afraid to try new tools. I know a lot of times those of us who write code feel a need to build our own tools for every issue that pops up. Youll don't gave to do that. It's not cheating to use tools that are already there for youll it makes you faster, it makes it easier for you to have an impact, and it takes a lot of stress off when there's already tools out there. Even if you spend an hour or two and it's not quite what you're looking for, it's at least good to know that they exist just in case something pops up and you need it later. And I want to leave you with a few resources, so if you're interested in DVC, make sure to check out our docs. We have a very active discord channel, so if you want to drop in, ask some questions, say what's up to the mlops community, feel free to do that. And if you want to see a more gui type version of DVC, head to DVC studio here at Studio iterative AI and check it out. Hook up your GitHub repo and start running experiments. And of course, if you want to see these slides, you can go to my speaker deck link here and download them and get whatever you need. So thank you and I hope that this talk was useful for for you.
...

Milecia McGregor

Developer Advocate @ Iterative

Milecia McGregor's LinkedIn account Milecia McGregor's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways