10 Things That Can Go Wrong with ML Projects (and what you can do about it)

Video size:

Abstract

Machine learning practitioners are solving important problems every day. They’re also experiencing a new set of challenges that are unique to ML projects.

This session will cover what to watch out for in terms of building a model; model accuracy; transparency and fairness; and MLOps.

The good news is that there are solutions. Attendees will hear about best practices and tools that will help address these issues.

Summary

Machine learning practitioners are solving important problems every day. Today we're going to talk about ten things that can go wrong with ML projects. The issues fall into four categories: building a model, model accuracy, transparency and fairness, and mlops.
It's key that your machine learning model is aligned with your goals. Having users involved at each stage of your project is important. Set milestones that you can adapt and change as you go. These insights will help ensure that you're getting value out of your AI projects.
The next thing I want to focus on here is that your problem needs to be a good fit for machine learning. The first group is predictive analytics. There's another class of problems around unstructured data. Finally, personalization, where you want to understand how your users tick.
A huge problem can be jumping straight into model development with these prototype. A quick prototype can tell you a lot about challenges. Bigquery ML allows you to create machine learning models directly from the data warehouse in Bigquery. Use AutoML to handle the training, deployment and serving steps.
Problem number three, model training taking a long time. Serverless training with Vertex AI can be very helpful for this. Another option as far as training quickly is cloud tpus or tensor processing units. They allow you to create models very quickly at high scale.
Many machine learning tasks have many more examples that fall into one category than the other. Techniques can be applied to make sure that your accuracy is good across both of the classes. This really can be where projects might not even succeed.
Data scientist: Look at improving domain expertise for the problem you're trying to solve. Feature engineering of course is always useful to unlock information from some features. Also try different model architectures. Always doesn't hurt to try automl to see what kind of performance is possible.
Your model doesn't serve all of your users well. Here are a few questions to ask as you're building your model to help. What is the problem you're trying to solve? Who's your user? Things like the risks, success factors.
How was the training data collected, sampled, labeled. Key to ensure that you're collecting data and those sampling and labeling processes ensure you have a representative sample. What if tool allows you to slice your data by various factors. Tensorflow model analysis can produce estimates of performance by slice of the data.
On Google Cloud, we support explainability across multiple layers of the platform. From your notebook and model cards allow you to document your model. And explainable AI can tell you which features are driving the decisions.
Pipelines allow you to build a custom machine learning pipeline. Next issue, your model accuracy is drifting downward. Vertex model monitoring can help around detecting these issues. Finally, how do you handle spiky workloads that may result?

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello, I'm Karl Weinmeister with Google's developer app SC team. Today we're going to talk about ten things that can go wrong with ML projects. Let's get started. So machine learning practitioners are solving important problems every day, running into a set of unique challenges. Today we're going to talk about the best practices and tools that can help address them. The issues we're going to discuss fall into four categories, building a model, model accuracy, transparency and fairness, and mlops. So let's start with our first problem. It's all about these business problem you're trying to solve. So many organizations are transforming and really changing how they do things with machine learning, but we see other companies that are really struggling to get value out of those machine learning projects. So it's key that your machine learning model is aligned with your goals. It's also important that when you're figuring out if you're doing well, you have a baseline where you can evaluate how your model is doing. You need to know how existing approaches are working, whether they're manual, whether they're implemented by traditional software development systems, or in a previous form of machine learning. You need to know what your starting point is, to know how much you're improving based on that. So it's key to know what that baseline is and the goals. So let's talk a little more about that. So I recommend watching this video coming from DeepMind, Google's research organization, where there's a great video on product management for AI and the speaker talks about a lot of topics. It's about a 30 minutes video. Some of these key takeaways that I took from that were staying focused on your tools as a project, but being flexible on the tactics. It's inevitable that you're going to run into things that don't work along the way. So you just need to keep adapting but not losing sight of what your ML solution is aiming to achieve for your users. Secondly, from a scheduling perspective, it can be hard with machine learning because it's often a large research and discovery. Part of the mission is where you start things, where you're not sure exactly how things are going to work. You might not have an answer. You're learning a lot as you go, and that's hard to plan around. So setting milestones that you can adapt and change as you go is fine, but it's important to plan things out with milestones as you go to sort of have some structure on your project. Finally, having users involved at each stage of your project is important. So you may still be doing a lot of research work, but having some early insights and lessons to share with your users to ensure you're on track is often very valuable. So some of these insights will help ensure that you're getting value out of your AI projects. The next thing I want to focus on here is that your problem needs to be a good fit for machine learning. This isn't an exhaustive list, but here are a few things that you might want to keep in mind when you're wondering, is this an area where machine learning can help? The first group is predictive analytics, and these are problems. Theyre you have historical data and you want to look at trends that are going to happen in the future. Whether it's looking at past, say, transactions and trying to figure out if a new one is fraud, looking for equipment, if it's going to fail over time by looking at information about heat or vibration, et cetera, being able to extract those patterns for when the systems are going to fail, being able to fix them beforehand. There's all kinds of situations like this. We're using historical data to predict some new future. There's another class of problems around unstructured data. This is where you have data that's not in the tabular format that fits into, say, databases. This is where you have images, text, et cetera. There's a variety of different use cases here, but just a few examples of things like maybe triaging emails if there's a large load for the customer service organization, being able to cluster and move items to the right place. Another area could be automation, where you have a manual process and you're trying to automatically fulfill some step of that process, and you see a few examples of that. Finally, personalization, where you want to understand how your users tick and you want to be able to provide them useful information. Next steps that help them in your application to achieve what they're trying to do faster, easier, et cetera. All right, so let's move on to the next part of building a model. A huge problem can be jumping straight into model development with these prototype. A machine learning project is an iterative process, so you start with something simple and you refine it as you go, and many times something simple if it achieves your goals, is fine, but it's always good to start small and expand from there. A quick prototype can tell you a lot about challenges. Those could be access to data as you start to build out that initial model. Maybe there are teams you need to work with to request data or integration points that you weren't aware of. Maybe you really struggle with getting a decent model accuracy. There's all kinds of other questions you can find out and that will help with scoping out the length of the project and be able to understand what you're getting into. Starting with that prototype, there are a couple of tools that can help. So let's first start with Bigquery ML. So Bigquery ML allows you to create machine learning models directly from the data warehouse in Bigquery. So you'll see a little bit in this animation where you can write some SQL statements for deep neural network models, logistic regression, all these different types of models, even time series forecasting. It allows you to get started quickly. By the way, this is a full fledged production system as well. If you want to run your models out of bigquery completely, a great option for that, but allows you to move quickly, try some things out, see what your baseline accuracy is. Similarly, AutomL is another great option. If you want to write a custom model, theyre you take your own data. Use AutoML to handle the training, deployment and serving steps, and then generating a rest API. It can wrap all that into one user interface or SDK to enable you to do all these steps quickly so that can serve as performance baseline. It also can help with explainability where you can get feature importances so you can know maybe where AutoML is focusing in theyre there's signal. Maybe that's an area where with your data engineering, you can dig in a little bit to extract some more features from it. Problem number three, model training taking a long time. Not sure if you've ever run into this. If you're working with a larger model where it can take days or even weeks in some cases to run, and that just really mlops down your team where you might reach a point where you just have to wait until tomorrow to find out what you can do next. So that slows down that process of innovation and can really harm the success of your project. Serverless training with Vertex AI can be very helpful for this because it allows you to submit training jobs across a distributed infrastructure using gpus and other custom chips that you'd like to use, and building models in all different kind of frameworks. Using a container image as your base allows you to do even things like hyperparameter tuning where you can create multiple models and find out the one that's working best. And this really allows you to speed up that training time. And as well as you see here in these screenshot, things like getting access to logs, downloading your model storing it in these cloud and managing on the cloud it will take care of for you as well. Another option as far as training quickly is cloud tpus or tensor processing units. And those are custom chips that are built for machine learning workloads. They allow you to create models very quickly at high scale, and can speed up that training even more if that's something you want to use. All right, let's move on to the next group of issues around model accuracy. So, first type of issue could happen when you have an imbalanced data set. So many machine learning tasks have many more examples that fall into one category than the other, right? So let's take fraud detection as an example where fortunately, say, most of the data is benign, theyre they are not fraudulent transaction, and you have a few that are. It's like finding a needle in the haystack. So we could have a trivial model that simply predicts that everything is benign and I would have a good accuracy, but that's not going to add any value, right? So what you want to do is apply some techniques to make sure that your accuracy is good across both of the classes, even if there's not a lot of data for them. So let's look at this next resource here, which is a tutorial for dealing with imbalanced data. So this is on the Tensorflow website, and it provides some of these different techniques. So things like weighting different classes differently, basically applying a greater penalty for mistakes on, say, the class with fewer examples. Other things you could do are oversampling and undersampling, where you take with the existing data you have, you basically can duplicate some of those records so that there's a more balanced number between the classes. Or conversely, under sampling, where you remove examples from the class with more. Finally, you could consider generating synthetic data. So there are packages in Python, like one called smote, for example, that can look at the distributions of your data and generate data that's similar to what's in your training sets. All things to try. Personally, I've had the best success with waiting classes to help with that issue. And automl has some ability to help with this as well, without even changing some of the class weighting that we discussed previously. So when you're training a model, there's something called an optimization objective, and this is what you're optimizing for. And so you can see here there are several different options. And if you switch to something called the area under the curve for precision recall, that is generally better for helping with that class with fewer examples in it, but you see there's a spectrum of possibilities depending on if you're trying to maximize accuracy for all of the data or create a more balanced result, et cetera. You can just customize using this. It's under advanced options for model evaluation. There is something you can do too, is where you create a model, and then you can review the accuracy at different thresholds. You can then review the confusion matrix. And if you see that below, this is an example of flight delays. And again, very good to see that most of the time, flights are on time, although it might not always feel that way. We can see here that there's definitely a difference in the accuracy with flights that are delayed, a little bit harder to pick those out with this particular model. So you could go through this process, look at the results for each of the different classes, and then perhaps come back and adjust that optimization objective if you'd like. So, model accuracy, this is a huge one. And this really can be where projects might not even succeed. They get completely stuck, right? So you are creating a model and you just get to a point where you can't improve the model accuracy anymore, and it's not good enough. It's not going to really add any value for the business. And sometimes it just is what it is, that it's a hard problem to solve, and there's not that signal available in the data, but often, with some creative thinking, you can move past those obstacles. So let's go back to this example of flight delays. So, on the left, I have a research paper that talks about historically, what are the reasons for flight delays. So I was actually looking at modeling this problem, and I started with data around the start and end times of the different flights, the carrier, et cetera. And that gave me some information. But what I did was I augmented my data with weather information. So I took a bigquery public data set of weather. I had Latlong coordinates, I joined that against the airport. So I know, okay, well, the arriving flight is going to have hail in it, things like that. And that definitely improved my model. No, it actually wasn't a huge increase. And this data kind of shows that 6% of the root cause of flight delays and cancellations due to extreme cancellations, but the biggest one is aircraft arriving late. And this would be an example of if you have better data, you might be able to work on some data engineering to look at the whole flight graph and what flight is coming into the flight that you're trying to predict, even multiple flights back, using that as information. So the point here is really understanding these problem is key. It's not just about the algorithm and the math that you can often make a much bigger difference by understanding these domain that you're in. So that's my number one tip is look at improving domain expertise for the problem you're trying to solve. Really dig in, ensure you have the right experts on the team and as a data scientist learn the domain the best you can and you'll probably think of some things that are going to help you. Secondly, and sort of related to things, including more data, that always helps improve your accuracy, of course, and varied training data of different types for different tables that are going to add some diversity there for your model. Feature engineering of course is always useful to unlock information from some features. Maybe I'm making this up. You have a date field and you want to extract whether it's a weekend or a weekday. There might be different patterns for that, all kinds of different things that you can do with your data to ensure that your model can take advantage of it. So looking at some of the other things, consider removing some of the features that are causing overfitting where you're sort of locking into noise. I might suggest with what we talked about earlier, start with a smaller model and then incrementally add features back. Also try different model architectures. This is traditionally what we do in data science. Try a different model architecture, different number of layers, hyperparameter tuning, ensembling your model after you've done some of these more fundamental things. Finally, just for a gut check, always doesn't hurt to try automl to see what kind of performance is possible with the input data that you have. Let's move on. Transparency and fairness, our 6th issue, your model doesn't serve all of your users well. And so this issue is a very important and complex issue that when you look at how you're dealing with AI development, you want to look at a responsible AI framework to help with this. So here are a few questions to ask as you're building your model to help. So the first is around some of the business questions we asked before. What is the problem you're trying to solve? Who's your user? Moving on. Things like the risks, success factors. Now we're starting to move on to data. How was the training data collected, sampled, labeled. There's all kinds of issues that can pop up into this phase of the project. It's key to ensure that you're collecting data and those sampling and labeling processes ensure you have a representative sample. I'm not going to go through all the points here, but it's worth, as you go through the training problems, the evaluation process, you are considering all of these important questions when the model might have some limitations that you want to document and you want to document how you collected your data from end to end, so that you can continue to assess where you're at from a responsibility perspective and keep improving on it. There are a couple of tools that I'd like to mention here that can help. The what if tool allows you to slice your data by various factors to see why predictions happen the way that they did. This can give you a much more detailed understanding of your model accuracy versus a simple statistic like it was 98% precision or something like that. And you can also things is why it's called the what if tool. Actually change some values and see what happens. If I change a value slightly, does that change my prediction? So it's almost like a debugging tool for your model. Tensorflow model analysis is also helpful. So what this can help do is it can produce estimates of performance by slice of the data. So let's take a look at what that really means. Here is an example of Chicago taxi trip data, where we're estimating what the tip is going to be for a taxi trip. And you can see some statistics here. The bar graph shows you the number of samples at different hours of the day. So we're seeing that 05:00 in the morning, 06:00 in the morning, much lower number of trips. Well, that might impact the accuracy. It's an example of not having a balanced data set, so it's actually going to slice per hour and give you statistics on that accuracy. So this will allow you to see by different dimensions of your data how balanced the errors are. So it can be a very useful tool. So we talked about assessing the accuracy and equitability of the model. Now let's look at how to document it. So ML models often get distributed as is. Here's these model, it just works. Okay? And think about it. With software development, they always have tutorials, documentation explaining each part of the user interface, and glossary, et cetera. Let's think about these same concept applied to ML models. Right? So first is being able to explain what's happening under the hood. So for a variety of different data types, it's important to look at why these predictions happened the way they did. And explainable AI can tell you which features are driving the decisions. So you see some examples here, maybe in an image. What is it about the image that was critical to making that determination, or in, say, in tabular data. We see that distance was the most important factor in our model's predictions. Today on Google Cloud, we support explainability across multiple layers of the platform, from automl to prediction to using our sdks to perform explainability. From your notebook and model cards allow you to document your model. You can specify information such as how you collected the data. You can put graphs around your performance curves, the model architecture. This is what we've done for the object detection API. And you can see there's a model card toolkit that allows you to generate these model cards based on information or model, or even attach it to your Tensorflow extended pipeline to generate one of these automatically. All right, so our final class of problems, ML Ops, are machine learning operations. So what if you built a model that just was a bad model? It was built on some training data that had just a bunch of data quality issues, and it somehow got into production. All kinds of users were impacted by that. Definitely not something that you want to have happen. Vertex pipelines can help with that by allowing you to codify the set of steps to build a machine learning model and providing guardrails around deployment so you can implement steps, or rather processes like continuous integration, continuous deployment, and include tests along the way for each of these different steps. Pipelines allow you to build a custom machine learning pipeline. Next issue, your model accuracy is drifting downward. So most software projects you can have or should have unit best on them, where you can evaluate whether your code is working or not, and the unit test will tell you if it's working or not. It's binary, it works, or it's broken. It's a little more subtle with machine learning, where things might drift slightly, the data distributions, for whatever reasons, for whatever you're modeling over time, may change due to outside conditions changing. So how do you detect and manage that? There are a couple of processes to consider. One is continuous evaluation, where you're regularly sampling your model's predictions, comparing those to ground truth, and these assessing the accuracy. Another thought is continuous training, where you deploy an NML pipeline that extracts the data, trains a new model, tests the model, of course, and then deploys it to production. And each of these different steps can work together to look for issues when you've drifted away from a certain threshold and then preventing the issue by training models on a recurring basis or when a certain amount of data changes. So you want to find that right sweet spot, not create too many models with every small change that's happening, but not let your models get too stale where their performance starts getting impacted. Vertex model monitoring can help around detecting these issues as far as drift or training serving SKU. So maybe you're starting to see where your users are making a lot of prediction with data that's much different from what you originally trained on. That might be a warning signal that your model isn't quite as applicable as it could be. So this gives you an additional layer of confidence in your model reliability. Now, our final issue is around model inference. So this is when you've built a model, it's deployed and people are using it, or they're making inferences or predictions on your model. So it's a success. You solved the problem at the accuracy level you were looking for. It's integrated into widely used application. Now, how do you handle the spiky workloads that may result? And how do you avoid over provisioning infrastructure while preventing errors or high latency? If you don't have enough infrastructure set up, vertex prediction can help with that, because that can allow you to set up an online endpoint where you can serve your model, and it will scale automatically based on your traffic. So you can set up, say, a minimum number of nodes, a maximum number of nodes. It will scale those up and down based on various utilization thresholds. It will help you with logging, it'll provide you the option of using some powerful GPU chips, and so you'll be able to ensure that you're serving the right amount of requests with vertex prediction at an optimized cost. So that wraps up the ten different issues. I hope that was helpful. Let's look at a few resources. Vertex AI is the AI platform that we discussed today that can help with several of these problems. Codelabs are a way to dive in and use notebooks and building models and basically get some training. They're free resources at Codelabs, at developers google.com, and if you're into learning via video, like this one, AI Adventures is one of our video series that has a lot of different resources around using Google Cloud for AI. And that concludes our presentation today. So I thank you for watching. I hope you have a great day.

Slides

Download slides (PDF)

See all 23 talks at this event!

Conf42 Machine Learning 2021 - Online

July 29 2021

10 Things That Can Go Wrong with ML Projects (and what you can do about it)

Video size:

Abstract

Summary

Transcript

Slides

Karl Weinmeister

Engineering Manager - Cloud/AI/ML @ Google

Join the community!

Featured event

2025

2024

Info

Conf42 Machine Learning 2021 - Online

July 29 2021

10 Things That Can Go Wrong with ML Projects (and what you can do about it)

Video size:

Abstract

Summary

Transcript

Slides

Karl Weinmeister

Engineering Manager - Cloud/AI/ML @ Google

Join the community!