Conf42 Python 2024 - Online

Balancing speed and accuracy in model development

Video size:

Abstract

When building a commercial ML model, it’s crucial to contemplate not only its predictive accuracy but also its speed, as users won’t patiently wait for results for hours. Let’s delve into how to strike a balance between these parameters to achieve optimal business value and ensure high customer satisfaction.

Summary

  • Ivan Popov will talk about balancing speed and accuracy in machine learning. The main factors that impact the model accuracy and speed are the complexity of the model architecture, the amount and quality of input data, and the hardware. In today's talk, he will give real world examples of this balance.
  • Get yourself a good data set with quality data and good labels. Data preprocessing is a part of model development where a lot of code is written. How to identify your model's needs you need to understand your business objectives.
  • Use numpy instead of pandas for data preprocessing. Feature selection is essentially the final step of data pre processing. By selecting the most important features and removing irrelevant ones, you can simplify your model and reduce the risk of overfitting.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to my talk. My name is Ivan Popov and today I will be talking about balancing speed and accuracy in model development. In the beginning, a couple of words about myself. I'm a data scientist at these boundary render. It's a fintech company based in London. I have three years of machine learning experience in fintech and computer vision sectors. I also have an extensive experience as a data engineer data analyst and I also completed project as a DevOps. Also earlier in my career I have created an online service and grew it to 80,000 users. So in today's talk, I will be talking about the two main factors when you develop a machine learning model, speed and accuracy, and how we can balance them. I will talk to you about how you can identify which things to focus on in your model and how to optimize your model when you have identified so when we talk about model performance, we usually think of model accuracy, how well it can make predictions. However, there is another angle, the speed with which the model creates, the prognosis. The main factors that impact the model accuracy and speed are the complexity of the model architecture, the amount and quality of input data, and the hardware. Please note that when I say accuracy, I don't just mean the percentage of the correct predictions. I use it as an umbrella term for all metrics such as f one score, rock, oak, iou, et cetera. In the ideal world, we want a model that has 100% accuracy and can return. These result in a nanosecond, and we have to balance accuracy and speed to achieve the best value for the business. In today's talk, I'm going to give real world examples of this balance and provide a step by step instruction how to identify your model's needs as well as the ways to optimize it. In some situations, speed is not the most important factor, for example, in academic research. In that case, the priority is finding state of the art model that can push the boundaries of science and advance the field of machine learning. However, when creating a model for commercial purposes, it is important to consider the experience of the end user and their satisfaction. In today's fast paced world, people have shorter attention spans and are not willing to wait for more than a few seconds for a page to load. So your model must be able to quickly return results to keep the customer engaged on your web page or in your app. Accuracy should not be compromised entirely for speed, as reliable and trustworthy predictions are essential to gain customers trust in your product or service. So let me provide you some real world examples so you can get a context in the loan industry, when a person looks for a loan on the aggregator website, the loan providers must return the quote within a few seconds, otherwise, their offer won't be shown. In this case, the speed is prioritized because this is usually not the final offer, and the underwriters can later review the case in more detail to make a final decision. But when it comes to ecommerce, instant recommendations require a stricter balance between speed and accuracy. A system that recommends products too slowly may cause customers to lose interest or seek recommendations elsewhere, while a system that recommends irrelevant products may result in poor customer experiences and lost sales. Imagine a heavy metal fan getting Taylor Swift tickets as a recommendation. That would be hilarious, but not for the ticket website. Medical diagnosis models are an excellent example where accuracy is more crucial than speed. Doctors usually spend considerable time examining and analyzing the outcomes before making a diagnosis. Therefore, the model can take more time to provide results as long as the accuracy is not compromised. As with many other things in life, the problem of balancing accuracy and speed can be solved with money. Investing in better hardware, such as cpus and gpus can improve these inference speed without sacrificing accuracy. However, it is important to carefully weigh the cost and benefit of each component before making a decision. Sometimes investing a large amount of money in hardware may only yield a small speed improvement. Additionally, as budgets are typically limited, there are only a few options for hardware upgrades. And again, like with many other things in life, not every problem can be solved with money. Upgrading hardware can certainly improve the performance of a model, but it won't fix issues that stem from poor data quality or feature selection. The accuracy of a model is heavily reliant on the quality of the data it's trained on. Furthermore, a model's architecture can also impact its accuracy and efficiency. If the architecture is too complex or simple, the model can suffer from overfitting or underfitting, respectively. This can result in slow inference times and poor accuracy, even with hive hardware. The choice of algorithm or learning method used can also impact a model's efficiency. Some algorithms may be inherently slow or may perform better only on certain types of data. For example, using a fully connected network for image segmentation may not be the best choice. It can be impractical due to the large number of parameters involved. In image, every pixel is a feature, and in a fully connected network, each neuron in one layer is connected to every neuron in the next layer, leading to a very high number of connections and parameters. This can result in a computationally expensive and memory intensive model, making it difficult to train and it will be prone to overfitting. So mastering the balance between model speed and accuracy can serve as a significant competitive advantage for your company. By determining which aspect is more crucial in your case and investing wisely in optimization technologies and techniques, you can fine tune your model to deliver the best output for the end user. This will give the business the flexibility to succeed in a fiercely competitive market. How to identify your model's needs you need to understand your business objectives first step in understanding how to optimize your model model performance with business goals and objectives so you need to answer the question, what is the purpose of the model? Is it for internal users or is it customer facing? What are the desired outcomes of the model? Is it to increase revenue, reduce costs, improve customer satisfaction, or something else? What are the key performance indicators, or KPIs, that the business is tracking? How does the model fit into those KPIs and who are the end users of the model? What are their expectations and needs? Let me give you two main scenarios for using ML models. The main ones are customer facing and internal customer facing applications. Hear speed is often more critical than accuracy. For example, in an ecommerce application, a recommendation engine that takes too long to recommend products can lead to customers losing interest and seeking recommendations elsewhere. Similarly for online chat bots that become more and more popular due to Chat GPT and similar. So for online chat bots, speed is critical as customers expect quick responses and don't want to wait too long for a chatbot to answer. For internal analytics, on the other hand, accuracy is often more critical than speed. Financial forecasting accuracy is crucial for making informed business decisions and in supply chain management, accurate predictions are necessary to optimize inventory management. In those scenarios, you can spend a longer time waiting for the result because the model can run overnight and you have a lot more time to get the correct answer. So let's go from more general things to the actual things you can do. First and foremost, get yourself a good data set with quality data and good labels. Of course, this mainly applies to supervised learning, but commercial models are usually supervised. The more data you get, the better, as long as you can ensure its quality. Let's say you're working on a model that classifies hundreds and digits. A good data set is a data set of handwritten digits that includes samples from multiple writers and different writing styles. It should also have a balanced distribution of digits, meaning that each digit occurs roughly the same number of times. All images have a clear label associated with these bed data set in this case would be the one that only includes handwritten digits from a single writer because these the model would be biased towards the writing style of that particular writer and would not be able to generalize well to other handwriting styles or if this data set was missing certain digits or labels for the images. It is always better to have a smaller good quality data set than a larger bad quality one because you can always use data augmentation to generate more data from data you already have. Good raw data alone is not enough to ensure a good model. The data needs to be processed to fit your model. This step includes data cleaning such as removing redundant data and null values, data normalization such as tokenization, stowboard removal, and embedding in NLP feature generation such as aggregations, onepot encoding, and finding trends like recurring transactions and financial data. Data preprocessing is a part of model development where a lot of code is written, and this is also one of the biggest sources of inefficiencies in the model. Of course, when you preprocess data for training, it won't impact the model speed, but remember that the data used for inference also must undergo the same preprocessing steps. So how do we find inefficiencies in data preprocessing? Well, the simplest way is to use time function in Python. You just surround parts of code with it and see how quickly it runs. But what if you have a large code base and your data preprocessing is spread across multiple classes and files? You can't surround all functions with time. It will be very tedious and messy. Luckily, there are out of the box solutions such as Python's built in, cpython and Yappy. Yappy is a profiler that is written in c, it's super fast, and most importantly, it lets you profile asynchronous code. It is my profiler of choice. Here is an example of these basic usage of yapi, where foo is a function you want to profile can be cost, method, or anything sophisticated. The best thing is that you will see all of the functions and all of the files that are called when this function is executed. Let's go over some basic things when it comes down to yappy. First of all, let's understand the difference between times that you can use. Clock types can be CpU time or wall time. Cpu time or process time is the amount of time for which cpu has been used for processing instructions of a computer program or operating system, or in our case, a function as opposed to elapsed time, which includes, for example, waiting for input output operations or entering low voltime is the actual time taken from the start of a computer program to the end. In other words, it is difference between the time at which a task finishes and the time at which the task started. When you're providing asynchronous code, you should use the wall time. Then at the bottom in green you can see the simple output of yappy. It includes function name with a function file. For more sophisticated programs it will be running. It will have a lot more functions and files in there. Then you will see n calls. N calls is the number of function calls how many times this function has been called. It's a great way to see if some function has been called a lot more times than you would expect. Then you would know that maybe there is a way to optimize it. T sub is the time spent in the function excluding subc calls. Function includes inside of it some other functions. The t sub will not account the time. So t sub usually is if it's big then you have a problem or it means that the function doesn't call other functions and then ttot is ttotal total time spent in the function, including subcools. Obviously if it's a function like main, it will have a very large total time. But then you need to go and see all of the functions that are inside main to see which one takes the most time. By using a profiler you can get a complete overview of how your code is running and which parts of it are the slope. This is the quickest and simplest way of finding bottlenecks in your code. Go and see some examples of inefficiencies that often happen in data preprocessing. It's no secret we all use pandas. It's great for data analysis and data preprocessing, and one of the most useful functions of pandas is apply. However, it's not the most efficient. While pandas itself is a great package, apply does not take advantage of vectorization. Also apply returns a new series of data frame objects. So with a very large data frame you have considerable input output overhead. A couple of ways to solve it is to instead of using apply, try using numpy set, especially if you are just performing operation on a single column independence data frame. Alternatively, you can find a simpler multiply column by two. It can be done with these built in function. Also, if you want to use apply to multiple columns in the pandas data frame, try to avoid using access equals one format in apply and write a standalone function that can take in multiple numpy arrays as inputs and then directly use it on the values attribute of the panda series. Sometimes you can be performing calculations more times than needed. Sometimes you may have metadata in your data set like gender, city, car type, and you can be performing a calculation for every single data point. While you only need to perform calculation once per these. So you should consider using and filtering in pandas and only performing a calculation once per group. This could significantly improve the speed of your data pre processing. Finally, wherever possible, it's best to use numpy instead of pandas. While pandas is very user friendly and intuitive, numpy is written in c and it's the champion when it comes down to efficiency. Feature selection is essentially the final step of data pre processing and it has a very large impact on the accuracy and speed of a model. As the name suggests, it's the process of determining which features in a data set are most relevant to the output. Your first instinct may be to take all of the data and throw it in the model because just a minute ago I told you the more data these better, but I was talking about the number of data points, not the data from each particular data point. By selecting the most important features and removing irrelevant ones, you can simplify your model and reduce the risk of overfitting. This not only improves the accuracy of the model, but also makes it more efficient and less complex, which can be critical for real world applications where time and resources are limited. So one of the methods I users for feature selection is sharply additive explanations or sharp values. They are a way to explain the output of any machine learning model. It uses a game theoretic approach that measures each player's or features contribution to the outcome machine learning. Each feature is assigned an importance value representing its contribution to the model's output. Features with positive shock values positively impact the prediction, while those with negative values have a negative impact. These magnitude is a measure of how strong the effect is. When I say positive or negative, I don't mean good or bad, I just mean plus or minus. Sharp values are model agnostic, so it means they can be used to interpret any machine learning model, including linear regression, decision trees, random forests, grading, boosting models, neural networks so they are universal. Obviously for more complex architectures it is harder to calculate them and increases the number of calculations. So even though they can be used for neural networks, for example, these best work for simpler models like gradient boosted trees. Shack values are particularly useful for feature selection when dealing with high dimensional complex data sets. By prioritizing features with high shack values, both positive and negative, we are looking at magnitude here, you can streamline the model by removing less impactful features and highlighting the most influential ones. You can make the model simpler and faster without sacrificing the accuracy. This method not only enhances model performance, but also helps to improve the explainability of a model. It also helps understanding the driving forces behind predictions, making the model more transparent and transworthy. You can say that using sharp values for feature selection is a form of regular prediction, and you will not be wrong. That's pretty much it. What's best? Sharp values do not change when the model changes unless the contribution of the feature changes. So this means that sharp values provide a consistent interpretation of the model's behavior even when the model architecture or parameters change. You do not need to study game theory to calculate sharp values. All necessary tools can be found in a shape package in python, and using it you can calculate shapalis and visualize feature importance, feature dependence, force, and make decision lets. And for example, the visualization you see on the slide right now is directly from d. When we talk about machine learning models today, we usually talk about llms and transformers, and while they are great at many tasks, most businesses don't need such sophisticated architectures for their purposes, especially because llms are very expensive to train and maintain. And most tasks even today, can be easily executed using much simpler models such as gradient boosted trees like Xgboost and LightGBM. Xgboost and LightGBM are two of the most popular gradient boosting frameworks users in machine learning. Both models are designed to improve the speed and accuracy of machine learning models. Xgboost is known for its scalability and flexibility, while LGBM is known for its high speed performance. Xgboost is a well established framework with a large user base, and LGBM is relatively new, but has gained popularity due to its impressive performance. Xgboost has been widely used since its release in 2014. It is flexible because it can handle multiple input data types and works well with sparse data. Xgboost has an internal data structure called dmetrics that is optimized for both memory efficiency and training speed. You can construct a dmetrics from multiple different sources of data. Xgboost also has a regularization feature that helps prevent overfitting common problem in machine learning. However, Xgboost can be slower than other models when dealing with larger data sets. This is because when training a gradient boosted tree model, Xgboost uses level by tree growth, where it grows the tree by one level for each branch in what is known in depth first order. This will usually result in more leaves and therefore more splits, as in a computational overhead. So for each leaf, as you can see on the diagram, for each leaf on the level, it will grow even if it's not needed there. On the other hand, LGBM is known for its lightning fast performance. This is because when training a gradient boosted tree model, LGBM uses leafwise tree growth where it grows. The tree using a liftwise approach uses best first order by constructing the splits for each branch until the perfect split is reached. LGBM is designed to handle large data sets efficiently and in certain cases can be much faster than Xgboost. LGBM also has a feature that allows it to handle categorical data efficiently, which is a significant advantage over HgBoost, and it also has a built in regularization support to prevent overfitting. However, LGBM can be more memory intensive, which can be a problem when dealing with larger data lets with limited memory resources. There is no obvious way to choose one model over the other, so you'll have to experiment and decide these result. Fortunately, the models can be set up and trade very quickly, and you can get the testing swiftly out of the way and move to optimizing the model for let's make a quick recap of today's balancing speed and accuracy often depends on the context or the field of application. We may need quick results in user focused applications and on the other hand, require the highest degree of accuracy in fields like medical to identify your model's needs, align its purpose with business goals, define desired outcomes, and consider KPIs. Tailor the model to meet user expectations, prioritizing speed for customer facing applications and accuracy for internal analytics. Look at your particular case and sometimes in internal applications you also need speed, and in customer facing applications you need accuracy. Make sure you acquire a robust data set with quality data and accurate labels. The quantity of data is important, but the emphasis should always be on maintaining its quality. Consider experimenting with simpler models like Xgboost or light GPM instead of complex architectures like llms and transformers. These frameworks, known for enhancing both speed and accuracy, can be suitable for quite a variety of use cases. And when you have a simple model that makes accurate predictions but works too slowly by looking for the bottlenecks in hours code using these profilers such as cprfile or Yappy, the most frequent place for the bottlenecks is the data preprocessing step. Thank you for joining me today and I hope you find this talk useful. Hopefully see you in the future.
...

Ivan Popov

Data Scientist @ Abound

Ivan Popov's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways