Conf42 Machine Learning 2021 - Online

Deploying ML models and all the things that go wrong

Video size:


It’s no secret that the deployment of the Machine Learning models conceptually is far from training those models and requires a different mindset. Some deal with it by having people dedicated to work on deployment since there’s actually a lot to do even when the model is still in the development phase and some just expect data scientists to do everything from modeling and analysis to deployment and monitoring.

In this talk I’d like to share my experience with deployment starting from 2017 as well as the lessons I’ve learned. Wait for a couple of wild and sometimes embarrassing stories but at least (oops) I didn’t do it again.


  • Today we are going to talk about deploying machine learning models. During model training you're basically focusing on having the model and making it better. During deployment you are focusing on preserving your model. It's highly important to keep the quality of your code high, also to write tests.
  • The first story is about scaling, and basically it's about the moment when I developed my first ever model. I think I deployed more than 20 models using that prediction service and trying to scale that. The lesson that I learned there is that it's super important to plan your deploying thoroughly.
  • When there is just one person responsible for something, it never ends well. Delegation is super important, like delegate if you can. Another important issue here is synchronization of tasks. And it's important to remember that you are not alone in the company.
  • So the next story is about data scientists and no need for tests. We deployed prediction service without tests, like zero tests. Software engineering doesn't require as much tests as data scientists do. Also, it's easier to write tests than to fix bugs in production.
  • Even some quite big companies have this habit of not monitoring what they are doing with their models. Monitoring is crucial for so many things. It helps you to understand when you need to improve your solution. Data scientists work is not just about building models. It's more about finding the answer to questions.


This transcript was autogenerated. To make changes, submit a PR.
You. Hello everyone. And today we are going to talk about deploying machine learning models. Models. And I will tell a few stories from my experience, mostly from the beginning of my career as a data scientist. And probably you will find yourself in things story. So you can probably learn some of the things if you are lucky and you didn't have any failures like that in your career. So first of all, we all used to train the models because this is like the main task of us as a data scientists. We all know all of these stages like data gathering, like exploratory data analysis. Then we do feature engineering and we train our model. We can fine tune it by optimization of hyperparameters of our model and also we evaluate our model. This is something that we are so used to and into things process. And we have a specific mindset, I would say a researcher mindset. And we are focusing not just on the end solution like an accurate model, but also we are trying to keep the experiment right and checking everything at every stage. And basically deployment kind of requires another set of skills, another set of oops and another mindset as well. For example, in most of the cases you probably will be building some kind of a prediction service and it communicates with end user application or with the system. Depends what kind of problem do you solve and how your solution is integrated in the system in general. But the main idea is that during model training you're basically focusing on having the model and making it better, improving it all the time. While during deployment you are focusing on preserving your model because you don't want to get different responses and in form of prediction scores from the same model for the same input data. And this is like one of the main concerns to think about and to look at during deployment. And also there is completely different set of issues you have to focus on during deploying, for example infrastructure. And you don't really think about infrastructure when you training models like, yeah, you have to think about capabilities of your service or how you can deploying, what kind of models do you train. But still at that moment you don't think really, but production service, like what's those load on that and what you can do about that, how you can plan and organize infrastructure for your solution as well as you don't have to think about integration of the models with the whole system because you just work separately on your research and you're basically alone with your own environment and you can do all kinds of experiments that you want and you don't have to think about that moment, about the influence of what you do on other components of the system, as well as the risk quality of code, which is again not that important during research because I wouldn't say that it is unimportant, but it's more like not a focus of our research because we are focusing on the experiments. And sometimes it can get messy, especially if you are using Jupyter notebooks, as you may know that already. But for deployment, it's highly important to keep the quality of your code high, also to write tests. And in my opinion, tests for data scientists is crucial, especially for deployment, because again, during research you have this kind of luxury to be able to check your data set, see how the data changes, and if something goes wrong and the calculations are not working as you expected. But you don't have this luxury during deployment. And somehow you have to be able to evaluate whether your calculations conducted in an expected way and to save the states of your data somehow and to check that. Also there is logging, which is again, not really required during the research, probably because quite a lot of data scientists come from a research background, from applied mathematics and so on. Usually we don't really have a software engineering background and that's why we use such tools which don't really require login. Like for example, if all of us were using Pycharm, for example, probably we would use some kind of login to track the experiments. But if we use Jypton notebooks, we don't need that because we can look at the data at any moment and see what's going on in there while during deployment. Again, it's unfortunately impossible and we have to think about other ways to handle that. I might share of my mistakes during deploying of machine learning models, and unfortunately, quite a lot of them I did in production. But that's how you kind of learn a lot of things, and sometimes you learn them the hard way. But let's hope that we're not going to repeat that ever again. And you can learn by just listening what I'm going to talk about instead of going through that by yourself. And the first story is about scaling, and basically it's about the moment when I developed my first ever model and it was my first job as a data scientist and I needed to deploy that. And since I was working in a startup, and you know how sometimes it happens when they have really tight deadlines, things fast pacing, startup culture, investors coming to the office and training to get everything what they need in a few days. So it's quite a lot of pressure, and that's why especially if you are doing your first project as a data scientist in a startup, you have to at those same time do something valuable, but also do that fast because it's really hard to prove that this kind of investment worth it. And that's why I would say a lot of mistakes is happening for many different data scientists in their careers. And for me the story was but deploying my first model along with my teammate and we just basically created a prediction service which was really all built around just one model and it wasn't really able to scale ever by any other model. And scaling it was like super inconvenient. It wasn't really made for it, it was just made for one model. But in reality, I think I deployed more than 20 models using that prediction service and trying to scale that. And it was a big pain because again, it wasn't really made for it. I constantly had different issues related to the fact that I didn't really try to plan that thoroughly because I was out of time, I had deadlines and I felt like, okay, I'm going to just deploy that kind of model and I will think later about how I will be deploying other models. But in the future kind of gets harder and harder to prove that you need more time to change something, especially to do something from scratch. So basically the lesson that I learned there is that it's super important to plan your deploying thoroughly. Like try to communicate with different teammates from other teams as well. Especially if you don't really have an experience in architecture or experience in software development. Try to communicate with different people who are more experienced in that venue and they can help you to create great architecture, great plan, or even take some of the tasks that you are supposed to be doing for those first time ever and help you out with that too. Also, there is always a possibility that you probably will scale your solution. And even if you think that you're going to make just one model and you're never going to make any other model again other than things prediction service, probably you will have to maintain that. And by that it means that you will have to release different versions of the same model and retrain it. So it means you will have to add other versions to the same service. And if it's not made for it, it's going to be a huge pain for you for a long, long time. Also, it may seem that it takes quite a lot of time to plan a good prediction service for deployment, but actually if you do that later, it takes even more time and it gets even more expensive. Like I said in my story, I deployed over 20 models using this service. And already at that time, we all understood that it was our own way to do that, that it was inconvenient, that it was holding us down, that wasn't great in terms of resources, usage and so on and so on. And when we started to build another solution for that, and basically we just had to throw away everything that was done before and create something completely from scratch, which took really a lot of time. And especially it was hard since we had a lot of models already running in production and we had to keep them consistent, which was again, a big task in terms of testing and the implementation. And it was harder to move for all the models to another service. So it just seems in the beginning that you're having too much time to plan that, but it's never too much time to get rid of the huge headache you might have in the future. So another lessons that, another story that I had was about the time when I became a team lead in a team of data scientists. And I'm a perfectionist. And for me, it's really hard to kind of give out some of my tasks. And at that moment, I was just participating in model development as much as all of my other teammates. I wasn't just managing a team. And that's why there was quite a lot of things which I was just doing by myself. And because there was a lot of pressure, no time, and I was really worried about the quality of what I was doing. I was taking some of the things, just only not trying to delegate that anyhow. And there was a time when I went on a vacation and I remember it was like, it could be great to having out with my friends in Barcelona, in the park and have fun and enjoy the weather. But on the contrary, I had to go through the work chats and try to help my team debug some of those bugs that occurred in production, which I couldn't help them with because they were seeing them for the first time and they haven't had any experience with that because I was the one maintaining this deployment service and I didn't let anybody else do that because I was sure that I will be the one who will make it the best way. And that kind of taught me that you have to split your work somehow, mostly because you are human being and sometimes you're going on vacations, sometimes you get sick, and so on and so on. And when there is just one person responsible for something, it never ends well. So talking. But the lessons that I learned is that delegation is super important, like delegate if you can. It's more about managers, I would say, than people who just work on their tasks. But again, for data scientists, I would say it's also quite important because often we do so much work, which is usually done by software developers, by QA engineers, by DevOps, and we can just learn to cooperate with different types of people instead of trying to put all the hats on our head and training to do everything ourselves. Another important issue here is synchronization of tasks. And honestly, it feels great when you can actually delegate something and feel like your team is working as a one algorithm, I don't know, as a process. And even if you are not here, it's something that is working without your participation, and it's great not just for you, but also for the product. And again, it kind of helps to feel like you are part of the team instead of jugging everything by yourself. And as I said, you can't be everywhere at once. You can't always respond to everything, what is happening. So even though it's not super popular among data scientists, but the models, deployment services, they all should be shared in the team. Another story is about not being alone, and by that I mean not just being focused on having your team of data scientists, because there are so many people with different roles around you as well. And we had at some point a roadmap and it was like three months of a tough work and we had to focus on quite a lot of tasks. And other teams like back end development, front end development, they all had their own tasks as well. And we all were on a tight schedule and we didn't even thought at that moment that since we are cooperating just several times with those back end team, that we will have to include them and to tell them that they should be doing some of the small things for us. And our tasks depend on what they do. And so it happened that since it was too late to do something earlier, it was like just the last night of the deadline, and we're all sitting together and trying to do everything with backend engineers just because I totally forgot that me and my team, we are not just alone working on our solution. We are doing a part of the system and we had to cooperate with really a lot of people, with key engineers as well, with DevOps, and especially with backup engineers during deployment, because our component was just a part of the system which communicated with another part of it. And they needed to send us requests, we had to respond to those requests and they had to process somehow our responses. So after that, I think I always created a specific task in Jira, which included communication with the backend engineers, planning together a lot of stuff that where help is required. So basically the lessons that I learned is that you shouldn't just rely on your team. There is a lot of things that you do in a team. And again, taking all of the roles for yourself, it's not efficient. And probably you're not going to be that great in that instead of getting a help from someone who is more experienced in that. And also it's important to remember that you are not alone in the company and you're not just doing something like your part, which is, I don't know, the most important. Somehow you're all doing something together. Also, helping other people to understand what kind of work are you doing kind of helps you as well. For example, education of your team, and not just your team of data scientists, I mean, but the whole company helps them also to come to you when something happens and tell you what can we do for you? Instead of just being the one who forces things kind of cooperation and trying to push everyone to do something for you. You kind of, on the contrary, you try to build at once that kind of relationship and try to explain at once what is the essence of a data scientist work and how they can help you and how you can help them. So the next story is about data scientists and no need for tests. So prediction service that I was talking about in a few previous stories, we deployed it without tests, like zero tests. And since when we do the research again, as I said, we don't really use tests for that because we can check everything at any point. But during deploying we have so many things that we need to be attentive about, but not just the code itself, but also the data, how it is processed, how dispatched. There are so much things we should test about the states of data. And so what happened? It basically wasn't really just one story. I had quite a lot of cases like that when we had to debug something, trying to understand what went wrong. For example, a new category was added to the categorical variables that we used in the models and we didn't expect that. And everything just was breaking down as well as for example, you just get some recent values in the feature that didn't have any recent values when you trained the model. And again, something can break and you don't get any response from those model instead of handling all these cases. So the lessons that I learned is that basically, in my opinion, software engineering doesn't require as much tests as data scientists do, because we don't just work with those code, we work so often with data, and it's super important that everything that we do, but the data, every kind of feature is calculated the way it should be. Like we have to check different cases related to different ways, how we calculate the features, and there are so many things like, as well as testing later the distribution of the data, how it changes and so on. Also, it's kind of easier to write tests than to fix bugs in production, which kind of makes sense because sometimes when you're out of time, you have a tight schedule and you need to fix something really fast, and you don't have even logging, you don't have tests, you can't understand what is happening, you have to just debug through that. And it takes quite a lot of time, a lot of nerves, and it'd be better to kind of be prepared for that. And another story I wanted to talk about is about monitoring the models. Basically, it's about those past being in the past and but us just forgetting that we have to maintain our solutions. And it's interesting that even some quite big companies have this habit of not monitoring what they are doing with their models. And I had an experience like that when you come to the big company and they don't even monitor anything for years, and they only fix something when some of the clients tell them that something's going wrong, which I things is really bad approach. So one of the things that I did, again, things is those first time when I deployed models was that I didn't monitor how we perform. I monitored my metrics when I needed to. So I would say more like by request rather than doing it constantly. And at some point my models were performing worse and I didnt understand why because it was my first models and I was thinking maybe I should change something about how they are built, or, I don't know, change something about features. There were so many ways to do that, so many options, what could go wrong? And then I found out about such a thing called data, such shift, which happens if the data that you use for models changes over time. For example, patterns of users behavior are changing. Instead of ordering some products in a sequence a, they change the sequence in which they order those products, for example, and it breaks the model, but not in a way that you can see that. That's why monitoring is crucial for so many things. And it helps you to understand when you need to improve your solution, when you need to maybe change it completely and it's kind of another way to test your model and to catch some kind of errors, which are more logical errors rather than boxing the code. So basically the lesson that I learned is that monitoring is a lot about the result of your solution. And we as data scientists, we don't just build models for the sake of building models over. It's a lot of fun. But because we try to achieve something like that, and if we don't track how models work in Neverland, whether we took the right approach, whether we have to improve our approach somehow, maybe, or what can we do about that? In general, we don't know what to do with that. Basically, we don't really understand whether our solution is valuable enough, whether it's worth the efforts that we made. Another thing that we talked in the beginning, but the processes of model development. And there is this lifecycle of a model. And basically it doesn't just include model training. The next stage is usually model deployment and the stage after that is monitoring of a model. So basically it's just a part of a lifecycle. The model doesn't die when you deploy it. It doesn't cease to exist. It's still there, it's working. And all the work that you did before, it was just a part of all that. It's not like you standard the project and that's it. You have to maintain your model over time and to check, for example, how the data changes, how your model changes, what's the distribution of prediction stories and so on and so on. Another thing is that you may have tests, which is great, but also they don't really catch mistakes like the model started to work less accurately. You don't really have tests like that. So it basically helps you to see the logic of your solution and to check more, I would say data science errors, rather than seeing some just bugs in the code. And another point is that I genuinely think that data scientists work is not just about building models and predicting something, making something, just more efficient. It's more about finding the answer to questions like why is that, why we can use that, how we can improve that. And that is why when monitoring kind of helps us to answer these questions. So talking, but all the stories that I told you and what I learned about that is that you should always expect and plan that your solution will have to be able to be scaled later on. There is no such situation when it's worse for your solution, when you already try something to do it for that. Even now, when I build something and I think about scaling sometimes I still make some errors about that and I still see that I didn't predict some of the things and I should have done something better even when I already tried to plan for scaling and do some of the things. Imagine in my head that I will add here more of those pipelines, or for example, that I add here more of the models. So I will be deploying more of the models with things service. Also, it's super important to work as a team, and unfortunately I don't see that it's really a things in data science field because we are used to just doing our research on our own usually, and we don't really share the tasks that much. But in my opinion, it actually helps when someone can at least review what you are doing, because sometimes we tend to focus too much on some of the things and miss out something or we just get stuck at some point. Also, for managers or probably any data scientists, it's important to remember that you don't have to do everything. You should delegate, and there is a lot of people who can be better in that than you think. Another important issue is to be able to cooperate and to work with other teams as well and other departments. There is no just data scientists team work completely separately from everyone else, and they do something which is not related anyhow to the work of other teams. That is not true usually. And especially we get to cooperate quite a lot with software engineers, with QA engineers, with DevOps, and it's super important to educate your teammates in the company and also to build communications and to be ready to cooperate. Also, as I said, data scientists probably need tasks even more than software engineers or anyone else, because we don't just deal with code, which usually we don't have a software engineering background, which is that strong. And also we deal with data and it changes states all the time. So we have to be able to track that and to track those states. And another point is that model monitoring is a key to understanding whether we are doing something valuable, whether our model is performing in a good way, whether it's performing in an expected way, and how we can improve that. So it basically answers all of the questions for the future planning, and we don't have to feel like we don't know what's happening when the business side asks us what's happening, what can we fix, why it's not working that way. Monitoring is something that helps us to be the same control, to maintain the models and to remember that their life cycle is going on and they are not just disappear after we do our research, get the most accurate models that good. So thank you for your attention, and I hope that this presentation was useful for you. And I hope that you're not going to do all of those mistakes. But even if you do, you remember that all of them are just lessons that we learned. Thank you.

Marianna Diachuk

Data Scientist @ Restream

Marianna Diachuk's LinkedIn account Marianna Diachuk's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways