Conf42 MLOps 2025 - Online

- premiere 5PM GMT

From Lab to Production: Building Scalable MLOps Pipelines That Actually Work - A Data-Driven Blueprint for ML Success

Video size:

Abstract

87% of ML projects fail to reach production. Discover proven frameworks that help Fortune 500 companies deploy models 5x faster with 60% higher performance. Get actionable MLOps pipelines and governance templates you can implement today. Transform your ML experiments into production winners

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. This is Han with, I work as a software engineer at Meta for this talk. I would be covering scalable mLab pipelines that actually work. Essentially, what are the, some of the best practices that can help to move model development from, whatever model developed from laptop production essentially. The first, we'll talk about the existing challenges. If you look at the global landscape. Looks like we can see ML deployments are growing at a rapid pace, almost like 40% annually, or probably more than that right now. And at success remains extremely low. So if you look at some of the statistics we can see, you can see almost like 87% of the thel projects never reach production. A lot of them will die down much before experimentation phase or much before that. Even for the 13% of them which actually reach production, we can see that, there are a lot of operational challenges and eventually they may not deliver the success metrics that we want to see in production environments. Like it could be challenges around model drift, infrastructure bot likes, or even operational complexity of how we have built the whole pipelines. So what are we trying to what would we be covering at the, as part of this talk? I think this is a brief agenda of what we would be covering. First we'll try to look at what is the existing state of enterprise ops, what is the current landscape look like? And we'll talk about, what are the s ml ops architectures that we can look at essentially all towards, what are the efficient model development, deployment, management companies, monitoring, and all those different aspects of it. And the next important aspect is what are the implementation strategies that we can implement, mostly into technical solutions of what are the, how do we have our right methods in place for model validation testing ml ML workflows for seamless M-L-C-S-E-D deployment pipelines and all that. Another important aspect is also like organizational transformation. It's not only about the tools, it's also about how do we organizationally function in such a way that, all, the teams like, data science engineering and operation teams work together in collaboration to make this whole journey a success. And eventually I think we'll also touch upon what are the emerging trends that we can look at, so that we are building these pipelines or normal ops for the future. So the first thing looking at, what is the current state, but just starting with that, there are two quadrants we can look at. One extreme is like purely manual, low maturity, and other side is mostly like full automation. High maturity. ML lops systems. Again looking at, the manual side of the things. Essentially, if you look at, there are processes around starting from left bottom side, there's like limited pipeline automation. And from the left off we can see. Okay. But these are different different systems we usually see in, in, in a production, right? So mostly it'll start, there it can be like, some systems can only be ad hoc experimentation, mostly manually. We just do some kind of experimentation. Mostly manual, no automation primarily engineers or scientists or essentially. Billing pipelines, testing them, and ad hoc as the core, right? And and the other aspect is there could be minimum automation, limited pipeline automation for scheduled jobs. Maybe like a small CR jobs or stuff like that. But that's about it mostly, like manual and low maturity kind of models, right? If you look at high maturity and the other spectrum of it the ML lops systems, right? Team specific ML lops systems, but it can be like. Every team or every company has its own lot of internal tools. How do we deploy package package the models, deploy the models, and and also test them. All those construct are. Relatively different in terms of how companies adopt such methodologies. So having that consistent or, having that having that repeatable workflows, how do we all through this life cycle of model development, of deployment and testing and all those aspects, how do we have that repeatable workflows is also extremely important. Having that as a. Is in a, is a maturity level of any m lops systems in a way. And the other important aspect, we can also look at enterprise lops and a full automation, data driven where we have metrics to capture at every stage of model. Deployment when there is development or testing and even deployment we can have lot of testing baked into at every phase of it. And eventually looking at the metrics, we proceed with the model deployment or roll back or go to different versions and all those different aspects, which we will touch upon in a brief file now. Again, a similar aspect to it, just to elaborate on what I've been touching upon before is the first level Zero is mostly about ad hoc experimentation. People just start with something new. And also purely manual with with no standardized workflows at all, and fragmented collaboration, people collaborating is also more mostly fragmented in a way. And there is no desired metrics or success metrics defined and model performance also lacks a lot, right? And the next level in terms of maturity is like pipeline automation. Now we don't have a lot of automation in place. Then we come up with some kind of automation, the basic automation we can say, right? And it even though we made some progress in terms of pipeline automation there are still some lacking statements. Things like, we don't have really good governance frameworks. Teams are pretty siloed in a way. And also the monitoring is not robust enough to identify model thrifts or, all kinds of performance issues that we can see, right? That, that is a state of level one we can briefly talk about. Level two is like a little more advanced or matured kind of systems is mostly continuous integration. We have standardized repeatable pipelines. You can bake in how you or want to have a model go through the system to have it. Seamless or repeatable pipelines, essentially, right? And also we have robust version control. Every model we put a version to it and all that. Stuff. At the same time automated testing extremely important. When we deploy a model every phase having that automated testing is extremely important. Ladies has lot of manual effort. And also proactive having that really sound monitoring is also extremely important in when we talk about, enhancing this lops pipelines. And the next important thing, I think this is like a this is like the we have the best in class. You have the end-to-end automation with very, or very low or no manual effort at all. And comprehensive governance and compliance policies established and also centralized ML lops pipelines and also advanced operator observed continuous implementing optimizations. So this is like a extremely sophisticated ML lops pipelines, which are built to look at monitoring aspect of it. When we deploy during the deployment or during model deployment testing, we have metrics baked into the whole pipeline. And all these checks around governance and the compliance is also baked into the whole end, into pipeline. And let's talk few details about what are the primary obstacles to ML success. I think some of the. Primary, if you look at some of the charts here. So I think you can look at organizational silos. I think for any success. I think it's extremely important for teams to collaborate and work as a liaison for its own success. I think that is extremely important. So at least in some of the metrics, some of the. Studies, we can see almost 73% of ML initiatives directly are linked to inadequate, failed ML Institute surgeon, that because of inadequate collaboration and governance there are other limitations we can think of in infrastructure is not as strong enough to is not built for what what we want to serve for customers and also not having enough monitoring. If we don't have monitoring, we don't know what's going on. That's, I think, extremely important to have a robust monitoring. Again, data quality issues, most similar. I think garbage in, garbage out. Having that really quality data to train on or build those models is also an extremely important aspect of it. And also the lack of standardization without having the standard tools and technologies and also the pipelines to deploy. And models at scale it becomes extremely important difficult, and also governance gaps. Having those governance governance governance gaps is also extreme. But, and also looking at some of the metrics here, considering some of these links, like we can think about like 56% of production ML failure. Our directly linked to data quality issues. As I was mentioning, data quality is extremely important in a production ml, a ML kind of systems in a way, right? Similarly for insufficient monitoring, almost like 38% are linked to insufficient monitoring, lack of standardization, 45%. These are like ballpark numbers that we have gathered. This is also showing how important to have these systems in place so that we are more buildings. Towards these now obstacles so that we can build for success in a way. So that's like what I want to cover for this slide. The next important thing is what are the architectural patterns that we can put in place we can build a system which is really robust, right? The first important thing I want to touch here is mo modular component architecture. So the more like decomposable components, right? Every component again, the Compass, compass core construct is component should be able to iterate on its own. It can deploy model. Yeah, not only model anyway, like all the ML ops supply plan has multiple steps or multiple components to it. Every component should be able to. Be iterated on its own and also improved all those good stuff, right? And the other important thing is centralized feature store. For any model development pipelines or model development teams, features are the most important things, essentially, right? These are the things that are baked into the model and used for predictions and stuff like that. So having that consistent standardized feature store is extremely important. So that also provides, guaranteed consistency for training and inference environment. So that is extremely important in a model environments. In a way. Another important thing is reproducible training pipelines, again so we really want to deploy models at scale and also want to release new models. As frequently as possible. So we are giving the best to our customers, right? Looking at that as a metric, having that reproducible training pipeline so that lot of manual efforts can be can be avoided. And lot of these things can be, having that consistent pipelines and also reproducible would help eventually in accelerating the development station. Another important thing is also having automated validation back then. So at every stage, having that model automation in terms of having those validations at every step of the phase, extremely important as we scale our systems. So let's talk about how do you build a robust MOPS pipeline? Having that model lifecycle deployment, model lifecycle management. I know essentially, how do we is also extremely important in terms of, this helps almost deliver three x better quality monitoring at the same time, reduces production strength by almost a half, right? So what are the various things that we usually look at when we think about model, right? So we have data engineering. Which provides all the essential data points, whether it is like data required for feature engineering on all those other aspects of data. So once we have data, I think the next important part is like model development. So the primarily ML engineers or applied scientist, who looks at, takes this data and builds some robust pipelines, where we train ML models to come up with the robust models for any use case, essentially, and the next important, the thing is, like now, once a model is developed, the important thing is like we have to validate it, whether it is working now models are usually trained on training data set, essentially. Then we have to validate it how it is working on the real data set and production data set. Are we seeing the right kind of production predictions or if if, if it is not performing as good, it can be various other things. It can be modeled accuracy kind of information, or it could be like performance and production and stuff like that, right? So having that validation is at the step that is extreme important. And also eventually, once it, it covers all the validation aspects of it, and mostly like deployment of it in production and scaling it. And once we deploy it in production, the most important thing is like monitoring for it. So for the first thing is that now we deploy the model and so once we deploy the model how is the model performing? How is the predictions look like for the attachable model? Are the users that are using our systems are. A sentiment is positive or they're not so much or negative. Having that feedback signals fed back into our systems would help us help us improve our models tremendously in the future. So having that end-to-end monitoring and also feedback systems that can be used for model deployment or development deployment is extremely important in building any ml s pipelines. So the next aspect I want to touch upon is so technical implementation of model validation framework. So how do we validate what are the things we want to actually validate? So again from, as I was mentioning, now we have a model that is developed. And now we need to think about what are the things we need to, while we are building these MLS pipelines, what are the things, what are the testing things we need to build, right? So the first important thing, government, data, everything starts with the data and ml, right? So I think we have how good data that we have would determine how good our models are. Garbage in, garbage out. So it's extremely important to start with ml primarily a data validation and whatever is a use case, we need to have some kind of a data set. Having that. While we are creating this data set, having that schema enforcement and also having that in our distribution checks, drift detection and stuff like that. So to make sure we have a very clean data is the first step of any validation. The next thing we want to think about is like now model is deployed. Model is deployed in production. So the next important thing is like, how do we. Measure the accuracy, so how good it is. Production, performance, production. And also align with some kind of a key performance metrics, right? So that is other aspect of validation as well. The other thing is like the, now we talk about operational things now the other operational thing is the latency to stick. So even though the performance or inaccuracy super good, but the thing is if the per model predictions are taking too long, or if it is taking a lot more time, the customers may not like it, right? So I think in this world, users would like to see fast responses. The other important aspect we need to validate is like how fast it is providing all these predictions and, that are providing the timely predictions is extremely important as well. Again, other aspects like it's not only latency metrics and also you look at resource utilization, throughput analysis, all the, all those good stuff. And the last thing I want to touch upon is like ethical validation. Is there any bias that is coming out of this model that is one side of the things we need to have a validation as well? Is the responses more consistent or is it fair? Is also an extremely important metric we need to capture or, test our models against. So what are the best practices here, right? So having that clear acceptance criteria. With the with pass or fail threshold for each validation step is extremely important while we go build these production systems. And also having those integrated into CCD pipelines, we don't want to make it manual, right? So I think we want to build them into the consistency CD pipelines that can actually enforce these checks at one time. And also having that history. To track how this quality of models are, how this model deployment and validation delivery is happening is also very good indicator for us for our maturity or on how we are progressing as ML ops teams, right? The next thing I want to touch up on is so technical implementation. So there are a few aspects I want to touch upon. For any CSC pipeline or CSCD systems, continuous integration, continuous delivery is an extremely important aspect of it. So for in, in case of continuous integration again, a few things I want to touch upon. Model quality validations. During the integration stuff and also versioning model artifacts, during building the models, artifacts, having this versioning in place and right method attached to it. And also dependency environment management is also baked into the same process as well. The other important thing I want to touch upon is like a continuous delivery. So the, when we are deploying these models at scale into any production environment, usually it's containerized. There's a lot of other tools that we usually use, like it could be Titan Server, various other things that we use, Docker containers, various other things that we can use. But. This is one thing we need to also keep in mind how do we build those consistent con containerization of model serving framework, right? The other thing is also environment specific configurations and IAS infrastructure as code deployments as well. Other important thing is like canary or blue green deployment. So when I, when we want to deploy the model into production, there are various deployment strategies that we can look at. For example, if you want to do something like canary deployment, it's more in a production systems, we may expose some percent of production traffic to this new version of the model and see how it is performing. And based on, metrics that is omitted by this new model, we can either promote it to higher percentage of production traffic. And eventually take a hundred percent of operating traffic or blue green deployment is creating a simultaneous green environment or a blue environment along with the production environment and see and pass through some of the traffic to this new model version. And see if if the metrics are looking good, we promote the new version, green deployment into a production version. So those are like different methodologies that we primarily use in case of microservices, deployment architectures again once we build and deploy. The other important thing, as we have talked about previously, is also like monitoring, right? Having this real time performance metric. And also other aspects like drift detection and alerts, and also feature distribution monitoring is also extremely important. And all these are, have all these are correlated also, see how this is all correlated to business method. Correlation is also extremely important. Having that consistent monitoring gives a better view of how the system is performing. What are the things that we can improve upon as well. Yeah. The other thing I want to chop on here is like a b testing of models. Again, I think ML models I think this is a very well known framework that is used in a ML teams is having that AB testing. So basically enterprise, with automated AV testing, we achieve almost 40% faster model attrition cycles and also 25% higher performance improvements compared to manual testing approaches. So what is actually IB testing? So it's mostly having that when we have two different versions of the models exposing a customer to both the versions of the models and see that user sentiment, which is performing better in terms of giving, whether it is like a. Recommendation systems are any kind of use case that we're trying to build, seeing which model is performing better, and also trying to use those signal to promote either of them is like a testing is what we usually call about, right? There are a few aspects to it. Again, traffic allocation dynamic. Clear route is a traffic to model variance. Again, it could be configurable. Similarly, performance measurements, accurate track across. Both miss race and technical metrics. So as I was talking about latency and all those other things will come into technical businesses, like how is a sentiment look like? Is it performing positive, negative, and all this user aspect of the things right? And also other thing is like statistical analysis. Just think about again, mostly about data-driven deployment decisions, right? Rigorously having that analysis is also. Extremely useful in such kind of systems. So the other aspect I want to talk about here is like now we have talked about what are the challenges? What are the things that we can put in place and what are extremely important in terms of testing and some of the testing framework, like a testing and stuff like that, right? So now functionally in terms of how do we build this in such a way that. It is built for success. There is like primarily in any ML teams. We have the data science team and also ML engineering team. So two different teams. And also the collaboration of both of them comes to a ML ops excellence. So essentially these are the people who have mixed knowledge of both data science side and also engineering. So these are the this, these are the. Core collaboration that helps build some of the the typical aspects of ML lops pipeline. So a few things I want to mention here is like cross-functional, like an ML lops teams are extremely important in terms of working cross teams to build those consistent end to ML products and shared accountability model in a joint ownership. It's a joint ownership between ML performance, operational health business outcome. So it's not one team's responsibility or one person responsibility. It's a joint responsibility. And also the other thing is also ml Op Center of Excellence, like critical team. Having that best practices governance framework and also helping them do that self service ops is an extremely important thing as well. Then I want to touch upon few aspects of cost optimization as well, right? So I think all these GPO hardwares are expensive. For building any ML pipeline, I think these are few things we need to keep in mind so that, we are building in a much more efficient way. Key cost drivers looking at, compute cost, as we all know, I think compute hardware all the GPUs are extremely expensive. We definitely need to consider that. And also data storage and processing, the data pipelines and our feature stores, ML pipelines. How do we generate this data is also the next aspect of, where the cost could come in and the tools and platforms ML ops, pipelines and monitoring systems, and also specialized tools that help us build this monitoring at scale is also something we need to. Keep in mind the other thing is like operational overhead, like system maintenance, support incidents and all those other things, right? A few things I want to touch upon in terms of operational strategies, resources, autoscaling. I think a lot of provide cloud providers provide this out of the box. We should be able to autoscale the infrastructure to dynamically match our workloads demands. If there's the, if the request grow, we can have those auto automatic knobs in place so that, we can get those auto scaling right. Model efficiency. These are few ml ml but ml concepts that we usually use, like model pruning, quantization distillation to reduce the footprint and the size of the models, and also probably get similar or close to good, similar performance in a way. And the other thing is like process automation. Having that complete end-to-end automation is extremely important in terms of our strategies as well. Enterprises that actually implement these strategies achieve definitely have significant cost detections and also most importantly, visibility, right? I think where, how much is going on is also extremely important while we are building this system so that we know where to increase our spending and how we can optimize is also an extremely important decision that we can look at. Since we have talked about a lot of other aspects of it, now I want to touch upon what is the future, right? This is an evolving space, which is rapidly growing. How do we build systems that can looking at the future, right? So I think we are just making sure that we are building for the future. So a few things I want to touch upon is like mops, observability, gaining that insights. Into the model behavior as I was speaking, is having that complete monitoring and end-to-end metrics is extremely important. That is one thing which we need to keep in mind. And also having that AI governance implement frameworks for responsibility, deployment responsibility AI is extremely important nowadays. And also having that auditing bias detection compliance controls is extremely important as well. Federative learning. Again, this is more like having that models being trained across decentralized data sources. Again, extremely important aspect and also obviously improving privacy and security. Auto ml is also like autonomous continuous model improvement. Again, looking at feature selection, getting the feedback hyper tuning. It's like a complete cycle of how do we use that end to end to actually improve our models at scale. Is something that is that is, that we can keep in mind as well. Key SI want to talk of briefly about key takeaways or high level start with clear go governance. I think there's a few things like, establish that robust governance and motor lifecycle management practices. Make sure to make sure that we have that solid foundation for ML instructors. Build modular architectures. Making sure that these are decomposable units, I can build them, deploy them, scale them independently. So those are extremely important as well. Automate ruthlessly everything, I think extremely important. Automation is very important as we scale our infrastructure systems. Any manual process. Look for opportunity to automate, integrate teams. Again, the teams, if there are silos, it's extremely important things work in, is on have that consistent expectations or collaboration with each other. Is extremely important in this kind of systems as well. Again, looking at some of the statistics we can see like enterprises with mature ML ops deploy almost five x faster and also reduce, achieve 60% higher model performance and production. Extremely important statistics there. I think having these tools and processes and building these consistent pipelines can definitely improve. These systems, ML systems at scale. Yep. I think these are all the things I want to cover for this talk. Thank you very much for tuning in. Have a nice day.
...

Srikanth Vissarapu

@ Meta US



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content