Transcript
This transcript was autogenerated. To make changes, submit a PR.
Everyone.
My name is Barta.
I have around nine years of experience in Salesforce cloud
technologies and AI driven solution.
I have worked with finance, healthcare, and real estate industries.
Today we are going to talk about a problem that resonates across industries.
Why so many promising machine learning models never make it into production?
And how we lop peers practices can help bridge that gap.
Think of this talk as a roadmap from research to production ready systems
covering the challenges, orchestration, infras monitoring, and ca cd.
Today's agenda, our journey will move across five pillars.
The MOPS challenge, why so many model get stuck in the lab?
Yml pipeline orchestration tools and techniques to automate workflows.
Production infrastructure.
How to make Y ML systems scalable and reliable.
Monitoring and observability.
Ensuring model continues to deliver value after deployment.
CI cd for EML adapting.
DevOps best practices.
Machine learning.
By the end of this session, you will have a complete picture of
what M-L-O-P-S maturity looks like and our organizations can, the main
challenge, the M-L-O-P-S challenge, let's start with the problem study.
Show that 20% of ML model server ever make it to production.
That means four out of PRO five projects never deliver value.
BM research.
Why does this happen?
Some key barriers include reproducibility, gaps between research and engineering,
lack of standardized deployment process, insufficient monitoring and maintenance
of framework and inadequate testing methodologies for particularly ML systems.
Poor integration with enterprise systems.
I am sure you have seen brilliant proof of concepts die in a lab
because of these exact issues.
The M-L-O-P-S evaluation
over the years, M-L-O-P-S has matured in stages manually, ML process,
isolated notebooks, one of deployments.
No repeatability, and we have EML pipeline automation, basic training automation,
but limited to production integration.
CACD for EML, bringing DevOps, rigor, automated testing, structured
releases, and inconsistent deployment.
Full E-M-L-O-P-S maturity.
End-to-end.
Automation monitoring, drift detection, retraining rollback strategies.
For organizations that reach maturity, reduce deployment times by
80% and it'll sustain 95% accuracy in the production environments.
This is not a just technical gist transformation.
It's a cultural world.
This engineers and operation teams collaborates seamlessly.
ML pipeline orchestration.
Now let's dive into orchestration.
Imagine the ML'S lifecycle as a factory assembly.
If any step is manual or fragile, the whole process slows down.
A fully automated MI pipeline should cover data prep, cleaning, future engineering
validation, and model training in that hypermarket tuning, tracking experiments.
Model evolution testing against holdout data sets and validating
business KPIs, deployment, packaging, and serving models consistent.
And the other one is monitoring continuous watch for drift
error and performance issues.
Without this orchestration, every new model is a reinvention of the wheel.
And yummy pipeline orchestration components.
Different teams choose different orchestration tools based on their eco
ecosystem, for example, or QB flow.
QB flow is the one thing it has, uh, Kubernetes native platform.
Great.
If our org is already invested in Kubernetes, it has strong model serving.
Pipeline Libraries.
EML Flow, it's it's lightweight, experiment tracking model,
registry, and language specific.
Perfect for the experimentation and for that particular heavy
teams and airflow, Apache airflow.
The general workflow engine, it is not specific to ML specific, but
great for complex data, heavy process.
In practice, many organizations have mix and match of these, uh, comparison tools.
For example, you might use ML flow for experiment tracking while
deploying models, we have a QB flow.
When picking tools always ask, does this fit our workflow, our skillset, and are we
forcing the team unnecessary complexity.
Production infra structure.
Even the best model fail without the right infrastructure here, what
modern ML infrastructure look like.
Infrastructure has cored with terraform are pmi, ensuring environments
are reproductive containerization.
With docker package dependencies to eliminate, it works on
machine deployment automation.
Like CACD pipelines, pushing code and model across dev test.
Broad scalability, Kubernetes, auto scaling for spikes in
production, fast optimization, dynamic scaling, and right size.
The main goal is to build infrastructure that as a, as the model themselves.
Real time inference architecture, many applicants.
Many applications, fraud detection chart bots, recommendation engines
require real time predictions.
That means 50 to 20 millisecond latency.
Key design considerations.
Horizonal scaling Kubernetes clusters that scale out with the demand load balance,
ensuring spread evenly with health checks cashing, using Redis for frequent queries.
It'll reduce the compute hardware acceleration kind
of GPUs, pus wherever needed.
One example, like a healthcare client, a realtime insurance eligibility checks
the reduced wait times from minutes to milliseconds because the model on
an optimized auto-scaling architecture and real time inference architectures.
For example, our scaling load, balancing caching layers, hardware acceleration.
These are the things
monitoring observability.
Deploying a model is not the end.
Beginning model.
Drift has changes.
Monitoring covers three layers.
Data drift.
Statistical checks, scale divergence and population stability index.
For model performances accuracy, we F1 A UC, and we have to continuously
track against those benchmarks.
Operational metrics, latency, thorough.
Put.
As well as resource utilization is the main.
If you are not monitoring your flying behind even that's
a dangerous in production.
We have to maintain that properly.
Model monitoring systems, automated retraining deployments.
How do we respond for that?
How do we respond?
The drift is detected.
Automated retraining, common triggers, performance drops below threshold.
Time-based schedules, weekly or monthly retaining database triggers
significant distribution shift, and we have to follow the distribution
and deployment strategies.
Deployments, new models will run silently.
In parallel canary releases roll out five person of the traffic before scaling up
and AB testing, measuring the business impact, we have to test it thoroughly.
And automated rollback.
If the new model under profits, if it is not performing perfectly, we have to roll
back that, uh, functionality and we need the deployment strategies to be in place.
These combinations ensures both agility and safety particular
and.
Particularly automated retraining.
We spoke right based time based database volume.
Those are the main key things.
And CACD for machine learning.
Finally, we have to speak about CACD for EML.
DevOps, transform Software Engineering.
Ya Melo peers does the same for.
Ski testing MLCA CD Pipeline data validation schema checks,
missing value detection model.
Model performance test.
Ion checks prevent black sliding integration test.
We have to validate the entire pipeline and load and performance
test ensuring the production readiness based on the application.
Security and compliance test.
We have to follow GDPR, hip, uh, PA production.
We have to, security is the main concern.
Without these all models, whatever these will, if these are not there, we,
that particular models will be failing.
Production with them becomes predictable, reputable, and it's there.
And.
In my specific strategy for testing, we have different data validation, test
model, performance test integration, test load, and performance test.
We have to verify the data for data validation, and we have to check the
delivery scenario, model performance test.
We had to check whether we met the model or not.
If not, we have to check that particular and the integration test, validate the
entire pipeline and load performance test, ensuring the production, the readiness,
security, compliance and everything.
And coming to the real challenge of this is most of the ML models never, never
have the never reach the production.
MLO PS offers a way far from orchestration.
Infrastructure monitoring and C. A C. It means S is not just
about the models, but about the system culture and collaboration.
If there is no one takeaway, read machine learning as a product, not a project.
Product requires lifecycle management, testing, monitoring,
and continuous improvement.
These are the key things for us.
Thank you for your time.
If you have any questions, please reach out.
Thank.