Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
My name is Ola Modi and I'm happy to be presenting here at
the Cube Native Conference 2025.
I wanted to start with a quick introduction about myself.
I currently work for Wells Fargo Bank and I have around 14 years of hands-on
experience in advanced data analytics, credit risk strategy, and predictive
modeling across financial services, marketing, and customer intelligence.
I also have extensive experience in developing machine learning driven fraud
detection strategies, segmentation models, and campaign optimization frameworks that
help improve ROI reduce risk exposure and enhance operational efficiency.
The topic I'm planning to present today is container native ml scaling predictive
customer segmentation on Kubernetes.
Today I'll be sharing how we re-engineered a traditional monolithic analytics
system into a cloud native machine learning ecosystem built on Kubernetes.
Our focus was on predictive customer segmentation, identifying.
Behavioral patterns in real time at Enterprise Scale.
This session is about more than just technology.
It's about how containerization and orchestration can transform the
way organizations operationalize ml, making it faster, more
efficient, and more cost effective.
Moving on to slide two.
Let's start with the challenges.
Most enterprises face traditional analytics platforms were not designed
for real-time large scale prediction.
They rely on monolithic pipelines that struggle under heavy
workloads as data volumes grow.
Performance degrades, leading to longer processing times and slower insights.
Scalability becomes a bottleneck.
You can't just add more servers.
You need horizontal scaling elasticity and automation.
Operationally, managing diverse ML workloads on rigid infrastructure is
complex data scientists, ML engineers and dev DevOp teams often work in silos.
We need an architecture that could adopt.
Adapt dynamically to business load, self-heal, and scale.
Predictively, not reactively.
Moving on to the next slide.
This slide talks about container orchestration solution.
So this is where Kubernetes changed the game for us.
Kubernetes provides the flexibility and reliability that
monolithic architectures lack.
It allows you to deploy ML workloads as microservices, which are loosely
coupled and highly scalable.
We implemented horizontal pod auto scaling, so during peak customer
activity, pods automatically spin up to handle the load.
Intelligent resource orchestration distributes workloads, efficiency
efficiently across clusters, and with built-in fall tolerance, even if one
node fails, another sly takes over.
The result is a system that's always on, always fast and always
efficient, the backbone of a real time predictive analytics.
Moving on to the next slide.
This slide talks about native ML architecture overview.
This slide basically gives you a bird's eye view of the full architecture.
We structured it into four layers, data ingestion services, feature engineering,
model training, and interference services.
So the first one is data ingestion services, containerized pipelines.
Pull customer data from multiple sources like CRM, digital interactions and
transactions using event driven ingestion.
Next one is feature engineering, independent microservices,
clean, validate and convert raw data into ML ready features.
This ensures every model uses standardized inputs.
Model training is the next one.
Distributed training runs across Kubernetes clusters enabling
parallelization for algorithms like gradient boosting neural
networks and clustering.
And the last one is interference services.
This layer basically provides real time predictions.
With Redis caching, we achieve millisecond response times for customer segmentation.
In short, this architecture makes the ML process continuous,
modular, and highly responsive.
Moving on to next slide.
Breaking the pipe ML pipeline into microservices was a turning point.
Each stage ingestion, feature engineering training, and
inference runs independently.
This means if you're training jobs.
Spike, you scale only that part.
If your inference layer needs more compute, you add pods just for that.
It reduces resource waste stage accelerates deployments and simplifies
debugging, and most importantly, it allows different teams to iterate
without breaking the entire pipeline.
So data engineers, ML developers, and DevOps can all work
concurrently a major boost.
This is a major boost to HGDT.
We on to the next slide.
This slide talks about deployment management with Helm.
When you have dozens of microservices deployment, consistency becomes
crucial, and that's where Helm comes in.
We use Helm charts to define and version our deployments.
This ensures every ML service, data processing, training, or inference follows
the same standard configuration pattern.
Helm also supports environment specific overrides.
So the same chart can be deployed in development, staging,
or production seamlessly.
Seamlessly.
If something breaks.
Rollback is instant.
We extended Kubernetes with customer resource definition, CRDs for ML
specific needs, like scheduling, model training jobs, managing GPU resources
and automating, retraining lifecycles.
Moving on to the next slide.
This slide talks about event driven processing, architecture.
Customer behavior changes constantly promotions, seasonal campaigns or new
product launches can all shift patterns.
We implemented event driven retraining to keep our segmentation models are.
To date when our monitoring system detects data drift, meaning customer
behavior is no longer aligned with the model, Kubernetes automatically
triggers a model retraining pipeline.
So the new model is validated and deployed automatically without
manual intervention or downtime.
This approach ensures that ACA segmentation remains accurate
and responsive even as customer behavior evolves in real time.
Moving on to the next slide.
This slide talks about container based feature store implementation.
A consistent feature layer is critical for reproducibility.
We created a container based feature store that centralizes all engineered
features used by our ML models.
Every service, whether it's training or inference accesses features through
this store guaranteeing consistency.
We use red caching for sub millisecond feature retrieval, and with feature
versioning, we can run AB tests on new features or rollback to previous
versions efficiently, effortlessly.
This ensures full traceability and confidence in our model outputs.
Moving on to the next slide.
This slide talks about advanced Kubernetes patterns.
So we leverage several Kubernetes native design patterns to
make the system more robust.
It's, some of them are in IT containers, sidecar containers, and multi
container pods for the in IT containers before a main container starts.
These run checks verifying data integrity, dependencies, and environment readiness.
The next one is sidecar containers.
These monitor performance collect metrics like latency or drift and push them to.
PROEs or Grafana multi container pods for components that need
to share resources closely, like pre-processing and model serving.
These pods offer efficiency and isolation.
These patterns improve observability, ensure reliability, and made
troubleshooting much easier.
Moving on to the next slide.
This slide talks about resource management strategies, efficiencies, efficiency.
At the heart of container native ml, we introduce GPU scheduling
policies to ensure deep learning workloads get prioritized GPU access.
Without idle cycles, we also optimized memory allocation for large scale
clustering, preventing out of memory crashes, and improving throughput.
The results that we saw were remarkable, 85% GPU utilization.
And.
Three, three times more or better, more memory efficiency.
These are the two remarkable results that we achieved.
These aren't just performance stats.
They represent cost savings, unsustainable scalability.
Moving on to the next slide.
This slide talks about product production performance improvements.
After a full deployment, we saw measurable impact.
There was a 60% cost reduction, containerization and auto-scaling,
drastically reduced idle infrastructure.
The next one is 99.9% uptime.
The automated failover, redundancy kept services continuously available, and
it also increased the processing speed.
It is 40% faster, so it distributed computing, cut training and
interference timing significantly.
These improvements directly translated to faster campaign turnarounds,
better personalization, and ultimately higher ROI for our marketing teams.
Okay.
Moving on to the next slide.
This slide talks about implementation strategy, roadmap.
Here's the roadmap.
We followed a structured four phase approach.
Phase one is containerization.
We converted, monolithic.
ML workflows into dockerized microservices.
Phase two orchestration.
Kubernetes was implemented for automated scaling, resource
optimization and workload management.
Phase three was optimization.
We layered on advanced techniques, feature stores, event driven retraining, and CRDs.
Phase four was production.
This was a full deployment with monitoring, alerting and CI or CD
integration for continuous improvements.
Each phase was iterative, allowing us to learn and refine
before scaling enterprise wide.
Moving on to the next slide.
This slide talks about the key takeaways for platform engineers
for platform and ML engineers.
Here are three lessons that really stand out.
Adopt microservice architecture, automate with event driven processing,
and focus on resource optimization.
We can talk about adopt microservice.
Architecture first.
So to break your ML pipeline into modular independently scalable
units, it accelerates development and improves reliability to automate
with event driven processing.
Let Kubernetes manage lifestyle events, lifecycle events like
restraining or redeployment.
Focus on resource optimization.
Use Kubernetes native tools for GPU, scheduling, memory
optimization, and auto-scaling.
These principles turn ML systems from brittle and reactive into adaptive,
intelligent and cost efficient ecosystems.
Okay.
Moving on to the next slide.
This slide talks about how cloud native ML is the future now.
The future is now container native ML is not just a trend, it's the foundation
for scalable enterprise analytics.
We have moved from static centralized systems to dynamic modular architectures.
That evolve with business needs.
Kubernetes and microservices are enabling ML teams to deploy models
faster, adapt to data changes instantly, and deliver real business impact.
The future of ML infrastructure is cloud native, even driven
and continuously optimized, and that future is already here.
I think that concludes our presentation.
Thank you so much for your attention.
I hope this session gave you practical insights into how Kubernetes can
be leveraged for scaling machine learning in real world enterprise.
Settings, I'd be happy to take any questions, whether it's
about architecture, automation, or implementation specifics.
You can also connect with me afterwards to discuss use cases or share any experiences
with your own ML infrastructure.
Thank you everyone for giving me this opportunity and I
hope you have a good day.