Conf42 Kube Native 2025 - Online

- premiere 5PM GMT

Container-Native ML: Scaling Predictive Customer Segmentation on Kubernetes

Video size:

Abstract

Transform your ML analytics with Kubernetes! Learn to deploy scalable, containerized customer segmentation pipelines that automatically scale, reduce costs, and handle enterprise workloads. Real-world microservices patterns for production ML systems.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. My name is Ola Modi and I'm happy to be presenting here at the Cube Native Conference 2025. I wanted to start with a quick introduction about myself. I currently work for Wells Fargo Bank and I have around 14 years of hands-on experience in advanced data analytics, credit risk strategy, and predictive modeling across financial services, marketing, and customer intelligence. I also have extensive experience in developing machine learning driven fraud detection strategies, segmentation models, and campaign optimization frameworks that help improve ROI reduce risk exposure and enhance operational efficiency. The topic I'm planning to present today is container native ml scaling predictive customer segmentation on Kubernetes. Today I'll be sharing how we re-engineered a traditional monolithic analytics system into a cloud native machine learning ecosystem built on Kubernetes. Our focus was on predictive customer segmentation, identifying. Behavioral patterns in real time at Enterprise Scale. This session is about more than just technology. It's about how containerization and orchestration can transform the way organizations operationalize ml, making it faster, more efficient, and more cost effective. Moving on to slide two. Let's start with the challenges. Most enterprises face traditional analytics platforms were not designed for real-time large scale prediction. They rely on monolithic pipelines that struggle under heavy workloads as data volumes grow. Performance degrades, leading to longer processing times and slower insights. Scalability becomes a bottleneck. You can't just add more servers. You need horizontal scaling elasticity and automation. Operationally, managing diverse ML workloads on rigid infrastructure is complex data scientists, ML engineers and dev DevOp teams often work in silos. We need an architecture that could adopt. Adapt dynamically to business load, self-heal, and scale. Predictively, not reactively. Moving on to the next slide. This slide talks about container orchestration solution. So this is where Kubernetes changed the game for us. Kubernetes provides the flexibility and reliability that monolithic architectures lack. It allows you to deploy ML workloads as microservices, which are loosely coupled and highly scalable. We implemented horizontal pod auto scaling, so during peak customer activity, pods automatically spin up to handle the load. Intelligent resource orchestration distributes workloads, efficiency efficiently across clusters, and with built-in fall tolerance, even if one node fails, another sly takes over. The result is a system that's always on, always fast and always efficient, the backbone of a real time predictive analytics. Moving on to the next slide. This slide talks about native ML architecture overview. This slide basically gives you a bird's eye view of the full architecture. We structured it into four layers, data ingestion services, feature engineering, model training, and interference services. So the first one is data ingestion services, containerized pipelines. Pull customer data from multiple sources like CRM, digital interactions and transactions using event driven ingestion. Next one is feature engineering, independent microservices, clean, validate and convert raw data into ML ready features. This ensures every model uses standardized inputs. Model training is the next one. Distributed training runs across Kubernetes clusters enabling parallelization for algorithms like gradient boosting neural networks and clustering. And the last one is interference services. This layer basically provides real time predictions. With Redis caching, we achieve millisecond response times for customer segmentation. In short, this architecture makes the ML process continuous, modular, and highly responsive. Moving on to next slide. Breaking the pipe ML pipeline into microservices was a turning point. Each stage ingestion, feature engineering training, and inference runs independently. This means if you're training jobs. Spike, you scale only that part. If your inference layer needs more compute, you add pods just for that. It reduces resource waste stage accelerates deployments and simplifies debugging, and most importantly, it allows different teams to iterate without breaking the entire pipeline. So data engineers, ML developers, and DevOps can all work concurrently a major boost. This is a major boost to HGDT. We on to the next slide. This slide talks about deployment management with Helm. When you have dozens of microservices deployment, consistency becomes crucial, and that's where Helm comes in. We use Helm charts to define and version our deployments. This ensures every ML service, data processing, training, or inference follows the same standard configuration pattern. Helm also supports environment specific overrides. So the same chart can be deployed in development, staging, or production seamlessly. Seamlessly. If something breaks. Rollback is instant. We extended Kubernetes with customer resource definition, CRDs for ML specific needs, like scheduling, model training jobs, managing GPU resources and automating, retraining lifecycles. Moving on to the next slide. This slide talks about event driven processing, architecture. Customer behavior changes constantly promotions, seasonal campaigns or new product launches can all shift patterns. We implemented event driven retraining to keep our segmentation models are. To date when our monitoring system detects data drift, meaning customer behavior is no longer aligned with the model, Kubernetes automatically triggers a model retraining pipeline. So the new model is validated and deployed automatically without manual intervention or downtime. This approach ensures that ACA segmentation remains accurate and responsive even as customer behavior evolves in real time. Moving on to the next slide. This slide talks about container based feature store implementation. A consistent feature layer is critical for reproducibility. We created a container based feature store that centralizes all engineered features used by our ML models. Every service, whether it's training or inference accesses features through this store guaranteeing consistency. We use red caching for sub millisecond feature retrieval, and with feature versioning, we can run AB tests on new features or rollback to previous versions efficiently, effortlessly. This ensures full traceability and confidence in our model outputs. Moving on to the next slide. This slide talks about advanced Kubernetes patterns. So we leverage several Kubernetes native design patterns to make the system more robust. It's, some of them are in IT containers, sidecar containers, and multi container pods for the in IT containers before a main container starts. These run checks verifying data integrity, dependencies, and environment readiness. The next one is sidecar containers. These monitor performance collect metrics like latency or drift and push them to. PROEs or Grafana multi container pods for components that need to share resources closely, like pre-processing and model serving. These pods offer efficiency and isolation. These patterns improve observability, ensure reliability, and made troubleshooting much easier. Moving on to the next slide. This slide talks about resource management strategies, efficiencies, efficiency. At the heart of container native ml, we introduce GPU scheduling policies to ensure deep learning workloads get prioritized GPU access. Without idle cycles, we also optimized memory allocation for large scale clustering, preventing out of memory crashes, and improving throughput. The results that we saw were remarkable, 85% GPU utilization. And. Three, three times more or better, more memory efficiency. These are the two remarkable results that we achieved. These aren't just performance stats. They represent cost savings, unsustainable scalability. Moving on to the next slide. This slide talks about product production performance improvements. After a full deployment, we saw measurable impact. There was a 60% cost reduction, containerization and auto-scaling, drastically reduced idle infrastructure. The next one is 99.9% uptime. The automated failover, redundancy kept services continuously available, and it also increased the processing speed. It is 40% faster, so it distributed computing, cut training and interference timing significantly. These improvements directly translated to faster campaign turnarounds, better personalization, and ultimately higher ROI for our marketing teams. Okay. Moving on to the next slide. This slide talks about implementation strategy, roadmap. Here's the roadmap. We followed a structured four phase approach. Phase one is containerization. We converted, monolithic. ML workflows into dockerized microservices. Phase two orchestration. Kubernetes was implemented for automated scaling, resource optimization and workload management. Phase three was optimization. We layered on advanced techniques, feature stores, event driven retraining, and CRDs. Phase four was production. This was a full deployment with monitoring, alerting and CI or CD integration for continuous improvements. Each phase was iterative, allowing us to learn and refine before scaling enterprise wide. Moving on to the next slide. This slide talks about the key takeaways for platform engineers for platform and ML engineers. Here are three lessons that really stand out. Adopt microservice architecture, automate with event driven processing, and focus on resource optimization. We can talk about adopt microservice. Architecture first. So to break your ML pipeline into modular independently scalable units, it accelerates development and improves reliability to automate with event driven processing. Let Kubernetes manage lifestyle events, lifecycle events like restraining or redeployment. Focus on resource optimization. Use Kubernetes native tools for GPU, scheduling, memory optimization, and auto-scaling. These principles turn ML systems from brittle and reactive into adaptive, intelligent and cost efficient ecosystems. Okay. Moving on to the next slide. This slide talks about how cloud native ML is the future now. The future is now container native ML is not just a trend, it's the foundation for scalable enterprise analytics. We have moved from static centralized systems to dynamic modular architectures. That evolve with business needs. Kubernetes and microservices are enabling ML teams to deploy models faster, adapt to data changes instantly, and deliver real business impact. The future of ML infrastructure is cloud native, even driven and continuously optimized, and that future is already here. I think that concludes our presentation. Thank you so much for your attention. I hope this session gave you practical insights into how Kubernetes can be leveraged for scaling machine learning in real world enterprise. Settings, I'd be happy to take any questions, whether it's about architecture, automation, or implementation specifics. You can also connect with me afterwards to discuss use cases or share any experiences with your own ML infrastructure. Thank you everyone for giving me this opportunity and I hope you have a good day.
...

Ujjwala Modepalli

Senior Credit Risk Specialist @ Wells Fargo

Ujjwala Modepalli's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content