Scaling Conversational AI in Production: MLOps Strategies for Contact Center Transformation and Operational Excellence

Video size:

Abstract

Learn battle-tested MLOps strategies for scaling conversational AI in production! Discover CI/CD pipelines, monitoring frameworks, and deployment patterns that power millions of customer interactions daily. Real-world case studies + actionable frameworks for your AI systems.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everyone. Thanks for tuning in. I am Manu Kumar Awa, and I work for GoDaddy as a senior telecom engineer. And today we are going to explore how to scale conversational AI in production. And this is specially for contact centers. Now, when I say contact center, you might think of, long hold times choppy sounds, and then the, background music and agents juggling multiple screens. But actually the behind the screens AI is quietly transforming these environments. Sometimes you talk directly to it other times, it helps an agent responds faster. So this talk is all about how we can make those AI system production ready, reliable, and scalable. So here's what we will cover. First thing is the cha, the challenges unique to the contact center. Ai and ML ops framework are tailored to conversational ai. Some technical details like architecture, CICD, pipeline and monitoring. And finally how to measure the success. And when I say measuring the success, it's not just with the technical KPIs but with business outcomes. So as we go through, I will also share some stories and particular examples. So this doesn't feel like a theory, but something you can imagine in in action, with contact centers. A good thing is we all actually experience this on a day-to-day basis because every company does have a contact center that, once in a lifetime, obviously you have to call them and have to experience it. So let me start with a story. So a few years ago, a telecom company, tried launching a chatbot to handle customer complaints on day one, customer asked about a network outage. The bot had no idea what was going on, and kept replying with generic message. Something like, Hey, have you tried your, restarting your phone, router, or modem? Messages, customers, obviously were furious, based on this kind of response. But why? Because contact center AI is unique, uniquely challenging. So to say, we are take, we are talking millions of conversations per day. Unlike an agent who handles 30 calls, a handles everything, and it's instant. And second thing is human in the loop. So basically it has to work side by side with agents. Think of it like a co-pilot, not but not the actual replacement. And the other thing is cost of errors. A wrong answer is not just about one unhappy customer. It could training complaints on Twitter or any other social media platform within ours. It's wildfire in social media nowadays. And the next thing is diverse language. So every industry has its own jargon. Take it like healthcare, insurance, retail, banking industry. You can't rely on one generic model when it comes to the contact center language. So the real challenge isn't about building ai, but it's building AI that scales adapts and earns customer trust. So this is where ML ops enter the picture. So if you think of. Theater production, the actors on the stage, but the backstage crew, makes sure the lights work, the sound is clear, and they actually run the show on time. That is how here the ML ops is. So the ML ops is the backstage crew for ai. So basically tens features are consistently engineered, training pipelines run smoothly every time deployments happening safely. And monitoring. Catches issues before customers actually do so without ML Ops AI project end up, as cool demos that has, that doesn't actually survive the chaos of production. So now the features, so think of features as, what the model pays attention to for conversational ai. It's not just the text it's the context which means, what's being said earlier in the conversation and how they said it and the timing, how quickly customer replies, how often they contact the support, right? And the third one is the language. Which is basically sentiment intent and domain specific keywords. We manage all this through a feature store. If you think of cooking, it's like a well organized pantry. You don't want to discover midway that you know you're running out of salt or something else. Similarly, your A model shouldn't find that a feature is missing or something that is outdated. I have seen teams skip this step and they end up with models that work fine in lab, but they actually fall apart in production. Next comes training. Training can't be a one time experiment on someone's laptop. It has to be robust repeatable and automated. Which basically means that, checking data quality, garbage in, out generating synthetic data for for rare but important cases like fraud detection detecting bias. Imagine a model that works well for one language, but fails for another. As an example, maybe it works well for retail, but when it comes to the healthcare industry, you have complicated medicine names or or the formula names that they have to pronounce. And system should be able to recognize it. And the next one is, tracking experiments. So you know why one version worked better than the another. And finally, validation. We don't just measure accuracy, but we test how. How the model, would affect actual KPIs, like customer satisfaction scores or resolution time. Think of it like testing an airplane. You don't just check if it can fly. You actually, simulate the turbulence, emergencies, and worst case scenarios as well. AB testing okay, now your model is ready. Do you unleash it on all customers at once? No no way. We are doing that. So basically we start small. So shadow testing, the model runs silently in the background. It makes predictions but actually doesn't affect real customers. Canary releases a tiny fraction of traffic, goes to the new model and then staged rollout. So gradually increase exposure as confidence bills and always a rollback switch. Ready? I like to compare this to, teaching someone to drive first they watch from the passenger seat and then they drive in a parking lot, then slowly onto the main road and freeways, you don't actually throw them straight onto a freeway, right? So that is how this is. And obviously monitoring deployment isn't the finish line, right? It's actually the starting point, but we need to monitor at four levels. The first one is technical metrics speed and the error rates. The next one is data quality, which means our input's actually drifting the model or not. And the next one is performance, right? Like accuracy, agent overrides, et cetera. And finally, the business outcome. Our customers happier, our issues resolve faster. So this is like having multiple dashboards in a plane. One for altitude, one for fuel, and another for weather, and something else you don't rely on. Just, one metric. So here's what the overall architecture looks like. We ingested data from calls and chats. We run it through a feature store on an, and, to enrich it, the model service predictions in real time. And then we log everything, feed it into the monitoring tools and agency suggestions directly in their workspace while, they're having the conversation with the customer. So the key here is AI is invisible to the customers when they, when the experience is faster. And the service is much smarter. So scaling requirements for the CICD, right? So we treat models like software, so code, configurations, and the features like gi. We automate test files on each change. Models are registered with metadata deployments are containerized and they rolled out gradually. So basically this cuts deployment time from weeks to hours. It is a difference between mailing a letter and sending a WhatsApp message. That agility allows, business to adapt in the actual real life. But even the best pipelines can't stop. One thing that is data drifting. Picture this, it's a holiday. Seasons. Suddenly everyone is asking, where's my package? Or imagine a viral outage million people calling about the same issue. So over time, even language evolves. A year ago. No one was asking about charge GBT. Now it's a common customer, support question. So if the model isn't updated, it slowly becomes irrelevant. That is why, drift detection is essential. It's a statistical check embedding, monitoring, and animal alerts. So it's like having a doctor check your vitals. Small shifts can tell you, something wrong, long before you actually collapse. And when drift is detected, retraining shouldn't require a massive manual effort. It should be automated, so collect new interactions, verified by the agents trigger retraining jobs automatically validate against all and new scenarios roll out carefully with monitoring these kind of method. This closes the loop. So this is, it's never static. It act, it all, it actually always learning. Just like your agents are. I think of it like fitness. So if you don't exercise, your body drifts, retraining keeps your AI in shape. So let's wrap this up. Some of the key features are the foundation training pipelines. Progress your rollout, which makes, basically minimizes the risk. Monitoring keeps us honest and retraining ensures the system, keeps evolving. And the golden rule here is AI doesn't replace people. It frees them, and it take cares of the routine questions. So agents can focus on empathy, problem solving, and complex cases. And when humans and AI work together, contact center, shifts from cost center to. Value creations. And I really thank you for watching this session. Scaling conversational AI is not just about building models, it's about putting the right MLO practices in place so those models stay, reliable, scalable, and value over time. When done right. AI doesn't replace human agents. It empowers them and it takes of the REIT tasks so people can focus on actual empathy, judgment, and complex problem solving. I hope this gave you a practical ideas, you can take back to your own work. Thanks again for joining and I will leave you with this. The future of customer experience isn't just human or ai. It's human with ai. Thank you everyone.

Slides

Download slides (PDF)

See all 37 talks at this event!

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

Scaling Conversational AI in Production: MLOps Strategies for Contact Center Transformation and Operational Excellence

Video size:

Abstract

Summary

Transcript

Slides

Manoj Kumar Vunnava

Senior Telecom Engineer @ Godaddy

Join the community!

Featured event

2026

2025

Info

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

Scaling Conversational AI in Production: MLOps Strategies for Contact Center Transformation and Operational Excellence

Video size:

Abstract

Summary

Transcript

Slides

Manoj Kumar Vunnava

Senior Telecom Engineer @ Godaddy

Join the community!