Production MLOps for Spam Detection: Real-Time Multi-Modal AI at Billion-User Scale

Video size:

Abstract

Billion-user platforms crush spam attacks in milliseconds,Discover production MLOps secrets: smart sampling that slashes labeling costs, real-time threat detection, privacy-preserving ML, and bulletproof deployments. Internet-scale AI exposed

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Everyone. I am PR Singh. Today I'm going to take your insight how a spam detection is handled at truly massive scale. We are talking about billions of users, millions of decisions every second, and everything has to happen in real time. Now when we talk about a spam, it's not just annoying messages. We are talking about fraud, phishing, scams, harmful content, et cetera, things that directly affects users trust and platform integrity. And here is the hard part at this scale, even a tiny error, say a 10th of a percent means millions of peoples are impacted. And that's why we need ML ops that is not just clear, but resilient, privacy, preserving, and battle tested. So let me walk you through how we are going to explore this challenge today. So here is the roadmap. We'll start by looking at unique challenges of working at billion users scale. Then I will show you end to end ML ops architecture, how large scale systems design pipelines to keep up with a constant change. We'll also dive into data strategies because what data you choose. Either to label and or to train is absolutely critical. Then we'll look at the production site like scaling monitoring and incident response. Finally, we'll end by looking ahead, what spam looks like in the future, and the technologies we will need to stay ahead of it. So think of this talk as a blueprint, not just theory, but patterns you can adapt to your own systems. So let's start at the beginning. The sheer challenge of operating at a scale every day. Billions of pieces of content move across platform. At peak load, large scale systems are making millions of evaluations per second. And for user facing decisions like blocking a comment or flying a post response must be under a hundred millisecond. Now, think about the consequences of error at this scale. A 0.1% false positive rate does not sound terrible in theory. But in practice that could mean millions of legitimate users waking up to find their posts or blocked, or comments are deleted. And on the other side, false negative means spam or scams getting through to the millions of people in harming them. The constraints are equally tough. Large scale systems are detecting across text, images, videos, even behavioral patterns. They have to support hundred of languages as well, like they have to constantly adapt because attackers change tactics every day and all of things must comply with strict privacy laws like GDPR and CCPA. It's like playing chess with millions of opponents, and they all change the strategies overnight. So meeting challenges like this requires more than just good models. It requires end to end ML ops architecture, and that is we are going to talk about next. Here's what that looks like. First ingest data from multiple sources, which includes real time streams and batch histories. Then comes feature engineering where they extract both instant signals like device fingerprinting and long-term signals like account history. Models are retrained continuously, often daily, but training is not enough. You have to validate rigorously against colon data sets. Fairness checks and adversarial examples. And when a deployment happens they don't just flip a switch. We have to use a shadow deployments where new models can run in parallel silently for days or weeks or even months. And then can roll outs are there, which are gradually sifting traffic with instant rollback if needed. This is not just a pipeline, it's a living system. It adapts, learns and recovers without stopping. But what makes this models powerful is not this, but the features we built from the multi-model data. So let's discover about this multi-model feature engineering. Spam today does not come in just one form. Attackers use text, images, videos, and even behavioral manipulation for text. Large scale systems use transformer embeddings like Bert and Grams, entity recognition, sentiment intent classification, and et cetera. For image and video you can rely on CNNs like efficient net and rest net. OCR are very common for memes and frame signal analysis, but content is only half the story. Behavior is often the giveaway. Things like account as posting frequency engagement metrics, or unusual device fingerprints can just be as revealing as other. Finally, context matters the most. For example, a post at 2:00 AM from new device. Linked to trending spam topics tells a different story than the same post from a long standing account. When you put all these signals together, you catch patterns no single dimension can reveal. Now let's look at how we bring those signal together in the model itself. Something we call a model architecture ensemble approach. So where large scale systems. And don't rely on one big model. They, instead they use a hierarchical ensemble first. Do a fast pre-filters, where light model lightweight models that can process use volume quickly. They're designed for high recall, catching anything even slightly suspicious. Now we bring into a specialist model here, like deeper, which has deeper domain specific networks for text, images, videos, behavior. And then comes the multimodal fusion where they're combining the signals using cross attention. This is where we find hidden correlations, like when the text looks fine, but paired with image features, it, it becomes clearly a spam or a harm. Finally, a meta learner integrates everything weighing outputs based on historical reliability. The result is speed when we need it, and depth when it matters. It's the difference between quickly scanning luggage at an airport versus pulling aside a suspicious bag for deeper inspection. And to serve this model in production, we need a specialized infrastructure. So let's talk about distributed serving architecture. Serving at a scale is a systems challenge as much as an ML one. Large scale systems use context of air load balancing to send requests to the right resources. Horizontal autoscaling helps us handle certain traffic spikes. A low latency feature store can ensure models, gets what they need without recomputing, and a specialized hardwares like GPUs and TPUs accelerates inferences. Performance optimization is constant. Model quantization makes networks smaller and faster. Batching lets lets us process multiple requests together. Prioritization ensures critical cases are not stuck in the queue, and the circuit breaker prevents cost getting failures if something goes wrong. All of this will allow to consistently hit 80 millisecond end to end latency even under extreme load. 80 millisecond. This just a figure. It could be anything. Whatever matters to organization, but even the best infrastructure needs a smart data strategies to keep the model learning. So let's talk about a smart sampling and learning strategies. Leveling billions of sample is not possible. The simply not possible. The question is which samples matter most. So large scale systems use uncertainty sampling. Extending low confidence cases to human review. Diverse diversity sampling makes sure we don't miss underrepresented languages or formats. Adversarial sampling, durability, six out the cases where models fail on making the system even stronger. And time sensitive sampling ensures we catch new attack vectors as they emerge together. This strategies a drastically reduced labeling cost while keeping the models robust. One example I can think of is like when attackers started hiding spam in memes. Adversarial sampling surface these cases early, letting retrain before it is spread widely. But not all spam come from individual. Often it's coordinated campaigns. So let's discuss a little bit about coronated attack detection. This is where things gets interesting. A single post or may look normal, but when thousands of accounts act in unison, that's a coordinated attack. Large scale systems use real time streaming analysis to spot anomalies in activity patterns. They apply graph neural networks to model user content interaction as dynamic graphs exposing suspicious clusters, and they deploy countermeasures in real time, like throttling activity, adding friction with captures or flagging suspicious clusters for further review. One case we can discuss here is like 10,000 accounts lacking a post in under one minute. Individually, nothing stood out, but when viewed as a graph. The coordination will be obvious. Now while large scale systems might fight these attacks, they also have to protect user privacy. Everything I've described so far has to be done under strict privacy laws like GTPR and CCPA large scale systems use very learning to train across decentralized data without centralizing raw content. Differential privacy ensures individuals can't be re-identified. The systems also employ advanced cryptographic techniques like homomorphic encryption for secure inferences, multi-party computation for trust distribution, and geo knowledge proofs for compliance without exposing our data. The key point being here, privacy and performance are not opposites. With the right techniques, you can have both. Now let's look at how to keep these models fresh and reliable in production. The training cycle is continuous. The large scale systems, reference model retrains models daily on fresh data using b optimization for hyper parameters. Then evaluate against golden data sets, fairness slices, and adversarial examples. New models run in shadow deployment first silently for scoring real traffic without affecting users. Once it's stable, they move into progressive rollout, which means can retest gradual traffic shifts. Automated metric analysis and instant rollback if something slips. This rhythm ensures the morals. Keep space with evolving spam tactics without ever destabilizing the system. But the deployment is only half. The story. Monitoring keeps everything healthy. Usually large scale systems monitor at four labels. Model performance, which basically is precision recall and drift detection and system performance is another, which has request latency, q, te, and resource utilization data. Drift is third one, which has includes metrics like feature distributions is schema, validations, statistical tests. And business impact, of course, like user eng, user engagement metrics, false positive appeals, and of platform health. The idea is to catch problems before users do. When it's a model drifting, like whether it's a model drifting or a certain traffic spike. We want to detect early and respond fast, and when issues do happen, rely on a structured incident response. So let's talk about anomaly detection, incident response here. So we know incidents are inevitable. Resilience comes from how you respond. The large scale systems use automated anomaly detection, so involves outlay detection, multivariate analysis, and baseline comparisons. If something is off the systems follow structured playbooks. Severity based escalation. Automated rollbacks shadow mode for investigations and configurable safety thresholds. And after every major incident, run post-mortems. Not just to fix, but to learn and strengthen the system. This structure discipline turns outages into opportunities for resilience. Now let's look forward what's next in this arm's rest. So the arm race is not slowing down First. Synthetic content. Let's talk about that. Attackers are already experimenting with AI generated spam, deep fakes, synthetic text that slips past filters. Second cross platform coordination is another thing where spam campaigns don't stay confined. They spread across apps and platforms. Collaboration, while preserving privacy will be the key here. Third, let's talk about adversarial robustness. We must strengthen defenses against intentional manipulation of models. Finally, explainable AI enforcement actions must be transparent both for users and regulators. We need models that not only predict, but also explain why it took certain action. The future will demand even faster Adaptation. This is not a one-time problem, but a continuous race. So let's wrap this up with the key takeaways. Here are four key principles. I want you to remember, spam detection must be multi. Meaning you have to take care of text, images, videos, and behavior together. ML ops at billion users scale requires a specialized architecture is something we have to be aware of. Privacy and performance can coexist with the right techniques. That's very important. Continuous adaptation is non-negotiable. We, you must know this is an evolving arms race. If you keep these principles in mind, you'll be better prepared to design systems that don't just work in research, but that scales in the real world. And with that, let me thank you. So thank you for listening. I hope this gave you a clear look into what it takes to build robust, scalable privacy, presuming ML ops system for span detection at global scale. This fight against spam never ends, but with the right architecture, we can stay ahead. Thank you.

Slides

Download slides (PDF)

See all 37 talks at this event!

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

Production MLOps for Spam Detection: Real-Time Multi-Modal AI at Billion-User Scale

Video size:

Abstract

Summary

Transcript

Slides

Prabhakar Singh

Software Engineer @ Meta

Join the community!

Featured event

2026

2025

Info

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

Production MLOps for Spam Detection: Real-Time Multi-Modal AI at Billion-User Scale

Video size:

Abstract

Summary

Transcript

Slides

Prabhakar Singh

Software Engineer @ Meta

Join the community!