Harnessing Machine Learning to Combat E-commerce Fraud: Real-time Detection, Advanced Feature Engineering, and Scalable Systems

Video size:

Abstract

With global e-commerce transactions projected to exceed $8.9 trillion by 2026, fraud rates have surged by 21.3% year-over-year, presenting a critical challenge for digital commerce ecosystems. This session explores the transformative role of machine learning (ML) in detecting and mitigating e-commerce fraud, offering a data-driven roadmap for future resilience. Traditional rule-based systems now average a 71.4% detection rate with a 28.7% false positive rate, leading to substantial revenue loss and customer friction. In contrast, ML-driven systems process 947,000 transactions per second and analyze 2,347 features per transaction, achieving pattern detection accuracy of 96.7% while reducing false positives by 81.4%. We delve into cutting-edge supervised models like Random Forests and Gradient Boosting Machines, which deliver up to 96.2% precision while maintaining sub-70 millisecond inference times across 127,000 concurrent sessions. Unsupervised approaches like Isolation Forests and Autoencoders complement these models by detecting 26.8% previously unknown fraud patterns with real-time adaptability. Advanced feature engineering across temporal (achieving 94.8% timing anomaly detection), network (with 95.2% accuracy in fraud ring detection), and behavioral domains (distinguishing bots with 93.8% accuracy) has further revolutionized fraud prevention. Scalability remains paramount—optimized distributed systems now handle 2.7 million transactions per minute with 99.992% uptime, achieving consistent sub-150 millisecond decision times. Moreover, hybrid systems integrating ML with rule-based logic have elevated detection accuracies to 97.4%, while smart manual review systems have improved reviewer efficiency by 57.2%.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. I'm Serena Raju, a senior software development engineer at Amazon. Today I will be talking about one of the most critical and rapidly evolving challenges in e-commerce fraud, and more importantly, how machine learning is transforming, how we combat it as the digital economic grows soon as the sophistication of suicide crime. We'll explore how traditional systems are feeling and why machine learning is emerging as a more resilient and scalable solution. So the topic is harnessing machine learning to combat e-commerce fraud. By 2026, the global e-commerce is projected to hit nearly $9 trillion. The tremendous growth. It comes with a risk. Fraud is growing at more than 20% per year for businesses. This means not only direct revenue loss, but damage to customer trust and brand reputation. I want you to think about this. What happens when a genuine customer's card gets strongly flagged and declined? They don't just lose patience, they might never return. How many of you have experienced a false decline while shopping online? Probably many traditional rule system worked well. In a simpler time, you would set through like block all the transaction over a thousand dollars from a new device. But today. The frauds are faster and more coordinator. These rules can't adapt the consequences over 28%, false positive meaning real customers are turned away. And these false declines cost over $3.6 billion every year just from lost transaction. So if Ruby Systems are updated, what's the alternative? Machine learning. Machine learning has changed the game and through static rules we now use systems that can learn and evolve. Let's take a closer look. ML models can process nearly millions transaction per second and analyze thousands of features far beyond what an analyst can do manually with over 96% detection accuracy. And an 81% drop in false positive. The result speaks for themself. Customers enjoy smoother transaction and business stops more fraud with less friction. In supervised learning, we train models on historical data where, which, where we already know which transaction were fraudulent. The random forest algorithm are great inter interpretability and work well when we need to explain decisions to regulate us. Whereas the gradient boosting shines with imbalance data like fraud case, which are rare but critical. The neural network can spot settle complex pattern even humans can't articulate. And all of them work in real time under 7,200 milliseconds per transaction, even when handling over a hundred thousand conference session. Think of this teaching a spam filter, but only the stakes are high and it's more real in terms of the customers and the dollar amount. So model selection depends on the specific fraud patterns and the data volume and the explainability requirements. Depending on that, we can either use one of them or the combination of these, but what about new frauds that we haven't seen before? That's where the unsupervised learning comes in. These model don't need labeled examples. They just look at the anomalies. Some of them are like isolation Forest works by separating out individual behaviors. Another one is Altman coder. Try to reconstruct normal behaviors and flag anything that doesn't fit. Clustering fine groups. Sorry. Clustering algorithms, buying groups of suspicious pattern like fraud rings, these tool detects over 26% of the novel fraud items, threats that would have bypassed traditional filters. So these unsupervised learnings are critical component as these will identify some of the unknown fraud patterns. And allow us to detect before it happens. Features are the signals. Our model learned from raw data, like transaction time or IP address is useful. But when we combine them creatively, we unlock powerful indicators. Temporal features help us to detect timing anomalies like buying. 500 gift cards in five seconds. Network features really reveal links between users, devices, and ips. Behavior features like cursor movement or timing speed can distinguish bots from human with over 90% accuracy. Great models needs great features, and these must evolve constantly. As the frauds adapt to these feature engineering transforms raw transactions into the rich and informative signals that could dramatically improve the model performance. The most effective feature often combine data across these multiple domains, creating a multidimensional risk indicator, and that FoST cannot easily manipulate. Ion. Yeah. Evaluation is essential as static feature sets quickly become targets for sophisticated fraud techniques. Detecting fraud is one thing. Doing it at scale is another. Imagine Black Friday, millions of transaction in minutes. We need system that make fraud decisions. In under one 50 milliseconds and support million transaction per minute, and for sure it needs to be up and running. It should never go down. That's why we use distributed fault tolerant microservice based architectures with the terabyte scale storage. This infrastructure enables us to stop fraud in real time. Without slowing the customers. So we need high performance processing. We're using distributed architecture. We'll use the optimized storage and we'll have a resilient infrastructure, the cloud-based infrastructure that will enable us to make sure if there are any failures, we have the other set of infrastructure up and running for us. Some businesses aren't ready to go all in on ML, and that's okay. Hybrid systems are a great bridge. They combine event driven rules with ML intelligence. Think like a pilot using autopilot, but still able to intervene. Rules provide guardrails, and ML finds their nuanced patterns. These hybrid system can push detection accuracy above 97%. And because they are explainable, they satisfy the regulatory needs. So we'll have this adaptive learning. We'll have these business rules or the guardrail that's encode the business expertise will have the integration layer. That is where decision on these fusion. And we'll have this feedback mechanism where we can continuously optimize the system. Now coming to the manual reviews, now not all cases can be automated. That's where we need some smart manual review. Comes in picture ml, prioritize which cases to review and routes him to the right analyst. It highlights suspicious signals and learns from analyst decisions, feeding those insight back into the model. This human in the loop system cuts review time by over 30% and improves the decision accuracy by more than 40%. It's a win-win for both security team and the customers. The smart. Manual review systems have improved the review efficiency as well, thus reducing the decision time by a lot from minutes to seconds. The human machine collaboration creates a powerful feedback loop that continuously improves both automate systems and the human system for the more complex cases where the pure automation. It cannot be handled reliably. Let's talk some of the business values that we see with the ML four prevention. The companies that uses the ML fraud preventions reports a 23% drop and fraud ops cost. Nearly 18% increase in approvals of good. Transactions, over 30% fewer customer complaints due to false declines. So it's not just stopping fraud, it's about driving revenue, customer satisfaction, and loyalty. What's next? Explainable AI will make black box model easier to audit. The next generation of fraud models will provide clear explanation of decision that could meet their regulatory requirements while maintaining the detection performance. Federated learning will help companies collaborate on fighting fraud without sharing the sensitive data. These collaborated more across the organization. Sharing NONSENSITIVE data will enable industry-wide PRI fraud protection. While we do preserve the private privacy and realtime adaption will let morally wall instantly when fraud patterns change. So the continuous learning system. That updates within seconds of new fraud pattern images will close the adaption gap. With the fraud leading platforms now demonstrate model and updating under 30 seconds after detecting the normal attack. The future of fraud detection is master faster and more collaborative. So we need to have these fraud prevention. With more collaboration, adapt their and explainable system. Organization that invest in advanced ML capabilities now have been better positioned to come back to more sophisticated frauds while maintaining the frictionless customer experiences. Thank you for listening to the topic. I hope you enjoyed it.

Slides

Download slides (PDF)

See all 136 talks at this event!

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Harnessing Machine Learning to Combat E-commerce Fraud: Real-time Detection, Advanced Feature Engineering, and Scalable Systems

Video size:

Abstract

Summary

Transcript

Slides

Surendra Lakkaraju

Senior Software Development Engineer @ Amazon

Join the community!

Featured event

2026

2025

Info

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Harnessing Machine Learning to Combat E-commerce Fraud: Real-time Detection, Advanced Feature Engineering, and Scalable Systems

Video size:

Abstract

Summary

Transcript

Slides

Surendra Lakkaraju

Senior Software Development Engineer @ Amazon

Join the community!