Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everybody.
A warm welcome to you all to this conference 42.
JavaScript 2025.
It's my real pleasure to be here with you today.
I'm here to talk about using AI to have a smart rate limited system,
which is not as scalable, and it can be cost efficient as well.
It is going to be cost efficient as well.
Let me ask you a question.
How many of you were here when building an API real reach out for rate
limiting as your first line of defense?
All of us, right?
Almost all of us.
It's a fundamental practice.
We use it to protect our systems from abuse, ensure stability, manage
cost, and prevent any denial of service attacks as anything that
is important, attracts thieves.
As we also know that as APIs become more important, they attract users with
malicious intent to either steal the data or cut down the incoming business.
But here's the critical challenge we face now, is that attackers have
become more smart and the threats have become more, have evolved
and become more sophisticated.
But a primary defense mechanism has not.
We are relying on a traditional starting rate limited system in a
dynamic, intelligent world of attacks.
This isn't a failure of concept, but rather missing evolution
In par with the threats.
Today we are going to explore why these traditional methods are no
longer sufficient, how they are actively costing companies millions
and still damaging user experience.
All while creating a false sense of security.
Most importantly, we are going to discuss a new path forward.
The one that is meant to form a essential tool rate limiting is an with artificial
intelligence to build APIs that are not just secure, but also remarkably
scalable and crucially cost efficient.
Okay.
With that said, let me move on to the next slide.
Let's begin with the fundamental truth of the current situation, right?
The API forms the heart of the modern web world, the, they power and touch
multiple aspects of our life, like iot devices, whether we use our mobile phones
and applications shopping and much more.
But this is critical infrastructure, but this critical infrastructure.
Mostly is protected by static rate limiting a defense strategy that
in terms of age, it might look like an ancient fossil when compared to
current sophisticated cyber attacks.
Actually, the idea of static rate limiting is by setting a fixed threshold, like
500 requests per minute per user does not work or prevent attacks anymore because.
Let's be honest, people have become smart and so have those cyber attackers.
With that said, one of on the one hand, it is set to limit too
low to protect against attacks.
Okay, let me talk about this.
In one hand, if you set it to too low the to protect against attacks, you
end up blocking your best customers.
Imagine a scenario like where loyal users flee face a or during a flash sale face.
Issues with being able to access the.
Portal that they are trying, because guess what?
They got the error.
4 29. Too many re requests error.
The data shows that these instances aren't rail on an average 41% of
estimate traffic gets blocked by overly aggressive static rate limiting rules.
But on the other hand, if you set the limit too high to avoid blocking
users, you might be actually opening the flood gates to abuse and misuse
the infrastructure costs might spiral out of budget and become more
inefficient, and you are left with.
Left to be with the system, which is vulnerable to very attacks.
You were meant to stop to begin with.
This isn't just an inconvenience.
This is the direct hit to the bottom line, leading to millions in cost,
revenue, and wasted cloud spent opening path for sophisticated hackers to get in.
The core of the problem is that static thresholds are blind.
They are not intelligent, are flexible enough to adapt to the changing
scenarios, or rather the changing context.
Let's diagnose the illness in the current context the static rate limiting.
Why is this 30-year-old paradigm failing us in 2025?
It boils down to three critical flaws.
First, the rigid thresholds.
A fixed limit cannot distinguish between the good attack traffic
spike or actual attack and a good traffic spike and a successful product
launch going viral on Hacker News looks identical to DDoS attacks from
static rate, limiter point of view, reasons being both generate a massive.
Search in traffic.
One in your dream.
One is the dream scenario and other one is your nightmare.
Second com.
A complete lack of context awareness.
Static rules ignore everything that makes a user who they are.
They don't care if the user has authenticated successfully from the
past two years being able to connect it using their office network.
They don't care if the traffic is flowing.
A logical sequence of APIs like getting the products, viewing the
products, and adding to the cart.
Under a static, non-intelligent system, this legitimate users looks
like a scripted adapter from a data center in a foreign country
hammering your login endpoint.
And third, the result of this flaws which are, which is nothing
but a sky high operational cost.
Because companies can't trust relate limiters to peace smartt.
They are forced to over provision infrastructure.
They pay for enough servers, database capacity, and cloud resources to
handle worse DOS scenario, even though 99% of the times those resources.
Sit idle burning money.
Even if the scaling is dynamic on the backend, it is very likely
the system scales to cater to a coordinated attack, not only responding
to attack, but also adding to the computational cost of the system.
And these are the three traditional rate limiting fails Now.
I want to zoom in into most real, potent, modern day threat that exploits
these weaknesses, which is distributed denial of attack or rather DDoS.
When many of us think DDoS, we imagine a more massive high volume nurse attack,
which is like a tidal wave of traffic that crashes cause failures on the servers.
But this is, this landscape has changed on world today, the most insidious
and common attacks are low and slow application layered DDoS attacks.
These attacks, don't try to bring down your front door with
raw power as the name suggests.
Denial of that is denial of service.
Instead, they pick the lock of the front door.
They target your API endpoints directly like your login,
your search, or your checkout.
And that is the most expensive past part of your application to run.
And here's the most genius and most terrifying part of these attacks.
They are designed to be stealth steal.
It is not like a lock picking in, it's like a lock picking in invisible thief.
A botnet of thousands of compromised devices will send few requests per minute.
Staying carefully just below your static rate limit, so as
not to get caught individually.
Each IP address looks like slightly active, but less make user, which
is not bombarding the setup.
But collectively, they consume all of your database connections, exhaust your
server CPU, and can rake up massive cloud compute bills all while your
basically rate limiter gives them a green light as they look legit.
This is the ultimate demonstration of why counting requests is no
longer enough to prevent DDoS attack.
We need to understand their behavior and come up with a solution that
handles such sneaky attacks.
Let's talk about the answer for this insidious situation, right?
We need to replace a static blindfolded rate limiter with
a dynamic intelligent one.
We need to move from simply counting rate limiter to the one which
truly understands the situation and takes decisions dynamically.
So as to who should be allowed access to the APIs, this is where we
introduce the AI powered framework instead of single number, our system
analyzes a rich mashup of 27 different.
Behavioral features in real time.
It just doesn't look at the number of requests.
It tries to understand what is the pattern of this request?
Who is making them?
What is their intent?
What is the sequence of requests from each single user trying to do?
By understanding these patterns, the smart setup can dynamically adapt.
It can confidently allow a surge of LE legitimate users during a
marketing campaign while identifying and throttling a sophisticated DDoS.
Attacks happening parallelly.
It provides robust security without sacrificing the user experience
and the business's bottom line.
All while keeping reputation, cost, and data security intact and in and checking
and keeping the attackers in check.
Now, since.
Since the smart system does so much effectively and tactfully, you
might be thinking, oh, this smart system sounds complex, but the
smarty process can be broken down into clean four step cycle, which is
collect, engineer, train, and deploy.
Now, let me talk about collecting the right data.
It all starts with the data.
The principle of garbage in garbage, how out has never proven to be more true.
We are not just collecting logs, we are gathering the digital DNA
markers of each API interaction.
We instrument our API gateways and load balancer to capture a
rich tapestry of over 14 criteria data points data, DNS markers.
Which we group into several key categories, in the category of request
metadata and patterns, which is one of the most important ones we capture the
velocity and the rhythm of requests, for example, is we, let's talk about frequency
and burst patterns, which actually tries to see if the traffic study.
Stream or is actually a steady stream or a burst of violent machine gun, like bus
and human browsing website has natural pauses, whereas a script does not.
Another example of request metadata and patterns category is
timing and inter request delays.
We measure the milliseconds between calls.
Real users have variable delays between the requests.
However, the automated attacks often operate with metronomic time differences,
which are humanly impossible.
Another category that we measure is sequential data and behavioral intent.
This is where we start to understand the intent of the API call being done One.
Parameter we use to understand this is endpoint access sequences.
What we do here is track the journey and not just a single request.
A SML users might follow a logical path of finding the products,
selecting the products, and actually then viewing, continue to viewing
the same and adding it to the cart.
An attacker who actually does not have intent of ordering and buying
products might just be hammer a single.
Endpoint like login or pro random endpoints like search or export
in a quick suce succession.
This sequence of API calls and even the missing API calls,
tells a story about the intent.
Another parameter that we use in this category is action outcomes.
In this, we just don't look at the requests, but also look at
what happened after the request.
A series of HTTP.
Four.
Oh 4 0 4.
Not found errors.
Might indicate a scanner, A rapid sequence of http 4 0 1 unauthorized,
followed by a single 200.
Okay.
Could be a credential stuffing attack in progress.
Let's look at another category here, which is authentication and session context.
In this.
This layer adds identity to the behavior.
For example, talking about login success and failure rates, a single, a couple of
failures during log user login is actually very normal, 20 failures in a minute from
the same IP or IP subnet, even if the API calls are below the static rate limit.
Is a different story altogether and is a huge red flag.
Another example to talk about in this category is session token usage, or
few questions to check around session R. How is the session being used?
Is it a newly created token immediately making high value transactions?
Basically, if a newly created token that is just generated.
Is suddenly very active is a token from one geographic location suddenly
being used from another an hour later.
These form critical trust signals, which may point to leaked
passwords or even bigger problems.
The next category we'll be talking about will be system
health and resource consumption.
Here we listen to what the API itself is telling us.
Example of example here would be like the API response time and the error
rates or DDoS attacks or resource intensive scraping bot will often
cause elevated response times and a spike in 500 series server errors on
the endpoints that they're targeting.
This resource intensive scrapping is a crucial symptom.
Of sophisticated attack that a static rate limit is completely miss.
The last, but not the least category we will talk about is the contextual signals.
Here, what we are looking for here would be the who and the from.
We use the geographical location and network resource and find
answer to questions like.
Is the user who not money logs in from London, suddenly making
requests from a data center in different country, or it is a bot.
We correlate with IP reputation scores and known VPN proxy networks to
verify the validity of the requests.
We also use device fingerprint and user agent to understand the
context answers to question like.
A request coming from standard browser with a con with a consistent
set of headers or a headless client with a suspicious or a missing
user agent helps us identify trustworthiness of their request.
This rich, multidimensional data.
Forms the rich material, it forms the foundation that our system can begin
to extrapolate the subtle nuances that differentiate between the real
user and that from the Han attacker.
Now moving on till now we have just been talking about the data and
which is just the list of facts.
Now step two would be engineer in.
Which is the art of and science of transforming these facts into meaningful,
insightful trends, setting signals.
And these signals are the features our AI model will actually understand.
Think of it as this way.
Roll logs are useless to a machine learn learning model.
We must teach it, teach the language of behavior, understanding.
From the loss.
This is where we create a med layer of 27 separate behaviors, which we group
into four powerful categories and help analyze that, help analyze and infer the
complete story of the actual API requests.
Let's break down these categories with concrete examples of what we built.
The first category is.
Temporal patterns that gives us the rhythm of crust in this.
First, we analyze time.
We move beyond a simple count to understand the pattern of requests.
We create features like request per second volatility and collect data on it.
This is a statistical variance.
In the customer request rate, humans are volatile and unpredictable,
whereas bots are often metronomically consistent and persistent.
We also calculate a burst score to identify short, high intensity
explosions of traffic that haul marks of automated, that forms the
hallmarks of automated scripts.
We even look at time of the day anomalies.
Where we identify if the requests are happening at the rec user's,
typical time on, on the time zone or at 3:00 AM in a different time zone
altogether, gather from a location that they have never been to.
The goal is to answer a critical question.
Is the traffic coming in smooth, human-like rhythm, or a throttling,
robotic, consistent bus, even if they're.
Smaller parts.
Another category that we need to talk about is access behavior.
That gives us the narrative of intent.
Here we analyze what is being accessed and how this is about
understanding the user's point of view.
Here we calculate the endpoint entropy, a measure of randomness in the api.
Endpoints being accessed.
A real user has low entropy following a predictable path like
home search and product page, and possibly adding to the cart.
Whereas a scanner has high entropy, high disorder list like jumping randomly
between different APIs, login, admin, export, or even search using continuous.
Our using continuous search APIs with multitude of inputs and not
using the other routine kind of APIs, which an end user uses.
We also create a suspicious sequence flag that triggers that the user's
parts matches a known malicious pattern.
These patterns are based on industry or sector wise, or data scan,
realize patterns identified and updated on a law ongoing basis.
An example of this is accessing a login endpoint immediately after
trying to hit a sensitive data export endpoint or immediately
accessing a search endpoint after.
Oh, search and an export in the previous call.
The insight we are engineering here is to identify and realize that the user is
browsing a diver set of endpoints like a human, or they're laser focused on a
simple, extensive APIs in the illogical and unhuman like sequence and speed.
The speed is not calculated here, but you get the point.
The next category we will talk about is network signal that help us
understand the context of connection.
Here we look at the origin of the request.
The network doesn't lie.
We build a geographically impossible score, calculating the physical
possibility of a user moving from New York to London between two subsequent requests.
This is a massive red flag.
We incorporate realtime IP reputation scores from threat intelligence feeds and
calculators source an anal anomaly index.
A measure of how unusual the user's networks source is
compared to their history and their general use base user base.
This category is also incorporating external context and looking for
geographic and infrastructural anomalies that static systems completely ignore.
Last category that we have to talk about is the user context, the
and the, basically that highlights the power of the baseline.
Finally, and most powerfully, we analyze.
Identity.
We don't treat every user as a stranger.
We derive a session confidence score based on the age of the session, the
diversity of action taken, and its geographical geographic stability.
We calculate fail login velocity, but we normalize it
by the user's historical basis.
A user who never fails a login certainly fails 2020 times.
In a minute or two is a much bigger alert than a known clumsy typist who
might make unsuccessful attempts, or someone like me who forgets password
soft one and takes time to remember and has failed logins, but that would
be based on the historical PA pattern that this uses, has such issues.
This is the crown jewel of all the features so far.
We are continuously asking here about if and how's this current session
compared to the user's 90 day baseline?
Has their behavior ly and suspiciously changed?
Now with the 27 powerful behavioral features, data stats, ready, we
move to then step three train.
This is very.
Important step.
This is where we build the intelligent brain of the entire system.
We use a detection tree ensemble model.
We can think this, we can think of this as a committee of many simple
interpretable subject matter experts, more relevant to their own feature.
Sit together, these experts work together to make.
Highly accurate and robust decision.
We specifically use a combination of random forest for robust generalized
classification and gradient boosting for precision tuning on difficult edge cases.
Now, you might be wondering what this terms mean.
Don't worry.
You don't need a data science degree to get to the core idea.
Let me explain them here first.
We we ha we have what we'll call the committee of expert or specialist.
Imagine we had 10 different security experts, or rather a hundred different
security experts, and each one of them is a specialist in a different area.
This is our random forest, and we don't give them all the same information.
One expert, one expert only looks at the timing and rhythm of request.
List to check if it's smooth, flow or robotic burst.
Another expert only focuses on geographic location, like looking at things like
login patterns to identify that it's a login from London just two minutes
after logging from New York and so on.
The third specializes in the sequence of pages or the API visits the
U user basically to fit whether it's looks like a natural browsing
journey or a random, suspicious.
Scan another one could only analyze the user's past behavior in this.
Is this action normal or abnormal for this user, we give each expert
a slightly different view of the request focusing or which where they
need to focus on their specialty.
We ask them all the same question.
Does this look like an attack?
Each expert makes their own decision based on their unique lens and
the data that is given to them.
In the end, we just take the majority vote.
This method is incredibly robust and reliable because it does not rely
on us any single piece of evidence.
It's hard to fool around a whole committee of ex specialists who are all looking
at different angles and different clues.
But sometimes attacks can be so clever and subtle that they can slip
past this general vote, and that's why a second technique comes into
picture, which we call it ours.
Master Investigator.
This is our gradient boosting model.
This detective examples, the entire case file, learning from
mistakes and becoming exceptionally skilled and connecting subtle dots.
To catch the most elusive threats, sophisticated and sneaky threats as well.
Now, why did we choose this specific combination?
There are three critical advantages, and let us look at them.
Firstly is that we achieve both breadth and depth and accuracy.
The committee provides reliable baseline detection while the investigator
handles sophisticated edge case.
This hybrid approach is a key to our.
97.5% model accuracy.
Second is that the division of labor enables realtime performance despite
multiple, despite using two models.
Our sophisticated architecture makes predictions in microseconds, which is
actually crucial for live API traffic.
And third, we gain enhanced interoperability.
The committee ran the committee, random forest shows us.
Which feels like most influential across all experts.
While the investigator reveals the sequential logic of complex cases,
this multifaceted understanding is vital for trust and debugging.
The training process is continuous.
We feed the model historical traffic data labeled as estimate
and malicious as malicious data.
And cross verification to ensure generalization and maintain automated
pipelines that regularly retrain and on new production data and basically
become more sophisticated and up to date.
This allows the system to adapt to novel attacks.
And evolving user behavior, creating a learning system, not a static
rule-based, which functionally overcomes the limitations of the traditional
rate limiting we discussed earlier.
The final step over here is deploy.
How do we put this intelligent brain into production without creating a
bottleneck or a single point of failure?
The answer is a cloud native serverless architecture.
And here what it looks like in practice, the model is packaged and deployed as
a serverless function, for example, in AWS Lambda or an Azure function.
Now, why is the serverless approach, so transf, let's me, let me explain it
over here about its core advantages.
For our use case, it is elastic and event driven, basically, unlike.
Traditional servers that you have to provision and pay for 20 24 7
serverless function scales to zero.
It only wakes up when the API request ticks.
When you see a certain traffic spikes, like during the product launch or
marketing event, the cloud provider automatically spins up thousands of
parallel instance in milliseconds.
There is no capacity planning and no manual intervention.
The system scales precisely as the demand increases.
Now the next advantage is granular paper use cost model.
This is a game changer with cost efficiency.
You are not built for the ideal time.
We did see earlier how.
The older infrastructure had extra cost owing to the fact that we
are maintaining the infrastructure despite not having enough request.
Over here, you are only charged for the milliseconds of the time, compute
time it takes to execute the model inference for each request during quite,
quite periods when there is no traffic.
Your cost for these API component drops to absolute zero.
While your system remains ready to spring in action now talking about built-in
fault tolerance and high availability feature Cloud provides run cloud.
Cloud providers run serverless functions across multiple availability zones By
default, this means if an entire data center is in one zone, has an outage.
The platform automatically routes traffic and executes the
function in another time zone.
You are, you get a highly resilient system without having to
architect the redundancy yourself.
Second is reduce operational over it.
We.
Completely eliminate the need for you to manage servers, operating
systems, runtime environments.
There are no patches to apply, no servers to reboot, no clusters to
monitor this no ops model allows your team to focus on building features
and not managing the infrastructure.
This serverless function integrates seamlessly into your existing API gateway.
Every incoming request that needs inspection has its features
calculated and is then sent to this function for real time inference.
The gateway then enforces the decision of allowing, delay, denying, or blocking
based on the models confidence score.
Now, you might wonder, what does intelligence system not become?
It's it's not what, how does this intelligence system
not become a bottleneck?
The answer lies in the powerful synergy between the static model
choice and the inherited strengths of serverless architecture.
First, our optimized decision tree example is purpose built for this environment.
It delivers the high accuracy we need with microsecond inference.
Speed and crystal clear interpretability, ensuring it can take lightning fast
decisions without slowing down your API.
Second, we deploy it using the serverless function.
The cloud platform provides automatic, near infinite scaling
and true parallel processing.
If 10,000 requests arrive at once, it's been sub 10,000 parallel instances.
There's no queue because there's no single point to queue at all.
We complete this with a robust and operational practices.
We implement zero down deployments using green blue deployment model, allowing
us to update the AI model seamlessly and roll back instantly if needed.
All without users having to noticing this powerful combination gives us
three massive advantages of infinite scalability, true cost, efficiency, and
built in availability makes this rigorous architecture that guarantees a system
which not only is intelligent, but also performant, reliable and cost effective.
This four step cycle, collecting rich data engineering intelligent features.
Training a powerful model and deploying with cloud native agility is how we
transform the blunt instrument of traditional rate limiting into a precise,
adaptive, scalable security system.
Now talking about advanced strategies, or even before that.
Let's think sometimes makes people wonder where this actually work.
These are not just lab result.
The other trail work reports, these are metrics that come from
real world production deployment across AWS Azure and Google Cloud.
We see 96% product detection accuracy for malicious threats, and we also
see 68% reduction in false positives.
Now, that's the number of millions of estimate users.
Who are no longer accidentally blocked, which increases better, which
increases the user experience of that many users by throttling precise,
precisely and only when needed.
Companies have achieved over 27% in infrastructure cost savings.
This model itself operates with 97% accuracy.
This is an efficient and tangible.
Bottom line improvement.
Looking at advanced strategy like progressive throttling, a key
innovation that drives down false positives, how we respond, we reject
the binary block or allow paradigm.
Instead, we use progressive throttling now.
What happens here is the AI assigns a threat confidence score from
zero to hundred based on the score.
We apply a graduated response in case of low score, which means a low threat.
That request will be full speed, will have.
Full speed access, no impact to real users.
In case of medium score, we introduced a slightly incremental delays.
A script will be crippled by 500 millisecond delay, but human user might
not even notice it in case of high.
Medium scores can be from 31 to 70, low, zero to 30 high, 71 to 99.
We enforce much stricter rate limits.
In case of confirmed threat, we will completely block the request.
This graceful, slow down and degradation is what allows us
to ensure security without being hostile and completely shutting down.
The system doesn't stand still.
It continues continuously segments.
Users, monitors model performance and incorporates
new data to retain and improve.
It's a living learning system that attach to your unique traffic patterns
and evolving tactics of attackers.
You if you are convinced it's time from move, from dark ages to
rate limiting, here's a practical phased roadmap to get you there.
Assessment phase audit, your current rate limiting.
Identify pain points.
What are your false positive rates?
What does your attack traffic looks like?
Establish a baseline, then do a infrastructure setup.
Configure the data collection.
Pipelines set the.
Cloud resources and monitoring dashboards.
This is the foundation model development.
Engineer your features and train your initial models on historical data.
Validate their performance against your baseline.
This is crucial.
Deploy a single API or a small percentage of traffic.
Monitor everything.
Closely tune the model based on real world feedback.
Full rollout.
Once your confidence scale across all your APIs, implement the
continuous learning loop and start optimizing the cost efficiency.
Now, I would like to wrap up this discussion and if you have any questions,
you can reach me at LinkedIn that is looking up against Rihanna Han.
There's only one person you will get and you can connect with me again, you
can connect with me to discuss about this or anything else, my friends.
Thank you again.
Okay.