Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and thank you for joining today's session at
Con 42 on incident management.
I'm Viji Ma personally, and I bring with me around two decades of exercise
in cloud technologies specializing in microservices architecture.
Over the years, I have had the privilege of working with various
organizations as a hands-on architect and team lead, have guided teams
in building and deploying mission critical microservices applications.
Within agile and cloud environments.
Currently I'm working as a tech leader at Freddie Mac, and today I'll be sharing
how the same principles tie into the evolution of modern A PA gateways and
their role in data driven reliability for microservices and serverless environments.
Why a PA gateways matter for the incident management?
Let's start with why.
A PA gateways have evolved from being simple traffic routers
to becoming critical planes.
In fact, they are often the first responders during the incidents.
They give us comprehensive visibility across distributed services and
enabled rapid fault isolation and allow for target recovery strategies.
Think.
Of them as the strategic checkpoints, the place where you can apply resilience
patterns consistently without having to change each service individually.
Let's discuss today's agenda.
Here are the things I'll cover today in this session.
The number one, the evolution of API gateways, and how service mesh
integration strengthens reliability.
Optimizations for serverless workloads and edge computing as a resilient strategy.
The fifth one, AI and ML power routing and caching techniques.
And the final one, security and zero trust resilience.
By the end of this session, you will see how modern gateways
are inges traffic managers.
There are incident management engines.
Let's jump onto the evolution of API Gateways.
We'll deep dive into each of this topic one by one.
Let's jump onto the evolution of API gateways.
Let's trace the evolution of this API gateways during the
different generations of first.
The generation one, the basic proxies.
These gateways just handles the simple routing and basic authentications
and offer very limited visibility.
And the next generation API management, it added rate limiting
analytics and developer portals.
This was the start of governance and in the further generation, the cloud
native, which is the Kubernetes native service, mesh integrator, and much
better at scaling with microservices.
The final generation, the reliability engine, this is a big leap, a driven
re resilience and predictive scaling, and even autonomous recovery.
Today, the modern gateways can process one 80 million a p calls daily across
eight 50 plus microservices, keeping latency under 50 milliseconds, and
uptime at 99.99 percentage even during the incident conditions.
Let's check into the second topic, service mesh integration.
Next, let's talk about the service mesh integration, which I call
the relatable team multiplier.
When paid with an API Gateway service mesh delivers a lot of great features.
Something like below, which is like 62% faster in insulin detection,
34% lower end-to-end latency.
And 57% better resource efficiency, 78% more accurate and fault isolation.
Together they form a resilient control plane.
We get real time traffic, shaping intelligence, circuit
braking, and observability, all in one co one ecosystem.
Let's jump onto a real world example, which comes from the
financial services sector.
Which coordinates eight 50 plus microservices through
a single gateway cluster.
The setup processed one 80 million daily calls, kept LA at 47 milliseconds, and cut
MTTR from 23 minutes down to 4.3 minutes.
That's a game changer for incident management.
Let's move on to the serverless optimization.
Now let's move into the serverless.
Which brings unique challenges.
Some of them are something like this, like cold start mitigation
by using pre warming strategies.
We have seen a 76% reduction in cold strats maintaining 88 milliseconds, warm
starts even during instant recovery.
Scaling precision predictive scaling has reached 99.995.
Percentage accuracy enables us to handle 42 million monthly
events without over provisioning.
And finally, the request izing with intelligent batching, we cut down 43%
of functional invocations during spikes.
The takeaways, API Gateways help serverless workloads stay fast.
Coefficient.
And reliable.
Let's move on to edge computing for resilience.
Under another powerful lever is edge computing.
Deploying gateways at Edge provides 58% reduction in global latency support for
98,000 requests per second, across 40 to global locations, and 99.95 percentage
of time during the regional outages.
73% reduction in cross region traffic.
An example of a edge deployment architecture looks like this, and
which has several key things which are like regional API Edge, which are
like local gateway notes, handling user traffic and a control plane,
which has a central configuration and routing policies, and then an active
routing, which does load balanced.
Health aware traffic screening.
Finally, the failover paths, which does automatic automated
fallback to healthy regions.
In practice, this multi region architecture allowed one E-Commerce
customer survived a major US East one outage by rerouting traffic
automatically to healthy regions, maintaining 99.98% availability.
This is resilience in action.
Failures are contained regionally while customers stay online.
And move on to the next topic, which is AI and ML capabilities with API Gateways,
AI powered routing gateways make nine, 950,000 routing decisions per minute with
99.95 percentage optimal path accuracy.
This directly speeds up incident resolution by 38%.
An ML driven caching by analyzing request patterns, data volatility, and
traffic fix, we achieve up to 47% back and load reduction during the incidents.
The benefit is clear here.
Instead of engineered, scrambling to scale systems under pressure, AI and MI
allow the system itself to absorb the stress while teams focus on the recovery.
Let's jump on to another topic, which is zero trust security,
During the incidents, let's not forget security re resilience.
During the peak loads, gateways can process 1.9 million authentication
requests per minute with an average 16 milliseconds response time while
maintaining 99.99% compliance.
How.
To distributor token validation, local policy enforcement, and graceful
degradation, even when identity providers or restructured security remains intact.
These that during incidents, we are not trading off availability for complaints.
We are maintaining both.
Let's discuss implementing resilient gateway architectures.
So how do we do this altogether?
There are several key factors to achieve this.
One, a clear ownership and boundaries define what the
gateway forms versus the services.
And second, the multi-layer observability, which we have to
do the tracing metrics and probes.
Failure, isolation patterns, which are like circuit breakers,
bulk hurts, and rate limits.
And automated remediation build self-healing with fallback behaviors.
And finally, the instant playbooks.
Clear Runbooks for gateway specific failures.
This isn't just about the tech, it's about the operational discipline and readiness.
So the final key takeaways from this session, a PA gateways
are no longer just routers.
They are critical incident management control plans, service
mesh integration cards, incidents detection time by over 60%.
Edge deployments maintain near a hundred percent of time during the outages.
A powered routing and ML caching deliver fast recovery and efficiency gains.
Zero.
Trust, resilience, and ensure security is never com
compromised during the incidents.
I hope I covered all the topics which I intended to cover during this session.
Thank you all for listening.
I hope this gives you a clear picture of how modern API gateways are becoming the
backbone of reliable cloud native systems.
Thank you.
I would love to connect further.
Here is my LinkedIn link if anyone want to connect.
Thank you once again.