Transcript
This transcript was autogenerated. To make changes, submit a PR.
Good morning.
Good afternoon, everyone.
My name is Siva Prakash.
It's a pleasure to be here.
With two decades in the IT industry, I have dedicated the last four
years specifically to navigating the intricate technological landscapes of
the financial and insurance sectors.
During this time, I have seen firsthand the evolution of systems.
The explosion of data and the mounting pressure for absolute
reliability and security.
Today I want to talk about a transformative approach that addresses one
of the most pressing issues in our field.
Cloud native observability financial institution are the
bedrock of our economy, processing trillions of transactions annually
through increasingly complex.
Distributed architecture, this complexity while enabling innovations
being unprecedented, observability challenges, the stark reality is
that traditional monitoring approach often detect a mere one to two
percentage of critical anomalies.
Think about that 98 percentage of potential issues might be lurking, unseen.
This can and does lead to significant.
Downtime costs, customer dissatisfaction and serious compliance viol violations.
But what if we could flip that statistics?
That's precisely what I'm here to discuss.
We will explore how cutting edge cloud native observability solutions,
particularly those harnessing the power of graph based topology analysis and
behavioral analytics, are fundamentally changing financial systems monitoring.
We are aiming for and achieving up to 99.4 percentage accuracy in identifying system
anomalies while simultaneously making a massive dent in operational noise by
reducing false alerts by 87 percentage.
Let's spend a few moments dissecting the current monitoring challenge because
understanding the problem deeply is.
Appreciating the solution.
As this slide illustrates, it's a multifaceted issue.
Number one, low detection rate.
I mentioned the alarming one to 2% detection rate for critical anomalies.
Anomalies.
There is not just a statistics, it is a significant business risk.
It means that by the time an issue is manually discovered, it may
have already impacted customers or critical financial processes.
Alert fatigue for the anomalies that are detected.
Traditional system often vol, they generate an excessive
number of false alert positives.
My colleagues in operation often tell me they are drowning in alerts.
This alert fatigue is dangerous.
Critical alerts get missed because teams are desensitized or simply overwhelmed.
System complexity.
Financial systems today are no longer simple.
Monolithic application.
They have evolved into incredibly intricate webs of microservices,
APIs, and distributed databases.
We are talking thousands of interdependencies.
Trying to manually map or understand this with all tools is like trying to navigate
labyrinth blindfolded financial impact.
The direct consequence of this shortcomings is significant.
Downtime costs, but it's more than just that.
It's reputational damage, loss of customer trust, and the ever looming threat of
regulatory penalties for non-compliance.
So why are traditional tools failing us?
Predominantly, they rely on threshold based monitoring.
You set a static limit CP usage above x percentage response time over y
millisecond, but in a. Cloud native world, these thresholds are often too rigid.
They don't understand the content context or the complex
relationship between the component.
This leads to those BL critical blind spots and makes troubleshooting
are reactive, a resource intensive and often frustrating process.
This brings us to our solution, starting with a foundational
element, graph based ology analysis.
Imagine creating a dynamic living and breathing blueprint
of your entire financial systems.
That's what this is.
It's about understanding the who, what, where, and how.
Your sensitive interactions, the process unfold in logical steps, data collections.
First, we gather comprehensive elementary, that include logs, detailed
metrics, and distributed tracers.
From every single component in your system, be it in a, be it
an application, a microservice, a database, or a piece of infrastructure.
Number two, graph construction.
This is where the real intelligence come into play.
We don't just collect data.
We use it to map out all service dependencies and
their intricate relationship.
This construct, a graph model, a visual and analytical representation
of your systems topology.
Think of it like a social networking for your services.
We see who took talks to whom, how often, how politically pattern analysis.
With this dynamic graph, we can analyze how services interact
under normal condition, and more importantly, identify behavioral
patterns that deviate from this norm.
Patterns that are anomalous.
For instance, a service suddenly trying to communicate with another
service it has never interacted with before, or a critical pathway showing
unusual latency, intelligent editing.
This allows for highly targeted notification because we
understand the relationship.
The alert come with rich context, helping teams.
Understand the blast radius and potential upstream downstream impact.
This also feeds into risk assessment, helping prioritize issues.
The true power here, and I have seen this make a huge difference, is its ability
to reveal those hidden connections and dependencies that often go in notice.
This allows us to identify anomalies that conventional detection methods with
their siloed views would invariably miss.
Sequently, we can precisely target the root cause of issues rather
than just firefighting the symptoms.
This shift from reactive to proactive is a game change changer, especially in
preventing those fast escalating failure that can ripple financial operations
hand in line with our graph based apologies or behavior.
And these frameworks, if the graph tells us how components are.
Interconnected Behavioral analytics tells us how they should be behaving
within that structure and critically when they start to deviate.
This framework is a continuous loop number first metric collection.
We are gathering over 300 different system metrics in a real time
from across your environment.
This provides a rich, multidimensional view of.
This is where it gets really smart.
The system does not rely on predefined static thresholds.
Instead, it uses machine learning to establish normal operational pattern
for your specific environment.
Creating dynamic baseline that continuously adapt to changing
condition and seasonality.
What's normal on a Monday morning might be different from a Friday afternoon
during options expire, for example.
Number three, contextual analysis.
When deviation from this learned baseline occur, the framework
evaluates their significance.
It considers the context what else is happening in the system?
How does this deviation relate to others?
Is this a minor isolated blip or a statistically significant pattern?
That could be the early warning sign of a larger issue.
Number four, deviation detection.
Based on the contextual analysis, the system identify even subtle anomalies
that might be indicative of potential failure, often long before they would
breach a traditional static threshold.
The crucial differentiator here is the move away from
those rigid static threshold.
As I mentioned, static thresholds often lead to a flood of false
positives or conversely mis emerging issues until it's too late.
Our behavioral analytics approach by focusing on subtle deviation
from learn norms, achieve early detection rate that are impressive.
5.3 times higher than typical industry standard in my whole
years in finance and insurance.
Ability to get ahead of an issue to detect it in fancy has consistently
be been a key factor in maintain, maintaining stability and trust.
These are not just theoretical advantages.
The performance metrics speaks volumes as you can see from this comparison.
Let's first look at the anomaly detection rate.
Our solution represented by the lighter bar achieves a 99.4%
accuracy in detecting anomalies.
Compare that to the very low single digits of additional monitoring.
That's a monumentally.
Now consider the false positive rate.
We have achieved an 87% reduction in false alerts.
Imagine the impact on your operation team, less noise, less wasted effort, and
the renewed ability to focus on genuine issues that require their expertise.
This also the rebuilds plus in the monitoring system itself.
Early detection, it is normally measured in ours, is significantly enhanced.
This crucial lead time gives times a much teams a much better chance to investigate,
mitigate, and resolve problems before they impact users or escalate
into major incidents and critically look at un unidentified failures.
There is a dramatic decrease here translating to a 76.
Percentage increase in detecting previously unidentified failure pattern.
This means fewer unexpected outages, enhanced system reliability, and
a more resilient infrastructure Overall, this proactive discovery
of unknown failure, more invaluable for continuous improvement.
Financial systems are data factories.
They generate absolutely volumes of tele data logs, metric prices,
often reaching data scale.
Handling this sheer volume efficiently is a significant
engineering challenge in itself.
Slow processing or inefficient storage can negate the benefit of
even one, the smartest analytics.
Our cloudnative platform is specifically architecture to manage deluge effectively.
It employs intelligent data filtering, which is crucial for prioritizing
the most relevant telemetry.
Not all data is created equal when it comes to detecting anomalies.
And our system knows how to focus on the signals that matters.
Adaptive compression techniques are used to optimize storage
utilization significantly, keeping costs manageable without sacrificing
access to technical, historical data.
This enables real-time analysis, providing immediate insights into system behavior.
The result of this intelligent data management is processing speed up to 42
times faster than traditional solutions.
The speed combined with optimized storage and cost efficiency is vital.
In finance, data insights are often time sensitive.
The ability to process and analyze vast data sets quickly is fundamental
to maintaining a competitive edge and operational stability.
Ultimately, these techniques, technical advancements, must translate into
tangible business value, and they do.
We are seeing.
A remarkable 94% reduction in monitoring infrastructure, maintenance cost.
This is not just a small saving.
It frees up significant budget that can be reinvested into innovation
or other strategic initiatives, and 82% improvement in scalability.
This ensures your system can handle peak transaction, period.
Think market open Black Friday for retail banking or month end
batch processing reliably and without performance degradation.
71% faster response in incident resolution capabilities, faster detection, better
context and root cost analysis, all contribute to minimizing downtime
costs, and importantly, limiting any negative impact on your customer.
And 68% enhanced mapping of cross service dependencies.
This deep understanding is crucial for proactive risk management,
preventing those cascading failure that can spread rapidly through
interconnected financial system.
These results are not confined to a lab projects in.
It showcases a real success across.
Diverse financial institutions.
We share a few example.
Consider Global Investment Bank.
They were struggling with a sprawling estate of 12,000 microservices.
By implementing our solution, they reduced their mean time to resolution.
NTTR by an incredible 65% in virtually eliminated 94%
percentage of their false alert.
The bottom line impact approximately three point.
$2 million saved annually in operational cost.
Then there's regional credit unit.
Their challenge was improving system availability and
proactively addressing issues.
They saw their availability jump from 99.2 percentage to an
outstanding 99.97 percentage.
Furthermore, they were able to detect potential failure before they
impacted customers in 98 percentage of cases, and as a bonus reduce their
monitoring staff requirement by 40%.
An insurance provider faced a dual challenge of rapidly
increasing transaction volumes, 2.5 times more, and the need to
reduce critical I 73% decrease.
Critical incident frequency and significantly enhance the regulatory
compliance reporting through automatic anomaly documentation across these
and other implementation financial institution have experienced
on an average of 59% decrease.
In meantime to resolution, I'm an TR, while simultaneously improving
system reliability by 83%.
These figures under the consistent and significant value.
Very good.
Irrespective of an organization size or the specific complexity of its IT
environment, we recognize that adopting a new observability paradigm is a
journey, not a flip off, a switch.
That's why we champion a strategic asset, a carefully phased implementation
approach designed to minimize disruption and accelerate your time to value.
This is not.
About a Big Bang deployment.
It's about high success.
Our proven method methodology typically unfolds as follows, number one,
assessment phase two to three weeks.
This is crucial starting point.
We conduct a pro system in perform detailed topology mapping of your existing
environment and carry out a comprehensive monitoring gap analysis this and shows
we understand your unique landscape.
Two ILI deployment.
Three to four weeks armed with insight from the assessment.
We then focus the implementation on the set of your cost most critical services.
This allows us to establish initial baselines, demonstrate value quickly,
and gather learnings in a control manner.
Number three, expanded rollouts four to six weeks.
Based on the success and learning from the pilot, we then scale the solution
to your full production environment.
This phase includes careful, alert tuning to align with your operational workflows.
Four is optimization it.
It is an ongoing one.
Observability is not a one-time setup.
This phase involves continuous requirement, refinement of
detection, algorithms, dashboards, and integration to ensure the
solution evolves with your system, and delivers lasting effectiveness.
This structured approach, beginning with that comprehensive assessment, allows
for a targeted and efficient pilot, which in turn paves the way for a smoother,
more successful scale development, ensuring buy in and minimizing a race.
A natural and important question is always, who will this integrate with
our existing complex technology stack?
Our solution is architected for flexibility and seamless
integration cloud platform.
We provide native deep integration with all major cloud providers, AWS,
Azure GCP, including leveraging their auto-scaling capabilities for efficiency.
We also support on-prem environment.
Application layer.
Our instrumentation is language agnostic, meaning we can monitor diverse
application portfolios often without recurring extensive code modification.
We fully support open telemetry, which is key for future proofing
and avoiding vendor lock-in, and also offer zero code integration
options for many common technologies.
Data stores, we offer specialized connectors.
Providing deep visibility into both SQL and no SQL database performance,
including query performance analysis and tools for capacity
planning, security and compliance.
This is non-negotiable in the financial sector.
Our architecture is designed to be SOC to PCI and GDPR.
It features end to and encryption and robust role-based access controls
to ensure your sensitive telemetry data is handled securely and meets
stringent regulatory requirement.
This versatile architecture ensures that we can fit into your world rather
than forcing you to fit into ours.
Providing comprehensive visibility across your entire aspect.
So how can you embark on this journey to transform your observability capabilities?
We offer several pathways to get started, failure to your needs.
Schedule a complimentary assessment allows us to help you
understand your current state.
We can book a system topology assessment to pinpoint specific monitoring gaps
and identify clear opportunities for improvement within your environment.
Request a custom demo seeing is believing we can provide a tailored
demonstration perhaps using an now anonymized sample data that
reflects your kind of environment.
To help you visualize the direct benefits and how this would look and feel for your
teams, download our implementation guide.
For those who like to dive into the details.
We have a comprehensive implementation guide packed with best practices
specifically curated for.
Financial system, join our community, connect with peers.
We facilitate a community where you can share insights and learn from
other financial institutions that are also on the journey to implementing
advanced observability solutions.
We are genuinely passionate about helping financial institutions like yours.
Achieve this new frontier of 99% anomaly detection, accuracy, and
dramatically improve system reliability.
Our dedicated financial services team is ready and eager to collaborate with you.
We will work side by side to design a implementation plan medically tailored
to your specific environment, your unique challenges, and your strategy requirement.
That's it.
Here comes the end of the slide.
Thank you very much for your time and attention today.
It's been a privilege to share who the potent combination of craft-based
monitoring and behavioral analytics is truly revolutionizing observability within
the demanding context of financial system.
I hope this has provided you with valuable insight into what's possible.
I'm now very happy to open the floor and answer.
Any questions that you may have.
Thank you so much again.