Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
Thank you for being here.
My name is Nja.
Today I'll be sharing insights on how we can successfully scale AI systems in
the financial industry, specifically, how to handle high volume transactions.
While processing and while staying compliant with strict regulations,
the talk will cover on operational strategies, infrastructure
choices, and real world lessons.
We have learned from deploying ML pipelines across the financial
services transactional ecosystem.
What is the challenges that we are looking at when we look at financial institutions?
The challenges are unlike in any other industry, right?
Fundamentally.
We are processing billions and billions of transactions with less than a
millisecond of a second response times.
On top of that, every decision must satisfy the regulations like
G-D-P-R-C-C-P-A of CRA, and then you have to also maintain 99.99%
of time for critical systems.
While achieving the uptime latency is another aspect of it as well, right?
You need to provide responses within a fraction of a second while also making
sure that the right decisions are made at the right time, the right level of
compliance checks and balances happen.
I'll give you an example, right?
When you're doing a credit card authorization transaction, the
authorization has to happen in less than like a fraction of a second because the
customer is still there in the session.
While the customer is still there in the session, you are engaging
with the customer behind the scenes.
You need to make sure that KYC is done, the KYB is done, the
sanction screening check is done.
So all these things have to happen simultaneously in parallel.
And then final decision has to be taken while ensuring the
business outcomes are met, right?
Because every.
Every drop that happens when the checkout conversion doesn't happen,
that means it's a lost sale.
A lost sale means it's a lost in revenue.
So these are the factors that needs to be considered when we
are looking at deploying machine learning systems in financial space.
What are the three three critical financial applications?
There are three core areas.
One is, as I alluded in the previous example, it is real time fraud detection.
Then the credit assessment, then automated financial management of it, the first,
again, I'll quickly touch base on this.
The first is real time fraud detection models that can scan millions
of daily transactions to spot suspicious activity milliseconds.
The credit assessments where platforms needs to value the
credit worthiness while remaining compliant and explainable, right?
Imagine a scenario where you are applying for a personal loan or
you're applying for a credit card.
You would have seen this from your real world examples, like
what the decision is in life.
Few seconds, but behind the scenes, the whole aspect of assessment, prospecting,
underwriting, all these things need to happen in a pretty quick time.
And lastly, it is the financial management.
Everything from portfolio optimization to old algorithmic trading to
personalize financial guidance.
These are mission critical so that they rely on the strong ML operations
foundations and the associated pipelines.
Real time fraud detection architecture is super important.
As you can see over here on this slide what I'm presenting over
here is various channels, right?
You have the transaction sources from where.
A transaction can originate.
You have point of sale channels, you have online channels, you have mobile channels.
So all these channels have their own respective streams, and those streams of
data have to be processed and published.
And that is where the next aspect of pipelines and feature store come into the.
Where we outline what are the low latency inference systems, what
are the decision engines, what are their rules that they're applying?
And finally monitoring and audit systems, right?
Ensure telemetry, retaining, and compliance.
The key point here though, fraud detection isn't just about a model.
It's about a pipeline that can handle massive scale in seconds.
What are the challenges in fraud detection in MLO Ops?
And this is another key.
Vector that needs to be considered.
And as you see on this slide goes is the real time performance.
Achieving sub 10 millisecond latency while processing over a hundred thousand
transactions per second during peak periods is incredibly demanding.
Then the feature freshness, right?
Fraud stills evolve quickly.
The DDoS attacks that we have seen over the last few.
Instances will actually peak, right?
And that.
That plays a critical role in ensuring that our features stay in compliant
and with various fraud patterns.
The third is the rapid adaptation, which is pro tactics emerge
constantly and our systems need to recognize them immediately.
False positives, that's the final behavior.
If we flag too many legitimate transactions, we are losing the
risk of losing the customers.
So that's another thing that needs to be kept in mind, and that is where the
balancing act has to be super efficient.
Otherwise, we run into the risk of losing value of customers, which
in turn impact sales and revenue.
The next slide as you guys can see over here, it speaks about what are the key
vectors in assessing a credit application or assessing credit worthiness of an
individual using the ML ops framework.
As you can see the frameworks ensure data orchestration that tracks the full
lineage of every data source model.
Governance provides version history, approval, workflows, so we know which
model made which decision and validation systems rigorously test for payments.
When it comes to lending, regulators and customers need to know the
system is both accurate and fair.
That's.
On the credit assessment side of the world.
And of course the final aspect is the automated financial management systems.
This is, again, super important because if you consider in any line
of business, whether it is payments or lending or further matter, any
financial transaction, at the end of the day, everything has to be inside the.
It's a zero sum principle, right?
The debits and credits have to match, and you cannot have money
being created on the fly, right?
It cannot be created, nor it cannot be destroyed.
Every record has to match, and that can be achieved through streamlined.
Model deployment pipelines, monitoring systems, integration workflows,
automated workflows where you can actually achieve this through various
streaming technologies as well.
As and when a transaction happens, you publish it to a Kafka topic.
Consumer subscribed to the topic.
Then you record the transactions in your ledger system.
You use it for reconciliation, settlement clearing, making sure that everything
is reconciled behind the scenes.
So it's super important to have automated financial management systems.
Otherwise it becomes super tricky to track the transactions.
And the next slide.
As you can see over here, it covers the high level architecture for
deploying MOPS and financial services.
As you can see, the foundation is a data infrastructure, right?
Governance, processing, storage for structure and unstructured data
need to have a set of policies for both structure and unstructured.
One set of policies for structure.
Another set of policies for unstructured data on top of it sits the training
and validation pipeline which automates the compliance checks, which automates
the testing procedures of the same all of the various validation.
Pipelines, right?
And then the layer that is the engineering and storage layer that
delivers both batch and real type features, depending on where you need.
And then where finally we have the model registry and deployment
systems for versioning, AB testing and various rollouts, right?
You want to do controlled rollouts control testing, where you actually
decide, oh, if I. Pull the lever of a particular parameter in one direction.
How is it going to behave?
Or how the impact is going to be on my core products and features.
Will it improve, take rate?
Will it improve conversion?
Will it improve?
Will it reduce the, fraud rates or will it increase?
So there are, these are the various vectors that you can control by virtue
of having versioning and AB testing.
And finally, at the very top is the model serving, which is the
high performance APIs that powers customer facing applications.
You need to have, as I alluded into the, at the start in either of the use
cases where a customer is waiting on the checkout screen, trying to process
a, or trying to purchase a product.
Or a customer waiting on the screen where they're trying to apply for a loan.
These are very time sensitive customer flows.
So the high performance API layer on the top ensures that the right
level of information is abstracted.
They buy servicing the clients in a timely manner.
The next slides talks about how do we do model deployment strategies
in a containerized manner?
Containerization has been a game changer in this space, right?
The Kubernetes size separate clusters are there for training and inference.
GPU nodes for deep learning and auto scale based on demand compliance,
critical models, you can in fact scale horizontally and vertical by dedicating by
having dedicated nodes or isolated nodes.
Financial institutions also add domain specific optimization,
hot standby replicas for zero time geo distributor deployments.
So that's.
That's super, super critical in terms of having real, like your
well-defined deployment strategies.
On the next slide, it's on about real-time monitoring and alerting systems.
This is where all of the key technical metrics would be like, what is
the latency, what is the triple, what is your resource utilization?
Is it at the max resource utilization main?
Main resource utilization?
So that you can dynamically at runtime either increase the capacity or
reduce the capacities, and so you can also actually track the error rates.
The key other thing is the model performance, right?
Predicting what is the accuracy of it?
What are the drift in terms of distribution?
How is that being used?
And of course.
Pertaining to that would be the business KPIs in terms of what are the approval
rate, what are false positive, how is it impacting the revenue, customer
experience, and all those things.
And lastly compliance is a key aspect of it.
So ensuring that the end-to-end transaction is thought through in terms of
observability and reliability standpoint.
And what are the various automated validation frameworks you have?
You have data validation, you have model training, performance
and fairness, explaining the compliance side of the world.
Financial institutions have to ensure that the validation
processes needs to be there.
Otherwise, how do you know the robustness and the accuracy of a particular
model before you go ahead and deploy that in production and thereby.
You don't want to have such a scenario where you don't validate and deploy it
in production, and then all of a sudden it can have negative impact across the
length and breadth of the spectrum.
And then the next slide is about, I would want to speak about like the
model drift in high stakes environments.
Like what exactly is a model done?
You just cannot avoid, model drift, right?
It's unavoidable in a, especially in a high stakes environment, right?
How do you detect dread drift through statistical tests segment
based analysis and, whenever a drift occurs, how do you respond?
You respond through shadow deployment comparison.
You gradually ramp wrap the traffic, ramp down the traffic, and then that
is where when in doubt you have a back fall back approach of human in the loop
approval and then track the audit trail to make sure that the performances
through transitions is seamless.
And then I would also want to speak about implementing AB
testing for financial models.
That's, that is where e testing is another area which is super critical.
Start with clear hypothesis type to business metric test designs.
Ensure statistical rigor, ensure carefully and control for adverse effects and
practice analyzed at the segment level.
And rollout is progressive with continuous monitoring?
Yes.
Innovation is the key, but not at the cost of, not protecting the customers, right?
Safe AB testing is super important.
So to wrap up, what are my key takeaways, right?
Design for scale from day one, integrate compliance throughout the pipeline,
automate everything, but ensure the fallback mechanisms are there, where
human in the loop intervention is there.
And then investing in robust monitoring and response systems.
That's pretty much it, and I hope you, you were able to gather
some insights on my presentation.
Thank you for giving me this opportunity.
I.