Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
This is Tarun, and I'll be sharing how we've modernized a legacy fixed
income index system using cloud native platform engineering.
This was a major transformation effort that addressed both technical
and organizational challenges.
I'll walk you through the challenges we faced, the modernization
strategy, the technical architecture,
the results we achieved and finally the lessons we learned.
By the end, I hope you will see how Cloud native engineering can unlock
real performance and agility in a heavily regulated financial environment.
Here is the roadmap for the presentation.
We will start with the legacy system challenges that forced
us to rethink our approach.
Then we will go into the modernization strategy we adopted, followed by
the technical architecture we built.
I'll then share the implementation journey and results and wrap up with
lessons learned and future directions.
Let's start with the problems we were trying to solve.
Our legacy fixed income index system had grown
increasingly fragile.
For example, monthly index rebalancing required longer maintenance windows.
During that time, clients could not access critical data,
which directly impacted SLAs.
At the same time, data volumes grew five times in just three years.
The system simply couldn't process that influx efficiently.
Index calculations often took 30 minutes or more
during peak times, which is unacceptable when clients
needed near real time results.
Finally, scalability was a huge limitation.
The monolithic system couldn't handle sudden spikes in demand, such as quarter-end
reporting or market volatility events.
This wasn't just an IT problem.
It directly affected client satisfaction, SLAs, and even revenue opportunities.
Clearly, something had to change.
We needed a strategy that didn't just patch problems, but Rema.., but reimagined
the whole foundation to solve this.
We adopted a platform engineering approach that meant
building a foundation that supported both stability and innovation.
We could not afford outages, so we moved step by step with zero downtime goals.
Infrastructure as code was ensured, has ensured every environment
is consistent, automated and auditable.
We applied domain-driven design, for example
pricing logic became its own service decoupled from reporting.
CICD pipelines: we introduced automated pipelines, so code changes went from
weeks of manual promotion to hours.
For instance, we used Docker to containerize every component, making
it portable and easier to scale.
We redesigned workflows to be asynchronous, which prevented bottleneck
and security Compliance was integrated directly into the deployment process,
so previously we had a giant monolith one code base, manually
deployed scaling by buying bigger boxes and running nightly batches.
Now we run domain based, domain-bounded microservices, deployed with
GitHOpss, event-driven architecture, multi-region redundancy, and
auto scaling in Kubernetes.
Think of it like moving from a single massive mainframe to an ecosystem of
small, specialized teams of workers who can scale up or down instantly.
This was not just a technology upgrade.
It fundamentally changed how we could deliver features
and respond to client needs.
Let's break down what those components look like.
Kubernetes and AirFlow implementation.
Along alongside Kubernetes, we introduced Apache Airflow
as our workflow orchestrator.
While Kubernetes handle infrastructure, scaling airflow gave us a
framework to manage the sequence of steps in index calculation.
We built containerized adapters to connect to multiple data providers like
Bloomberg, because they were stateless, we could scale them independently.
Airflow triggered tasks to pull issuer and bond metadata from multiple providers.
Next tasks collected real-time and end-of-day bond prices, ensuring data
quality checks before proceeding.
Calculation engine pods spun up based on workload.
For example, at month-end, the system want to scale to
30 pods for calculations.
Then scale down overnight. For persistence,
we chose time series databases that were optimized for financial
tick data and are critical for both accuracy and performance.
The API layer included both rest and graphQL supporting both diverse clients.
Compliance checks like masking restricted securities happen directly at the API.
Service mesh gave us fine-graind traffic control, encryption, and observability.
If one service failed, we could trace exactly where and why. Airlflow orchestrated
the workflow end to end, starting from ingestion, ingesting different
data and prices, and validate quality.
The followed by now triggering calculations on Kubernetes
pods, as a third step generating reports and validations.
Lastly, publishing to APIs and downstream customers - the entire work
orchestration was done by airflow.
Airflow gave us a DAG view.
Every step was visible, retriable and auditable.
Using these Directed Acyclic Graphs -
Let's say if pricing data failed, calculations paused
automatically until corrected.
Think of Kubernetes as the engine and airflow as the conductor, making sure
the orchestra plays in the right order.
Together these components formed a resilient and flexible backbone for
our system. Critical system components:
Kafka decoupled processing. Instead of requests piling up in a queue, Kafka
smoothened our spikes and preserved state
if one node failed. During one stress test, Kafka handled three times expected
peak loads without dropping a message.
With Redis, we cut database reads by 90%.
That meant sub millisecond access to frequently used data
with cross region replication for resilience.
Kubernetes features like stateful sets for order deployment, network
policies for regulatory boundaries and horizontal pod auto scaling for
elasticity have been implemented here. For example, during volatile market days,
the system automatically scaled to handle 20 times more concurrent
calculations without service degradation.
Let's look at
the performance improvements, we saw an 88% reduction in calculation time.
What used to take 30 minutes now takes under five.
Scheduled downtime was eliminated.
Clients could access the system continuously even during rebalancing.
And we could now support 20 times more concurrent calculations directly
improving client SLAs added with airflow orchestration, which not only did
calculations run faster, but the entire workflow pipeline from ingestion to publishing
became more reliable and auditable.
These weren't just technical wins,
they directly improved the client
SLAs even reduced operational risk and gave product teams the ability to launch
indexes that weren't even possible before.
The migration approach was executed in four phases.
As a phase one, we set up cloud infrastructure with compliance baked in.
Infrastructure as a code CICD and monitoring. As phase two,
we migrated historical data.
We used dual right patterns and built a data validation framework.
Every bond reference data and prices were checked across all and new systems
before going live. As part of phase three,
we decomposed the monolith extracting services gradually
using the strangler pattern. We peeled services off one by one, running both
in parallel until they are stable.
When we decomposed services, airflow became the glue
between old and new systems.
For example, the reference data was still in Legacy database while
calculations moved to Kubernetes.
Airflow orchestrated across both worlds,
during this transition, which actually let us migrate step by
step without breaking workflows.
As the last phase, we cut over with blue green deployments,
ran systems in parallel and only decommissioned legacy
after full confidence at every stage, we made sure business
operations were not stopped.
The key was that clients never saw disruption. For them
the system just got faster
and more reliable.
Technology
Transformation was only half the story here.
We also had to transform the organization.
We created a dedicated platform team that functioned as an
internal service provider.
We invested in developer experience building self-service portals that
standardized container environments.
Instead of waiting weeks for servers, developers had a self-service portal. Needed a
database, just one click needed a dev environment done in minutes. We ran
internal tech talks, paired developers with platform engineers, and created runbooks
so teams were not dependent on specialists. Without these cultural and
organization organizational changes
the technical success would not have been sustainable.
Key takeaways, performance transformation was the major key here.
Overall, we are eight times faster.
We are doing eight, eight times faster calculations with no downtime.
We improved clients trust and enable entirely new products.
By automating compliance in CICD, we reduced audit stress
while deploying more frequently.
We cut time to market for custom products by 70%, allowing teams to focus
on innovation instead of firefighting infrastructure. As the next steps,
we are now exploring machine learning for anomaly detection in market data
to catch issues before they impact clients.
For example, if a bond price suddenly spikes out of historical
range, machine learning can flag it before impacting calculations.
We are considering multi-cloud deployment strategies for resilience
against provider outages that is critical for regulated markets.
We are enhancing self-service capabilities so business users can
request infrastructure directly.
The goal is that a business analyst can spin up a test index environment
without calling the application team.
We are also looking at extending airflow DAGs to support streaming workflows. For
example, triggering intraday calculations when new market data crosses a threshold.
The journey is ongoing
but the foundation we have built sets up for continuous innovation.
This modernization was as much about people and process
as it was about technology.
It showed that with the right foundation, even legacy financial
systems can be reimagined.
Thank you for your time.
I look forward to collaborating on building the next wave of resilient
and scalable financial platforms.