Conf42 Platform Engineering 2025 - Online

- premiere 5PM GMT

Engineering Athena: Building a Scalable, Resilient, and Compliant Financial Platform at J.P. Morgan

Video size:

Abstract

Discover how J.P. Morgan’s Athena platform handles massive trade volumes with ultra-low latency, fault tolerance, and strict regulatory compliance. Learn architectural patterns, distributed processing techniques, and real-world lessons for building scalable, reliable, and secure platforms.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good morning everyone. Thank you for joining me today. My name is Aroma and I'm here to talk about something that keeps many of us in FinTech awake at night burning financial systems that actually scale. Today we are diving deep into engineering Athena, a journey of building a scalable resilience and compliant financial trading platform. Now I know what you're thinking. Another scaling talk. But here's the thing. Financial systems are different beasts entirely. They're not your typical e-commerce platform where you can eventually consistent your way outta problems in finance. If your system goes down during market hours, people lose money. Real money. If your calculations are off, even by a few basis points, regulators come knocking. And if you cannot explain exactly what happened to every trade three years ago, you might be looking at serious legal trouble. So today I'm going to share the story of how we transformed Athena from a system that could barely handle thousands of trades to one processing millions daily across global markets. We'll cover the technical challenges, the architectural solutions, and perhaps most importantly, the lessons we learned the hardware. By the end of this talk, you'll understand why financial systems require fundamentally different approaches to scaling, and you'll have concrete strategies you can apply to your own systems. Let's dive in. First, let me set the stage. Athena is a financial trading platform that handles entire life cycling of trading operations. We are talking about trade execution, risk management, profit and loss calculations, real time market data processing, and massive end of the day batch reconciliations. To give you a sense of scale, we process millions of trades daily across global markets. Each trade might be worth hundreds of millions of dollars, and every single one of those trades needs to be tracked, valued, risk assessed, and reported to multiple regulatory bodies, often in real time. Now before we jump into the technical architecture, I need to make sure we are all speaking the same language. Finance has its own vocabulary and understanding these core entities is crucial to understanding our scaling challenges. Let me walk you through a simple example. Imagine Trader one wants to buy a hundred units. Of a corporate bond from trader to this happens on August 5th, 2025 at exactly 3:00 PM GMT. Both traders work for firms governed by different legal entities, and the trade will be cleared through a central clearinghouse. This single transaction creates what we call a trade the atomic unit, of what actually happened at the market. But here's where it gets interesting. This one Trade creates multiple deals. cTrader, once book shows that they now own a hundred units of this book, trader TOS book shows that they received cash for selling these a hundred units. These are mirror images of the same transaction, but they deliver in different books and serve different purposes. And that brings us two books. Think of them as accounted ledgers. A book is a collection of credits and debits, typically grouped by trader desk and product type. So we might have a corporate bonds book, a serious book, a European derivatives book, and so on. Now, why does this matter from the engineering perspective? Because each of these entities has different scaling characteristics. Trades are write ones read. Many deals are frequently updated as markets move. Books need to be aggregated and reported in real time. Understanding these patterns is crucial to building efficient systems. So A trade is a single executed transaction of a financial instrument. It represents the actual buying or selling event, capturing what happened in the market. Most of the key attributes in Athena are the instrument. For example, a bond, a CD. Which is a derivative type of instrument. The quantity or the notional, which would be correspond to the size of the trade, the price or the premium, the executed price counterparties, which are buyer and seller timestamp, trade date, the status, whether it's spending settle, corrected, defaulted, et cetera. Typically the lifecycle in Athena goes as created when the TRA transaction is executed. Updated. Rarely for twice, but usually for deals just because of how markets move to correct errors or mark status changes recorded always exist in the book forming the basis for positions risk and p and l. Let me show you how these entities flow through a typical trading ecosystem. Now, this isn't just, this is the actual architecture that every major investment bank uses with variations. First, we have the front office. This is where trade are born. Traders, salespeople, or electronic trading platforms execute transactions. These get captured in systems like Athena. Then we move to the middle office. This is where the magic happens. Every trade gets validated for legal compliance. Credit risk and regulatory requirements. Trades get assigned to the appropriate books. Price data gets enriched. Risk sensitivities get calculated. Only clean risk approved trades move forward. Next is a back office. This handles settlement. We are generating settlement instructions, processing payment flows, reconciling with counterparties, and handling corporate actions like bond maturities or credit events. Finally, we have books and piano, and this is where all those individual trades get aggregated into meaningful business information daily. Mark to market valuations, realized and unrealized gains and losses, risk measures like value at risk. Now here's the critical insight. Each of these stages has different preference requirements. Different performance requirements, different consistency requirements, and different failure modes. A traditional monolithic approach tries to solve all of this with the same architecture, and that's where things start to break down. Let me give you some concrete numbers. In our front office, we might see trade burst. Of around 10,000 transactions per second during volatile market conditions, our middle office needs to validate and enrich each of these trades within 30 seconds to meet regulatory requirements. Our back office processes settlement instructions that might not execute for days or weeks. Our books and p and l systems need to provide realtime position data to traders while simultaneously running complex risk calculations. Each of these has fundamentally different patterns, burst rights, realtime validation, long running workflows, and mixed read write workloads. Trying to hide a lot of this with a single database, a single application server, and synchronous processing. That's a recipe for disaster. And that brings us to our scaling challenge. Lemme paint you a picture of where we started pictured the classic monolithic nightmare, but with financial regulatory requirements layered on top. We had single everything, one ingestion instance, one application node, one primary database with no replicas and no sharding. The only way to scale was to throw a bigger and bigger hardware at the Prop Classic vertical scale. Every single call was synchronous and blocking. If a trader wanted to see their p and l, that request would hit the same database that was trying to process incoming market data feeds and run compliance reports. Guess what would happen during market volatility? The system would grind to hos, but here's what made it worse. We were running everything as bad jobs. Instead of processing data as it arrived, we'd accumulate changes all day and then run massive recalculations. These jobs would take eight to 12 hours to complete if anything fails at 3:00 AM our traders would start their day with stale data and we had no back pressure handling. When market data feeds would spike, which happens every single day during major economic announcements such as a government change or war, our ingestion threats would stop. This would cascade to user facing operations. Traders couldn't see their position because the system was choking on routers feeds. We were a single region deployment, which met high latency for our London and Hong Kong traders, and any regional outage was a complete platform outage, but perhaps, first of all, our compliance and audit data was just extra rows in the same operational database. This created noise enabler problems and made it incredibly easy to accidentally break compliance in variant. So this is an example of what the monolithic approach looks like. At the top feeds user legacy rading, which is when you use like an electronic platform to execute a trade or a CSV upload of any sort of trades that the trader has already executed on their road platform. Which then would have an HTTP pooling infrastructure in between, which would provide all of these services to the ingestion service, which in the traditional monolithic approach would be single instance and would have some level of sync paring. It would then go sit into a single relational database, which would be one in one region. It wouldn't actually have any replicas, and it would be mixed OLTP and OLEP. Which would then eventually lead to a risk computation unit, a reporting dashboard, and a compliance logger. And all of these would run nightly in batches via a CR job. Sounds simple. This is the appropriate architecture if you have around a thousand traits. But what happens when you try to scale it? So core traits, why this won't scale. This is basically single everything. One ingestion instance, one op node, one primary tb. No replica, no shouting. Only vertical scaling is possible. It's synchronous coupling. Every call is blocking. Computer and reporting. Sit directly on the old TP tables. Mixed workloads on one DB ingestion, rights, risk analytics, dashboards and compliance. All hammered the same tables, and the same in diocese. This is batch ology. So heavy recalculation runs which run nightly via CR job instead of streaming or incremental computation, no back pressure. So every time there's a spike in feed volume, it'll stall ingestion, threads and cascade. Three users, single region. So higher latency for global users and any regional outage would equal platform outage. Ad compliance. So audit is just a few extra rows in the same db, the schema rigidity. So no event versioning schema changes require coordinated downtime, adventure time. We are going to find out what happens when we scale this architecture. This it's a graphical representation of what the Inges service feels like, and there's millions of trays coming in every day. It probably feels like this, and this is probably what the database looks like. So every box in this particular database is actually the same deal. It probably differs from each other by a few parameters on certain days by one parameter. Here's a simple example that illustrates the core problem. Let's say we have deal one that evolves over time. On day one, we are holding instrument one worth a hundred dollars. On day two, the market moves, so we are now holding $300 worth J three $500 and so on. The nave approach is to store a complete record for each day, but think about what it means at scale. We millions of deals each potentially changing daily over years of trading history. That's terabytes of mostly redundant data, but here's the real kicker. P and l needs to be calculated not just daily, but sometimes every minute during wall tail markets. Our traders need to see real time, profit and loss as positions change. So now we need to index all of this data by date, by instrument, by book, by trader, and we need these squares to be fast, even as our data grows exponentially. And that's just the beginning of our data challenges. The redundancy of previous records can be addressed by relational databases. Based on the dates, the normalized table of changes cannot be guaranteed to function well because of what factors in the deals change. The regime can change, legal entities can change, instrument can change, and pricing can change. So how do you maintain data that changes frequently? In financial markets, data is usually dependent upon each other. For instance, if there is a credit event, it leads to the evaluation of an instrument, which leads to the change in the estate, which eventually leads to a change in p and l, and the traders only care about the change in PL on the other hand, because of the magic of the markets, which is a two layer case market. There could be a change in the value of the instrument that cascades into a change in piano. What this means is all of these calculations are basically dependent on each other and each other's state. Let me give you a concrete example. Here's deal one over four days. We are holding Instrument one worth a hundred dollars. The same instrument now is worth 150 due to market movement. The market crashed on 30 and we are now holding $50 worth of that instrument. On the fourth day, there was a corporate action that was taken and instrument one gets converted into Instrument 12, which is. A cashflow instrument worth $30. Now imagine you're a regulator asking show me the complete audit trail of deal. One, you need to reconstruct not just the value changes, but the instrument evolution, the corporate actions, the legal entity changes everything. Traditional relational databases start to buckle under this kind of complexity. Normalized tables can't handle the table. Schema changes. Full snapshots create massive storage overheads. Point to time queries require increasingly com complex joins as the data grows. And here's what really keeps database administrators up at night lock contention. And during peak trading hours, especially around market open and close, we are seeing massive concurrent updates to the same records. Thousands of traders are updating their position simultaneously. Market data feeds are updating instrument prices. Every millisecond risk systems are recalculating exposures in real time. All of this happening on the same set of database tables with the same indexes. The result, very performance that degrades over time, backup and restore operations that take longer than the business takes. A system that becomes increasingly fragile as it grows. So that was our reality. A system that worked fine for thousands of trades but completely fell apart at millions. A system where adding new features meant risking the stability of existing operations. A system where a brilliant traders was spending more time waiting for screens to refresh than actually trading. Something had to change. And that brings us to our solution. Our first breakthrough came from a simple observation. Most of the time when data changes, only a small subset of dependent calculations actually need to be updated. So here we are marking our functions with dependency decorators. When we calculate position value, we are dependent on instrument ID and market data. When we calculate base currency value, we're dependent on position value and FX rates. The magic happens when something changes. If the price of instrument one moves, our dependency graph automatically identifies every calculation that depends on instrument one and triggers only those updates. Before this change, a single price movement would trigger a full recalculation of potentially thousands of positions. Now we only recalculate what actually changed. This single architectural change reduced our end of day processing time from eight hours to five hours. But more importantly, it made real time updates feasible instead of minute long delays for our p and l updates, we were getting subsequent responsiveness. Our second major breakthrough was rethinking how we store data. That changes frequently. This is what a deal looks like, evolving over days. In the traditional approach, we are storing complete records, everything. Even when 49 out of the 50 fields haven't changed, it's massively wasteful. In our Delta approach, we store the full record once the first state, then we only store what actually changes. Obviously, you have to maintain the head state for the day you're doing the processing. Day two might just be a price change. J three might just be quantity and status change. And here's the clever part. We can reconstruct any points in time view by applying deltas sequentially. Now let's talk about geographic re distribution In finance, this isn't just about performance, it's about regulatory requirements and disaster recovery. Our primary training hubs are in New York and London, close to major exchanges. Each location has a primary database instance with three local replicas for IO optimization. But here's the tricky part. We net synchronous replication within each region for consistency, but asynchronous replication between regions to handle network partitions. Why? When markets are moving fast, traders can't wait for cross Atlantic network latency. They need local data to be consistent and fast, but we also need to ensure that if London goes dark, New York has all the data it needs to keep operating. This is a classic gap theorem decision. In finance. We choose consistency and partition tolerance, accepting higher latency for global users during network issues. This diagram basically shows you potential locations of book records all across the us so you can consider each of these dolls as a book. These are replicated across US geography because data loss events are geographically connected. Power outages, natural disasters, et cetera, are usually limited by geography. And this is a representation of the TB hydra. So as you can see, each of the deals is replicated multiple times across different database instances. This is possibly what it looks like across the globe. Usually when trading, most deals are limited to certain regions because of legal entities. So you can see that. Data is replicated across the us, across Europe, across Australia, and then there is a specific DB that maintains all of the trades that are cross-regional are cross legal entity based. What this helps at is computing at scale. This helps us paralyzed as much as possible because of the nature of markets, trading is usually regionally legal entity based. Athena maintains a different instance for across legal entity trades. Often using clever techniques to add a third leg to the trade to be able to compute efficiently making for some very clever corner cases. For instance, if there's a trade that is cross legal entity, Athena will create a trade in, say, a legal entity associated with legal entity North America and one associated with a legal entity in Europe. A deal. Is associated with a legal entity in North America and a deal that is associated with a legal entity in Europe, and a third deal that is associated with coast legal entities between North America and Europe, which is stored in a completely separate entity. So that. The data that exists only in the North American markets and the data that exists only in the European markets can be computed and parallel independently of each other. Our end of day calculations used to be a single massive monolithic process. If anything failed, we would have to start over from the beginning. We completely redesigned this. As a dependency aware parallel pipeline, look at this architecture. We break down processing into independent units, instrument a reference data processing runs on one compute node, for instance, instrument B on another deal processing for different books runs in. But here's the key insight. We maintain dependencies. The p and l completion job cannot start unless all of the book processing jobs are complete. But if Book one processing fails and book two succeeds, we can restart jobs for book one using the cash results from the reference data jobs. This fall tolerance approach transformed our end of day processing instead of a hr, all or nothing runs, we have resilient pipelines that can recover from individual component failures. The reason we have separated the instrument reference data processing from the deal state book processing entirely is that instrument A could exist in a deal that exists in both book one and book two, which means that the instrument, a reference data processing needs to be complete before Book one processing or book two processing can start. Let me show you one of our most innovative solutions, compute building blocks or CBDs. The traditional approach pulls all the data to a central server and does computation there. This creates memory pressure, CPU contention and IU bottlenecks. Our CBB approach creates independent compute notes and loads data into its own memory space. It executes computation independently and returns results for aggregation. Think of it like AWS Lambda, but for financial calculations, its CBB gets its own play fields to load data, run computations, and return results without affecting the main server. This approach gives us horizontal scaling for compute intensive operations while keeping the main system responsive for user interactions. So for instance, if you consider each of these as books all across the globe, and for this particular example, if we are processing the book service processing for the end of day calculations in Europe, each of these books. Usually a book runs on multiple CBB notes, but for this instance, each of these books could be sent to a particular CBB as a reference, and the CBB then communicates with the Hydro DB to load the book details and all of the deals in one particular book, which are unrelated to all of the deals in all of the other books. They could be related, but for the purpose of these p and l calculations, which happen on book to book basis, they can be considered independent, which means you have so much of parallel processing and you will basically save a lot of time. This is horizontal sharding because we are keeping data independent of each other. So for every computation in order to optimize for in-memory computations and more parallelism, every operation is chunked into independent computable chunks, which form a lambda like function, which executes upon its own CVB nodes and the results are collated. For instance, instead of pulling graphs from instrument one to compute metrics on the server itself, another compute block is shown a functional hydroco to load data. So it loads in memory without affecting the compute of the main server. Basically, each doll gets its own play fields to beyond graves and graves. Before we move on, let me touch on something critical. Compliance architecture. In finance compliance isn't an afterthought. It's a core architectural requirement. We learned this, the Harvey. Our solution was to separate operational data from compliance data physically. Our OLTP systems handle real-time trading with performance optimization. Our OLAP systems handle regulatory reporting with long-term retention and query optimization. This separation prevents compliance queries from impacting trading operations while ensuring we can reconstruct any historical state for audit purposes. We also implemented automated compliance feed processing. Every trade, every price change, every position update gets streamed to immutable audit logs. Regulators can access this data without touching our operational systems. So this is the last step, but it's possibly the most important step. In certain cases, it comes. After the deal State Book one processing or after any deal state book processing. A story I have about how important this is in financial, in the financial domain is that there was a time when someone was running a Python machine learning base analytics script on a bunch of deals, which ended up choking a queue and the deals that were supposed to be processed. Were left in the state of this deal has not been processed, though it had been processed by the deal service, but not by the analytical script because of some bucks in the script and it choked the queue and it basically led to us paying a hundred thousand dollars in fines to the regulators because we could not make the regulatory timeframe. So back to earth. Now let's address the elephant in the room. Why Python? For financial systems? I get this question a lot. Python is in trust. Python has the GIL. Python is dynam. Clear type, all true. But finance, these apparent weaknesses become strengths. Financial instruments are incredibly diverse. Bonds derivative swaps, credit default swaps each with different attributes, different valuation models, and different risk characteristics. Python's duct typing and dynamic typing lets us handle this diversity without Bridget Schemas. We can represent a bond and a seed with the same processing logic. Python's dictionary structures work beautiful beautifully with our indexing requirements and we need raw performance. When we need raw performance, we drop down to c plus compute building blocks. The result is a system that's both flexible for rapid business logic changes and performance for compute intensive operations. Let me wrap up with the key principles we learned building Athena Scale first. Embrace financial domain complexity. Don't try to force financial data into simple relational models. The domain is inherently complex and your architecture needs to reflect that reality. Build flexibility into your data models from day one and plan for frequent schema evolution. Second, optimize for change, not just scale. In most tech companies data. Data grows, but doesn't fundamentally change In finance. Data mutates constantly in unpredictable ways. Dependency driven updates be it full recalculations delta storage beats full snapshots. Event driven architecture beats batch processing. Third separate concerns physically don't try to optimize operational data and analytical data with the same approach. They have fundamentally different access patterns and requirements. Fourth, design for failure. When finance failure isn't just an inconvenience, it's a regulatory event. As I mentioned before, built in independent processing units, cash in intermediate results, and have graceful degradation strategies. Let me share some of the pitfalls we encountered so you don't have to repeat our mistakes. Premature optimization. We spent months optimizing the wrong bottlenecks because we didn't properly profile our system under realistic load. Start with clear business requirements profile before optimizing and measure the impact of every change ignoring data lineage. Financial regulators don't just want your current data. They want to understand how you got there. Built audit trails and data lineage tracking into architecture from the beginning to bolt on compliance as an afterthought. Third pitfall, understanding operational complexity, multi ian deployments, monitoring distributed systems, handling partial failures. The operational overhead is significant. Plan for it and budget for it. Looking ahead, the financial technology landscape is evolving rapidly. Stream processing technologies like Kafka are becoming the standard for real-time analytics. Even sourcing is gaining adoption for complete audit trials. On the regulatory side, we are seeing increasing demands for real-time reporting. Cloud adoption, ISS accelerating. Even in highly regulated industries, data residency and sovereignty requirements are becoming more complex. Privacy regulations like GDP are impacting how we handle financial data. Building scalable financial systems is one of the most challenging problems in software engineer. You're dealing with regulatory complexity, massive scale, real time requirements, and zero tolerance for errors, but it's also incredibly reward. When your system processes millions of trades flawlessly, and when traders can react to market S in milliseconds, when regulators access audit trails instantly, you're enabling the global financial system to function more efficiently. The key is to respect the domain complexity while applying sound engineering principles. Don't try to solve everything at once. Start with your core Delta model. Build flexibility and observability. Observability from the beginning and scale incrementally. For instance, this is what Athena basically looks like. This is what any trade model would look like. Any trade management platform would look exactly like this. There would be the trade fees, there would be a trade ingestion layer. There would be a trade end to use service, which would eventually be compiled into a book service and which would have to go to compliance. The trade and deal service interacts heavily in this case with the distributed compute, which interacts with the database, and also outputs p and l and risk computations. The CBBs are also associated with event payments, which pop out alerts, notifications, and corporate actions, some of which eventually come and lie into. The queue of the trade and deal service, depending upon your use case. They also connect to monitoring dashboards and other regulatory reports. Sometimes the compliance services and the regulatory reports are connected based on what sort of regulatory agency you're reporting to. So this is what Athena high level scalable architecture looks like. We looked at the trade ingestion layer, we looked at queues. We looked at the trade service, the deal service on the book service, which is associated with the compliance service. We looked at ways to parallelize all of these services and to share data because most of the data, it's high scale, but it's independent of each other. As a matter of fact, it is required for the risk and p and l calculations that each book be independently weighed against the compute metrics. All of these services, they interact with the distributed compute, again, allowing for even more parallelization. Which interacts with the hydro database, which is geo distributed, which basically means that a trait that is executed in a certain geo region would only be touching a particular geo-located service instance. A geo-located compute block instance and a geo-located hydro instance. And hydro is replicated in order to improve IO bottlenecks, which eventually leads. Regulatory reporting and other monitoring services. If you compare these two models, this is the monolithic approach and this is the highly scaled approach. It looks almost the same, except this is after the SGDP polling. This is not an asynchronous service, so anything that would be blocked in between would be a blocker. For any sort of other request that the service would want to get at that time. On the other hand, because of our parallelization approaches, if a node fails, there are other nodes that can take over because of the replication of data, a lot of data is independent of each other, geo distributed, which means any failures in London would not affect anything in New York. On the compute node would not mean that the service is down for every sort of request, which is not true in this case. Thank you so much.
...

Aroma Rodrigues

Software Engineer @ Microsoft

Aroma Rodrigues's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content