Securing High-Frequency Trading at Scale: Event-Driven, Cloud-Native Order Management Systems

Video size:

Abstract

Discover how to architect secure, event-driven trading systems that scale to millions of messages per second. Learn proven DevSecOps patterns, cloud-native strategies, and real-world case studies powering the future of high-frequency trading.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. My name is Jonathan Chela. And in this talk I'm gonna talk about high frequency trading, where speed is everything, we are tackling a big problem, how do you keep things secure when you have to be faster than lightning? And if we add security tools, they basically, slow down the trading. We need a new way to be, both fast and at the same time. Safe. Think of three main goals in trading. One is, speed scale where you have to, process millions of orders per second. And at the same time, you cannot compromise, security usually, when you focus on one thing it could turn into sacrificing the other thing. The main issue is that, the latency tax, every security check adds a delay. If a traditional firewall adds, let's say, 50 microseconds, we are gonna lose some money, right? We have to design systems that handle huge volume without getting slow. So that's the objective. So here comes the, even driven and cloud native, architecture where you can design the systems, without compromising the points that we just talked about. So let's talk about the latency killer. Why is the standard competitor is too slow? Because the operating systems, for example, you look at Linx or Windows gets in the way of the network card. We use a specialized tools DPDK or RDME to skip the voice entirely and talk right to the network card. So this gives us the pure speed, but it also means we bypass some of the, security measures. We must move securely to the hardware itself. We use this, the concept of Aaron. Which will give a sub microsecond latency. So for the absolute fastest part of the system, that, which is the, the order matching system we use, the messaging system called Aaron. Aaron is like extremely reliable, encrypted, instant messaging for machines. It's built for speed. And uses a method called raft consensus to ensure all our trading computers agree on the order of the trade, even if one computer breaks. So let's talk about, cloud native, even bus and orchestration for slower and high volume tasks like analyzing market data are. Settling trades. We use tools like, Kafka. This allows us, to handle huge systems of information without slowing down the trading engine. We separate the tasks one part handles the trade, which is which has to be very fast. And another handles, record keeping, for example, which could be. High volume, it not necessarily to be like, ultra fast, it still should handle huge volume. So the latency tax, let's look at some of these security patterns and how each of them, will add security latency in terms of like microseconds. So this chart will show how much delay, common security tools will add. A common security pattern called, a sidecar used in modern microservices adds about 150 microseconds. That's too much for high frequency trading. We must bake security directly into the application code using, in process library to keep the delay to, five microseconds or even less. So at the same time, when we talk about all these high frequency ratings and a bunch of other applications, connected to that observability is also, one of the key aspect of the system. And at the same time, we have to achieve this observability without, much penalty if you can't use the standard security tools. So how do you watch, what's happening? We use a, a tool called EBPF, which lets us hook directly into the kennel without adding a delay. It lets us observe every packet and every action of the system that it takes for, for compliance, all while maintaining, zero latency goal it's observation without the performance penalty. So that is the goal, basically. Determination itself is the, is know security. Even sourcing. So anything that is happening in the system, we, we need to capture that as a, as an immutable event. If you can't replay basically if you have to replace a scenario, something happened in the system. These immutable events are very helpful. You can basically replay the event, what happened in the past, and if you can't replay it, you can't prove it, what happened, right? That's the key. Okay. And four and six, like for example, if something happened in the system or something went wrong, you should be able to do the, post-mortem analysis. Where you need to replay, the trading sequence, what exactly happened, at a certain, time interval. And compliance is, another aspect. So nowadays, regulators are demanding, proof of best execution. And deterministic logs provide, the evidence of the, system behavior. So you know, com, let's talk about, compliance as a code and real world scale by building security and resilience into the architecture. We are calling it as a, compliance as a code. We get some amazing results. For example, if you do, if you look at some of these real world, real world examples. The London Ex Stock Exchange, can handle, almost like close to, 15 million messages per second, and they cut out the systems downtime from, 40 hours to, just about 2.5 years in the whole year. This shows that the speed and reliability, goes hand in hand. And in short, when you talk about, resilience and chaos engineering basically in a high frequency trading system a system that cannot recover in milliseconds is a system that already failed. It's you can think of it as a quote, but, so that's the core principle of the, high frequency trading system. And not only, what we talked, in this talk, there is also, a future aspect of the high frequency trading systems. There is still a lot need to be improved. The next frontier is moving security logic directly into the hardware and, we need to achieve, zero CPU overhead. We need to implement the firewalls, risk checks and encryption, directly into the, FPGA gates and bump it, bump in the wire. Bad packets are dropped at the, core level ensuring that they never consume the whole CPU. And of course we need to implement all these things with the aspect of, public clouds and not only private clouds, public clouds as well. That's the end of this talk. Thank you for attending.

Slides

Download slides (PDF)

See all 26 talks at this event!

Conf42 DevSecOps 2025 - Online

December 04 2025 - premiere 5PM GMT

Securing High-Frequency Trading at Scale: Event-Driven, Cloud-Native Order Management Systems

Video size:

Abstract

Summary

Transcript

Slides

Janardhan Chejarla

Senior Technical Consultant @ Calypso Technology

Join the community!

Featured event

2026

2025

Info

Conf42 DevSecOps 2025 - Online

December 04 2025 - premiere 5PM GMT

Securing High-Frequency Trading at Scale: Event-Driven, Cloud-Native Order Management Systems

Video size:

Abstract

Summary

Transcript

Slides

Janardhan Chejarla

Senior Technical Consultant @ Calypso Technology

Join the community!