Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
My name is Jonathan Chela.
And in this talk I'm gonna talk about high frequency trading, where speed
is everything, we are tackling a big problem, how do you keep things secure
when you have to be faster than lightning?
And if we add security tools, they basically, slow down the trading.
We need a new way to be, both fast and at the same time.
Safe.
Think of three main goals in trading.
One is, speed scale where you have to, process millions of orders per second.
And at the same time, you cannot compromise, security usually, when
you focus on one thing it could turn into sacrificing the other thing.
The main issue is that, the latency tax, every security check adds a delay.
If a traditional firewall adds, let's say, 50 microseconds, we
are gonna lose some money, right?
We have to design systems that handle huge volume without getting slow.
So that's the objective.
So here comes the, even driven and cloud native, architecture where you can
design the systems, without compromising the points that we just talked about.
So let's talk about the latency killer.
Why is the standard competitor is too slow?
Because the operating systems, for example, you look at Linx or Windows
gets in the way of the network card.
We use a specialized tools DPDK or RDME to skip the voice entirely
and talk right to the network card.
So this gives us the pure speed, but it also means we bypass
some of the, security measures.
We must move securely to the hardware itself.
We use this, the concept of Aaron.
Which will give a sub microsecond latency.
So for the absolute fastest part of the system, that, which is the,
the order matching system we use, the messaging system called Aaron.
Aaron is like extremely reliable, encrypted, instant messaging for machines.
It's built for speed.
And uses a method called raft consensus to ensure all our trading
computers agree on the order of the trade, even if one computer breaks.
So let's talk about, cloud native, even bus and orchestration for
slower and high volume tasks like analyzing market data are.
Settling trades.
We use tools like, Kafka.
This allows us, to handle huge systems of information without
slowing down the trading engine.
We separate the tasks one part handles the trade, which is which has to be very fast.
And another handles, record keeping, for example, which could be.
High volume, it not necessarily to be like, ultra fast, it
still should handle huge volume.
So the latency tax, let's look at some of these security patterns and
how each of them, will add security latency in terms of like microseconds.
So this chart will show how much delay, common security tools will add.
A common security pattern called, a sidecar used in modern microservices
adds about 150 microseconds.
That's too much for high frequency trading.
We must bake security directly into the application code using, in
process library to keep the delay to, five microseconds or even less.
So at the same time, when we talk about all these high frequency ratings
and a bunch of other applications, connected to that observability is also,
one of the key aspect of the system.
And at the same time, we have to achieve this observability
without, much penalty if you can't use the standard security tools.
So how do you watch, what's happening?
We use a, a tool called EBPF, which lets us hook directly into
the kennel without adding a delay.
It lets us observe every packet and every action of the system that it takes for,
for compliance, all while maintaining, zero latency goal it's observation
without the performance penalty.
So that is the goal, basically.
Determination itself is the, is know security.
Even sourcing.
So anything that is happening in the system, we, we need to capture
that as a, as an immutable event.
If you can't replay basically if you have to replace a scenario,
something happened in the system.
These immutable events are very helpful.
You can basically replay the event, what happened in the past, and
if you can't replay it, you can't prove it, what happened, right?
That's the key.
Okay.
And four and six, like for example, if something happened in the system
or something went wrong, you should be able to do the, post-mortem analysis.
Where you need to replay, the trading sequence, what exactly happened,
at a certain, time interval.
And compliance is, another aspect.
So nowadays, regulators are demanding, proof of best execution.
And deterministic logs provide, the evidence of the, system behavior.
So you know, com, let's talk about, compliance as a code and real world
scale by building security and resilience into the architecture.
We are calling it as a, compliance as a code.
We get some amazing results.
For example, if you do, if you look at some of these real
world, real world examples.
The London Ex Stock Exchange, can handle, almost like close to, 15 million
messages per second, and they cut out the systems downtime from, 40 hours to,
just about 2.5 years in the whole year.
This shows that the speed and reliability, goes hand in hand.
And in short, when you talk about, resilience and chaos engineering basically
in a high frequency trading system a system that cannot recover in milliseconds
is a system that already failed.
It's you can think of it as a quote, but, so that's the core principle of
the, high frequency trading system.
And not only, what we talked, in this talk, there is also, a future aspect
of the high frequency trading systems.
There is still a lot need to be improved.
The next frontier is moving security logic directly into the hardware and,
we need to achieve, zero CPU overhead.
We need to implement the firewalls, risk checks and encryption, directly into the,
FPGA gates and bump it, bump in the wire.
Bad packets are dropped at the, core level ensuring that they
never consume the whole CPU.
And of course we need to implement all these things with the aspect of,
public clouds and not only private clouds, public clouds as well.
That's the end of this talk.
Thank you for attending.