Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
Welcome to my session.
My name is Samuel Bafi.
I'm an in pretty simple solutions architect with AWS.
Today I'm gonna be presenting about Amazon, Aurora DS L, which is a new
relation database offering on AWS.
So let's start talking a lot, a little bit about some of the current
relation database challenges, right?
We can give an example about running a Postgres database.
In whatever, service or platform.
It could be on your own EC2, it could be on your on-premise, or potentially
it could be on RDS as a service as well.
Some of the challenges that we have is scalability, right?
Traditional databases have the capacity limitations that you would
have a instance, and that instance, that server is your limitation.
So customers would be constrained by these capacity limits of
this traditional database.
And it becomes very hard to rightsize your application for this specific server that
you might need to configure when you're provisioning this specific instance.
S also availability because it's a challenge because now if you have only
one server and you know that server goes down, we have lower resiliency
that could potentially lead into unplanned down times, and it could
impact your database availability.
Of course, there are ways that you can have, read only replicas and
you can potentially help alleviate some of those concerns, but those
are not very easy to manage.
And there are a lot of pros and cons about ab, about that, right?
So those are the functional challenges.
When you look at the operational challenges, one of the things
that is very common, and I keep hearing from many customers, is.
Infrastructure management, like patching, upgrades requiring a lot of engineering
time to, prepare potentially the database, prepare the server, test those, and
that is a lot of engineering effort that goes just to keep your database
up and running, also is the complexity.
Right now you have a lot of infrastructure.
You need to make sure you're installing the operating system, you're installing
the database, like the Postgres, your parts, that you're doing a lot of fine
tuning configuration, which that only not, that not only requires engineering
effort, but also require expertise.
That becomes very challenge, right?
So with some, if not all of these challenges in mind.
AWS have announced in December of 2024, the reinvent Amazon, Aurora
d sql, which is a cloud native serverless distributed SQL database
with virtually unlimited scalability and the highest availability of AWS.
We are on this talk, we're gonna talk about how that was possible.
All the behind the scenes important architecture decisions that were made
in order to make these available.
So you have virtually unlimited scaling.
That is one of the core concepts of our RDS QL because you have the compute
and multiple steps of the compute that are independently managed and scaled.
You can have rights and reads also being scale separately and, both will
be up and down as you need it, right?
That allows you to have a business continuity because
you can now have a active.
Multi-region distributed relation database on databases that is
completely managed for you.
And you're gonna talk, spend a lot of time on the talk today
explaining how that actually works.
How does it mean?
Because we know with traditional relation databases, actually all the
relation databases there is a lot of locking issues and how to make sure,
the consistency across different replicas in different regions.
How does that work?
Okay.
Another aspect that has been designed from the beginning is serverless.
There is no server to provision.
There is no server to patch.
There is no software niching style, or maintain or operate.
It's completely serverless.
Serverless native serverless.
And fast and easy is the idea of RDCO is very inspired on other, their offerings
on the AWS ecosystem for serverless.
So I'll give an example of create a diamond db table.
Just create a D DB table and you can start querying and actually insert data on that.
The same idea is trying to be replicated here on our RD sql which,
it should be very quick if you have familiarity Postgres because.
Our Rd CO is a Postgres compliant database where you can run some
of the capabilities of Postgres.
If you are interested of some of the capabilities and limitations,
please check our documentation, but let's dive into the first.
Way you can run our RD sql.
So the first one is you can run our RD SQL on a single region cluster if you
do not have the necessity to running active multi region applications,
because those are, those can be expensive and not every single application
requires you to have a active solution.
So if you are okay, have as you've been running.
Potentially other traditional databases on a single region.
You can create the s QL on a single region.
By default, of course, because this is a managed offering on AWS, when
you create a single region cluster, you operate it active across three
availability zones always, right?
So you can see you have your VPC with your application, you're gonna receive
an endpoint for that specific database.
You're gonna use that endpoint both for reads and writes the compute.
The transaction logs, which you're gonna talk about in the future of
this presentation, the storage are actually replicated independently
across three availability zones.
This provides a 99.99% of availability and all the transactions are
fast and local, and also maintain acid properties of your database.
The transaction commits goes across availability zones, ensure data
transactions are durable, isolated anatomic, of course trying to maintain
actually maintaining the asset property.
So this is the single, region cluster that I create, you'll have an option
if you want a single region cluster or if you want a multi region cluster if
you're going to the multi region cluster.
Now Aurora, the SQL delivers five nines of availability across multiple regions.
And the way this works, it's very unique and very interesting.
So multi to region clusters provide two regional endpoints, and on this scenario
we are talking about a linked region.
The way it works is you go on your AWS console or use the CLI or
the API, you can create an neuro DSL and you can say, I want a
secondary region as a linked region.
So in this example, you can see that we have three regions
here into this architecture.
We have, region one, region two, where each region it's
gonna have its own unique.
Endpoint.
So we are gonna talk about how reads and writes work, but the good thing
about having a endpoint region is all the reads that are gonna be done in that
specific region, using that specific endpoint are always gonna be locally.
You don't need to go across region, and that's one of
the main benefits of the sql.
But also rights are gonna be synchronous getting replicated across regions.
At the time of commit, we are gonna explain what that means.
In a moment, the regions are equal peers.
There is no leader or master node in this situation.
And because you have the synchronous replication between regions,
you always have a RPO of zero.
Which is really important for mission critical applications.
Now you see here that we have also a witness region.
So the witness region is just replicating the journal.
So we will call that journal, but that is where the transaction
logs of your database are.
And in case of a failure, we're gonna explain in the end of the
presentation, in the case of failure.
Of that, of one specific region.
You also are gonna, you always are gonna have a quorum because
this third region, which is the witness region, but on the witness
region, you do not have an endpoint.
It's just a witness region that is there to actually replicate it at
the transaction lock for the quorum.
So you have always three up and Right.
So let's talk a little bit about the components of the sql.
So if we think about the components of the C COEs, you have the front end.
So the front end you can think about, the endpoint on each specific region
that you're gonna talk to, you know of.
Of course, that front end is being.
Being served to you across multiple load balancers that are replicated
across multiple availability zones.
But the interesting part here comes with query processor.
So the query processors are responsible for executing the customer SQL,
returning data in response to reads, buffering data in response to rights
and running the transaction protocol.
So that's where Postgres will be running, then we have ad adjudicators.
Adjudicators are responsible for deciding whether a transaction can be
commit while following isolation rules.
We're gonna explain how isolation rules work in a moment for working with the
journal in order for the transaction to actually be committed into the storage.
Everything you see here from the edge educator to the crossbar are only gonna be
necessary if you're doing a right, right?
If you're doing a transaction that requires a right.
The journal are an order data stream that makes transactions durable and replicated
data between regions and availabilities.
The crossbar is just a way that it can replicate it.
Your data into the storage and of course the storage.
Where is the data is gonna be completely replicated across
different storage partitions.
And, DSL takes a very interesting approach of, replicating the different
pieces of data across different storage.
So we have the replication and the performance capability.
Benefit given to you by default.
You don't need to even, you don't see any of these components here.
We are just talking about these components, so we have
an idea how it actually works.
So let's talk a little bit about how different transactions, like a re
transaction, a right transaction operate, so you understand how these components.
Which potentially by now are still a little bit confusing, but how
these components will be put in place and how do they work together.
So if we look here, let's just try to illustrate how all works together.
So we reduce the complexity.
We'll follow a transaction from ra.
We are going to start with a RET transaction and a base select statement.
Okay, let's imagine that a user is on a specific region US is one, and that user
is looking to order pizza from a local restaurant in let's say Virginia, right?
So what happens there is the user decides what food he wants from the
restaurant based on a specific rating.
So in this case, let's select all the restaurants where
ratings equals to four, right?
What happens behind the scenes at this?
From this point because you have select, you've done the a select
statement, which is a read statement.
How does this SQL manage that statement?
So let's look into that.
So you have the client will connect to the front end.
That means a specific regional endpoint.
This could be a multi-regional linked cluster.
It could be a single reach for the examples you're gonna provide today,
let's say they are all multi reach.
So when you do this specific request, that request will go through a load
balance from the front end, and then we will create a query processor.
It's where your, transaction will actually happen.
The query processor will actually use a start time, and this is one
of the very unique benefits and capabilities of the SQL and AWS is.
This time that is being retrieved from a local clock is using what we call
the Amazon Time Sink Service, which AWS uses highly accurate global standard
time by leveraging satellite GPS signals alongside with atomic clocks references,
which is crucial to ensuring that.
Time in one region and time in another region are actually are actually aligned
because, speed of light and how clocks are aligned can be problematically.
If you're not using atomic clock and you have, satellite communications
to pinpoint that, right?
So when a query gets processor by the query processor, we have time at the
start, which is a local clock, right?
So you receive that data.
Then the query processor, what it does is.
In this case, because it's a read because it's a read statement, read
transaction, the query processor will look for the shard map of the storage,
where the data is being stored on the storage layer, and you go to the read
path, you go directly to the storage.
It doesn't need to go through any adjudicator, any journal.
You go to the storage and then you return, if you look here now you
return the data back to the query.
Processor and the query processor will return the data to the front
end, return the data to the client.
So now these results, of course, will be merged here when
it'll be sent to the customer.
There you go.
And you see the specific the specific pizza place that you have selected.
Let's look at more complex query.
Where maybe the user does not only want to run a simple select
transaction, but also wants to do some interactive transaction potentially,
inserting data into the database.
So the query, of course, you're gonna create a transaction, this case select
in a restaurants with rate four.
I actually want to see a specific, a restaurant id, and I want
to see specific item, right?
You're selecting the item, then you're saying, I want to order this
item, which is a pizza and then, put this pizza into my order table.
So you selected the restaurant, you chose the item you want,
and you placed the order.
Order.
Let's look at how these actually worked behind the scenes with this sql.
So again, we, what we're gonna do here is we are gonna have a transaction t
time using Amazon time sync, right?
What is gonna happen here is the query processor.
We get a snapshot of the data from that T type that they
start type the T star, right?
That snapshot is gonna be load into the query processor and then every
single read and every single write will only be done within that is specific.
Query processor is not touching the storage yet.
He's using optimistic locking, which allows, for concurrence concurrent.
Rights should be done at the same time if they're not actually
using the same roles, right?
So in this S ql, the care processor acts as holding tank for all these
statements waiting for a commit statement before it sends the full transaction
to adjudicators to be checked.
We're gonna talk a little bit about what Edge Educator does in a moment, right?
Think about the query processes being displaced here now.
The cool thing about how this works right is once you have a start of
the transaction, you have that tee time you have, the cross region
transactions proceed very similar to a single region transaction.
The red path is barely changed, so you can just grab the data from the
storage nodes that have the charts.
Now when you do the select in insert update, no cross region
interactions are required here.
Because the optimistic locking that the CCO implements.
So the way this works is the latency is only incurred at the commit time.
So on the query processors you're gonna go through, you're gonna do
select, you're gonna do insert, and you're gonna do update.
And once you finally have your snapshot with all the commitments and sorry,
with all the data that needs to be commit, it sends to the adjudicator.
And we're gonna talk about the adjudicator in a moment, but it sent
to adjudicator and the goal of the adjudicator is to go across both regions
that you have and make sure that after your t start, you know that no is,
no other data has been transacted.
Into your storage and your journal that you conflict with your specific request.
If there is only one of them is gonna be committed and the other one's
gonna be aborted and they need to ry, but if not, you know the commit
is when you incur the latency.
So you're not gonna incurring the latency into every single select insert an update.
That is not how it works.
You only get the latency when you're doing the commit cross reach.
So this means that reads, writes, and updates are just as fast as they
would be in a single region database.
Only the commit part is where you incur the cross region latency and the cross
region latency could go between 15 to a hundred plus milliseconds, depending
how far each region that you have linked together are from each other.
If they're close to each other in, into the us, they're gonna
be a little bit faster between 20 to 30 to 40 milliseconds.
If they're far apart, it could be hundreds of plus milliseconds
at the commit time because, speed of flight is something that.
We cannot expedite, at least for the time being.
So let's talk a little bit about the query processor, because there
is a lot of innovation that has been created behind the scenes.
So as you've seen, the query processor is doing a lot of work is the HA Heart,
or where the SQL Architecture runs.
It runs inside what we call a fire crack virtual machine, micro virtual machine.
Which is where the query processor host within a server, in this
case, a bare metal instance, right?
A micro firecracker.
Micro VM was created and built for Lambda in 2018, and this is the
same technology that the SQL Query processors is using behind the scenes.
It's an open source micro VM that AWS has open source.
And, we've used to put a secure box around the Postgres engine.
So Postgres will be running on top of this query processor, which is a micro view.
So the cool thing about this is as your database grows in a scale and a
demand, this can literally scales to little tens or one query processor at
any given time or zero if you don't have any requests to tens of millions
of query processor is being created.
Each query is gonna create a query processor, like each transaction, sorry,
is gonna create a query processor.
Another good thing is the support of I snapshot isolation.
So in this SQL, we support an isolation level call snapshot isolation.
So what does act this actually mean?
It means that each transaction operated operates on a consistent
snap snapshot of the database as it exists at the start of the transaction.
So when as you do the right from the storage, and you see here on the
right of the screen, the when there is snapshot is being created into the query
processor, that is a shot isolation who only exists on the query processor.
If you have any rights and updates and inserts, then you
will try to do the commit.
If it's multi-region use, use the adjudicator to make sure there is no
there is no conflict across the commits across different regions, right?
So the transaction begins, proceeds through the SQ execution phase where
the read see the consistent snapshot being put back into the micro view.
So when the right operation occurs, like an insert or update, they
are not immediately applied to the storage, and that is a very
important thing to think, right?
Very different than other databases and how databases are architected.
If you would do a read or an update or an insert, it would be
automatically done in storage.
At this time, because we are using Optimistic Locking, what it does is uses
that snapshot that it got from, your read pad, and every time it has a right, like
an update on insert, you run that locally.
Instead of we spo, of course, we spo these rights locally and you create a
private workspace for this transaction.
What this approaches allows you to do is read your rights so you
don't need to subsequent read.
Within the same transaction that can see pending changes on the storage that
increases performance and the scalability, which is one of the very incredible
things that the CQ allows you to do.
But as you are doing these rights locally on your a credit processors,
there is a challenge, right?
What happens if there is another transaction that is
trying to write at the same.
A role that your existing transaction has.
So let's say you have transaction and transaction B, you need to have
a capability on your database engine that can look across these specific
transactions and decide is there a conflict or is there not a conflict?
And that's what the adjudicator allows you to do.
The job of the dedicated should detect and resolve conflict between transactions
and ensure their rights are consistent.
Because you have relation database, they need to be asset, and by having
that, they need to be consistent, right?
So when you look at transaction a, for example, when you create
a payload, you have a T start.
And remember, the T start is the time when you are actually receiving
the reads from the storage, right?
Then you can do a write of sets, post images.
That is the payload.
You send the payload to the adjudicator, a payload will contain the right
sets, which are the items that you modified copies of the table, roads,
applying effects of a transaction.
The payload also contain, contains the transaction start time, which is the
T start, which is crucial element in committing or aborting the transaction.
So let's look at how this works.
So coordinate once only at commit time.
So you have your query processor once it's ready to, once the query processors
have done all the transactions that has been given, what the query processor
does is sends you the adjudicator and pretty much says to the adjudicator, dear
adjudicator, here are the keys I intend to write, and here are my transaction.
Start time if no other transactions have been reading these key since.
They start time that I've done my read.
Pick a time for a commit and write these changes to the journal.
Your friend query processors what the adjudicator will do.
You'll never allow another conflict transaction to pick
a lower T commit, right?
So if behind the scenes there was another transaction that comes.
Few seconds after this and says, oh, I have actually a transaction
that is started, after the T start, but before the T commit.
The adjudicator will be like, you are not gonna be able to do that.
It's gonna abort your transaction and your application need to retry the transaction.
That is one of the things that you need to be aware of as building applications
with this SQL that you need to retry.
If.
That transaction gets abor by the seco because there was a conflict.
I'm gonna show you in the multi region, active scenario, that adjudicator is the
piece that grows across the region, right?
That is the piece that goes across the region.
And, it shards different keys across different regions
sorry, d those two regions.
So it needs to send the data for, a specific query across the regions to make
sure it's actually able to do that now.
So if you have two specific transactions that are changing the same role, right?
So if I have transaction one that you start at 10 0 9 33, but 10 0 9
35, if I have the two transactions, A and B, and they start roughly at
the same time, but transaction A sends a commit for the transaction is
likely before transaction B. What will happen in this case, in this case.
What would happen is the adjudicator look at, are you
trying to write at the same row?
In this case, adjudicator discovers inter in intersecting rights, right?
And so you compare the payloads from the query processor it's looking for, right?
Any rights that have a T start time and propose what needs to be in index.
In this case, as you can see here, is trying to update the same payload.
So only transaction A is gonna be approved because both can change
the same role at the same time.
So transaction A is gonna get committed where our transaction B must be
aborted, and your application must retry because then when your application
retries, you actually go to the storage layer and retrieve the new data, the
transaction they have just updated.
So you don't have any loss of data or in any inconsistency.
Now the cool thing about adjudicator is if the transactions do not intersect, in this
case you can see one of the, maybe it's too small for you to see, but transaction
A is trying to write on item 93 and transaction B is trying to write item 97.
Even though some other selects are exactly the same, both transactions are gonna be
allowed to be committed and both commits will proceed, which is really good.
Another very interesting thing about the SECO is in traditional databases, the
durability happen at the storage layer.
Transactions are only committed once they are durably written in the storage
layer that is for traditional databases.
So the storage layer is expected to be able to recover al committed
transactions from storage after any phase.
These requirements add significant complexity to the database engines,
including logging, coordinating systems, APIs, and need to keep a storage
consistent with more for recover purposes.
On the sql.
The durability is given to the journal, and of course the journal,
if you're using a multi-region, active solution is replicated across.
Multiple multiple regions.
And this is where, if you remember when I said I have the witness
region, that witness region is a replication of the journey.
Journey is just where all your transaction logs for your database are stored.
So this CCOs manages this complexity by making the journal response
responsible for the short term dur.
Both transactions are considered committed once they are reading to the journal.
Not once they are reading to the storage, the journal can escape
horizontally and transactions can be reading to any journal.
They coordinated to provide a totally order stream of committed transactions.
Remember, because now the adjudicator makes sure that the priority and the
order of the transactions are in place.
The journal now have these transaction logs that are ordered.
And the crossbar component uses the journals ordering to ensure that
the updates that are applied to the storage are in the correct sequence,
even when there are multiple journals.
So this is the cool thing, right?
Because now you have this journal.
That, can you scale horizontally and any transaction can be written to any journal.
But because the ordering that adjudicator allows you to do is the crossbar
components, we use that journal ordering and ensure that the storage is also
save the data into the correct order.
And this is actually pretty cool, right?
So the payload and the timestamp are sent to the journal.
Once the journal acknowledges.
The right, the updates then are sent to the storage nos.
The journal sends a success code to the query processor, which sends
back a success code to the client.
From the user perspective, the transaction is done.
And this is the interesting part because now the journal is where
the durability happens, right?
And journal is the place that says, okay, it's good to go.
You go because it's replicated across.
Multiple availability zones and multiple regions, and it has the
consistence and the order that you have.
So we talked a lot about the S QL so far, and I have a couple more
slides before I am the presentation.
Our RD SQL uses optimistic concurrence control.
So rather than putting a lock when you are starting an insert
and you need to do multiple.
Selects and multiple, insert multiple updates on the row.
That is not how it works.
Doing a pessimistic log allows you to have a concurrence play here, which makes a
crucial row for multiple transactions to occur Simultaneously, we, without leading
to inconsistency of data, you might need to retry some of those transactions if the
transactions are within the same period as I explained the capability and the.
And the row of the adjudicator.
There are no deadlocks conflicts will be detected at the transaction.
Commit by the adjudicator.
You have a multi-region concurrency protection now because every single
data that is committed is guaranteed if you're using an active multi
region architecture that they are consistent across these regions.
Now we have an efficient storage layer from synchronous replication
because of the journal capability.
This and allows you as a developer to really design new capabilities and
new design patterns for developers.
So there are multiple things here that we could talk about it, but what I'll show
you some of the extra documentations that you can go and do a little bit of reading.
There is great reinvent videos also posted on the AWS channel.
What I want to show you is behind the scenes, how is the architecture
for an active DCO database?
So when we look at an active architecture, you have three regions.
Like I said, you have one region, two regions with endpoints, region
A and region C in this diagram.
And region B is what we call the witness the witness region.
You have, one router on each of those regions that have endpoints.
Those are the endpoints.
And then, even though it says one query processor here, one box, think
about that being, the micro VM from firecracker, which is gonna run
the Postgres engine there, right?
So you can have multiple query processors each actually transaction is a query
processor invocation on its own.
Thes l internally will actually host across multiple availability zones, right?
So realistically, ER stack would be duplicated across three availability
zones in each of these three regions.
And here you have the adjudicator.
And you can see here the adjudicator is the only one that is responsible, the
only one that requires you to have a cross region communication at the right.
At the commit of a right, and you can see that adjudicator will
have specific keys that are led by specific adjudicator in this case.
Because that's how you know, you guaranteed the consistency
across the transactions from multiple places, multiple regions.
Then you have the journal and you can see the journal is actually replicated across.
All the three regions because that's where the durability happens.
So the journal is where the transaction logs are stored, and it's a key
critical key component of the sql.
Then you have the storage, of course, that will read from the journal.
You have a crossbar that is not on this diagram that you read
from the journal and actually make those changes into the storage.
So the journal only witness, which is region B, ensures that if there
is a split between those two regions.
The available side can continue to be available and consistent for client.
On the other side of the partition, for distributions that go across
large geographies, you can put the witness kind in the middle, right?
So if you have, let's say, one region, apologies, one region in the US
and you might have another region.
In Australia, you might wanna put another region maybe in the
Middle East on Europe, right?
That is a nice sort of additional property that you can actually
optimize for latency as well, right?
So let's talk about what happened if Ency in this case goes down, not,
let's knock on the wood here, but let's say if there is a catastrophe or
something happens with the region as sea goes down, which is very uncommon
but let's say that region goes down.
Any adjudicator leadership that short leave state for doing an isolation
moves out of their region regency and remain into the health region.
Because remember, you need all the keys for the adjudicators
to make you know the control.
Now the journal is splitted across region A, region B, which
is region B is the weakness.
So you still have that durable storage in the, in two regions and only the health
side of the database remains available.
Of course, the endpoint of Regency won't work, but endpoint region will continue to
work for reads rights at any given time.
So you do, you then need to move traffic to the health side, so of the partition.
So you can, just either use that specific endpoint or you can do a route 53.
Lat sensitive routing with health checks and so on.
It's how you are using endpoint on an application.
It's completed for you.
You can just have one specific endpoint for the region, or you can actually
potentially just create a DNS with Route 53 that you know, will you do
the routing with health checks for that specific regions that is working.
So this is what I wanted to present to you about this S ql.
This que is currently in public preview.
Hopefully soon it'll be in general available that you will be able
to run production workloads.
But if you have interest, please feel free to reach out on LinkedIn.
Again, my name is Samuel bfi, but also you can scan this here.
Codes.
We have, the documentation of the cco, some of the dsco or public
web pages, and also the useful blog posts that have that have we shared.
And I've mentioned there are great Ring van 2024 talks that are available
for you to consume on YouTube, and I highly recommend for you to do that.
So let us know what you thought about that.
Hopefully you'll be building new, exciting, resilient, and highly
available applications on top of the SQL.
And we are very excited to see what you can do.
So thank you so much for taking the time to watch my session.
And I'll see you soon.
Bye bye.