Conf42 Site Reliability Engineering (SRE) 2025 - Online

- premiere 5PM GMT

Amazon Aurora DSQL: The Future of Scalable, Distributed Relational Databases

Video size:

Abstract

Amazon Aurora DSQL is a serverless, distributed SQL database that delivers active-active high availability, automatic scaling, and zero infrastructure management. It ensures 99.999% uptime, making it ideal for high-performance, mission-critical applications

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Welcome to my session. My name is Samuel Bafi. I'm an in pretty simple solutions architect with AWS. Today I'm gonna be presenting about Amazon, Aurora DS L, which is a new relation database offering on AWS. So let's start talking a lot, a little bit about some of the current relation database challenges, right? We can give an example about running a Postgres database. In whatever, service or platform. It could be on your own EC2, it could be on your on-premise, or potentially it could be on RDS as a service as well. Some of the challenges that we have is scalability, right? Traditional databases have the capacity limitations that you would have a instance, and that instance, that server is your limitation. So customers would be constrained by these capacity limits of this traditional database. And it becomes very hard to rightsize your application for this specific server that you might need to configure when you're provisioning this specific instance. S also availability because it's a challenge because now if you have only one server and you know that server goes down, we have lower resiliency that could potentially lead into unplanned down times, and it could impact your database availability. Of course, there are ways that you can have, read only replicas and you can potentially help alleviate some of those concerns, but those are not very easy to manage. And there are a lot of pros and cons about ab, about that, right? So those are the functional challenges. When you look at the operational challenges, one of the things that is very common, and I keep hearing from many customers, is. Infrastructure management, like patching, upgrades requiring a lot of engineering time to, prepare potentially the database, prepare the server, test those, and that is a lot of engineering effort that goes just to keep your database up and running, also is the complexity. Right now you have a lot of infrastructure. You need to make sure you're installing the operating system, you're installing the database, like the Postgres, your parts, that you're doing a lot of fine tuning configuration, which that only not, that not only requires engineering effort, but also require expertise. That becomes very challenge, right? So with some, if not all of these challenges in mind. AWS have announced in December of 2024, the reinvent Amazon, Aurora d sql, which is a cloud native serverless distributed SQL database with virtually unlimited scalability and the highest availability of AWS. We are on this talk, we're gonna talk about how that was possible. All the behind the scenes important architecture decisions that were made in order to make these available. So you have virtually unlimited scaling. That is one of the core concepts of our RDS QL because you have the compute and multiple steps of the compute that are independently managed and scaled. You can have rights and reads also being scale separately and, both will be up and down as you need it, right? That allows you to have a business continuity because you can now have a active. Multi-region distributed relation database on databases that is completely managed for you. And you're gonna talk, spend a lot of time on the talk today explaining how that actually works. How does it mean? Because we know with traditional relation databases, actually all the relation databases there is a lot of locking issues and how to make sure, the consistency across different replicas in different regions. How does that work? Okay. Another aspect that has been designed from the beginning is serverless. There is no server to provision. There is no server to patch. There is no software niching style, or maintain or operate. It's completely serverless. Serverless native serverless. And fast and easy is the idea of RDCO is very inspired on other, their offerings on the AWS ecosystem for serverless. So I'll give an example of create a diamond db table. Just create a D DB table and you can start querying and actually insert data on that. The same idea is trying to be replicated here on our RD sql which, it should be very quick if you have familiarity Postgres because. Our Rd CO is a Postgres compliant database where you can run some of the capabilities of Postgres. If you are interested of some of the capabilities and limitations, please check our documentation, but let's dive into the first. Way you can run our RD sql. So the first one is you can run our RD SQL on a single region cluster if you do not have the necessity to running active multi region applications, because those are, those can be expensive and not every single application requires you to have a active solution. So if you are okay, have as you've been running. Potentially other traditional databases on a single region. You can create the s QL on a single region. By default, of course, because this is a managed offering on AWS, when you create a single region cluster, you operate it active across three availability zones always, right? So you can see you have your VPC with your application, you're gonna receive an endpoint for that specific database. You're gonna use that endpoint both for reads and writes the compute. The transaction logs, which you're gonna talk about in the future of this presentation, the storage are actually replicated independently across three availability zones. This provides a 99.99% of availability and all the transactions are fast and local, and also maintain acid properties of your database. The transaction commits goes across availability zones, ensure data transactions are durable, isolated anatomic, of course trying to maintain actually maintaining the asset property. So this is the single, region cluster that I create, you'll have an option if you want a single region cluster or if you want a multi region cluster if you're going to the multi region cluster. Now Aurora, the SQL delivers five nines of availability across multiple regions. And the way this works, it's very unique and very interesting. So multi to region clusters provide two regional endpoints, and on this scenario we are talking about a linked region. The way it works is you go on your AWS console or use the CLI or the API, you can create an neuro DSL and you can say, I want a secondary region as a linked region. So in this example, you can see that we have three regions here into this architecture. We have, region one, region two, where each region it's gonna have its own unique. Endpoint. So we are gonna talk about how reads and writes work, but the good thing about having a endpoint region is all the reads that are gonna be done in that specific region, using that specific endpoint are always gonna be locally. You don't need to go across region, and that's one of the main benefits of the sql. But also rights are gonna be synchronous getting replicated across regions. At the time of commit, we are gonna explain what that means. In a moment, the regions are equal peers. There is no leader or master node in this situation. And because you have the synchronous replication between regions, you always have a RPO of zero. Which is really important for mission critical applications. Now you see here that we have also a witness region. So the witness region is just replicating the journal. So we will call that journal, but that is where the transaction logs of your database are. And in case of a failure, we're gonna explain in the end of the presentation, in the case of failure. Of that, of one specific region. You also are gonna, you always are gonna have a quorum because this third region, which is the witness region, but on the witness region, you do not have an endpoint. It's just a witness region that is there to actually replicate it at the transaction lock for the quorum. So you have always three up and Right. So let's talk a little bit about the components of the sql. So if we think about the components of the C COEs, you have the front end. So the front end you can think about, the endpoint on each specific region that you're gonna talk to, you know of. Of course, that front end is being. Being served to you across multiple load balancers that are replicated across multiple availability zones. But the interesting part here comes with query processor. So the query processors are responsible for executing the customer SQL, returning data in response to reads, buffering data in response to rights and running the transaction protocol. So that's where Postgres will be running, then we have ad adjudicators. Adjudicators are responsible for deciding whether a transaction can be commit while following isolation rules. We're gonna explain how isolation rules work in a moment for working with the journal in order for the transaction to actually be committed into the storage. Everything you see here from the edge educator to the crossbar are only gonna be necessary if you're doing a right, right? If you're doing a transaction that requires a right. The journal are an order data stream that makes transactions durable and replicated data between regions and availabilities. The crossbar is just a way that it can replicate it. Your data into the storage and of course the storage. Where is the data is gonna be completely replicated across different storage partitions. And, DSL takes a very interesting approach of, replicating the different pieces of data across different storage. So we have the replication and the performance capability. Benefit given to you by default. You don't need to even, you don't see any of these components here. We are just talking about these components, so we have an idea how it actually works. So let's talk a little bit about how different transactions, like a re transaction, a right transaction operate, so you understand how these components. Which potentially by now are still a little bit confusing, but how these components will be put in place and how do they work together. So if we look here, let's just try to illustrate how all works together. So we reduce the complexity. We'll follow a transaction from ra. We are going to start with a RET transaction and a base select statement. Okay, let's imagine that a user is on a specific region US is one, and that user is looking to order pizza from a local restaurant in let's say Virginia, right? So what happens there is the user decides what food he wants from the restaurant based on a specific rating. So in this case, let's select all the restaurants where ratings equals to four, right? What happens behind the scenes at this? From this point because you have select, you've done the a select statement, which is a read statement. How does this SQL manage that statement? So let's look into that. So you have the client will connect to the front end. That means a specific regional endpoint. This could be a multi-regional linked cluster. It could be a single reach for the examples you're gonna provide today, let's say they are all multi reach. So when you do this specific request, that request will go through a load balance from the front end, and then we will create a query processor. It's where your, transaction will actually happen. The query processor will actually use a start time, and this is one of the very unique benefits and capabilities of the SQL and AWS is. This time that is being retrieved from a local clock is using what we call the Amazon Time Sink Service, which AWS uses highly accurate global standard time by leveraging satellite GPS signals alongside with atomic clocks references, which is crucial to ensuring that. Time in one region and time in another region are actually are actually aligned because, speed of light and how clocks are aligned can be problematically. If you're not using atomic clock and you have, satellite communications to pinpoint that, right? So when a query gets processor by the query processor, we have time at the start, which is a local clock, right? So you receive that data. Then the query processor, what it does is. In this case, because it's a read because it's a read statement, read transaction, the query processor will look for the shard map of the storage, where the data is being stored on the storage layer, and you go to the read path, you go directly to the storage. It doesn't need to go through any adjudicator, any journal. You go to the storage and then you return, if you look here now you return the data back to the query. Processor and the query processor will return the data to the front end, return the data to the client. So now these results, of course, will be merged here when it'll be sent to the customer. There you go. And you see the specific the specific pizza place that you have selected. Let's look at more complex query. Where maybe the user does not only want to run a simple select transaction, but also wants to do some interactive transaction potentially, inserting data into the database. So the query, of course, you're gonna create a transaction, this case select in a restaurants with rate four. I actually want to see a specific, a restaurant id, and I want to see specific item, right? You're selecting the item, then you're saying, I want to order this item, which is a pizza and then, put this pizza into my order table. So you selected the restaurant, you chose the item you want, and you placed the order. Order. Let's look at how these actually worked behind the scenes with this sql. So again, we, what we're gonna do here is we are gonna have a transaction t time using Amazon time sync, right? What is gonna happen here is the query processor. We get a snapshot of the data from that T type that they start type the T star, right? That snapshot is gonna be load into the query processor and then every single read and every single write will only be done within that is specific. Query processor is not touching the storage yet. He's using optimistic locking, which allows, for concurrence concurrent. Rights should be done at the same time if they're not actually using the same roles, right? So in this S ql, the care processor acts as holding tank for all these statements waiting for a commit statement before it sends the full transaction to adjudicators to be checked. We're gonna talk a little bit about what Edge Educator does in a moment, right? Think about the query processes being displaced here now. The cool thing about how this works right is once you have a start of the transaction, you have that tee time you have, the cross region transactions proceed very similar to a single region transaction. The red path is barely changed, so you can just grab the data from the storage nodes that have the charts. Now when you do the select in insert update, no cross region interactions are required here. Because the optimistic locking that the CCO implements. So the way this works is the latency is only incurred at the commit time. So on the query processors you're gonna go through, you're gonna do select, you're gonna do insert, and you're gonna do update. And once you finally have your snapshot with all the commitments and sorry, with all the data that needs to be commit, it sends to the adjudicator. And we're gonna talk about the adjudicator in a moment, but it sent to adjudicator and the goal of the adjudicator is to go across both regions that you have and make sure that after your t start, you know that no is, no other data has been transacted. Into your storage and your journal that you conflict with your specific request. If there is only one of them is gonna be committed and the other one's gonna be aborted and they need to ry, but if not, you know the commit is when you incur the latency. So you're not gonna incurring the latency into every single select insert an update. That is not how it works. You only get the latency when you're doing the commit cross reach. So this means that reads, writes, and updates are just as fast as they would be in a single region database. Only the commit part is where you incur the cross region latency and the cross region latency could go between 15 to a hundred plus milliseconds, depending how far each region that you have linked together are from each other. If they're close to each other in, into the us, they're gonna be a little bit faster between 20 to 30 to 40 milliseconds. If they're far apart, it could be hundreds of plus milliseconds at the commit time because, speed of flight is something that. We cannot expedite, at least for the time being. So let's talk a little bit about the query processor, because there is a lot of innovation that has been created behind the scenes. So as you've seen, the query processor is doing a lot of work is the HA Heart, or where the SQL Architecture runs. It runs inside what we call a fire crack virtual machine, micro virtual machine. Which is where the query processor host within a server, in this case, a bare metal instance, right? A micro firecracker. Micro VM was created and built for Lambda in 2018, and this is the same technology that the SQL Query processors is using behind the scenes. It's an open source micro VM that AWS has open source. And, we've used to put a secure box around the Postgres engine. So Postgres will be running on top of this query processor, which is a micro view. So the cool thing about this is as your database grows in a scale and a demand, this can literally scales to little tens or one query processor at any given time or zero if you don't have any requests to tens of millions of query processor is being created. Each query is gonna create a query processor, like each transaction, sorry, is gonna create a query processor. Another good thing is the support of I snapshot isolation. So in this SQL, we support an isolation level call snapshot isolation. So what does act this actually mean? It means that each transaction operated operates on a consistent snap snapshot of the database as it exists at the start of the transaction. So when as you do the right from the storage, and you see here on the right of the screen, the when there is snapshot is being created into the query processor, that is a shot isolation who only exists on the query processor. If you have any rights and updates and inserts, then you will try to do the commit. If it's multi-region use, use the adjudicator to make sure there is no there is no conflict across the commits across different regions, right? So the transaction begins, proceeds through the SQ execution phase where the read see the consistent snapshot being put back into the micro view. So when the right operation occurs, like an insert or update, they are not immediately applied to the storage, and that is a very important thing to think, right? Very different than other databases and how databases are architected. If you would do a read or an update or an insert, it would be automatically done in storage. At this time, because we are using Optimistic Locking, what it does is uses that snapshot that it got from, your read pad, and every time it has a right, like an update on insert, you run that locally. Instead of we spo, of course, we spo these rights locally and you create a private workspace for this transaction. What this approaches allows you to do is read your rights so you don't need to subsequent read. Within the same transaction that can see pending changes on the storage that increases performance and the scalability, which is one of the very incredible things that the CQ allows you to do. But as you are doing these rights locally on your a credit processors, there is a challenge, right? What happens if there is another transaction that is trying to write at the same. A role that your existing transaction has. So let's say you have transaction and transaction B, you need to have a capability on your database engine that can look across these specific transactions and decide is there a conflict or is there not a conflict? And that's what the adjudicator allows you to do. The job of the dedicated should detect and resolve conflict between transactions and ensure their rights are consistent. Because you have relation database, they need to be asset, and by having that, they need to be consistent, right? So when you look at transaction a, for example, when you create a payload, you have a T start. And remember, the T start is the time when you are actually receiving the reads from the storage, right? Then you can do a write of sets, post images. That is the payload. You send the payload to the adjudicator, a payload will contain the right sets, which are the items that you modified copies of the table, roads, applying effects of a transaction. The payload also contain, contains the transaction start time, which is the T start, which is crucial element in committing or aborting the transaction. So let's look at how this works. So coordinate once only at commit time. So you have your query processor once it's ready to, once the query processors have done all the transactions that has been given, what the query processor does is sends you the adjudicator and pretty much says to the adjudicator, dear adjudicator, here are the keys I intend to write, and here are my transaction. Start time if no other transactions have been reading these key since. They start time that I've done my read. Pick a time for a commit and write these changes to the journal. Your friend query processors what the adjudicator will do. You'll never allow another conflict transaction to pick a lower T commit, right? So if behind the scenes there was another transaction that comes. Few seconds after this and says, oh, I have actually a transaction that is started, after the T start, but before the T commit. The adjudicator will be like, you are not gonna be able to do that. It's gonna abort your transaction and your application need to retry the transaction. That is one of the things that you need to be aware of as building applications with this SQL that you need to retry. If. That transaction gets abor by the seco because there was a conflict. I'm gonna show you in the multi region, active scenario, that adjudicator is the piece that grows across the region, right? That is the piece that goes across the region. And, it shards different keys across different regions sorry, d those two regions. So it needs to send the data for, a specific query across the regions to make sure it's actually able to do that now. So if you have two specific transactions that are changing the same role, right? So if I have transaction one that you start at 10 0 9 33, but 10 0 9 35, if I have the two transactions, A and B, and they start roughly at the same time, but transaction A sends a commit for the transaction is likely before transaction B. What will happen in this case, in this case. What would happen is the adjudicator look at, are you trying to write at the same row? In this case, adjudicator discovers inter in intersecting rights, right? And so you compare the payloads from the query processor it's looking for, right? Any rights that have a T start time and propose what needs to be in index. In this case, as you can see here, is trying to update the same payload. So only transaction A is gonna be approved because both can change the same role at the same time. So transaction A is gonna get committed where our transaction B must be aborted, and your application must retry because then when your application retries, you actually go to the storage layer and retrieve the new data, the transaction they have just updated. So you don't have any loss of data or in any inconsistency. Now the cool thing about adjudicator is if the transactions do not intersect, in this case you can see one of the, maybe it's too small for you to see, but transaction A is trying to write on item 93 and transaction B is trying to write item 97. Even though some other selects are exactly the same, both transactions are gonna be allowed to be committed and both commits will proceed, which is really good. Another very interesting thing about the SECO is in traditional databases, the durability happen at the storage layer. Transactions are only committed once they are durably written in the storage layer that is for traditional databases. So the storage layer is expected to be able to recover al committed transactions from storage after any phase. These requirements add significant complexity to the database engines, including logging, coordinating systems, APIs, and need to keep a storage consistent with more for recover purposes. On the sql. The durability is given to the journal, and of course the journal, if you're using a multi-region, active solution is replicated across. Multiple multiple regions. And this is where, if you remember when I said I have the witness region, that witness region is a replication of the journey. Journey is just where all your transaction logs for your database are stored. So this CCOs manages this complexity by making the journal response responsible for the short term dur. Both transactions are considered committed once they are reading to the journal. Not once they are reading to the storage, the journal can escape horizontally and transactions can be reading to any journal. They coordinated to provide a totally order stream of committed transactions. Remember, because now the adjudicator makes sure that the priority and the order of the transactions are in place. The journal now have these transaction logs that are ordered. And the crossbar component uses the journals ordering to ensure that the updates that are applied to the storage are in the correct sequence, even when there are multiple journals. So this is the cool thing, right? Because now you have this journal. That, can you scale horizontally and any transaction can be written to any journal. But because the ordering that adjudicator allows you to do is the crossbar components, we use that journal ordering and ensure that the storage is also save the data into the correct order. And this is actually pretty cool, right? So the payload and the timestamp are sent to the journal. Once the journal acknowledges. The right, the updates then are sent to the storage nos. The journal sends a success code to the query processor, which sends back a success code to the client. From the user perspective, the transaction is done. And this is the interesting part because now the journal is where the durability happens, right? And journal is the place that says, okay, it's good to go. You go because it's replicated across. Multiple availability zones and multiple regions, and it has the consistence and the order that you have. So we talked a lot about the S QL so far, and I have a couple more slides before I am the presentation. Our RD SQL uses optimistic concurrence control. So rather than putting a lock when you are starting an insert and you need to do multiple. Selects and multiple, insert multiple updates on the row. That is not how it works. Doing a pessimistic log allows you to have a concurrence play here, which makes a crucial row for multiple transactions to occur Simultaneously, we, without leading to inconsistency of data, you might need to retry some of those transactions if the transactions are within the same period as I explained the capability and the. And the row of the adjudicator. There are no deadlocks conflicts will be detected at the transaction. Commit by the adjudicator. You have a multi-region concurrency protection now because every single data that is committed is guaranteed if you're using an active multi region architecture that they are consistent across these regions. Now we have an efficient storage layer from synchronous replication because of the journal capability. This and allows you as a developer to really design new capabilities and new design patterns for developers. So there are multiple things here that we could talk about it, but what I'll show you some of the extra documentations that you can go and do a little bit of reading. There is great reinvent videos also posted on the AWS channel. What I want to show you is behind the scenes, how is the architecture for an active DCO database? So when we look at an active architecture, you have three regions. Like I said, you have one region, two regions with endpoints, region A and region C in this diagram. And region B is what we call the witness the witness region. You have, one router on each of those regions that have endpoints. Those are the endpoints. And then, even though it says one query processor here, one box, think about that being, the micro VM from firecracker, which is gonna run the Postgres engine there, right? So you can have multiple query processors each actually transaction is a query processor invocation on its own. Thes l internally will actually host across multiple availability zones, right? So realistically, ER stack would be duplicated across three availability zones in each of these three regions. And here you have the adjudicator. And you can see here the adjudicator is the only one that is responsible, the only one that requires you to have a cross region communication at the right. At the commit of a right, and you can see that adjudicator will have specific keys that are led by specific adjudicator in this case. Because that's how you know, you guaranteed the consistency across the transactions from multiple places, multiple regions. Then you have the journal and you can see the journal is actually replicated across. All the three regions because that's where the durability happens. So the journal is where the transaction logs are stored, and it's a key critical key component of the sql. Then you have the storage, of course, that will read from the journal. You have a crossbar that is not on this diagram that you read from the journal and actually make those changes into the storage. So the journal only witness, which is region B, ensures that if there is a split between those two regions. The available side can continue to be available and consistent for client. On the other side of the partition, for distributions that go across large geographies, you can put the witness kind in the middle, right? So if you have, let's say, one region, apologies, one region in the US and you might have another region. In Australia, you might wanna put another region maybe in the Middle East on Europe, right? That is a nice sort of additional property that you can actually optimize for latency as well, right? So let's talk about what happened if Ency in this case goes down, not, let's knock on the wood here, but let's say if there is a catastrophe or something happens with the region as sea goes down, which is very uncommon but let's say that region goes down. Any adjudicator leadership that short leave state for doing an isolation moves out of their region regency and remain into the health region. Because remember, you need all the keys for the adjudicators to make you know the control. Now the journal is splitted across region A, region B, which is region B is the weakness. So you still have that durable storage in the, in two regions and only the health side of the database remains available. Of course, the endpoint of Regency won't work, but endpoint region will continue to work for reads rights at any given time. So you do, you then need to move traffic to the health side, so of the partition. So you can, just either use that specific endpoint or you can do a route 53. Lat sensitive routing with health checks and so on. It's how you are using endpoint on an application. It's completed for you. You can just have one specific endpoint for the region, or you can actually potentially just create a DNS with Route 53 that you know, will you do the routing with health checks for that specific regions that is working. So this is what I wanted to present to you about this S ql. This que is currently in public preview. Hopefully soon it'll be in general available that you will be able to run production workloads. But if you have interest, please feel free to reach out on LinkedIn. Again, my name is Samuel bfi, but also you can scan this here. Codes. We have, the documentation of the cco, some of the dsco or public web pages, and also the useful blog posts that have that have we shared. And I've mentioned there are great Ring van 2024 talks that are available for you to consume on YouTube, and I highly recommend for you to do that. So let us know what you thought about that. Hopefully you'll be building new, exciting, resilient, and highly available applications on top of the SQL. And we are very excited to see what you can do. So thank you so much for taking the time to watch my session. And I'll see you soon. Bye bye.
...

Samuel Baruffi

Principal Global Solutions Architect @ AWS

Samuel Baruffi's LinkedIn account Samuel Baruffi's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)