Conf42 Golang 2023 - Online

Create the distributed database on Kubernetes leveraging your existing monolithic database

Video size:

Abstract

This talk will provide you with an example case that demonstrates how to deploy ShardingSphere-Operator, create a sharding table using DistSQL (Distributed SQL), and test the Scaling and high availability (HA) of your new ShardingSphere-Proxy cluster.

Summary

  • Jess Japan is the co founder and the CTO of Sophia ex. It's around the distributed database and around the cloud databases. Today's talk will give the introduction about the database, especially distributed database is on Kubernetes and how to solve such issues.
  • Last one we will talk about the Kubernetes and databases and also about the distributed database architecture. Solution can help you manage the tremendous data stored in your existing Postgres or MySQL SQL. If I have time I will give a demo show to introduce them step by step.
  • Cofounder CTO: How to make existing monolithic databases become the distributed one. Kubernetes was born for the stateless application or services. It can help us to automating our deployment, sharding and management for all the container application. Today we will focus on the difference between status services and stable services.
  • Today I want to give another solution about how to leverage your existing postgreSQl databases and put them into the Kubernetes. The distributed database system is made of two parts, two important elements. We can upgrade your favorite existing Postgres or MySQL databases to make it become a distributed one.
  • Apache Sharding Sophia is an open source project. Transfer any monolithic databases into a distributed one. Also provide more grateful features like I introduced before, rewrite it and auto scaling. Today I will use this sharding Sophia proxy to help you do this demo show.
  • Sharding Sophia on cloud is a ripple to provide the helm chart operators to help you automatically skill in, skill out and deploy this database cluster. When you use sharding Sophia proxy, actually you are using the distributed database system. Your computing nodes can directly visit your databases on the Kubernetes.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello guys, this is Trista. So today this talk will give the introduction about the database, especially distributed database is on Kubernetes and how to solve such issues. I'm Jess Japan now is the co founder and the CTO of Sophia ex. Actually this company, this startup, it's built from open source project. So it's open source commercial stuff as my area. It's around the distributed database and around the cloud databases. Sometimes I will give some post around the open source, around the Apache Software foundation, around the distributed database, all the stuff on my Twitter and linking. So if you have some questions about today's topic and also want to talk with me more, you can give a look there my Twitter and my linking. Yeah. So let's get started. Today our talk will include the following atoms. The first one will give the issues because if there are no issue, it's no necessary to give some talks around the solution. Around this talk. Last one we will talk about the Kubernetes and databases and also about the distributed database architecture. Then based on all of the background of this knowledges I can give them the new idea or solution to help you solve how to leveraging your existing PostgreSQL or SQL SQL such popular open source monolithic databases on the Kubernetes and upgrade them to become a distributed one. Therefore you can have such distributed system can help you solve for example the high availability or you need more performance TPS or QBS. Well, you want this solution to help you manage the tremendous data stored in your existing PostgreSQL or MySQL SQL. And the last part, if I have time I will give the demo show to introduce them step by step. But if I have no time, I suggest that you can slide up by this slice by yourself. All right. So the background or issue that our service from the monolithic architecture to become a distributed one, that means the microservice or the serverless architecture. Then we will cofounder CTO leverage this wonderful open source platform Kubernetes to help us to manage the traffic and to manage the macro service, right. And or server. You can see here as the infrastructure most of them consider to move our infrastructure from the on premise to the cloud. Because the cloud have the best service to help you skew up or skew in, skew out or skew in, skew up or skew down. Your infrastructure is best your service server, right? So in the middle layer that about your databases, how to deal with your databases. We will consider the first one how to make our existing monolithic databases become the distributed one. To make your database system to have the skilling feature, to have a high availability feature and to let your database system how to manage tremendous enormous data and also offer you the best performance, right? And especially people will consider could I put my database on the Kubernetes to help this wonderful platform to help me manage or the deploy or our DB databases system as same as the service or application. All of the stateless application or service. However, when we speak of the Kubernetes, actually it's born for the stateless application or services. It can help us to automating our deployment, sharding and management for all the container application that especially for the stateless applications. How about our database? Because everyone know the databases is mostly different from the stateless service, we need to consider the data persistence issue and how to manage the status of our replica or primary nodes of our database system. And also we need to consider the backup and restore our data. That means how to backup our data or restore our data to one specific point, right? But however, no matter if the stateless service or stateful service actually all of the applications needs the monitor, high vivid automatical deployment and security or the service quality. All of the features. That's the shared requirements from our users or from our ops, right? But today we will focus on the difference between the status services and stable services. Because today we all want to solve is how to put our databases on the Kubernetes. That means how to make our databases have the skilling features or automatic deployment or management features. Traditionally, actually currently all the database vendors, they provide some of the solutions on the Kubernetes. That means put their distributed database on the Kubernetes. They need to leverage the PV PVCN storage class of a Kubernetes native mechanism and use the stateful site to deploy their database and also leverage the pod identity or the mechanism to help them to put our distributed databases on the Kubernetes. I have to say it's a good way to evolve to put our databases on Kubernetes. Therefore your application now it's all born or leave in the Kubernetes. And if we can put our databases also in the Kubernetes, therefore your application can directly visit your databases in the same Kubernetes area or this area for this part. Right? But however, actually today I want to give another solution about how to leverage your existing postgreSQl databases and put them into the Kubernetes. The first solution, like I said before, you can also use some postgresQL operators or use the pvPvC storage classes to help you to deploy all of the databases, all of the stateful applications on the Kubernetes. But today I can give another way to figure out that issue. Let us first look at the distributed databases. Actually, when we speak of the databases distributed databases system, we will split this system into two parts. The first one, that means your distributed database system, it's made of two parts, two important elements. The first one is computing part. The second one is storage part. For example, I can give them the architecture, the high level architecture of the MongoDB or corporal DB. That's all the popular distributed database system you can see here. They also have their storage part, storage nodes, and also the computing nodes. Computing nodes. It's like the data proxy or the data router. It can help you deal with the request from our application, but all of the sharding all of the storage nodes, they can help to persist your data, right? And it can help you to manage your data part. And on the other hand, computing nodes help you deal with the computing part, right? So that's the basic introduction of the distributed database system. So if we can really understand such an architecture, then we can consider how to upgrade your favorite existing PostgreSQL or MySQL SQL databases to make it become a distributed one. Because I know that the PostgreSQL or SQL have been popular for many years and people love it and they already deploy it and manage your existing production environment, right? So can we don't overturn your database infrastructure and just to upgrade to become the distributed one, right? That's another solution for you to solve the distributed database system issue, right? So like you can see here the first solution for you. If you found that your PostgreSQl cannot help you manage the enormous data and you found the request from the postgres become slower and you want to make it have the more performance and the TPS, higher TPS or QBS, you can just remove or get rid of your postgres database infrastructure and just use the current popular distributed database one like ProperDB or other Aurora databases. But another solution that the question here that we can consider continue using the PostgreSQl MySQL SQL in your production environment. But at this point you just use all of the PostgreSQL cluster as the storage nodes. Storage part, storage elements of this distributed database and all the PostgreSQL instance, or we can call them storage nodes, can help you persist your data or do the local computing. And at this point we can just import or use global computing nodes into this distributed system. Then we can use the PostgreSQL working as a server node and important new global computing nodes work as the database proxy and to group all of the elements become a distributed one, right? Therefore we can upgrade your SQL instance or postgres instance become a distributed one. So here the key point is that what's the global computing nodes who have the capability to work at the computing nodes? That is Apache sharding Sophia. I will introduce it later, but now I can give a high level solution about this, how to leverage your postgres instance to upgrade them, become a distributed one and also put this distributed database on your Kubernetes cluster, right? So as I said, sharding Sophia can work as the database proxy or the computing nodes of this distributed system. And your postgresql can work or act as the storage nodes. And to help you manage your data and computing nodes can you deal with the request from your application, right? So therefore, because all of the two parts are actually independent from each other, so you can put your computing nodes into your Kubernetes cluster. Because Apache, Shorty and Sophia were computing nodes, they are the stateless application and Kubernetes is born for stateless application, right? So if we can put the computing nodes into your Kubernetes cluster and can exactly fully leverage the Kubernetes mechanism to manage or deploy all of the status computing nodes, and here you have two options to deploy or manage your storage nodes. The first one you can put your storage nodes into your Kubernetes. That means you deploy your storage loads. That means postgres database instance into the Kubernetes and just like the computing nodes to visit their storage nodes and your application just send a request to your computing nodes, right? That's the first option. The second option that I recommend because you know that Kubernetes currently is not so good to help you manage the stateful databases, right? So you can just leverage the RDS on the cloud, on any cloud and just deploy sharding Sophia. That means the computing node of this distributed database system into your Kubernetes cluster. Therefore your application just send a request to your computing nodes and your computing nodes will run the global computing computing work. And then to get or to persist the data into or from your storage node. That means your RDS, MySQL RDS or postgres or RDS, right? But for your application they will sync your application, just visit a database, a distributed database. Actually this database, for the application it's a single one. But for yourself, from the internal perspective, it's made of a two part, right? But however, you just independently deploy your storage nodes and computing nodes in a different place. The computing nodes live in the Kubernetes and your RDS were born on your cloud, right? Yeah. So what's the benefit of this? My solution, the first one, it can help you leverage your existing databases. You don't want to do the totally change your database infrastructure. The second one, it can help you upgrade it to the distributed one, right? Therefore it can meet your new requirements for your databases infrastructure. And the last one, because you import sharding Sophia in your database distributed system, that means this open source project can provide you more grade four features. For example, the data sharding, rewrite, splitting, SQL audit. That means SQL Fairwall and elastic skilling skew out such features. And the next one, that it gave you another way to help you put your distributed database on the Kubernetes cluster, right? Plus because sharding Sophia, it provide the operators and provide the helm charts. So it actually provide your out of box deployment way to help you to upgrade all of your database infrastructure, become a distributed one and make it happen in the Kubernetes cluster. Yeah, so I mentioned many times about this open source project, Apache Sharding Sophia so what's Apache Sharding Sophia? It's an Apache Toplab project. And this project, basically it's a database proxy, right? And this database proxy or database ending can help you. Here it's introduction. Transfer any monolithic databases into a distributed one. And also provide more grateful features like I introduced before, rewrite it and auto scaling case out data sharding and SQL firewall or SQL audit or logging all of the grateful features around your database. And because this project has open sourced for more than five or six years, so it has a mature community, that means you don't worry that you are the first person to use this project. Many people already help you check this rifle and to test this project. And it provide many user cases and the documents to help you quickly sign up and use this project. Yeah, so that's the basic introduction about this project. The last one I will give some introduction about the features because that's the important part, that's the value of this project. So Apache Sharding Sophia, it has two clients for you to choose. The first one, sharding Sophia GDBC. Actually it's a Java driver for your Java application. When you import sharding Sophia JDBC into your application. It can help you do the following features or the functions the data sharding elastic skew out distributed transaction rewrite, splitting or data encryption or data masking actually because okay, another client is sharding Sophia proxy. It's a database proxy. So your application no matter is the Java or your Golong or the PHP, you can just use some the standard database driver and to visit sharding Sophia proxy and sharding Sophia proxy or sharding Sophia GDBC for your application. You can just regard it as a distributed database or the server. But the Sharding Sophia proxy or sharding Sophia GDBC actually it help you manage your MySQL PostgreSQL Oracle SQL servever database cluster. That means it's not just to help you manage your database cluster, it can enhance this database cluster to make it become a distributed one and enhance it with a lot of useful database or the features you case around your system. So you can see here that all the features and all the databases it support and it's the deployment architecture for you to choose. So today I will use this sharding Sophia proxy to help you do this demo show. That means at the beginning your application just visit your primary postgresql instance or replica postgresql instance. But now your application no need to care too many replica instance or primary instance, just visit sharding Sophia. There's only single database server and this database proxy. This database server help you to manage all of your database clusters to do the rewrite, splitting right data sharding, data masking data SQL audit, all of the great four features you want to use for your application. The NAS part it's about sharding Sophia and cloud. That means okay so sharding Sophia is so great but I want to easily use it and want to deploy this stateless proxy, I mean the computing node cluster into your Kubernetes cluster. So sharding Sophia on cloud is a ripple to provide the helm chart operators to help you automatically skill in, skill out and deploy this database cluster. Yeah, so you can see here, first you need to use sharding Sophia operator charts to deploy sharding Sophia proxy into your Kubernetes cluster. And plus you need to pick up postgreSql charts to deploy your postgreSql into the Kubernetes. But if you already have your RDs on AWs or on google then you no need to use the postgresql chart to deploy it into your Kubernetes cluster, right? Like I mentioned before, you can just leverage the RDS server from your database vendor and just deploy your computing nodes of this distributed database system into your Kubernetes cluster, right? And your computing nodes can directly visit your rds or the databases on the Kubernetes. Anyway, yeah, so today I'll give this solution detailedly because this slides I introduce how to deploy it. The second one, after you deploy it you will consider how to create a sharding table or how to because when you use sharding Sophia proxy, actually you are using the distributed database system. That means if you want to create a table, it's not a single table in one postgres sentencing it's a distributed sharding table locating in different postgresql. But for your application it's just a single one logic databases database or single one logic table. But this logic table for example, this one user table, it's made up of 1234 subtables or physical tables live in different, I mean here postgresql instance and each cluster of Postgresql has a primary node and replica nodes, right? So you can see here there are two postgres cluster. Each one has the one primary node and replica nodes and your logical single table logic table user. For your application there is only one table, but this t user table, it has 11234 physical tables, right? So here we use distributed SQL, this SQL dialect of sharding Sophia to help you define this user table. For example, if you just use the created table t order, that means it can help you create a single one table. But you use the distributed SQL, this SQL dialogue of sharding Sophia, it can help you to create a sharding table here. So you can see here we use this keyword sharding table, not just a table, right? So it's very easy for you to get use of this distributed SQL language to help you to manage or define your sharding database system or this logic, I mean the tables or databases. Yeah. So first when you deploy it, second one, you create a sharding database and sharding table. Then your application just send a request to your computing nodes or this whole database distributed system. For example, this application standard requiring here and when your proxy receive this query it will do the following steps and to calculate which postgres shell instance, owen the readout of this query and Charlene Sophia will send the query to the target. Maybe one or maybe many postgresql instance. And to get the local results together and then calculate the final result or merge the sub result into the final one and send the final result to your application. So that's the basic parasites of each query. The last part is the dymo show. I have no enough time here, but you can see here we just deploy PostgreSQl working as the storage nodes into your Kubernetes. But actually here you can just use your RDS. It's okay. The second one deploy sharding Sophia proxy and then to create your sharding table and insert some day testing data and to ask you the query to test it's okay or not. Yeah, so I already do this demo by myself. You can see here how to deploy it, how to create the database and the table and how to let your proxy vivid your RDS or your postgresql instance and then how to define your sharding table here and then how to insert the testing data and how to ask you to test it. Works well or not. All right, so that's all about this talk. If you have some questions you can just ask me here or visit my linking or GitHub or Twitter. All right, thanks for your time. See you.
...

Trista Pan

Co-founder & CTO @ SphereEx

Trista Pan's LinkedIn account Trista Pan's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways