Conf42 Cloud Native 2022 - Online

PostgreSQL Distributed & Secure Database Ecosystem Building

Video size:

Abstract

As the most popular open source relational database in the world, PostgreSQL keeps attracting the significant attention it deserves. With the ever increasing data storage and query requirements, new challenges are brought forward for horizontal elastic expansion and security of the PostgreSQL database.

How to provide existing PostgreSQL databases with incremental capabilities such as data sharding, data encryption and other functions is of great concern to many PostgreSQL users.

This will focus on introducing how to empower PostgreSQL thanks to the ecosystem provided by Apache ShardingSphere - an open source distributed database, plus an ecosystem users and developers need for their database to provide a customized and cloud-native experience.

ShardingSphere doesn’t quite fit into the usual industry mold of a simple distributed database middleware solution. ShardingSphere recreates the distributed pluggable system, enabling actual user implementation scenarios to thrive and contributing valuable solutions to the community and the database industry. The aim of ShardingSphere is the Database Plus concept.

Database Plus sets out to build a standard layer and an ecosystem layer above the fragmented database’s basic services. A unified and standardized database usage specification provides for upper-level applications, and the challenges faced by businesses due to underlying databases fragmentation get minimized as much as possible. To link databases and applications, it uses traffic and data rendering and parsing. It provides users with enhanced core features, such as a distributed database, data security, database gateway, and stress testing.

ShardingSphere uses a pluggable kernel architecture for Database Plus. That means there’s modularity, which provides flexibility for the user. Demos and notable use cases in production environments that are from the Asia equivalents of FAANG (Facebook, Amazon etc.) will be used to introduce the use and implementation of these functions for PostgreSQL databases.

Summary

  • Chester will give a talk about how to create your postgresQl PostgreSQl distributed secure database ecosystem postgresql instance. Talk will focus on how to use Apache sharding Sophia. If you have some question about my talk, you can contact me on the linking or GitHub or Twitter any channel you like.
  • Apache sharding Sophia is a project to transfer any database into a distributed database system. The project has a very active community on GitHub. If you want to learn more about this project, it has a website, Twitter linking any channel you can pick up.
  • Project has many features around your database. Today we just focus on sharding and data encryption. You can pick up some features and group these features into a complete new solution.
  • Sharding safe proxy like the gateway or like the proxy. The second part here is the sharding SaFi proxy. These end plus your postgres instance, they can group and become a distributed database. Another product is sharding Sophia GDBC. It's a lightweight GDBC driver.
  • Sharding Sophia proxy helps you manage all of the sharding of this distributed system. Data encryption can help to decrypt or encrypt your privacy information automatically or internally. How to create this solution or how to create that cluster?
  • These next one that you need to learn and use distributed SQL. It can help you to communicate with sharding Sophia. That means distributed SQL advanced feature, advanced understanding capability. Here I give the complete test example or demo for you to create this architecture.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, this is Chester. So I'm happy to meet you and give a talk about how to create your postgresQl PostgreSQl distributed secure database ecosystem postgresql instance. Here is something about myself. I'm the Trista, the cases co founder and these CTO. It's my job title, my area. It's about the distribute built database, about the data sharding, about the database AI management platform developing. So I'm a developer but apart from that I love open source and also I get involved a lot in Apache software foundation. I'm the member and incubator mentor to help to provide some ideas to other incubator projects. So today my talk will focus on how to use Apache sharding Sophia to create your secure and sharding database ecosystem with your postgresql. Well if today you have some question about my talk, you can contact me on the linking or GitHub or Twitter any channel you like. Yeah, so let's get started. So today's content is just to solve one issue. The issue that how to create the secure distributed postgreSQL database system. But before to import the solution. I just want to give some background about our postgresql. Yeah, I guess most application will use the PostgreSQL in the second part. That means you will create these cluster with your postgresql. You have the primary node and one or more replica nodes to help to improve the availability of your postgresql instance. Also your replica nodes can get some or share some traffic with your primary nodes, right? That's the standard usage for the database. But in this classic structure or architecture of your database we also can see more needs or requirements from the modern application or website. That means like high scalability or high availability. Also you want to get your readout as quick as possible and also how to get the elastic skill out with your growing up your application and about how to make your data encrypted and decrypted automatically. So that's the more needs or new requirements from the current your database architecture. So in order to solve such issues I want to introduce project, open source project or community to help us to deal with such issues. That is the Apache sharding Sophia. So what's the Apache sharding Sophia? Yeah, that's the first question I give the answer for myself. That is these ecosystem to transfer any database into a distributed database system and enhance these system with like sharding skew out distributed transaction or SQL audit or more enhancement features around this distributed system. Yeah that is definition we're spoken but actually I will give more introducing about the features or architectures but let me first add its community or this data on GitHub because that is very important for us to pick up or choose this project if it has a very active community and that means you can get more help from this community and that means this community receive a lot of user cases. So the same mission you may be met but will be stopped before. Yeah. So on GitHub, this community or this project receive more than 15,000 stars and have near like 318 contributors now and it released more than 14 variants. That means actually it's active or diverse or maybe have a lot of user cases. Right, so that's Apache sharding, Sophia's community and it's mostly developed by Java. That means if you interested in this project or you want to get some helps from the community or you want to do some contribution to this community, I guess that maybe is a good project for you to join in open source area. Yeah, so because this project has been open source for more than five years. So I guess it's so mature for us to create your sharding Sophia or sharding ecosystem around your postgreSQL or MySQL users. And if you want to learn more about this project, it has a website, Twitter linking any channel you can pick up. Yeah so that's these community part, the NAS part we will enter into the technique or about its feature or what this project can help us do. First. It's like the catalog of this project because this ecosystem it has many features around your database. It has two clients or products for user to choose. Also it support many traditional dbms. So here you can see for the feature part, maybe you want to use a feature like Vrise building shadow database for your load test and sharding database gateway, SQL audit or provides any features I mentioned here you can just use and by tell this system that you want to use one of these features and you can use the distributed SQL or configuration YaML file to tell this product or tell this application that you want to use sharding, you want to use the database gateway. But today we just focus on sharding and data encryption. There is the interesting part, I want to mention that although you will see many of the features but these features, apparently they are independent but you can uses some of them together. That means you can pick up some features and group these features into a complete new solution. Yeah. So today because we want to create a secure and sharding cluster with sharding Sophia with your postgres instance. So that means we want to group sharding and data encryption to create our cluster. So that's about the features part. The next part I want to say that this ecosystem or this community have two clients for you to choose. First one is sharing Sophia Proxy. Sharding safe proxy like the gateway or like the proxy, it can help you do some computing or to handle ways to manage your traffic over statements or queries. But you need to independently deploy it like the second diagram showed you. The second part here is the sharding SaFi proxy. The right column right. You can see you need a server and to deploy it. And then your sharding SaFi proxy these end plus your postgres instance, they can group and become a distributed database. Because this sharding surface proxy, it works as a server. So you can use like a CLI or some DB admin tools to connect to the sharding sufficient proxy and use it as a standard postgres instance or deserves. Currently this sharding Sophia proxy supports two dbms. First, MySQL. That means you can have a sharing Sophia proxy for MySQL. Alternatively you can have the sharing Sophia proxy for PostgreSQL. It's up to you. So today we will choose PostgreSQL to become this the storage nodes of this distributed system. Another product is sharding Sophia GDBC. It's a lightweight GDBC driver. So if you want to use sharding Sophia GDBC, that means it integrates with your application, your application and sharding Sophia GDBC will are deployed in the same machine or server. But sharding SUV GDBC, it implements most of these GDBC interface. That means sharding SUV GDBC can support PostgreSQL, MySQL, Oracle or SQL server. Yeah, so it has more sports for your database. All right, so today, like I say here, we choose the sharding data encryption features and choose the sharding field proxy for PostgreSQL to create our solution. Yeah. All right, so before we enter into the introducing of the solution, I want to give some case about how to underlying sharding with Sharding Sophia proxy and data encryption with sharding Sophia proxy. What's these meaning of such issues? First case is the sharding, right. In sharding case that means at the beginning you have the primary node for your postgresql replicate nodes of your postgresql. But when you import the sharding Sophia or deploy the sharding Sophia and all of the calculation just widows to these sharding Sophia proxy, that means these computing nodes and the proxy will do some calculation and to locate which PostgreSQL or which instance is for these target SQL or this target statement. So the Sharding Sophia proxy will help you manage all of the sharding of this distributed system. Yeah, but another case about data encryption. So what's the data encryption? Data encryption in this case, actually it's not a distributed system. Right, because you can see here the application just width one proxy and one proxy will just wit one PostgreSQL instance. But it's a feature or function that it can help to decrypt or encrypt your privacy information automatically or internally. Here, like you have a table named these user table and there is the telephone number column of this table. So you cannot store the plain text into your postgresql, right? So the sharding Sophia can help you to decrypt this privacy info and these store it in the postgresql. So these you can see the cipher text of your telephone number. Conversely, if your application want to get the exact plain text of your telephone number, the sharding SaFI proxy will first get the target cipher text and then decrypt the information and return these plain text to your end users. So that's the function of this case. The last one I just give a mention, that is the database gateway. Database gateway. That means here the computing nodes or sharding Sophia proxy, it's like the I five or like the NGX. It can help you manage your traffics or queries or statements. How? Here you have two postgres cluster, one of that. These primary nodes have two replicas. So you want proxy to automatically split or share your traffic among this privacy nodes. And two of the replica nodes, maybe like 20% of the requirements will be sent to the first replica, but remaining 18% of the sqls will visit the second replicas. But nor about the whole process of the traffic distribution. Your proxy or sharding Sophia proxy, just tell him that by your application about your uses or strategies around your traffic. Then sharding Sophia can help pass the SQL and then send the SQL to the target replica or your primary nodes. So that's the database gateway case. But today we just use the sharding plus data encryption to help us to solve today's issue. So entering today's case we just want to create these cluster distributed cluster. In this cluster you can see that your application in the top level and these your application standard. Visit your sharding Sophia proxy. That means the computing nodes and then the sharding proxy do some calculation and cases your sql understand SQL and resend this SQL or one sql to one of the Postgresql instance gather without merge out and then return the final readout to your application. But here there are two features mentioned. First it's the sharding because imagine that your proxy have the logic table. I mean the logic table, it's not these physical tables because the actor table or the physical table located in your two of the postgresql instance maybe named t user zero. From t user zero to t user one. And these same here in demo ds one, right. So there are two postgresql, demo ds zero, demo ds one. They all called storage nodes and the table actual table located in such postgresql instance. But for your applications, for your proxy these are only one table t user, right? So one table t user, it's my ping to four actual table or physical tables. Yeah, so that's the one keeps the next one is imagine this, users have the privacy information telephone number. Yeah, telephone number. We need to encrypt the plain text and send the cipher tax into your postgresql instance. So that's the encryption feature. So this case you use sharding and data encryption together. That's the value of this case. Then how to achieve that goal or how to create this solution? Well I gave the basic steps about how to create that cluster. First you need to prepare your storage nodes, that is the two PostgreSQl instance. Second, you need to deploy one proxy. Also you can deploy many proxies. You can group a proxies like the cluster and maybe your case need more computing nodes, need more computing power. Then you can deploy more than one proxy instance to help you balance all of the traffic from your application or web cases. So here in this case we just deploy one. And third step is to log in your proxy by the Db admin tools or GUI or CLI. Anyway you can visit your proxy or you can visit your postgresql server because as I before proxy, it works as the postgresql server, right? Yeah. So by step one, two, three you can create architecture on the left column and then that means by step one, two, three you create a distributed cluster and next one you need to buy using step of four five, six to fill this distribute the cluster with the sharding rule, sharding strategy, encryption rule, encrypt strategy or create many table, many schema, many database on these empty distributed database. That's clear, right? But how to tell these postgres to create like a sharding table not standard or single table to create more table with the encryption feature that means you need to use distributed SQL. Okay, I will give some details about this part. So by step four, five, six actually you make your cluster have the sharding feature, sharding function and the data encryption function. Though your application just witted these computing nodes, just witted the proxy and this distributed system can help you automatically sharding data and encrypt your data, decrypt your data. Yeah. So the step seven and eight will just give a test or input or insert some data or records and to do some test. So the NAS part or the later part I will give the folks on step two, how to deploy the proxy instance. Step five, how to use distributed SQL to create a sharding rule and encrypted rule. And the last one, these is some highlight on this part with your digital SQL. Yeah. So how to deploy your proxy. There are two ways for you to pick. The first one just uses these quickstart on sharding Sophia web page. It's easy, right? But there is a more efficient way for you to choose that is use the Sophia ex booting. Sophia ex boot is developers by the Python language and it can help you quickly one command to create your computing nodes and the governance center together. So by use these vs boot it can help you help you quickly create the cluster, the computing cluster and with one command. So that's the efficiency of this product. There are two way for you and I also give your links for you to visit after you deploy the sharding Sophia proxy. These next one that you need to learn and use distributed SQL. So what's these distributed SQL. When this word mentioned SQl that means it's a language like talk about your postgresql. If you want to make your postgreSQl instance to work like create a schema, create a database, create a table, you needed to use postgres and SQL dialect, right? And here distributed SQL, it's sharding Sophia SQL dialect for your postgres instance. Maybe you have other language, other SQL like MySQL dialect, Oracle dialect. Here it's these sharding Sophia dialogue card distributed SQL. So you use PostgreSQL SQL standard SQL to create a table to create alter the table or drop your schema. Yeah, it's very simple, right? But you use SQL to communicate with your database with your postgres instance here, the distributed SQL. It's the SQL like statement. It can help you to communicate with sharding Sophia. Although sharding Sophia can understand all the postgreSQL standard SQL sharding Sophia can also understand Sharding Sophia SQL dialogue. That means distributed SQL advanced feature, advanced understanding capability. But why do you want to uses them? Because as I said, your sharding Sophia plus your traditional dbms, they work together and collaborate with each other to create distributed database, cluster or a distributed database. So if you want to use the database, if you want to use this distributed base to create some sharding table or distributed table, distributed database and distributed schema or like make these distributed system have some data encryption SQL audit or some distributed privileges this advanced or the enhanced features. That means you need another language that distributes SQL to tell this distributed system. Could you get a point to run while? Yeah. So you can see some examples of distributed SQL when you first give a glance. It's very similar to your postgres SQL but it's a little different, right. So you can just log in by CLI to run this distributed SQL to help you before you need to five step five. You need to run distributed SQL to create a sharding rule and encrypt rule, right? That means to fill more metadata or create a sharding table in this cluster database. Cluster sharding system. Yeah. So that's these distributed SQL also plus it can help you manage this cluster here. Like this cases. Sorry, this case when you run some of the distributed SQL, it can also tell you that this sharding system, it has two storage nodes and one computer nodes and what status of each of the nodes. So use them to manage your cluster. Yeah, I have no more time. All right. I introducing the distributed SQL and value for these sharding system. You can see here like the show instance nodes to show you that what's these nodes in these system. And here I give the complete test example or demo for you to create this architecture. But I have no time to introduce them one by one. So just give you the skeleton by logging, add the resources or your storage nodes and then create the sharding table by these distributed SQL and insert some data for testing. And then here when you run the select statement on proxy, you will see that all of the logic information that this T user is a logical table and you can gather text plane of your telephone number but when you log in your physical postgresql instance you will found that all of the telephone number is encrypted into some string value. And also when you log in your actual postgresql you will found that there is no logic table t user but have one of the physical or actual table of this t user named t user zero or t user two. Yeah so that's the magic of this proxy or it really helping user to deal with a lot of these process about the sharding about the SQL audit et cetera. Yeah so that's these complete case of today's solution. If you want to have a these you can just copy and paste and also if you have any questions you can just contact me on my twitter or my linking or just visit the sharding Sophia GitHub and raise an issue. And also I'm waiting for your pr maybe if you want to join this community so that's fair time. See you next time.
...

Trista Pan

CTO & Co-Founder @ SphereEx

Trista Pan's LinkedIn account Trista Pan's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways