Conf42 DevOps 2024 - Online

The MariaDB Evolution: Just a Fork of MySQL?

Video size:

Abstract

MariaDB is MySQL with a different name… or is it? Learn the history of MySQL and MariaDB, and explore technical topics such as storage engines, data masking, SQL and NoSQL, HA, automatic failover, and more. This talk widens the perspective on database usage, management, and deployment

Summary

  • Alejandro Duarte is a software engineer at MariaDB plc. He is working on a new book right now called MariaDB for developers. The talk will touch on the present and future of MariaDB and migration.
  • It all started in the 60s with General Electric and the integrated data store ids. Edgar code proposes the relational model, which is like tables. Now IBM DB two, Oracle and database, the main market started to adopt this language. Even though it is spelled as SQL, you still pronounce it SQL.
  • In the late 80s, open source was established with the General Public License. Today, Oracle owns not only the Oracle database, but also the most popular open source database, MySQL. This could maybe even hide the project or maybe stop innovation or reduce it.
  • MariaDB comes with several storage engines. Ironically, you have horizontally column store right in the middle for analytical workloads. You can optimize on the cloud with s three and many others. Here's a short video to explain what makes MariaDB unique.
  • MariadB Enterprise is made for production. Built on top of open source software. Offers more larger maintenance window, up to eight years. Also offers possibility to run non blocking backups.
  • Max scale is a database proxy that sits between a client and a database. It can make decisions on, for example, where to send a query if it's a cluster of multiple database servers. It also understands Kafka. Can combine data from multiple applications in a single query.
  • Today you can deploy MariadB anywhere. You can deploy in the cloud, obviously any cloud. Looking into the future, the teams are working a lot on kubernetes, deployments and orchestration and AI capabilities.
  • Migrating to MariaDB is actually very easy if you for example do it from MySQL. 75% of the Fortune 500 companies use MariaDB. More than 1 billion downloads on Docker hub. Mariadb is much more than a fork of MySQL. Try it out.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, and welcome to this talk. The MariaDB evolution, is it just a fork of MySQL? Well, spoiler alert, it is not. It's a bit more than that. My name is Alejandro Duarte and I work in developer relations for MariaDB plc. I'm a software engineer. I have been writing code for almost 30 years, I believe, and I published these three books about web development with Java and a framework called Vidin, which is very, very interesting. But I'm working on a new book right now called MariaDB for developers. So if you're interested, take the screenshot of these and you'll get a notification when the book becomes available. But today we're going to talk about the MariaDB ecosystem, the clion ecosystem. So we're going to see the historical context in which both MySQL and MariaDB were born. We're going to also talk about storage engines, right? So a bit more technical stuff. We're going to talk about MariaDB Enterprise because that's what you want to use when you move to production, especially if you want to automate things such as failovers. We are going to briefly touch also on the present and future of MariaDB and migration. And who uses MariadB. Okay, so let's start with the history of relational databases, and it's going to be very brief. So it all started in the 60s with General Electric and the integrated data store ids. That's the very first database we know of. It is not a relational database, it's another kind of database. But that was the very first one. And that led to the development of something called the Codasil database model, which basically were extensions to the cobalt programming languages so that developers can query the databases using nested loops and pointers. So they need to think about data structures, algorithms, all this kind of stuff. That means they have to rewrite this codasil code on every schema change. So Edgar code realizes this and proposes the relational model, which is oversimplifying, is like tables. So you have columns and then you have rows. That's what the modern database use. And he was a mathematician, so he formalized these through something called relational calculus and relational algebra, which is what actually database use. Although modern databases, they are not just purely relational algebra based, they have some more concepts there. But this is the basis. All right? And these kind of theories allows you to demonstrate that it is possible to build query optimizers. And yes, they build also these query optimizers that all relational databases have. And a database, it contains tons of algorithms and data structures, right? Like trees and hash tables and so forth. And it knows your data, so it can make very, very good decisions on the plan to access that data on disk. Much better than what a programmer would be able to do. Now all these is theory until the first implementations start to appear. So more or less at the mid of the 70s in IBM, for example, CSMR, which more than a product is a project, it's a research project investigating, researching databases. So they started to implement these, to experiment with these ingress in the University of California, the precursor of PostgreSQL, Oracle very famous database Mimer, another academic project in Sweden, the University of Upsala, I believe. And the predominant query language was called QL. So let's try to remember this word there, QL, that is querying using the english language, all right? That was the main language there. Now later at the end, by end of 70s, maybe at the beginning of the 80s, through the years, don't pay too much attention of the exact location of this vertical line in the timeline. The scientists at IBM and the researchers and programmers at IBM started to think about what would be the best way to query databases. Relational databases. So what's the best way to specify queries using a relational environment? That's what square stands for. And more than a language, it was kind of a game they had. Like I said before, they are trying to figure out, hey, I found out this way, maybe I come up with this idea how we can combine these and yeah, maybe it was also a language but they were using this scientific notation, right? So it's subindices and super indices. This is hard to introduce in a keyboard, computer keyboard. So they redefine these and create something called SQL, which is the SQL of Quill, right. So they are playing with the words, this is like an improved version of Quill. Maybe they named it like that. Now this can be implemented and used in computers. However, SQL was a trademark in some company in the UK, I think it was some aircraft related company. So they cannot use his name, but they removed the vowels in this word and well, SQL is born. So even though it is spelled as SQL, you still pronounce it SQL. We still pronounce it SQL. Some of us, some pronounce it SQL. It doesn't really matter. It is here today. It's the best. It's not perfect, but so far nobody has come up with a better language than SQL. Now IBM DB two, Oracle and database, the main database in the market started to adopt this language, SQL, and it became a standard. I believe in 1986 or seven or around those two years, NC and ISO. Now to give you some bit of a perspective on what's going on in the industry, by the late 80s, open source is pretty well established with, for example, the Gunu project. They created something called the General Public License, which means that if you release a software with the GPL, you have to provide also the source code and people can modify it, but if they modify it, they have also to publish that source code. So it's like the source code is going to be available always. That's the GPL. Now Linux is being developed here in Finland by Linux Torbox. It was published at some point under the GPL. Postgres in the University of California, Berkeley is the academic project trying to build this relational database is under development. Unfortunately, it doesn't use the GPL. It is still open source and it's a very permissive license. However, there are no free SQL databases because postgres wasn't designed to support SQL. This changes with the very first free SQL database that was called MySQL or mini SQL. It offered better performance than postgresgres and SQL. It is still in use in embedded devices. In fact, the latest version was published well that date. So it's not very active in development where it's still in use. However, there's no open source SQL database because this one, yeah, you can use SQL, but you cannot see the source code, so you have that option. This changes with MySQL and its creator, Mikael Videnio. So he was working with his company and his colleagues and he wanted just know, provide good services and good products to his customers. And he created something called Unirec to manage databases. And on top of that, he started to develop its own SQL layer, so to speak. And later they called it MySQL and published the source code, opened the source code. It's a very fast database. He wanted a very fast database performant, easy to use. Two things that you can still see today on MySQL, in MySQL and MariaDB. Yeah, then in the 90s it had its limitations, but it turned out to be a great fit for a website. So we can say that it helped shape the Internet as we know it today. And it was released at some point under the GPL license. That means it cannot be closed again. So MySQL gains popularity very quickly in the next decade. A company called Inobase produces or develops this module for MySQL. Let's call it like that module, Enodbeam, that solves the limitations and a company is created to write services and that kind of stuff. But then Oracle buys Inobase, which is like I said, employs the developers who are writing the code for EnodB. Oracle bought that, okay, that was in 2005. Then later Sun Microsystems buys MySQL, the company MySQL Finland AB in 2008. And then I guess some of you remember what happened next or kind of guess where this is going. Oracle buys Sun Microsystems. So that was announced in 2009 and effective in 2010, I believe in January or something like that. And now Oracle at this point owns not only the Oracle database, which is the most successful commercial database, but also the most popular open source database, MySQL. So the community and especially Mikhail Vidanius realizes that this is our risk for the project, for MySQL, and at least there could be some conflict of interests, right? That's just natural. And this could maybe even hide the project or maybe stop innovation or reduce it. I'm not going to be the judge of that, but I'm going to show you this conversation also on the official MySQL community, slack whatever. I didn't see much innovation in the 8.1 innovation release notes there are plenty of deprecations. Is that the new definition of innovation at Oracle? Well, of course they are joking and I'll let you decide where there is truth in these jokes. Props to Oracle because the project is still alive and they are still innovating. I'm not sure how much though, but the project is alive. However, Michael Vineyards forks this code. That is, he takes the code and copies and publish it somewhere else in another repository and creates a new project. And many of the developers of MySQL, the original developers of MySQL, then they moved to this new project, to Mariadb. Right, so that's how MariaDb was born, as a fork of MySQL. Indeed, it was a fork of MySQL and it was supposed to be a drop pin replacement for MySQL and it was for some time. Nowadays, let's say they are highly compatible. There are not two other database that are as compatible as MySQL and MariaDb. However, projects have diverged. Right. And as you can see, the way I see it at least is that these are the guys who built MySQL. They are working now with Mariadb. So it's more like a change in the name and then the other company continue to keep the name and obviously some of the developers and stuff and both projects benefit from each other, I would say at the development level anyway, the first release was Mariadb 5138. We are on eleven something, so it's been a long ride since then. It has to honor the GPL license, obviously. And so that means it's going to continue to be protected just like MySQL, at least in terms of availability of the source code. About development, we don't know, right? I mean, you saw the conversation on slack. Now in the case of Marie bees, it gained popularity very quickly and it became the default database in many Linux distributions. And you can see it here, for example in the Debian popcorn popularity contest, which kind of sends, you have to install this package on Linux on your machine and then it sends data on what packages you have installed. So you see the MariaDb server package gaining and taking over MySQL in number of installations. And it's not just on Linux, also on windows you see more and more installations. This tells me, in my opinion that more developers are using MariaDB as well. So not only in production, but developers are choosing Mariadb. Now. The MariaDB foundation was created to protect the source code of being controlled by one large entity. So that innovation continues to happen. Also the MariaDB Corporation was founded. Now it's called MariaDB plc. And they offered services but also products and most of them open source. For example, faster connectors, connectors like drivers or APIs for Java or Node JS, C, Python to connect to Mariadb. And they are faster than those in MySQL. Now they also created additional storage engines. So what is that? Storage engines. Let's talk a little bit about storage engines. And there are many, many storage engines here you see in NoDB again, right? So in fact I said it's a module and yeah, that's true, it's a module for MariaDB. So it's something you put in Mariadb. MariaDB comes with several of these, not all of these, some of these already when you install it, but you can put more there or remove them if you want. InoDB comes there. That's the one that you are going to use most of the time. Ironically, you have horizontally column store right in the middle for analytical workloads. So that's for the average of the numbers in these columns. It's going to be much faster than other storage engines. So for reporting analytics you can do these with MariaDB as well. You have myrocks initially created by Facebook for workloads that are write heavy. You have tons of writes and maybe the opposite area, which is like Maria without the M many reads but very few writes. You can store it in memory, as you can see there, CSV, you have a spider for database charting. That is like dividing the data in multiple nodes so that your database can grow even. You can optimize on the cloud with s three and many others. Okay, so let me show you this. So let's say we have an application in which people can make tons of comments, right? And then we expect quite a lot of those. So we create a table, comments, some columns there, and then we say engine equals my rocks don't. This is optimized for write. We're going to save money probably on this storage. Now in the same database we have categories, some columns there, but categories, they don't change ever. Maybe they change every ten years or whatever. So we can say engines equals area. To be honest, you will use probably InoDB here, but you can use any of the others, even memory, and then just load them when the database starts or, I don't know, use any kind of strategy. You can do this with memory DB, you can have in the same database, these two kind of tables with different storage engines. And since they are on the same server, the same database, you can run a query like select some columns, let's say all the columns from comments join categories at mixing this data, add a condition to filter the data. And as you can see, we have in the same SQL query, we have two storage engines. That's pretty cool. Okay, if you want to learn a little bit more about the kind of different workloads that MariaDB offers and what makes MariaDB unique, this is a good video where I quickly, it's a very short video where I quickly mention some of these things. Anyway, so let's talk about production, because production is very important, right? So MariadB Enterprise is made for that. And it's built on top of open source software. Okay, now I call it enterprise subscription and it includes something called MariadB Enterprise server which is based on the community server which is free. And it offers more larger maintenance window, up to eight years. I believe the community server is like one year or so, don't believe me, but check the policies online. But it got to be something like that. It's a big difference in the maintenance window. It offers also the possibility to run non blocking backups so that operations continue. Even if you are taking a backup, you need to stop operations. Enterprise audit, if you have to comply with some certifications or this kind of stuff. Same with security, any kind of, what's the word for know? You have to comply with some policies or standards. Mariadb Enterprise offers more options for this now. It also offers something called Maxscale, which is a database proxy by the way. So MySQL, the name Mai comes from Mikhail Vidani's daughter. So he has a daughter called Mai. I don't know, maybe in Swedish it would be like me, let me know. You speak Swedish, how you would pronounce that? And he also has another daughter called Maria. So you have Maria Dubi and he also has a son called Max. And so you have Max scale. So that's interesting fact right there for you. Let's talk about the Max scale then. It's a database proxy. That means that it's something that sits between a client, in this case a web server with an application, web application and the database. But the web server or the application is talking to the proxy, but directly physically to proxy. But it thinks it's talking to the database or the server, right? Database server. And the server or the database thinks it's replying to the client. That's what a proxy generally speaking is. And I call it intelligent because it understands SQL. So it can make decisions on, for example, where to send a query if it's a cluster of multiple database servers, or what to reply if I need to modify the results somehow. This is all configurable. That's the idea of a database proxy. It also understands SQL. So you have maybe a web application in Java or node JS and it uses the MongoDB driver. So now it has to use MQL. So the MongoDB query language. So instead of using MongoDB you can send those queries to Maxscale and Maxscale translates that to SQL and stores the data in MariaDB. That's pretty cool. So you have all the data in a relational database. The advantage being that if you have other applications that use relational database, you have all the data in one single database. So you can use one single query to join the data from multiple applications that use kind of a different nature like SQL and NoSQL in one single query. That's pretty cool. If you want to experiment with these, I have this video where you get access to this docker compose file and it spins up all the services, max scale and you don't have to do much. You can just run a query using MongoDB query language and then another one using SQL, but you don't have MongoDB really. And then you can combine the data, both SQL and NoSQL, so to speak, data in a single SQL query. It's pretty cool. Did I mention that MariaDB B Max scale was intelligent? Well, it also understands Kafka. Now here I put the database server on the other side. And basically what you can do here, what this enables is something like change data capture, that is sending database change events, events like changing the schema or in data from Kafka to any other kind of system, including MariaDB. You can send it out to another MariaDB database if you want it. For example one that has column store while this one has inodb. That one would be for analytics. I don't know, there are many possibilities. So CDC and you can do the opposite. You can do a data ingestion that is storing data that comes through Kafka in MariaDB. Pretty cool. Now this is a very interesting use case read, write splitting. So let's say you have these two database servers right here and then you configure MariaDB replication, which you can learn with this video, this one, this code takes you to the channel so you can just subscribe. There are plenty of interesting videos, especially by my colleagues. I have to say they're top notch experts in the database, in database technology I would say. Anyway, so you have configured this, you put Maxcale here, you configure it. So that sends the rights, it's very easy, actually sends the rights to the primary and the reads to the replica. So everything you write in the primary because of MariaDB replication is going to be available in replica. So you can read from the replica instead from the primary. And then your web application or your applications just send the SQL or connect to the max scale proxy. Remember it's a proxy. So the application thinks it's talking to a database and it thinks it's talking to one database. In fact the connection string, in the case of Java, that's example, but similar. In other programming languages the parameters would be similar. It thinks it's just one endpoint. That's it, I'm going there. It's one. But actually there are two nodes. In fact we can add a new one and the web application. You don't need to restart it, you need to stop it. Nothing. It continues to work. It's just now can work more efficiently with reads. In this case we are scaling reads horizontally and you can remove also the replicas later when you don't need them to save money, for example in the cloud. In fact you can change the whole thing. You can change these to now three different clusters or availability zones or even clouds if you want. And data is replicated there, I don't know, with inoDB and column store nodes for analytics, the replication still doesn't know. It continues to use the same connection string. It thinks it's one logical database. In fact there are many nodes, as you can see there. So this is topology isolation, it's isolated that it can evolve, it can be evolved. Okay, so automatic failure, which is pretty cool. Let's say this is the primary server in this cloud, and this one is managing all the writes. So if it fails, then we cannot write data anymore. That's bad. But Max scale detects this automatically. You don't have to do anything. And then it picks another one, reconfigures it and promotes it as a new primary. That means that the web application continues to write data. Maybe there's a slight short delay in some of the write operations while this configuration is taking place, but it doesn't fail. Then later maybe the failed node recovers or you restart it, or it restarts automatically, whatever. Maxiscale detects these and now reconfigures it as a new replica. So all of a sudden you have the same capacity. Assuming nodes have the same capacity and they are identical, you can do the switch over through a UI that Maxscale offers, a web based GUI or GUI or the command line or your own script or configuration files to always use to kind of restore it to where it was before, manually. You can do that, that was automatic failover. Let's talk a little bit about the present future of Mariadb. Today you can deploy MariadB anywhere. So Docker for example, I deployed with docker swarm I believe was this deployment of MariadB in this raspberry PI cluster that I built. I didn't have it close to me right now, but it's pretty cool because you can disconnect one of these cables, the whole thing continues to operate. Maxiscale is replicated, I think I have two nodes, I think it's the two top nodes there. I installed Maxiscale there and it replicates the configuration. So I configure one, the other one changes accordingly. So it's pretty cool. You can deploy in the cloud, obviously any cloud. Looking into the future, the teams are working a lot on kubernetes, deployments and orchestration and AI capabilities. I'm not going to talk too much about it, so stay tuned for news on these two fronts. Migrating to MariaDB is actually very easy if you for example do it from MySQL. And these are the main servers, other servers that we see that people migrate from the most to Mariadb. But this is boring. This documentation, what I wanted to show you, it's actually a feature that MariaDB has. So you can say set SQL mode equals oracle or put this in a configuration file somewhere with a different, slight different syntax. Now MariaDB understands Oracle well, it doesn't understand all the dialect of Oracle. That would be crazy. But it helps with migration a lot because you get closer to it, so you need to change less things. The same with PostgreSQL and the same with SQL server. So that's pretty cool tool for migration. Who uses all this cool stuff? Well, here you see some usage around the world. So you see Asia, United States, Germany, Brazil, Mexico. Well there are many. These are the countries with more downloads, right? But they are used. Meridi is used everywhere, globally. And remember, it's open source and it's backed up with these companies, which are huge. So this project is not going to disappear. I don't need to mention or say anything about these companies. You recognize them. Some notable users, Wikipedia. When you read something on Wikipedia you are reading information stored on mariadb. Samsung. If you have devices that are Samsung and you log into their networks, you are using Mariadb. Nocare, Red Hat, Google DBS. DBS is a huge bank in Asia. They migrated from Oracle to MariaDb and they are very happy because they are saving a lot of money and they gain some functionality as well. These are some notable users, but actually 75% of the Fortune 500 companies use MariaDB. So most of them use MariaDB. Now it's not only big companies because MariaDB has more than 1 billion downloads on Docker hub. That's quite a lot. Now in conclusion, we saw the MariaDB evolutions in the 60s when you store data on tape. This kind of stuff up to today where you maybe even with a few clicks have the database running in the cloud, fully managed sometimes or on your raspberry Pis, I don't know. I wouldn't recommend going production with raspberry PI, although I bet it has been done. It might work even. I don't know. I don't know for databases though. But it is fun to do it for experimentation. Anyway, we saw this. I want to leave you with this message. Nobody says Ubuntu is a fork of Debian or Microsoft SQL Server is a fork of database unless they are making a historical remark in the same way. Mariadb is much more than a fork of MySQL and I hope you saw why this is true and learned something about Mariadb or database try it out. It's a lot of fun. These are my coordinates. Feel free to reach out. I'll be happy to hear from you. Thank you and enjoy the rest of the conference.
...

Alejandro Duarte

Developer Relations Engineer @ MariaDB

Alejandro Duarte's LinkedIn account Alejandro Duarte's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways