Event-driven applications: Apache Kafka and Python

Video size:

Abstract

What’s better than a pizza example to show how Python and Apache Kafka, a streaming platform, work together to enable reliable real-time data integration for your event-driven application? Lets dig into problems Kafka is solving, its Python libraries and prebuilt connectors together!

Code and data go together like tomato and basil; not many applications work without moving data in some way. As our applications modernise and evolve to become more event-driven, the requirements for data are changing. In this session we will explore Apache Kafka, a data streaming platform, to enable reliable real-time data integration for your applications. We will look at the types of problems that Kafka is best at solving, and show how to use it in your own applications. Whether you have a new application or are looking to upgrade an existing one, this session includes advice on adding Kafka using the Python libraries and includes code examples (with bonus discussion of pizza toppings) to use. With Kafka in place, many things are possible so this session also introduces Kafka Connect, a selection of pre-built connectors that you can use to route events between systems and integrate with other tools. This session is recommended for engineers and architects whose applications are ready for next-level data abilities.

Apache Kafka is becoming the de-facto standard for streaming applications with many companies basing on Kafka their whole data pipeline. In micro-services contexts Apache Kafka runs the fundamental role of decoupling information’s producers and consumers thus making sure data pipelines are scalable and reliable. I’m a Developer advocate at Aiven.io, leader in the managed Apache Kafka sector, spending most of my time in developer’s shoes building reproducible examples of data pipelines. I’ve been blogging and speaking around the world since 2014 about various aspects of the Data lifecycle: from producing applications, analytics, visualisation and streaming platforms.

Summary

In this session we will check how you can build event driven applications using Apache, Kafka and Python. Kafka is a tool that makes this communication easy and reliable at scale. Now we live in a fast world and we don't want to wait the batch time.
With Ivan, you can create your open source data platform in many clouds. For Kafka it's just a series of bytes. How can you send that series ofbytes to Kafka? Here is a demo showing how to do it.
How to create and produce messages to Kafka. By default, when a consumer attaches to Kafka, it starts consuming. This is the default behavior and we will see how to change this later on. The wall pipeline producers consumer works. There is no end time in streaming.
With Kafka we have the concept of partitions. Partition is just a way of taking events of the same time belonging to the same topic and divide them into subtopics, sub logs. partitions are good because they ease the trade off between disk space and log size.
Kafka Connect allows you to event driven applications. When they read a message, the message is not deleted from Kafka. This makes it available for other applications to read. You will probably want to integrate Kafka with an existing set of data tools, databases, data stores.
Kafka Connect allows you to send data from a Kafka topic to any other data target. Here we show you how to do that with Ivan web UI. In less than an hour we saw how to produce and consume data, how to use partitions and how to define multiple applications via multiple consumer groups.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi, and welcome to this talk. I'm Francesco Tisiot, developer advocate at Ive. In this session we will check how you can build event driven applications using Apache, Kafka and Python. If you are here at conference 42, I believe you are somehow familiar with Python, but on the other side you may start wondering, what is Kafka? Why should I care about Kafka? Well, the reality is that if you are into Python, you are somehow either creating a new shiny app or you inherited an old rusty app that you have to support, maybe extend, maybe take to the new world, no matter if it's new or old, if it's shiny or rusty. I've never seen an application, a Python program working in complete isolation. You will have companies within Python that needs to talk with each other, or you are exposing this old application into the word. So you have this old application talking with another application. Let me tell you a secret. Kafka is a tool that makes this communication easy and reliable at scale. Why should we use Kafka? Well, we used to have kind of what I call the old way of building applications, which were an application that at a certain point had to write the data somewhere. Where did the application write to? Well, usually it was a database, but the application wasn't existing to the database. Every single record, it was taking records, packaging them up, few of them, and then pushing them to the database. Or at the same time when it was reading from the database, it was reading a set of a batch of records and then existing a little bit before rereading the following batch. This means that basically every time we were using such a way of communicating, we were adding a custom delay which was called batch time, between when the event was available in the application and when it was pushed on the database, or when the events was available in the database and when it was read from the application. Now we are living in a fast word and we cannot wait batch time. You can imagine in batch time going between like few seconds or milliseconds to minutes or hours, depending on the application and the use case. Now we live in a fast world and we don't want to wait the batch time. We want to build event driven application. What are those application that as soon as an event happens in the real life, they want to know about it, they want to start parsing it in order to strike the relevant information. And probably they want to push the output of their basing to another application, which will be more likely another event driven application that will create a changing of those application. And we want to do it immediately. But let's do a step back. Let's try to understand what is an event. We are all used to for example mobile phones and we are all used to notifications. Notification tell us that an event happened. We receive a message. We made a payment with our credit card and we received the notification. Someone else stole our credit card details and made a payment. We receive a word notification. As you might understand, we cannot wait batch time of 2 hours, ten minutes, five minutes. We want to know immediately about someone stolen our credit card and we persons, we react as an event driven application by immediately phoning up our bank to block the credit card. This is why events and event event event driven applications important. But even without going into the digital world, we are used to events happening in the real life since long time. Just imagine where your alarm beeps in the morning, you wake up and you act as an event. Event driven applications, you are not only receiving passively events, you are creating events. Just think, when you change the time of your alarm, that will change your future, your actions in the future. Well, going back to mobile phones, especially in this time of pandemic, we have all been used to for example order food from an app. Well from the time that you open the app, you select the restaurant, you select which pizzas you want, then you create can order. This will create a chain of events because the order will be taken from the app and sent to the restaurant which will act as an event driven application and create the pizzas for you. And once the pizza is ready, boom. Another event for probably the delivery people to come and pick it up and take it to your place. So why event event event driven applications important? Because as I said, we live in a fast word and the value of the information is strictly relate to the time that it takes to be delivered. If we go back to the credit card example, I cannot wait half an hour or 5 hours before knowing that my cart has been stolen. I want to know immediately. But even if we talk about food and you know I'm italian so I'm a passionate about food. For examples, if we are waiting for our pizzas at home and we want to know where the pizza is, where the delivery person is, the information about the position of the delivery person is useful only if from the time that is taken from the person mobile phone to the time that it lands on my map on my mobile phone, the delay is minimal as 10 seconds. I couldn't care lets what the position was ten minutes ago. The information has value only if delivered on time. So we need to have a way to deliver this information, to create this sort of communication between components, real time, in real time and highly available and at scale. How can we do that? Well, we can use Apache kafka. What is Apache Kafka? Well, the idea of Apache Kafka is really, really simple. The basic idea is the idea of a log file. A log file where as soon as an event is created, we store them. We store it. So event number zero happens. We store it as a message in the log file. Event number one happens. We store it after event zero, two, three and four even more. Kafka has concept of a log file which is append only and immutable. This means that once we store event zero in the log, we cannot change it. It's not like a record in a database that we can go and update it. Once the event is there, it's there. If something changes the reality that is represented by the event zero, we will store it as a new event in our log. But of course we know that events take multiple shapes in different types. Just think about I could have events regarding pizza orders and I could have other events regarding delivery position. And Kafka allows me to store them in different locks logs which in kafka terms are called toppings. Even more, kafka is not meant to run in a huge single server. It's meant to be distributed. So this means that when you create a Kafka instance, most of the times you will create a cluster of nodes which are called, in kafka terms, brokers. And the log information will be stored across the brokers in your cluster not only one time, but multiple times. The number of times that each log will be stored in your cluster is defined by a parameter called replication factor. In our case we have three companies of the sharp edges log. So applications factor of three and two copies of the runs edges log. So replication factor of two. Why do we store multiple copies of each log? Well, because we know that computers are not entirely reliable so we could lose a node. But still, as you can see, we are not going to lose any data. So let's have a deep look at what happens with Kafka. So we have our three nodes and let's go to a very simple version of it where we have just one topic, our sharp edges topic with two copies. So replication factor of two. So now let's assume that we have a building node. What happens now? Well, Kafka will detect this node whats been failing and will check well, which are the logs available in my cluster. Well, there is the sharp edges log with only one copy, but a applications factor of two. So Kafka at that point will take care of creating a second copy in order to keep the number of copies equal to the applications factor. So once we have also the second copy, even if now we lose a second node, we still are not losing any information. So as of now we understood how Kafka works and what Kafka is. But Kafka is something to store events. What is an event for Kafka? Well, for all that matters to Kafka, an event is just a key value pair. A key value pair where you can put whatever you want in key and value. Kafka doesn't care. For Kafka it's just a series of bytes. So you could go from very simple use cases where you put key, the max temperature label and 35 three as the value itself. Or you could go wild and you could add both in the key and the value JSON formats, explaining the restaurant receiving your pizza order and the phone line used to make the call, and in the value, the order id, the name of the person calling and the list of pizzas. Usually the payload. The message size is around one meg and you can use like in this case JSON formats, which is really cool because I can read through it, but it's on the other side a little bit heavy when you push this on wire, because for every field contains both the field name and the fill value. If you want to have more compacted representation of the same information, you could use formats like Avro or protopath, which detach the schema from the payload and use a schema registry in order to tell to Kafka how I compacted my message. So when I read the message I can ask the schema to Kafka and I can recreate the message properly. So these are just methods for all that matters to Kafka you are sending just a series of types. So how can you send that series of bytes to Kafka? Well, you are trying to write to Kafka and probably in this case we will have a Python application existing to Kafka which is called a producer, and just remembering the things that we said earlier, it writes to a topic or multiple topics. In order to write to Kafka, all the producer has to know is where to find Kafka, list of hostname and ports, how to authenticate, do I use Sl? Do I use SASL? Do you use other methods? And then since I have, for example, my information available as JSON objects, I need to encode that in order to be the row series of bytes that Kafka understands. So I need to know how to encode the information on the other side. Once I have my data in Kafka, I want to read, and if I want to read from a topic with my python application that is called a consumer, how the consumer works is that it will read the message number zero and then communicate back to Kafka. Hey, number zero done. Let's move the offset to number one. It will read number one and move the offset to number two. Read number two, move the offset to number three. Why moving the offset? Communicating back the offset is important. Well, because we know, again, computers are not entirely reliable, so the consumer could go down. So the next time that the consumer pops up, Kafka still knows until what point that particular consumer read in that particular log. So the next time the consumer will pop up, will probably send the item the message number three, because it was the first not being read by the consumer. In order to consume data from Kafka, the consumer has to know kind of the similar information as the producers where to find Kafka Osnam import how to authenticate before we were encoding. Now we need to understand how to decode and we also need to understand, we need to know which topic or which topics we want to read from Kafka. So now it was a lot of content. Let's look at the demo. What we will look at here is a series of notebooks that I built in order to make our life easier. The first notebook that I created is actually a notebook that allows me to create automatically all the resources that I need for the follow up within Ivan, you can run this and you will access to this series of notebooks later on. But as of now, let me show you that I pre created two instances, one of Kafka and one of postgres that we will use later on. With Ivan, you can create your open source data platform across many clouds. In this case, we created a Kafka. Well, instead of showing you something that is already created, let me create a new instance. As you can see, you can create not only Kafka, but a lot of other open source data platforms. And once you select which data platform you want to create, you can select which cloud producers and within the cloud provider, the cloud rigid, so you can customize this per units. At the bottom you can also select the plan driving the amount of resources and the associated cost, which is all inclusive. Finally, you can give a name to the instance and after a few minutes, the instance will be up and running for you to have a look. The goodies about Ivan is not only that you can create open demand, but if you have an instance like this, you can upgrade it if a new version of Kafka comes up or you can changing the plan to upgrade, upscale or downscale. Or you can migrate while the service is online, the whole platform to a different region within the same cloud, or to a completely new cloud provider. So now instead of talking about Ivan, let's talk about how to create and produce messages to Kafka. So let's start a producers. The first thing that we will do is to install Kafka Python, which is the default basic library that allow us to connect to Kafka. And then we will create a producer. We create a producer by saying where to find Kafka, list of host, name and port, how to connect using SSL and three SSL certificates, and how to encode the information, how to serialize them. So both for key and value we will take the JSON and move it to a series of byte encoded in Ashi. So let's create this. Okay, and now we are ready to send our first message. We will send a pizza order again, I'm italian so pizza is key. And from myself, an order for myself ordering a pizza margarita. Okay, now the order, the message is sent to Kafka. How can we be sure about that? Well, let's create a consumer now let's move the consumer on the right and let's close the list of notebooks. We create a consumer. All we have to say is apart from the group id that we will check later. I'm calling it client one, I can call it whatever I want the same properties in order to connect osname, import SSL with the three certificates. And how do I deserialize now the data from the row series of bytes to JSON with the two formula CIA. Okay, so let me create the consumer. Now I can check which topics are available in Kafka and I can check there are some internal topics together with a nice Francesco pizza topic that I just created for this purpose. I can subscribe to it and now I can start reading. We can immediately see two things when reading. The first one being whats the consumer thread never ends. This is because we want to be there, ready as soon as an order comes in Kafka, we want to be there, ready to read it. And there is no end time in streaming there is no end date. We will always be there, ready to consume the data. The second thing that we can notice is that even if we send the first pizza order from Francesco, we are not receiving it in here. Why is that? Well, because by default, when a consumer attaches to Kafka, it starts consuming. From the time that it attached to Kafka, it doesn't go back in history. This is the default behavior and we will see how to change this later on. But just bear in mind this is the default. In order to show you that the wall pipeline producers consumer works, I'm going to send another couple of events I'm using to send an order for Adele with pizza y the pineapple pizza and an order for mark with pizza with chocolate. So I'm italian and both choices, there are not what I would call right choices for pizza. However, I respect your right to order whatever you want. Just try not to do that in Italy. Okay, so let's produce the two orders and if everything works, we should see them appearing on the consumer side immediately. There we are. We see that both Adele and mark orders are working in the consumer side. Our pipeline is working. So now let's go back to a little bit more slides. Let's talk about the log size. We want to send messages to a log. We want to send huge messages, huge number of messages to a log. But I told you that the log is stored in a broker. Is this meaning that we cannot have more messages than the bigger disk on the bigger server in our cluster? Well, this is not going to work well, if we want to send massive amounts of events, we don't want to have the trade off between disk space and amount of data. We don't want to need to purchase huge disk in order to store the wall topic in one disk on the other side. We don't want to limit ourselves and the number of events that we want to send to a particular topic because of disk space. We are lucky because Kafka doesn't impose that trade off on us. With Kafka we have the concept of partitions. Partition is just a way of taking events of the same time belonging to the same topic and divide them into subtopics, sub logs. For example, if I have my pizza orders, I could partition them based on the restaurant receiving the order because I want all the records, for example, for Luigi restaurant being the blue being together. But I don't care really what happens between the orders of Luigi restaurant and the orders of Mario the yellow one or Francesco the red ones. Now, why partitions are good for this kind of disk space trade off because the partition is what is actually stored on a node. So this means that if we want to have a huge amount of events landing in a topic, we just need more partition to fit the wall topic into smaller disks. And again, since it's distributed, we will have them stored across our cluster in a number of copies. In this case, the number of copies is equal to every partition because it's a topic level. And even if we lose a code again, we will not lose any data from any of the partitions because it will be available in the other copies in the other brokers. We said initially that we push data to Kafka and then it will be stored in the log forever. Well, this is not entirely true, because we can set what are called topic retention policies, so we can say for how long we want to keep the data in Kafka. We could say that based on time. So we can say, well, I want to keep the data on Kafka for two weeks, six hour, 30 minutes, or forever. Or we can say that based on log size. Basically, I want to keep the events in kafka until the kafka log reaches 10gb and then delete the oldest chunk. I can also use both, and the first threshold that will be hit between time and size will dictate when I will delete the oldest set of records. So we understood that partitions are good. How do you select a partition? Well, usually it's done with the key component of the message. And what Kafka does by default is that it ashes the key and takes the result of the ash in order to select one partition, ensuring that messages having the same key always land in the same partition. Why this is useful? Well, let me show you what happens when you start using partition. It's useful for ordering. Let me show you this little example. I have my producer which produces data to a topic with two partitions, and then I have a consumer. Let's assume a very simple use case where I have only three events, blue one happening first, yellow one happening second, red one happening third. Now, when pushing this data into our topic, the blue event will land in partition zero, the yellow event will land in partition one, and the red event will be in partition zero again. Now, when reading data from the topic, it could happen, it will not always be the case, but it could happen that I will read events in this order. Blue one first, red 1 second, yellow one third. So if you check, the global ordering is not correct. Why is that? Well, because when we start using partition, we have to give up on global ordering. Kafka ensures the correct ordering only per partition. So this means whats we have to start thinking about for which events, for which subset of events the related altering is necessary and for which not. If we go back to our pizza order example, it makes sense to keep all the orders of the same restaurant together because we want to know which person ordered before or after the other. But we don't really care if an order for Luigi's pizza was done before another order for Mario's pizza. So we understood that partitions are good because they ease the trade off between disk space and log size. But partitions are bad because we have to ive up on global ordering. But if you think about partitions, and if you think about a log with a single partition, it's just one unique thread appending one event after the other. And you can think that the throughput is done by the single thread doing the work. Now, if we have more partition, we have multiple independent threads that can append data one after the other. So you can still think, I know that there are other components, but the throughput of those three processes is roughly three times the throughput of the original process. So this means that we can have much more producer producing data to Kafka, and also we can have much more threats consuming data from Kafka. But still we want to consume all the events of a certain topic, but we don't want to consume the same event twice. How does Kafka handles that? Well, it does by assigning a non overlapping subset of partitions to the consumer. If these last few words didn't make a lot of sense for you, well, let's check. In this demo we have two consumers and three partitions. What Kafka will do is assign the top, the blue partition to consumer one, and the yellow and red partition to consumer two, ensuring that everything works as expected. Even more. Let me just focus on this one. If consumer one now dies, Kafka will understand that after a timeout and redirect the blue arrow from consumer one which died, to the consumer which is still available consumer two. So now let me show you this behavior in a demo. Let me show you partitioning in a demo. And this time let's create a new producers which is similar to the above, nothing different. Apart from now we are using Kafka admin client to connect on the admin side of Kafka and create a new topic with two partitions. So we will force the number of partitions here. Okay, now what we want on the other side is we have a producer with a topic of two partitions. Let's create two consumers that are working one against the other to consume all the messages from that topic. So let's move consumer one here and consumer two there. So I'm saying to Kafka that im existing those two consumers to the same topic. Since I have two partitions and two consumers, what Kafka should do is assign one consumer to the top partition and one consumer to the bottom partition. If I go now, let me check that the top one is started. Let me start also the bottom consumer. So all the two consumers are started. Now let me go back to the producer here and let me send a couple of messages. If you remember what I told you before, the partition is selected with the key. So Kafka does a hash of the key and selects the partition. What I'm doing here, I'm using two records with slightly different key ed one ed zero. So I'm expecting them to land into two different partitions. Let me try this out exactly. So I receive one record on the top consumer, one record at the bottom consumer. If you wonder what those two flex means, this means that the top consumer is reading from partition zero, the offset zero. So the first record of partition zero, the bottom consumer is reading from partition one, offset zero, first record of partition one. If now I send another couple of messages from the same producer reusing the same keys since what I told you before, that messages with the same key will end up in the same partition. Im expecting mark order to land in the same partition as Frank because they share the same key and the same for Jan and Adele. So let's check this out. Runs this and as expected, mark is landing in the same partition as Frank and can in the same partition as Adele. So offset 1 second record or partition zero offset 1 second record of partition one. So let's check also the latest bit. What happens if a consumer now fails? Kafka will somehow understand that after a timeout and should redirect also partition zero to consumers one. So let's check now what happens if I send another two events to the topic? I'm expecting to read both of them at the bottom consumer. Let's check this out exactly. So now I'm reading both from partition zero and partition one with the consumer which was left alive. So all as of now working as expected. Let's go back to a little bit more slides. So as of now we saw a pretty linear way of defining data pipelines. We had one or more threads of producer Kafka and one or more threads of consumers that were fighting one against the other in order to read all the messages from a Kafka topic. For example, if we go back to the pizza case we had, those two consumers could be the two pizza makers that are fighting one against the other in order to consume all the orders from the pizza order topic. But they don't want to make the same pizza twice. So they don't want to read the same pizza order twice. However, when they read a message, the message is not deleted from Kafka. This makes it available for other applications to read. So for example, I could have my billing person that wants to receive a copy of every order in order to make the bill and it wants to read from the topic at its own pace. How can I manage that? Well, with Kafka it's really simple. It's the concept of consumer groups. So I have to define the two pizza makers are part of the same consumer group. And then I will create a new application called like billing person and we'll set that as part of a new consumer group and Kafka will understand that it's a new application and we'll start sending a copy of the topic data to this new application that will read at its own pace that has nothing to do with the two pizza makers. Let's check this out again. Let's go back to the notebook. And now what we will see is we will create a new consumer part of a new consumer group. So if we go back to the original consumer we had this group id that I told you before. We will check that later. Well, it's time now. The top consumer was called pizza makers. This is how you event driven applications being part of a consumers group on can Android. The new consumer is called bill in person. So this is for Kafka. It's a completely new application reading data from the topic. If you remember when we had this original consumer we managed to attach to the topic but read not from the beginning of the topic, from the time when we attach to the topic. So we were missing the order number one. Now with things new application we also say out offset reset equal to earliest. So we say we're attaching to a topic in Kafka and we want to read from the beginning. So when we now start this new application we should receive the two messages above plus also the first message, the original message of Francesco order. There we are. We are receiving all the three messages since we started reading from the beginning. Now if we go to the original producer and we now send a new event to the original topic, we should receive it in both because those are two different application. Let's try this out exactly. We will receive down order both in the top and the bottom application. Everyone adding the data because there are different application reusing the same topic. Now we understood also different consumer groups. Let's check a little bit more. What we saw so far was us writing some code in order to produce data or to consume data. However, it's very hard that Kafka will be your first tool, your first data tool in your company. You will probably want to integrate Kafka with an existing set of data tools, databases, data stores, any kind of data tool. And believe me, you don't want to write your own code for each of those connectors. There is something that solves this problem for you and it's called Kafka Connect. Kafka Connect is a pre built framework that allows you to take data from existing data sources and put them in a Kafka topic, or take data from a Kafka topic and push them to a set of data syncs. And all is driven by just a config file. And one or more threads of Kafka Connect allows you to event event driven applications. If we go back to one of the initial slides where we had our producer producing data to a database, well, we now want to include Kafka in the picture, but still we don't want to change the original setup, which is still working. How can we include Kafka in the picture? Well, with Kafka Connect and with a change data capture solution, we can monitor all changes happening in a set of tables in the database and propagate those changes as events, as messages in Kafka. Very very easy. But also we can use Kafka Connect in order to distribute events. So for example, if we have our application that already writes to Kafka and we have the data in Kafka in a topic, well, is our team needing the data in a database? Any JDBC database? With Kafka Connect we just can ship the data to a JDBC database. They want another copy to a postgres database. There we are. They want a third copy to Bigquery. Really easy. They want a fourth copy into s three for long term storage. Just another Kafka connect thread. And again, what you need to do is just define a config file from which topic you want to take and where you want to bring them. And with Ivan, Kafka Connect is also managed service. So you just have to figure out how to write the config file. So let's check also this as a demo. If we go back to our notebook we can now check the Kafka connect one. Let me close a little bit of my notebooks. What we are doing here, we are creating a new producer. What we will create now is a different topic and within each message we are going to send both the schema of the key and the value and the payload of the key and the value. Why do we do that? Well, because we want to make sure that our Kafka connect connector understand the structure of the record and populates a downstream application that in this case it's postgres table properly. So we define the schema of the key and the value. And now we pass, we create three records. We push three messages to Kafka containing both the schema and the value itself. With Frank ordering a pizza margarita, Dan ordering a pizza with fries and Jan ordering a pizza with mushrooms. Okay so we push the data into a topic. Now we want to take the data from this topic and push that to postgres. In order to do that I have everything scripted but I want to show you how you can do that with Ivan web UI. If I go to config I can check that there is a Kafka connect set up waiting for me. So this is the config file that you have to write in order to send data from Kafka to any other data target. So let's check this out. What do we have to show? We have to create send the data to a postgres database using this connection using a very secure new PG user new password. One, two, three. Very secure. And what do we want to send to this postgres database? A topic called Francesco pizza schema and we are going to call things connector sync Kafka postgres. Additionally we want to tell that the value is adjacent and we are using a JDBC sync connector. We are pushing data to a JDBC and please if the target table doesn't exist, autocreate. Okay let me show you on the database side. Whats have postgres database called 42 whats I'm connecting to. It has a default Db there whats schemas? The public schema and set of tables which is empty. Okay now let me take the config file that I've been talking to you about earlier on and let me go to Ivan Web Ui. Im going into my kafka service. There is a connector tab where I can create a new connector. I select a JDB sync and syncing the data to a JDBC database and I could fill all the information here. Or since I have my config file I can copy and paste into the configuration section and things parses the information and fills all the details as shown before im creating a connector with name sync Kafka postgres classes JDBC sync connector and the value is a JSON database and I'm sending the Francesco pizza schema topic. So now we can create a new connector and the connector is running. So this means that now the data should be available in postgres. Let me check that out. We can see that now my database has still tables. Let's refresh the list of tables and I see my Francesco pizza schema which has the same name as the topic. If I double click on it and I select the data I can see the three orders from Frank can and Jan being populated correctly. If I go back to the notebook and let's go back to our Kafka connect and send another order for Giuseppe order in Pizza y this is sent to Kafka. Let's go back to the database and let's try to refresh and also the order for Giuseppe ordering pizza y is there. So Kafka connect managed to create the table, populate the table, and keeps populating the table as soon as a new row arrives in the Kafka topic. So going back to last few slides, I believe in less than an hour we saw a lot of things. We saw how to produce and consume data, how to use partitions, how to define multiple applications via multiple consumer groups and what Kafka Connect is and how to make it work. Now if you have more questions or if you have any questions, I will give you some extra resources. First of all, my twitter handle. You can find me at ftziot. My messages are open so if you have any question regarding any of the content that I've been talking to you so far, just shoes out. Second thing, if you want a copy of the notebooks that I've been showing you, you can find them as an open source GitHub repository. If you want to try Kafka but you don't have a nice streaming data source that you can use to push data to Kafka. Well, I created a fake pizza producer which creates even more complex fake pizza orders than the one that I've been showing you. The last one is if you want to try kafka but you don't have Kafka. Well check out Ivan IO because we offer that as managed service. So you can just start your instance. We will take care of it and you will have only to take care about your development. I hope this session was useful for you to understand what Kafka is and how you can start using it with Python. If you have any questions, I will be always there available for you to help. Just reach out on LinkedIn or Twitter. Thank you very much and Ciao from Francesco.

Slides

Download slides (PDF)

See all 29 talks at this event!

Conf42 Python 2021 - Online

May 27 2021

Event-driven applications: Apache Kafka and Python

Video size:

Abstract

Summary

Transcript

Slides

Francesco Tisiot

Developer Advocate @ Aiven.io

Join the community!

Featured event

2025

2024

Info

Conf42 Python 2021 - Online

May 27 2021

Event-driven applications: Apache Kafka and Python

Video size:

Abstract

Summary

Transcript

Slides

Francesco Tisiot

Developer Advocate @ Aiven.io

Join the community!