Conf42 Golang 2023 - Online

Let's Go Build Cloud Native Pulsar Apps with Go

Video size:

Abstract

If you are building Cloud Native Apps with Go, you need a Cloud Native Streaming and Messaging platform to supercharge your apps. Go is a first class citizen in the Apache Pulsar world. From clients to functions, Go with Pulsar.

Summary

  • David Kjerrumgaard is a developer advocate at Stream Native. He is also a committer on the Apache Pulsar project. He will discuss how to develop cloud native pulsar applications in Go. After we've developed and tested the application locally, he will walk you through the deployment process.
  • Apache Pulsar is a cloud native messaging and event streaming platform. It takes full advantage of the capabilities unique to cloud environments. Data is stored in a separate layer based on another Apache project called bookkeeper. Two tiered architecture enables elastic scalability on both tiers.
  • Apache Pulsar supports multiple protocols and is the first messaging system to do so. It allows you to easily administrator a pulsar cluster by having the concept of namespaces. Four subscription types support either streaming or messaging delivery semantics.
  • Go is one of the three supported languages for pulsar functions. Pulsar functions provide an easy way for you to develop stream processing applications with just a few lines of code and a configuration file. Let's explore why Go is a good fit for cloud native pulsar application development.
  • Now let's turn our attention to the consumer application. Just like the producer application, we import the pulsar go client. The consumer topic is again public default purchases. We've added a separate function called get consumer configuration. This includes two new constants, one for the consumer topic, the name of the topic and a unique subscription name.
  • To deploy a cloud native application, we must first containerize it. The most common technology used for containerization is docker. In this talk, I'm going to use a different technology known as buildpacks. This technology eliminates the need for a Docker file entirely.
  • Go producer and go consumer are now built. Now we can use these docker images. We can test them locally by running docker to run them locally. This is a great indication that our docker images have been built successfully using the build packs library.
  • Now that we've created the config maps, the final step is to deploy the application itself. We will do this by using the Kubecontrol apply command and specifying the deployment manifest. Let's start looking at what has happened underneath the covers.
  • Apache Pulsar is a cloud native messaging and event streaming platform that's designed for cloud native environments. Go is a good fit for developing cloud native applications. All the code available for this demonstration is available at the GitHub repo shown here.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello. My name is David Kjerrumgaard and I'm a developer advocate at Stream Native. I'm also a committer on the Apache Pulsar project and have over a decade of experience with event streaming and event driven architecture. I'm also the author of two books, including Pulsar in action by Manning Press, which is available for free download at the link shown here. If you like my talk and want to learn more about Apache Pulsar, I encourage you to download a copy. Let's start with a quick outline of the topics I'm going to cover during this talk. I will start with a quick introduction to Apache Pulsar, covering what it is and how it is different from other messaging and event streaming systems you may have encountered in the past. Next, I will explain why go is well suited for developing cloud native applications that interact with Apache Pulsar. After discussing the why of Go and Pulsar, I will demonstrate how to develop cloud native pulsar applications in Go. And finally, after we've developed and tested the application locally, I will walk you through the deployment process. In this section, we will explore Apache Pulsar to give you a better understanding of how it can be used inside your applications. Developed at Yahoo in 2012, Apache Pulsar is a cloud native messaging and event streaming platform. First of all, it was architected to take full advantage of the capabilities unique to cloud environments, including elastic scalability and high availability and redundant components. Secondly, it is the only platform that supports both traditional messaging semantics, like an ActivemQ or RabbitMQ, as well as event streaming semantics, like an Apache Kafka, Apache Pulsar provides a published and subscribed model for messaging that allows producers and consumers to exchange messages asynchronously. Producers and consumers are completely decoupled from one another and only interact with the Pulsar message broker, which acts as an intermediary. Producers and consumers exchange messages via topics that are used to store the messages until they are consumed. Apache Pulsar is the only messaging system with a two tiered architecture in which the component that serves the messages is separate from the component that stores the messages. Pulsar's serving layer consists of one or more brokers which are entirely stateless. This means that no data is stored on the brokers themselves. Instead, data is stored in a separate layer based on another Apache project called bookkeeper. This design has several advantages, including the ability for any broker to serve requests for any topic in the system at any time. This allows for automatic rebalancing of load across the broker layer, among many other things. The stateless nature of the brokers also allows you to dynamically increase or decrease the number of brokers based on your current workload. Similarly, the bookkeeper layer consists of one or more nodes known as bookies. Storage can easily be expanded simply by adding new nodes to the cluster. Tying these two layers together is a metadata storage layer which keeps track of where the data is stored for each topic inside bookkeeper. Pulsar's unique architecture enables it to provide several capabilities that distinguishes it from other messaging systems. By separating the storage of message from the serving of messages means you can offload the storage to other network accessible storage such as s three, which is not only cost effective, but also allows you to retain data for longer periods of time. Pulsar's two tiered design also enables elastic scalability on both tiers, so you can exploit kubernetes features such as auto scaling to take full advantage of those capabilities. Last but not least, it was also developed to support georeplication and multitenancy from the start and provide stronger data durability guarantees by flushing all messaging to disk before acknowledging them. Next, let's take a look at how we logically structure data within a pulsar cluster. As I mentioned earlier, pulsar supports georeplication. In order for pulsar clusters to georeplicate with one another, they must both belong to the same logical pulsar instance. Within each pulsar cluster, the first level of hierarchy are tenants as shown here in the green boxes. Each of these represents a different organizational unit and has an administrator separate unto itself that can control who can and cannot access data. Within these tenants, underneath the tenant exists namespaces as shown in the green here, these are logical groupings of topics that have similar policies or data storage requirements, data access requirements, things of that nature. It allows you to easily administrator a pulsar cluster by having the concept of namespaces. Within the namespaces themselves are multiple topics as shown here. Topics, as we mentioned earlier, are the lowest messaging channel between the producers and consumers that allow you to store messages and send them back and forth between producers and consumers. Now let's take a look at the messaging semantics within Apache Pulsar. As you can see on the left, Apache Pulsar supports multiple different protocols and is the first messaging system to do so. This means that you can have an MQTT client, a RabidMQ client, a pulsar client, and a Kafka client, all publishing or consuming from the same Apache pulsar topic. We achieve this through what's known as pluggable protocol handlers. This makes it a very flexible messaging system. On the right you can see the four subscription types supported by Apache Pulsar. These all support either streaming or messaging delivery semantics, which makes it very versatile. As you can see at the top there's key shared failover and exclusive. All that provide the streaming semantics that you're accustomed with when coming from an Apache Kafka system. This allows you to read data in order and process it in order as it was produced to the system. At the bottom you can see that there is the shared subscription type, which supports a more traditional messaging consumption pattern, such as a work queue in which all the work is handed out across a subset of the messages to each consumer to get higher throughput across that particular topic. Now that we've reviewed Apache Pulsar, let's explore why Go is a good fit for cloud native pulsar application development. It is worth noting that Go itself has some advantages when it comes to cloud native development, including its efficiency and scalability, and it is already a popular choice for cloud native development due to its support for restful APIs. Along with several third party libraries and development frameworks, Apache Pulsar provides a Golang client library that provides a simple and intuitive API for interacting with the Pulsar cluster. In addition, Go is one of the three supported languages for pulsar functions, which is a lightweight, serverless framework similar to AWS lambdas. Pulsar functions provide an easy way for you to develop stream processing applications with just a few lines of code and a configuration file. Even though I won't be covering pulsar functions in this talk, there's something you may want to explore in the future. As I mentioned previously, Apache Pulsar provides an officially supported Go client library that you can use to create producers, consumers, and readers. You can install the Pulsar library using the Go get command as shown here. API documents are also available on the Godoc page listed here. We will begin the development process by making a directory called let's Go Pulsar. Next, we will change into that directory and create a subdirectory for the consumer and another one for the producer. I will then change into the producer directory and initialize it as a Go module using the Go mod init command and giving it the name Pulsar Go producer. Next, we will go get the Pulsar client library using the Go get command we showed on the previous slide. This will download all the necessary binaries we need to use for the pulsar client. Next, we'll repeat the process for the consumer application. So we'll change into the consumer directory, initialize that module using Gomod init, and naming the module pulsar Go consumer. Finally, we'll use the go get command from our history to complete the process. Now let's take a closer look at the code that we're writing. First, we'll start with the producer code, which is in the producer folder and starts with the main function called producer. The first steps we do when connecting to a pulsar cluster is to get the configuration. Now we use these configuration in order to create what's called a pulsar client. This client in turn can be used to create a producer, as we see here. We'll walk through the process of getting the client config and how we've made it dynamically configurable using a util class which exists in the util module in order to get this information. So the pulsar util module here comes with a function called get clients which takes in a set of client options. These client options then in turn are passed to the constructor for the pulsar client. To create a new client object, we have pre configured a method to return the client configuration that is passed into the get clients method and made it configurable by the use of a constant called pulsar broker URL, which tells us the name the endpoint of the broker we're going to connect to, along with some various configuration options that we've hard coded. Similarly, there's a function called get producer config which we will use to get the producer configuration based on a constant called producer topic. These constants are defined in a class called constants go and are just string variables that map to property values that we're going to pass in in a properties file. So, for example, the pulsar broker URL is mapped to the property called client service URL. This property in turn is defined in our producer properties file, as you can see here. So the clients service URL points to a pulsar cluster running in my local Kubernetes cluster. And for the producer we have this topic called public default purchases, which again goes by tenant namespace and topic name. As we saw earlier. Once we've created a client, then we say create producer. Again getting this configuration values. In this case, we specify the topic name and that's it for our logic. We loop through randomly selecting both a data value for a key and an item that is passed through onto the pulsar topic. Then we use the send method producer send to send a message. Pulsar message is shown here. The properties include a payload which is a raw byte array of the data, which in this case is one of the items that has been purchased, a book, alarm clock, et cetera, and a key. We pass in the key which is a string containing one of the username. Since this is an infinite loop, we continuously publish messages and we sleep for 5 seconds in between publication of messages just so we can see the messages come in at a slower pace. Now let's turn our attention to the consumer application. Just like the producer application, first thing you can see is that we import the pulsar go client. The first step we take is to get the pulsar configuration and use that to create a pulsar client. This client in turn can be used to subscribe to a topic shown here. To start listening to messages coming in inbound messages on that topic, we then create a message channel that continuously listens on this channel, and as each new message comes in, we print out a descriptive message of the name the consumer that's received it, a unique message id, and display both the key and the raw payload for the message content itself. Finally, when we're done printing out that content, we must acknowledge the message to let the broker know that we've received and successfully processed the message so that it won't get redelived again in the event of an unsuccessful processing, we can also use what's called a negative acknowledgment here to force the broker to redeliver the message again. Consumer method also has a utility method shown here, similar to what we saw in the producer. So I won't spend much time on it other than to point out that we've added a separate function here called get consumer configuration, which includes two new constants, one for the consumer topic, the name of the topic with which we want to consume from, and a unique subscription name. These constants are defined here and map, as they did previously in the producer section, to message two properties in the configuration we're going to pass in and sort of make this properties dynamic. The consumer topic is again public default purchases. So again we're listening to the same topic that we're producing from, and we've created a unique subscription name. That way, if the consumer ever gets disconnected and reconnects using the same subscription name, it will pick up immediately where it left off without losing any messages. Now that we've reviewed the code, let's build and test the code locally. We'll start by changing into producer directory and using the Go build command to build the pulsar application in a different window. We'll change into the consumer directory and use Go build to build the consumer application. Once these binaries have been created, next test is to run them locally. So we'll run the command here using producer there and we can see it connects and starts generating messages. It generates a first message and another message. It's publishing of batteries and then gift card are produced. Now we'll go to the other window and start the consumer running it locally. We can see that it read previous messages that have been published to the topic, but it most recently read the book and batteries messages that were sent, and then as soon as the producer had sent the alarm clock message shown at the top, it showed up in the bottom. Similarly, the gift card message shows down here at the bottom as well. So we can see that the applications are in sync. They're sharing data across the same topic, which is greater in order to deploy a cloud native application, we must first containerize it. These containers not only bundle the application along with all its dependencies into a single deployable unit, they also provide an isolated environment for the application to run in inside your cloud native environment. The most common technology used for containerization is docker. However, this usually requires you to create and maintain a separate docker file inside your codebase solely for the purpose of creating the container itself. In this talk, I'm going to use a different technology known as buildpacks that simplify the container building process by eliminating the need for a Docker file entirely. Instead, buildpacks automatically detect the language and framework your application is using and will automatically containerize it for you with a single command. Before we start using the tool, I wanted to show you the website where you can get more information on this buildpacks tool itself. If you go to buildpacks IO, you'll see a lot of information, including getting starting videos, a detailed section on why you want to use cloud native buildpack specifically, and a little historical reference on the project itself, which was designed in 2011 to solve a very critical problem. Now, in order to perform the next steps I'm going to do you have to first have the buildpack software installed, which you can access through the start tutorial link, and it starts with an assumption that you have docker installed. If you don't have Docker installed already, then please do so. Next, you can choose to install the pack library, which is the tool used to build packs. We'll use in the next steps. And as you can see, they have multiple different installation methods for your particular OS distribution. Whether you're using Linux, macOS, or windows. There's a process for you. Since I'm using macOS and I have homebrew installed, I chose that path. But again, please refer to the documentation for your operating system for details on how to get it installed. So let's switch back to our console and put these build packs to work. We'll start by moving up a directory and using the build pack tool to go ahead and build the pulsar application. The command basically takes in the pack. You're calling the build command first and giving it a name of the docker container you want it to build. In this case it's Go producer. You also have to specify the name of the builder you want to use. In this case it's the standard build version one and it is tagged as a docker image. As we shall see, it downloads this information to first build the container itself class. We specify the path of the application we want to build and it begins downloading the build pack application. Similarly, we'll switch down to the consumer shell and run the same command to build the client application. In this case, we're specifying the name of the docker container to be go consumer. Everything else remains the same, including the builder. As you can see in the top, progress is being made on downloading the buildpack's builder version docker image itself, which is the logical component that does all the building itself. Again, we specify the path for the consumer and it begins downloading the application as well and building it. This process will go on for a bit and depending on your download speed, will take some time to download all this information and get these binary packs down and ready to build. Can see now we're finally finishing the download of the application of the build pack images themselves and there we can see updated that the newer images have been downloaded. Once the build packs have been downloaded, you can see that it analyzes and detects the type of package that we've tried to build. In this case it sees that it's go and needs to download a go runtime, a Go build path, et cetera in order to build this application. So it goes out and fetches these using curl and internally runs the Go build command along with some caching information to start building the go binaries inside the cache itself. This process will go on for a little bit until it's successfully built the go application and when it is done building the go binary. It will start building a docker image or containerizing around this particular application as we've seen here. So let's wait a little bit for this to finish up there. It's finally done. You can see it's finished all the build and now it's starting to use some of this application and it's showing up here at the top. It's building the go producer image shown at the top and it's trying to build a go consumer image here at the bottom. So part of the catching layer it already had is built on the go runtime image, which it has. And so it's using that and also a Go mod go path library image as well. And then finally building the image. Go producer and go consumer are now built. Now we can use these docker images. We can test them locally by running docker to run them locally. Slight typo here. Let's retry it again with Docker and we can run this producer now locally. So now we're running the exact same application we built locally but inside as a docker container and confirm that it works. So again we see the output that has created a producer. We've connected to the same pulsar broker and are sending additional messages and we're picking up where we left off. By the messaging id you can see we're picking up where we left off. Let's also test the docker consumer image as well. And you can see we're in sync again. We're consuming the most recent messages we've gotten. Everything's up to date. We can see the information coming through and that they're in sync. So this is a great indication that our docker images have been built successfully using the build packs library. We'll let these run for a little bit longer just to confirm everything's up and running. Now let's kill them and move on. Once the build pack process has completed, you'll have two docker images on your local machine. In order to use these images outside of your desktop environment, you'll need to push them to a container repository. We will walk through that process. Next, let's return to our development shells. We'll go back to the shell where we created the Goproducer Docker image and now add a tag to it. We'll prepend it with my Docker account id shown here and give it the same name go producer. Once I've tagged it, the next step is to push it up to the Docker hub repository, which is why I've prepended my account name on there. Once that is done, we will repeat the process for the go consumer Docker image that we've created. Down below we can see that the producer image was successfully published to Docker Hub and so we're going to tag the consumer with a similar fashion, adding my Docker hub account id as a prefix, giving it the image name go consumer, and then pushing it up to the Docker hub repository where we'll be able to access this image from our Kubernetes environment. In order to deploy our cloud native application we will create a deployment manifest that specifies the container images to use the resources our application will require, et cetera. Now, since our application is configurable by properties, we will also use config maps and persistent volume class as part of the application deployment. Let's take a look at this deployment manifest in detail. You can see that it is located in a separate folder called deployment inside the project itself and the file is called KH deployment. First we can see that it specifies an application type as the template and uses the pulsar let's go metadata tag so we are able to identify this resource quickly. We'll notice that we're going to deploy the producer and consumer together. So there will be a total of two containers inside the single pod. You can see that we'll be using the images that we just tagged and pushed previously and we'll also be using a separate mount path to get the properties that are dynamically configurable. As you recall when we looked at the code, all of these properties match to labels and this allows us to dynamically change things like the broker URL for the pulsar cluster, the name of the topic we publish to et cetera. We will access this information through a config mat on a mount point. Here we specify some resources constraints. Since our application is very light, we won't need very much and we always want to pull the images which will not be available. We do a similar configuration for our consumer application. Again, we're going to use the image that we tagged and pushed previously and we're going to have a mount path for resources consumers properties. As you may recall, when we're going through the code for the consumer, for example, I showed the resource manager code. We'll go through this again. Let's show here. It's going to assume there's consumer properties in the utils. The resource manager is reading this property right, resources consumers properties, and we're making sure to map to that. This way it will automatically pick up these values and use whatever we want to change. Last but not least, we specify two different volume mounts, one for the consumer configuration, which will have a map to the consumer config map which we'll create in a minute, and a producer configuration will have a producer configuration with a key being the file name itself of consumer properties for the consumer config map and producer properties for the producer properties config map as I mentioned, our deployment will use Kubernetes objects known as config maps to store the application properties. This allows us to dynamically change the configuration of our application without modifying the code. We will mount the config maps as properties files in a known location so our application will be able to read them in the format it expects. Now let's walk through the process of creating the config maps from a property file. So let's return to our shell environment and use Kubecontrol to create the config maps. Be sure that your kubeconfig is pointing to the proper Kubernetes environment. We use the command create config map and give it the name of the config map we want to create and specify the file where all the properties exist for our producer. We'll get an indication that the config map was properly created. Let's switch to the consumer environment and run the same command. This time we'll create the config map name consumer config map and point it to the consumer directory where all the properties files exist. Once they have both been successfully created, we can then use cube control to look at the config map to guarantee that these configuration maps have been existed have been created in the environment that we want. So there we can see that they're listed, the consumer config map and the producer config map, which matches our description in our deployment manifest file. Let's take a look at one of these config maps to get a better understanding of what their contents are. So here we can see that the consumer properties is mapped as we expected, along with the pulsar client URL. We can see that the consumer properties came over, including the consumer topic and subscription name along with the client service URL to connect to the pulsar cluster. All this information will be accessible from the key called consumer properties at runtime when our consumer connects. Now that we've created the config maps, the final step is to deploy the application itself. We will do this by using the Kubecontrol apply command and specifying the deployment manifest we looked at earlier. You can see that we've got an indication that the deployment was successfully created. Let's start looking at what has happened underneath the covers. First, we'll list the pods to see that they're just being created. There will be two. One for the producer and one for the consumer. Well, we can verify that the deployment is listed by doing a cube control git. Deployments. Next, let's describe the deployment in general by specifying the Kubecontrol describe command, along with specifying the full name of the deployment itself. In this case, it's pulsar. Let's go deployment. This returns a lot of information that we can look at, including details on the number of replicas, the labels we specified, the images that are going to be used for the producer and the consumer, the resource limits that we've requested, the mount points, et cetera. The image for the consumer, all the config maps as expected. The consumer config map, the producer config map, et cetera. Let's go back and look at the pods again. We can see that now that they're both up and running. So now let's explore what's going on inside these pods. Let's first describe what's going on inside the pod themselves. And it should list the two different containers. So let's go ahead and grab this information, copy and paste this pod name which is dynamically assigned. Let's paste it in there and we can see if you ever have issues deploying the application. As you know, before you can look at this event logs there to indicate what's going on. We can see successfully deployed. The config maps are mounted, the images are being used. The containers were started. So both the producer and the consumer have been started and created. So that's great. Now let's look at some logs in there. Let's verify again through our command line that information is being displayed. So we'll also get the logs, but we'll specify first the container for the let's Go pulsar producer to confirm that messages are being generated. And we can see here we can watch again the data is picking up where it left off, producing some additional information. Every 5 seconds a new message is being published. Now let's change gears and look at the consumer pod or the container within the consumer. It's the consumer container itself. And we can see that it's receiving messages as well. This is a good indication that the application has been deployed and picked up all the configuration properties as we expected. So let's summarize a few points that we've covered during this talk. First, Apache Pulsar is a cloud native messaging and event streaming platform that's designed for cloud native environments, and go is a good fit for developing cloud native applications that use Pulsar. Due to Apache Pulsar's go client library. I also showed you build packs and are a great tool for containerizing your go applications without the need to maintain a separate docker file, and walked you through the process of packaging and deploying a cloud native go application that interacts with Apache Pulsar. If you want to learn more, all the code available for this demonstration is available at the GitHub repo shown here.
...

David Kjerrumgaard

Developer Advocate @ StreamNative

David Kjerrumgaard's LinkedIn account David Kjerrumgaard's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways