Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello. My name is David Kjerrumgaard and I'm a developer advocate at Stream
Native. I'm also a committer on the Apache Pulsar project and have over
a decade of experience with event streaming and event driven architecture.
I'm also the author of two books, including Pulsar in action by Manning Press,
which is available for free download at the link shown here.
If you like my talk and want to learn more about Apache Pulsar, I encourage
you to download a copy.
Let's start with a quick outline of the topics I'm going to cover during this
talk. I will start with a quick introduction to Apache Pulsar,
covering what it is and how it is different from other messaging and event
streaming systems you may have encountered in the past.
Next, I will explain why go is well suited for developing cloud native
applications that interact with Apache Pulsar.
After discussing the why of Go and Pulsar, I will demonstrate
how to develop cloud native pulsar applications in Go.
And finally, after we've developed and tested the application locally,
I will walk you through the deployment process.
In this section, we will explore Apache Pulsar to give you a better understanding
of how it can be used inside your applications.
Developed at Yahoo in 2012, Apache Pulsar is a cloud
native messaging and event streaming platform.
First of all, it was architected to take full advantage of the capabilities
unique to cloud environments, including elastic scalability and high availability
and redundant components. Secondly, it is the only
platform that supports both traditional messaging semantics, like an ActivemQ
or RabbitMQ, as well as event streaming semantics, like an Apache
Kafka, Apache Pulsar
provides a published and subscribed model for messaging that allows producers and consumers
to exchange messages asynchronously. Producers and
consumers are completely decoupled from one another and only interact with
the Pulsar message broker, which acts as an intermediary.
Producers and consumers exchange messages via topics that
are used to store the messages until they are consumed.
Apache Pulsar is the only messaging system with a two tiered architecture
in which the component that serves the messages is separate from the component that stores
the messages. Pulsar's serving layer consists of one or
more brokers which are entirely stateless. This means that no data
is stored on the brokers themselves. Instead, data is stored in
a separate layer based on another Apache project called bookkeeper.
This design has several advantages, including the ability for any broker to
serve requests for any topic in the system at any time.
This allows for automatic rebalancing of load across the broker
layer, among many other things. The stateless nature of
the brokers also allows you to dynamically increase or decrease the number of
brokers based on your current workload.
Similarly, the bookkeeper layer consists of one or more nodes known as
bookies. Storage can easily be expanded simply by adding new nodes to
the cluster. Tying these two layers together is
a metadata storage layer which keeps track of where the data is stored for each
topic inside bookkeeper.
Pulsar's unique architecture enables it to provide several capabilities that
distinguishes it from other messaging systems. By separating the
storage of message from the serving of messages means you can offload the storage
to other network accessible storage such as s three, which is
not only cost effective, but also allows you to retain data
for longer periods of time. Pulsar's two tiered design
also enables elastic scalability on both tiers, so you can
exploit kubernetes features such as auto scaling to take full
advantage of those capabilities. Last but not
least, it was also developed to support georeplication and multitenancy
from the start and provide stronger data durability
guarantees by flushing all messaging to disk before
acknowledging them. Next, let's take a look at how we
logically structure data within a pulsar cluster. As I mentioned earlier,
pulsar supports georeplication. In order for pulsar clusters to
georeplicate with one another, they must both belong to the same logical pulsar
instance. Within each pulsar cluster, the first level
of hierarchy are tenants as shown here in the green boxes.
Each of these represents a different organizational unit and has an administrator
separate unto itself that can control who can and cannot access data.
Within these tenants, underneath the tenant exists
namespaces as shown in the green here, these are logical groupings of topics
that have similar policies or data storage requirements,
data access requirements, things of that nature. It allows you to easily administrator
a pulsar cluster by having the concept of namespaces.
Within the namespaces themselves are multiple topics as shown here.
Topics, as we mentioned earlier, are the lowest messaging channel between the
producers and consumers that allow you to store messages and
send them back and forth between producers and consumers.
Now let's take a look at the messaging semantics within Apache
Pulsar. As you can see on the left,
Apache Pulsar supports multiple different protocols and
is the first messaging system to do so. This means that
you can have an MQTT client, a RabidMQ client,
a pulsar client, and a Kafka client, all publishing or
consuming from the same Apache pulsar topic. We achieve
this through what's known as pluggable protocol handlers. This makes it a
very flexible messaging system.
On the right you can see the four subscription types supported
by Apache Pulsar. These all support either
streaming or messaging delivery semantics, which makes it very
versatile. As you can see at the top there's key shared
failover and exclusive. All that provide the streaming semantics that
you're accustomed with when coming from an Apache Kafka system. This allows
you to read data in order and process it in order as it was
produced to the system. At the bottom you can see that
there is the shared subscription type, which supports a more traditional
messaging consumption pattern, such as a work queue in which
all the work is handed out across a subset of the messages to each
consumer to get higher throughput across that particular
topic. Now that we've reviewed Apache Pulsar,
let's explore why Go is a good fit for cloud native pulsar application
development. It is worth
noting that Go itself has some advantages when it comes to cloud native
development, including its efficiency and scalability, and it
is already a popular choice for cloud native development due to its support
for restful APIs. Along with several third party libraries
and development frameworks,
Apache Pulsar provides a Golang client library that provides a
simple and intuitive API for interacting with the Pulsar cluster.
In addition, Go is one of the three supported languages for pulsar functions,
which is a lightweight, serverless framework similar to AWS lambdas.
Pulsar functions provide an easy way for you to develop stream processing
applications with just a few lines of code and a configuration file.
Even though I won't be covering pulsar functions in this talk, there's something
you may want to explore in the future. As I mentioned previously,
Apache Pulsar provides an officially supported Go client library that
you can use to create producers, consumers, and readers.
You can install the Pulsar library using the Go get command as shown here.
API documents are also available on the Godoc page listed
here.
We will begin the development process by making a directory called let's Go Pulsar.
Next, we will change into that directory and create a subdirectory
for the consumer and
another one for the producer.
I will then change into the producer directory and
initialize it as a Go module using
the Go mod init command and giving it the name Pulsar Go
producer.
Next, we will go get the Pulsar client library using the Go
get command we showed on the previous slide.
This will download all the necessary binaries we need to use for the pulsar
client. Next, we'll repeat the process for the consumer
application. So we'll change into the consumer directory,
initialize that module using Gomod init, and naming the
module pulsar Go consumer.
Finally, we'll use the go get command from our history to complete the process.
Now let's take a closer look at the code that we're writing.
First, we'll start with the producer code, which is in the producer folder and
starts with the main function called producer.
The first steps we do when connecting to a pulsar cluster is to get the
configuration. Now we use these configuration in order to
create what's called a pulsar client. This client
in turn can be used to create a producer, as we see here. We'll walk
through the process of getting the client config and how we've made it dynamically configurable
using a util class which exists in the util module in
order to get this information. So the pulsar util
module here comes with a function called
get clients which takes in a set of client options. These client options then
in turn are passed to the constructor for the pulsar client. To create a
new client object,
we have pre configured a method to return the client
configuration that is passed into the get clients method and
made it configurable by the use of a constant called pulsar
broker URL, which tells us the name the endpoint of the broker we're going to
connect to, along with some various configuration options that we've hard coded.
Similarly, there's a function called get producer config which we
will use to get the producer configuration based on a
constant called producer topic. These constants are
defined in a class called constants go and are just string
variables that map to property values that we're going to pass in in a properties
file. So, for example, the pulsar broker URL is mapped to
the property called client service URL.
This property in turn is defined in our producer properties file, as you can
see here. So the clients service URL points to a pulsar
cluster running in my local Kubernetes cluster. And for the producer we have
this topic called public default purchases,
which again goes by tenant namespace and topic
name. As we saw earlier.
Once we've created a client, then we say create producer. Again getting
this configuration values. In this case, we specify the
topic name and that's it for
our logic. We loop through randomly selecting both
a data value for a key and an item that is
passed through onto the pulsar topic.
Then we use the send method producer send
to send a message. Pulsar message is shown here.
The properties include a payload which is a raw byte array of
the data, which in this case is one of the items that has been purchased,
a book, alarm clock, et cetera, and a key. We pass in the
key which is a string containing one of the username.
Since this is an infinite loop, we continuously publish
messages and we sleep for 5 seconds in between publication
of messages just so we can see the messages come in at a slower
pace. Now let's turn our attention to the consumer application.
Just like the producer application, first thing you can see is that
we import the pulsar go client. The first step
we take is to get the pulsar configuration and use that to create
a pulsar client. This client in
turn can be used to subscribe to a topic shown here.
To start listening to messages coming in inbound messages on
that topic, we then create a message channel
that continuously listens on this channel, and as each
new message comes in, we print out a descriptive
message of the name the consumer that's received it,
a unique message id, and display both the key and
the raw payload for the message content itself.
Finally, when we're done printing out that content, we must acknowledge
the message to let the broker know that we've received and successfully processed
the message so that it won't get redelived again in the
event of an unsuccessful processing, we can also use what's called a negative acknowledgment
here to force the broker to redeliver the message again.
Consumer method also has a utility method shown here, similar to
what we saw in the producer. So I won't spend much time on it other
than to point out that we've added a separate function here called get
consumer configuration, which includes two new constants,
one for the consumer topic, the name of the topic with which we want to
consume from, and a unique subscription name.
These constants are defined here and map, as they did previously in the producer
section, to message two properties in
the configuration we're going to pass in and sort of make this properties
dynamic. The consumer topic is
again public default purchases. So again we're listening to the same topic that we're
producing from, and we've created a unique subscription name.
That way, if the consumer ever gets disconnected and reconnects using the
same subscription name, it will pick up immediately where it left off without
losing any messages.
Now that we've reviewed the code, let's build and test the code locally.
We'll start by changing into producer directory and using the Go
build command to build the pulsar
application in a different window.
We'll change into the consumer directory and use Go build to
build the consumer application.
Once these binaries have been created, next test is
to run them locally. So we'll run the command here using producer there
and we can see it connects and starts generating messages. It generates a first message
and another message. It's publishing of batteries and then gift card are produced.
Now we'll go to the other window and start the consumer running it locally.
We can see that it read previous messages that have been published to the topic,
but it most recently read the book and batteries
messages that were sent, and then as soon as the producer had sent the alarm
clock message shown at the top, it showed up in the bottom.
Similarly, the gift card message shows down here at the bottom as well.
So we can see that the applications are in sync. They're sharing data across the
same topic, which is greater
in order to deploy a cloud native application, we must first containerize
it. These containers not only bundle the application along with all
its dependencies into a single deployable unit, they also provide
an isolated environment for the application to run in inside your cloud native environment.
The most common technology used for containerization is docker.
However, this usually requires you to create and maintain a separate
docker file inside your codebase solely for the purpose of creating the
container itself. In this talk, I'm going to use a different
technology known as buildpacks that simplify the container building
process by eliminating the need for a Docker file entirely.
Instead, buildpacks automatically detect the language and framework
your application is using and will automatically containerize it for you with
a single command. Before we start
using the tool, I wanted to show you the website where you can get more
information on this buildpacks tool itself. If you go to buildpacks IO,
you'll see a lot of information, including getting starting videos,
a detailed section on why you want to use cloud native buildpack specifically,
and a little historical reference on the project itself, which was designed in 2011
to solve a very critical problem.
Now, in order to perform the next steps I'm going to do you have to
first have the buildpack software installed, which you can access through
the start tutorial link,
and it starts with an assumption that you have docker installed. If you don't have
Docker installed already, then please do so. Next,
you can choose to install the pack library, which is the tool used to
build packs. We'll use in the next steps. And as
you can see, they have multiple different installation methods
for your particular OS distribution. Whether you're using Linux,
macOS, or windows. There's a process for you.
Since I'm using macOS and I have homebrew installed, I chose that
path. But again, please refer to the documentation for your operating
system for details on how to get it installed.
So let's switch back to our console and put these build packs to work.
We'll start by moving up a directory and using the build pack tool
to go ahead and build the pulsar application.
The command basically takes in the pack. You're calling the build command first
and giving it a name of the docker container you want it to build.
In this case it's Go producer. You also have to specify
the name of the builder you want to use. In this case it's the
standard build version one and it is tagged as a docker image. As we
shall see, it downloads this information to first build the container
itself class. We specify the path of the application we want to
build and it begins downloading the build pack
application. Similarly, we'll switch
down to the consumer shell and run the same command to
build the client application. In this case, we're specifying the name
of the docker container to be go consumer. Everything else
remains the same, including the builder. As you can see
in the top, progress is being made on downloading the buildpack's
builder version docker image itself,
which is the logical component that does all the building itself.
Again, we specify the path for the consumer and it begins downloading
the application as well and building it.
This process will go on for a bit and depending on your download speed,
will take some time to download all this information and get these binary
packs down and ready to build.
Can see now we're finally finishing the download of the application of the build
pack images themselves and
there we can see updated that the newer images have been downloaded.
Once the build packs have been downloaded, you can see that it analyzes and
detects the type of package that we've tried to build. In this case
it sees that it's go and needs to download a go runtime,
a Go build path, et cetera in order to build this application.
So it goes out and fetches these using curl and
internally runs the Go build command along with some caching
information to start building the go binaries inside the
cache itself. This process
will go on for a little bit until it's successfully built the go application
and when it is done building the go binary.
It will start building a docker image or containerizing
around this particular application as we've seen here.
So let's wait a little bit for this to finish up there.
It's finally done. You can see it's finished all the build and now it's
starting to use some of this application and it's showing
up here at the top. It's building the go producer image shown at the top
and it's trying to build a go consumer image here at the bottom.
So part of the catching layer it already had is built on the go runtime
image, which it has. And so it's using that and also a Go
mod go path library image as well. And then finally
building the image. Go producer and go consumer are now built.
Now we can use these docker images. We can test them locally by running docker
to run them locally. Slight typo here.
Let's retry it again with Docker and we can run this producer now locally.
So now we're running the exact same application we built locally but inside as a
docker container and confirm that it works.
So again we see the output that has created a producer. We've connected to
the same pulsar broker and are sending
additional messages and we're picking up where we left off. By the messaging id
you can see we're picking up where we left off.
Let's also test the docker consumer image as
well. And you can see we're in sync again. We're consuming the most recent messages
we've gotten. Everything's up to date. We can see the information
coming through and that they're in sync. So this is a great indication that our
docker images have been built successfully using the build packs
library.
We'll let these run for a little bit longer just to confirm everything's up
and running. Now let's kill them and move on.
Once the build pack process has completed, you'll have two docker
images on your local machine. In order to
use these images outside of your desktop environment, you'll need to
push them to a container repository. We will walk through that process.
Next, let's return to our development
shells. We'll go back to the shell where we created the Goproducer
Docker image and now add a tag to it.
We'll prepend it with my Docker account id shown
here and give it the same name go producer.
Once I've tagged it, the next step is to push it up to the Docker
hub repository, which is why I've prepended my account name on there.
Once that is done, we will repeat the process for the go consumer Docker image
that we've created. Down below we can see that the producer
image was successfully published to Docker Hub and
so we're going to tag the consumer with a similar fashion,
adding my Docker hub account id as a prefix,
giving it the image name go consumer, and then pushing it up
to the Docker hub repository where we'll be able to access this image
from our Kubernetes environment.
In order to deploy our cloud native application we will create a
deployment manifest that specifies the container images to use the
resources our application will require, et cetera.
Now, since our application is configurable by properties,
we will also use config maps and persistent volume class as
part of the application deployment.
Let's take a look at this deployment manifest in detail.
You can see that it is located in a separate folder called deployment inside the
project itself and the file is called KH deployment.
First we can see that it specifies an application type
as the template and uses the pulsar let's
go metadata tag so
we are able to identify this resource quickly.
We'll notice that we're going to deploy the producer and consumer together. So there will
be a total of two containers inside the single pod.
You can see that we'll be using the images that we just tagged and pushed
previously and we'll also be using a separate mount
path to get the properties that are dynamically configurable. As you recall
when we looked at the code, all of these properties match to labels and this
allows us to dynamically change things like the broker URL
for the pulsar cluster, the name of the topic we publish to et cetera.
We will access this information through a config mat on a mount point.
Here we specify some resources constraints.
Since our application is very light, we won't need very much and we
always want to pull the images which will not be available.
We do a similar configuration for our consumer application.
Again, we're going to use the image that we tagged and pushed previously and
we're going to have a mount path for resources consumers properties.
As you may recall, when we're going through the code for the consumer,
for example, I showed the resource manager code. We'll go through this again.
Let's show here. It's going to assume there's consumer properties
in the utils. The resource manager is reading this property right,
resources consumers properties, and we're making sure to map to
that. This way it will automatically pick up these values and use whatever
we want to change. Last but not
least, we specify two different volume mounts, one for the consumer configuration,
which will have a map to the consumer config map which we'll create in a
minute, and a producer configuration will have a producer configuration
with a key being the file name itself of consumer properties for the consumer config
map and producer properties for the producer properties
config map as I mentioned,
our deployment will use Kubernetes objects known as config maps to store
the application properties. This allows us to dynamically change the configuration
of our application without modifying the code.
We will mount the config maps as properties files in a known location so
our application will be able to read them in the format it expects.
Now let's walk through the process of creating the config maps from a property
file. So let's return to our shell environment
and use Kubecontrol to create the config maps. Be sure
that your kubeconfig is pointing to the proper Kubernetes environment.
We use the command create config map and give it the name of
the config map we want to create and specify the file
where all the properties exist for our producer. We'll get an indication
that the config map was properly created.
Let's switch to the consumer environment and run the same
command. This time we'll create the config map name consumer config
map and point it to the consumer directory where
all the properties files exist.
Once they have both been successfully created, we can then use cube
control to look at the config map to
guarantee that these configuration maps have been existed
have been created in the environment that we want.
So there we can see that they're listed, the consumer config map and the producer
config map, which matches our description in our deployment manifest
file. Let's take a look at one of these config maps to get a better
understanding of what their contents are.
So here we can see that the consumer properties
is mapped as we expected, along with the pulsar client
URL. We can see that the consumer properties came
over, including the consumer topic and subscription name
along with the client service URL to connect
to the pulsar cluster. All this information will be accessible from the key
called consumer properties at runtime when our consumer
connects. Now that we've created the config maps,
the final step is to deploy the application itself.
We will do this by using the Kubecontrol apply command and specifying the
deployment manifest we looked at earlier. You can see that
we've got an indication that the deployment was successfully created.
Let's start looking at what has happened underneath the covers. First, we'll list the
pods to see that they're just being created. There will be two. One for the
producer and one for the consumer. Well, we can verify
that the deployment is listed by doing a cube control git.
Deployments. Next, let's describe the deployment
in general by specifying the Kubecontrol
describe command, along with specifying the full name of the deployment
itself. In this case, it's pulsar. Let's go deployment.
This returns a lot of information that we can look at,
including details on the number of replicas,
the labels we specified,
the images that are going to be used for the producer and the consumer,
the resource limits that we've requested,
the mount points, et cetera. The image for the consumer,
all the config maps as expected. The consumer config map,
the producer config map, et cetera.
Let's go back and look at the pods again. We can see that now that
they're both up and running. So now let's explore what's going on
inside these pods. Let's first describe what's
going on inside the pod
themselves. And it should list the two different containers. So let's go ahead
and grab this information, copy and paste this pod name
which is dynamically assigned. Let's paste it in there and
we can see if you ever have issues deploying the application.
As you know, before you can look at this event logs there to indicate what's
going on. We can see successfully deployed. The config maps
are mounted, the images are being used. The containers
were started. So both the producer and the consumer have been started and created.
So that's great. Now let's look at some logs in there. Let's verify again
through our command line that information is being displayed.
So we'll also get the logs, but we'll specify first the
container for the let's Go pulsar producer to confirm
that messages are being generated. And we can see here we
can watch again the data is picking up where it left off,
producing some additional information. Every 5 seconds a new message is
being published. Now let's change gears
and look at the consumer pod or the container
within the consumer. It's the consumer container itself. And we can see
that it's receiving messages as well. This is a good indication that the application
has been deployed and picked up all the configuration properties as we expected.
So let's summarize a few points that we've covered during this talk.
First, Apache Pulsar is a cloud native messaging and event streaming
platform that's designed for cloud native environments,
and go is a good fit for developing cloud native applications that use Pulsar.
Due to Apache Pulsar's go client library.
I also showed you build packs and are a great tool for containerizing
your go applications without the need to maintain a separate docker file,
and walked you through the process of packaging and deploying a cloud native go
application that interacts with Apache Pulsar. If you want
to learn more, all the code available for this demonstration is available at the GitHub
repo shown here.