Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey there, I'm Rizomatu, a backend engineer, and today I'll be
walking us through distributed caching and queuing in the cloud.
Let's get started.
So for folks who are new to all of this, a cache is basically
a high speed storage layer.
The, temporarily who frequently assess data to improve
performance and reduce latency.
So instead of going to a slow source, let's say for instance, a database
file system or a standard API, every time you request data that you know,
it's frequently assessed, you can basically store a copy of that data
on a cache so that future requests can be served and much faster.
This is the, typical setup for most applications, right?
You have a client that makes a request to the server, which then makes a
request to a database to fetch data, which is then sent back to the client.
So this works just fine for most applications, but on high traffic
systems, this doesn't scale well.
If it's not obvious already, the issue with this kind of setup is that Over time,
the database becomes a bottleneck as all clients requests get to hit the database.
This is, where we can leverage a cache to scale this solution, right?
So basically, instead of the server reaching out to the database each time
for data that we know it's, frequently accessed, the server can basically
just reach out to the cache to say, hey, is this record in the cache?
If the record is available in the cache, The server just returns this
back to the client from the cache.
But in the case where the record isn't available in the cache, the server makes
a request to the database to fetch this data, writes back to the cache, then
return to the client so that future requests can be retrieved from the cache.
This way we can improve the, Read performance for clients
request to the server.
So if you're wondering already why, retrieving data from the cache is faster
than retrieving from the database, it's simply because, data on the
cache is usually written to a faster storage, like memory, unlike data on
a database, which is written to disk.
So most times, as the application scales and grows with time, a single caching
stance isn't enough to, manage the load you have or your caching demand.
This is where distributed caching comes in.
Basically, distributed caching spreads cache data across multiple nodes, instead
of storing it on a single node or machine.
This ensures scalability, fault tolerance, and prevents cache overload.
There are different strategies out there for distributed caching.
We have sharding, replication, load balancing, consistent
hashing, and some others out there.
But for now, I'll just focus on the first three on this list because of time.
consistent hashing is a bit complicated and, might take us a lot of time.
So what's sharding?
Sharding is a data replication technique.
That distributes cache data across multiple nodes instead of storing
everything on a single, machine in a distributed caching environment,
sharding ensures scalability and efficient memory usage by dividing on
the cache data across multiple instances.
So let's say, on this, diagram, as you can see, we have a cache router, which
is basically responsible for assigning a cache request to the right nodes.
In this case, we have a range based on sharding setup, which means that we assign
requests to nodes based on the key range.
For keys in range 1 to 1, 000, it's assigned to node A.
1, 000 to 2, 000, it's assigned to node B and so on.
So basically what this does is that the cache data is split across each
of these nodes, saving only a range of these values or a range of this
key value pairs on a single node.
This way we can scale out our caching system to support, more memory
space or storage space, basically.
So let's say we have a constraint of five gigabytes per node.
By adding like two extra nodes, we're able to, increase that limits to 15 gigabytes,
which comes in very handy for scaling.
Up next, we have replication.
Replication is also a data partitioning technique.
Yes, that this duplicates cache data across multiple nodes to ensure
high availability, fault tolerance, and improve read performance.
It allows on cache systems to continue functioning even
if some of the nodes fail.
So basically no, no node is like a bottleneck in MOS.
replication setup.
So in this case, what we have on in this diagram is a geo replication setup, right?
So we have a cache router that is responsible for assigning the
right node to like assigning a cache request to the right node.
So here we have primary node A, which is basically the node
responsible for writes requests.
This is a master, secondary setup.
So basically the primary node is responsible for writes requests and
I'm synchronizing the data across the secondary nodes while the secondary
nodes are just there for read request and optimizing read request.
So in this case, we're trying to optimize for latency, right?
So what happens is that the cache router assigns them a node
to a cache request based on.
A node that is closer to the, back end or the user making the request.
So if a user is making a request from Africa, we just simply route the, cache
request to the cache node in Africa.
If they're making a request from Europe, we do the same thing.
So basically it is a way to improve or reduce latency using, replication.
Up next, we have load balancing.
Load balancing evenly distributes cache requests across multiple Cache nodes
to prevent overloading a single node.
This improves response time, ensures high availability.
It also ensures that no single cache node becomes a bottleneck in the system.
in this diagram we can see here that we have a cache load balancer,
which is responsible for routing requests to the correct node.
The data is then synchronized across the other nodes, right?
Let's say we write a request, we, add a cache value to node A, the data gets
saved to node A, then synchronized to B and C just to ensure that if
a read request is routed to Any of the other nodes, the cache data will
be available there for retrieval.
So basically the way most, load balancing systems work is that most
of them use, there are different algorithms out there, but a very
popular one is round robin, right?
Basically what the load balancer does is that for each request,
it just sends it from one node to another and that kind of stuff.
Let's say we have five requests to our caching load balancer, right?
The first request goes to node A, the second one goes to node B, the
third one node C, the fourth one goes back again to node A, then, so forth.
So that's like how round robin works in a nutshell.
Before I proceed, I would like to mention that, the caching
strategies I've mentioned.
can either be implemented on the application layer or they can be
implemented on the caching system or comes with the caching system.
It's highly recommended to go with, a caching strategy implemented on with
the caching system we're using, right?
If you're using Redis for instance, Redis cluster has, sharding and replication out
of the box, which means you don't have to.
I'm going implement this by yourself, you just have to configure your Redis
cluster to replicate or share your data based on your requirements.
So these are some of the popular caching systems out there.
The most popular ones are Redis and Memcached, right?
So Redis is an open source, in memory data store.
It also offers, advanced data structures, replication, and sharding, and is widely
adopted for high performance caching.
So Redis is quite popular when you, when a lot of folks hear caching, the first
thing that comes to their mind is Redis.
Then we have Memcached, which is basically a simple key value pair
store that is, very fast for, lookups.
Some cloud providers out there do offer, elastic, do offer custom solutions, right?
For instance, AWS has Amazon Elastic Catch, Google Cloud has
Memorystore, Azure has Catch for Redis.
So basically these are just similar with Redis Cloud.
These are just fully managed solutions.
So basically let's say you use Amazon Elastic Catch.
This means that you don't have to bother about the little details of managing your
Redis cluster or instance or whatever.
So same thing with Google Cloud Memorystore.
This is managed like for you.
You just have to configure how you want to scale your caching
system or your caching components.
Also, alternatively, you can go with the self hosted approach where you
purchase a server online and run.
Ready.
So memcache DNAT and, connect it to your backend system.
Although this can be a bit painful to manage, but, if this meets
your need is also an option.
So that is it for caching.
what is a queue?
A queue is a messaging system that enables asynchronous communication
between components by temporarily storing messages until they are processed.
most times in your application or on your backend system, right?
You have, different services that needs to communicate asynchronously
to maybe carry on different tasks.
For instance, an example would be, let's say you have, an
e commerce website, right?
Where you let users place an order online and you send them.
a receipt to their email after placing an order.
So basically, when a user sends a request to place an order, you don't
really want to send them a receipt immediately as that can be computationally
expensive to compile the receipts to PDF and also send them a mail for this.
So one thing you could do is, have a separate component or separate
service in charge of sending mails or sending, PDFs or stuff.
to users, then the, your main backend system sends a message to
the email components to compile the mail and send to them.
So this way you separate like the processing, keeping things
asynchronous and ensuring that user requests are not delayed.
Having them wait for you to compile PDFs and emails before showing them a
social message on your A PIO on a ui.
in a typical event driven setup, we have a producer.
Which is responsible for publishing events or text through a message queue.
Then you have a message queue.
Basically what a message queue does is that it holds the tax on until there's
an available consumer to dispatch it to.
It's, most of them usually have a first in first out approach, although
this othering isn't always guaranteed.
So just keep that in mind.
And you have a consumer, which is.
Basically a component or a system that just picks up events from the
message queue or receives events from the message queue and works on them.
And the example of the, e commerce, receipts, example, right?
So you have a producer that is the maybe order service sends a message
to the queue, then it gets consumed by, your email or messaging service.
Basically to send the user the receipts.
the way most consumers work is that they do, they subscribe to topics or
events on the message queue saying, okay, I'm interested in this topic.
So when this kind of topics do come across, it sends to the right consumer.
As your, demand for your queuing system grows, bottleneck, right?
If all messages are routed.
Through that, single queue.
distributed queuing addresses these challenges by spreading traffic
across multiple queue partitions.
This, allows for, high volume asynchronous processing, and also
enables multiple producers and consumers to operate concurrently,
ensuring scalability, resilience, and, scale under high load basically.
So this is, a quick diagram just explaining, what you could expect or
how distributed queuing works, right?
So in this case, you have multiple producers.
Mind you, even in the previous case of a single message queue,
you can have multiple producers and multiple consumers as well.
This doesn't stop them.
So the main important thing here, is the broker or load balancer, right?
This part of the system is.
It's basically responsible for taking events from the producers and routing
it to the right rights and partition.
In this case, we have six partitions, right?
Basically, this is a, skilled solution saying, okay, we have this number of
partitions to accept as much messages from the providers as we need, right?
So if the demand grows, you simply just add.
More partitions to the broker and the broker is in charge of allocating
which events get sent to a particular partition and, determining how it
gets sent to a consumer as well.
So this way we can, scale our, queuing components as much as we want by
simply just adding more partition or removing partition if the load drops.
These are some of the popular queuing systems out there.
We have Apache Kafka, which is a high throughput distributed streaming
platform designed for real time event streaming, and it's widely adopted for
its scalability and fault tolerance.
Then you also have RabbitMQ, which is quite mature as well and, supports
complex routing and different protocols.
It's it's a very popular queuing system as well.
Then you have NATS, which is a Lightweight High Performance Messaging
System that, it's quite popular in cloud native applications as it offers
some simplicity and low latency.
I would like to mention that The distributed, queuing strategy or,
managing partitions is usually done by the broker, which in this
case is RabbitMQ or Apache Kafka.
So most times you don't have to go into the little details
of managing this by yourself.
But in a worst case scenario where this don't meet your need, you
can simply, implement this on the application layer or create a service.
for managing like your distributed, queuing strategy.
So it's up to you, but it's best to go with the strategy or the
implementation that comes with your broker or queue load balancer.
Different cloud providers do offer a queuing service as well.
AWS has.
Amazon SQS, Google Cloud has pops up.
Then Azure has service bot.
Alternatively too, you could run, Kafka or RabbitMQ or nuts on a
server you purchased online and connect your application through it.
And yeah, you could have that Azure, although like similarly
with the cashing, I'm self hosting.
This could be a pain to manage and scale, but this is also an option
in case Army meets your needs.
Thank you for sticking around.
If you enjoyed this talk, feel free to reach out to me on
either LinkedIn or Twitter.