Distributed Caching & Queueing in the Cloud

Video size:

Abstract

Learn how to supercharge your backend with distributed caching and queueing! Handle traffic spikes, slash latency, and scale gracefully in the cloud. Perfect for any developer seeking practical performance boosts—no microservices required.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey there, I'm Rizomatu, a backend engineer, and today I'll be walking us through distributed caching and queuing in the cloud. Let's get started. So for folks who are new to all of this, a cache is basically a high speed storage layer. The, temporarily who frequently assess data to improve performance and reduce latency. So instead of going to a slow source, let's say for instance, a database file system or a standard API, every time you request data that you know, it's frequently assessed, you can basically store a copy of that data on a cache so that future requests can be served and much faster. This is the, typical setup for most applications, right? You have a client that makes a request to the server, which then makes a request to a database to fetch data, which is then sent back to the client. So this works just fine for most applications, but on high traffic systems, this doesn't scale well. If it's not obvious already, the issue with this kind of setup is that Over time, the database becomes a bottleneck as all clients requests get to hit the database. This is, where we can leverage a cache to scale this solution, right? So basically, instead of the server reaching out to the database each time for data that we know it's, frequently accessed, the server can basically just reach out to the cache to say, hey, is this record in the cache? If the record is available in the cache, The server just returns this back to the client from the cache. But in the case where the record isn't available in the cache, the server makes a request to the database to fetch this data, writes back to the cache, then return to the client so that future requests can be retrieved from the cache. This way we can improve the, Read performance for clients request to the server. So if you're wondering already why, retrieving data from the cache is faster than retrieving from the database, it's simply because, data on the cache is usually written to a faster storage, like memory, unlike data on a database, which is written to disk. So most times, as the application scales and grows with time, a single caching stance isn't enough to, manage the load you have or your caching demand. This is where distributed caching comes in. Basically, distributed caching spreads cache data across multiple nodes, instead of storing it on a single node or machine. This ensures scalability, fault tolerance, and prevents cache overload. There are different strategies out there for distributed caching. We have sharding, replication, load balancing, consistent hashing, and some others out there. But for now, I'll just focus on the first three on this list because of time. consistent hashing is a bit complicated and, might take us a lot of time. So what's sharding? Sharding is a data replication technique. That distributes cache data across multiple nodes instead of storing everything on a single, machine in a distributed caching environment, sharding ensures scalability and efficient memory usage by dividing on the cache data across multiple instances. So let's say, on this, diagram, as you can see, we have a cache router, which is basically responsible for assigning a cache request to the right nodes. In this case, we have a range based on sharding setup, which means that we assign requests to nodes based on the key range. For keys in range 1 to 1, 000, it's assigned to node A. 1, 000 to 2, 000, it's assigned to node B and so on. So basically what this does is that the cache data is split across each of these nodes, saving only a range of these values or a range of this key value pairs on a single node. This way we can scale out our caching system to support, more memory space or storage space, basically. So let's say we have a constraint of five gigabytes per node. By adding like two extra nodes, we're able to, increase that limits to 15 gigabytes, which comes in very handy for scaling. Up next, we have replication. Replication is also a data partitioning technique. Yes, that this duplicates cache data across multiple nodes to ensure high availability, fault tolerance, and improve read performance. It allows on cache systems to continue functioning even if some of the nodes fail. So basically no, no node is like a bottleneck in MOS. replication setup. So in this case, what we have on in this diagram is a geo replication setup, right? So we have a cache router that is responsible for assigning the right node to like assigning a cache request to the right node. So here we have primary node A, which is basically the node responsible for writes requests. This is a master, secondary setup. So basically the primary node is responsible for writes requests and I'm synchronizing the data across the secondary nodes while the secondary nodes are just there for read request and optimizing read request. So in this case, we're trying to optimize for latency, right? So what happens is that the cache router assigns them a node to a cache request based on. A node that is closer to the, back end or the user making the request. So if a user is making a request from Africa, we just simply route the, cache request to the cache node in Africa. If they're making a request from Europe, we do the same thing. So basically it is a way to improve or reduce latency using, replication. Up next, we have load balancing. Load balancing evenly distributes cache requests across multiple Cache nodes to prevent overloading a single node. This improves response time, ensures high availability. It also ensures that no single cache node becomes a bottleneck in the system. in this diagram we can see here that we have a cache load balancer, which is responsible for routing requests to the correct node. The data is then synchronized across the other nodes, right? Let's say we write a request, we, add a cache value to node A, the data gets saved to node A, then synchronized to B and C just to ensure that if a read request is routed to Any of the other nodes, the cache data will be available there for retrieval. So basically the way most, load balancing systems work is that most of them use, there are different algorithms out there, but a very popular one is round robin, right? Basically what the load balancer does is that for each request, it just sends it from one node to another and that kind of stuff. Let's say we have five requests to our caching load balancer, right? The first request goes to node A, the second one goes to node B, the third one node C, the fourth one goes back again to node A, then, so forth. So that's like how round robin works in a nutshell. Before I proceed, I would like to mention that, the caching strategies I've mentioned. can either be implemented on the application layer or they can be implemented on the caching system or comes with the caching system. It's highly recommended to go with, a caching strategy implemented on with the caching system we're using, right? If you're using Redis for instance, Redis cluster has, sharding and replication out of the box, which means you don't have to. I'm going implement this by yourself, you just have to configure your Redis cluster to replicate or share your data based on your requirements. So these are some of the popular caching systems out there. The most popular ones are Redis and Memcached, right? So Redis is an open source, in memory data store. It also offers, advanced data structures, replication, and sharding, and is widely adopted for high performance caching. So Redis is quite popular when you, when a lot of folks hear caching, the first thing that comes to their mind is Redis. Then we have Memcached, which is basically a simple key value pair store that is, very fast for, lookups. Some cloud providers out there do offer, elastic, do offer custom solutions, right? For instance, AWS has Amazon Elastic Catch, Google Cloud has Memorystore, Azure has Catch for Redis. So basically these are just similar with Redis Cloud. These are just fully managed solutions. So basically let's say you use Amazon Elastic Catch. This means that you don't have to bother about the little details of managing your Redis cluster or instance or whatever. So same thing with Google Cloud Memorystore. This is managed like for you. You just have to configure how you want to scale your caching system or your caching components. Also, alternatively, you can go with the self hosted approach where you purchase a server online and run. Ready. So memcache DNAT and, connect it to your backend system. Although this can be a bit painful to manage, but, if this meets your need is also an option. So that is it for caching. what is a queue? A queue is a messaging system that enables asynchronous communication between components by temporarily storing messages until they are processed. most times in your application or on your backend system, right? You have, different services that needs to communicate asynchronously to maybe carry on different tasks. For instance, an example would be, let's say you have, an e commerce website, right? Where you let users place an order online and you send them. a receipt to their email after placing an order. So basically, when a user sends a request to place an order, you don't really want to send them a receipt immediately as that can be computationally expensive to compile the receipts to PDF and also send them a mail for this. So one thing you could do is, have a separate component or separate service in charge of sending mails or sending, PDFs or stuff. to users, then the, your main backend system sends a message to the email components to compile the mail and send to them. So this way you separate like the processing, keeping things asynchronous and ensuring that user requests are not delayed. Having them wait for you to compile PDFs and emails before showing them a social message on your A PIO on a ui. in a typical event driven setup, we have a producer. Which is responsible for publishing events or text through a message queue. Then you have a message queue. Basically what a message queue does is that it holds the tax on until there's an available consumer to dispatch it to. It's, most of them usually have a first in first out approach, although this othering isn't always guaranteed. So just keep that in mind. And you have a consumer, which is. Basically a component or a system that just picks up events from the message queue or receives events from the message queue and works on them. And the example of the, e commerce, receipts, example, right? So you have a producer that is the maybe order service sends a message to the queue, then it gets consumed by, your email or messaging service. Basically to send the user the receipts. the way most consumers work is that they do, they subscribe to topics or events on the message queue saying, okay, I'm interested in this topic. So when this kind of topics do come across, it sends to the right consumer. As your, demand for your queuing system grows, bottleneck, right? If all messages are routed. Through that, single queue. distributed queuing addresses these challenges by spreading traffic across multiple queue partitions. This, allows for, high volume asynchronous processing, and also enables multiple producers and consumers to operate concurrently, ensuring scalability, resilience, and, scale under high load basically. So this is, a quick diagram just explaining, what you could expect or how distributed queuing works, right? So in this case, you have multiple producers. Mind you, even in the previous case of a single message queue, you can have multiple producers and multiple consumers as well. This doesn't stop them. So the main important thing here, is the broker or load balancer, right? This part of the system is. It's basically responsible for taking events from the producers and routing it to the right rights and partition. In this case, we have six partitions, right? Basically, this is a, skilled solution saying, okay, we have this number of partitions to accept as much messages from the providers as we need, right? So if the demand grows, you simply just add. More partitions to the broker and the broker is in charge of allocating which events get sent to a particular partition and, determining how it gets sent to a consumer as well. So this way we can, scale our, queuing components as much as we want by simply just adding more partition or removing partition if the load drops. These are some of the popular queuing systems out there. We have Apache Kafka, which is a high throughput distributed streaming platform designed for real time event streaming, and it's widely adopted for its scalability and fault tolerance. Then you also have RabbitMQ, which is quite mature as well and, supports complex routing and different protocols. It's it's a very popular queuing system as well. Then you have NATS, which is a Lightweight High Performance Messaging System that, it's quite popular in cloud native applications as it offers some simplicity and low latency. I would like to mention that The distributed, queuing strategy or, managing partitions is usually done by the broker, which in this case is RabbitMQ or Apache Kafka. So most times you don't have to go into the little details of managing this by yourself. But in a worst case scenario where this don't meet your need, you can simply, implement this on the application layer or create a service. for managing like your distributed, queuing strategy. So it's up to you, but it's best to go with the strategy or the implementation that comes with your broker or queue load balancer. Different cloud providers do offer a queuing service as well. AWS has. Amazon SQS, Google Cloud has pops up. Then Azure has service bot. Alternatively too, you could run, Kafka or RabbitMQ or nuts on a server you purchased online and connect your application through it. And yeah, you could have that Azure, although like similarly with the cashing, I'm self hosting. This could be a pain to manage and scale, but this is also an option in case Army meets your needs. Thank you for sticking around. If you enjoyed this talk, feel free to reach out to me on either LinkedIn or Twitter.

Slides

Download slides (PDF)

See all 81 talks at this event!

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Distributed Caching & Queueing in the Cloud

Video size:

Abstract

Summary

Transcript

Slides

Wisdom Matthew

Senior Backend Engineer @ NALA

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Distributed Caching & Queueing in the Cloud

Video size:

Abstract

Summary

Transcript

Slides

Wisdom Matthew

Senior Backend Engineer @ NALA

Join the community!