Designing Robust and Scalable Distributed Applications – Architectural Patterns, Challenges, and Best Practices

Video size:

Abstract

Organizations are under increasing pressure to build highly available, fault-tolerant, and scalable distributed applications. This presentation explores key architectural patterns such as microservices, event-driven systems, and serverless computing, alongside their respective trade-offs. Ashis will address challenges related to data consistency, fault tolerance, and network latency, providing practical solutions to enhance reliability and performance.

Attendees will gain actionable insights on building resilient distributed systems that can manage large-scale data while maintaining operational efficiency.

I truly hope there is still room for Ashis to contribute his expertise to the conference. I would be happy to provide further details or adjust the proposal if necessary. Thank you for your time and consideration, and I look forward to hearing from you.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Good morning everyone. my name is asis and I bring, 23 years of experience, in it, specializing in data engineering, building real time, near real time, applications, in business intelligence, data warehousing, other applications. currently I'm work employed with, MasterCard as data architect. throughout my career, I have designed and implemented complex data solutions, for major, organizations across the industries, including, British Telecom, Barclays, and then many other, fortune 500 companies. my main expertise lies in creating scalable data architectures, leading end-to-end data integration efforts, driving advanced analytics, to empower. Datadriven decisions. as data architect, I have had the privilege of working with, cutting edge tools and methodologies like, client cloud technologies, different type of, architectural patterns like, microservices, even different architectures to solve, real world challenges. today I'll be discussing, the design of, distributed applications, emphasizing the importance of robust and scalable architectures. As we explore, the architectural patterns, we'll identify common challenges and the best practices that can, that can guide us in creating effective solutions. Understanding these elements, is crucial, for ensuring that our applications can handle growth and, complexity while maintaining performance and reliability. let's go to, have a look. Our, today's agenda. Let's go to the next slide. Here's the agenda for, today's presentation. We'll start with, an introduction to, the distributed applications, then, the architectural patterns, the common challenges, the best practices and solutions. we'll try to cover a case study and, then we'll, have the, summary and question answer session at the end. A distribution, a distributed system is a collection of, independent computers that appear to the users of the system as a single coherent system. these computers or nodes walk together, communicate over a network, and, coordinate their activities to achieve a common goal by sharing resources, datas and task. the difference between, we, we are accustomed with, I know, mainly central, centralized system. so the main difference between, a distributed system and a centralized system is all data and computational resources are kept and controlled in a single, in a central place. So a server in a centralized system. applications that user connect to this hub in order to access and handle data. although this configuration is easy to maintain and secure, if too many users access it simultaneously, or if the central server malfunctions, it could become a bottleneck. A distributed system on the other hand is, it disperses, data and resources over several, server locations. maybe like frequently across various, physical places. This gives better scalability and reliability. since the system can function in the event like, in the event of a component failure, however, because of the numerous point of interactions, distributed systems can also, Be difficult to secure and, administer as well. like we have, heard of like various distributed systems. the terminology I think we are very much familiar of, like client server architecture, three to architecture, microservices, architecture, even given architecture, service oriented architecture. So these are the various common, patterns. We, we. we hear like an everyday, in our software world. so basically, the distributed applications like, they're designed for, to improve performance, scalability, and, reliability. So examples like, Netflix, Amazon, Spotify, so these applications, so like in how they can handle vast amounts of data and user request showcasing their importance in modern technology. So that's how it's, the distributed systems are so important, in software, architecture. this diagram briefly shows like on a very high level, how, the nodes are connected to a server. and similarly there is another cluster which is connected over the network. so both of them working as a, single coherent system. Let's move on to the next slide, which is about, the key principles of, designing distributed applications. all, the major applications which are on, let's say like in a banking application or maybe let's say, Gmail, for instance, the key principles, which is followed while designing a, distributed, application, which is, of very high importance, is, Is scalability, availability and reliability. Fault tolerance, maintainability security. So these are, the main, principles which to consider, when designing, distribute applications. I. like scalability ensures that the system can handle increased load by adding resources, reliability, ensures that, the system can recover from failures. Fault tolerance, a ensures that the system can continue to over it, even if there is some components like that fails. performance, basically is ensures that the system can respond to, quickly to user requests that's connected to scalability. So these are the major principles like, we have, we need to keep into consideration while designing, distribute applications, which, which is of very high importance. Here in this, slide, we will basically listed, the main, architectural patterns. there are many patterns, but these, the most, commonly used and important ones. I have listed here, so we be before, before you go there, we have heard a lot about, architectural patterns. So what are architectural patterns? Architectural patterns, are, are the. Are designed to provide solution to common design problems. this, there, the, in software design we may have encountered, many kinds of similar sort of problems. so the architectural designs, which provide solution to those common designs are called, architectural patterns. for distributed applications, for building distributor applications like, the major, architectural patterns nowadays, which are being followed, in modern applications, are, microservices, event driven architecture, c qras, that is command and query responsibility segregation, then service miss. microservices are, is, we'll go into these ones in detail later on. but very briefly, microservices, involve breaking down an application to very smaller, components, which can be independently developed and deployed and scaled. I. command, query responsibility. Segregation is a design pattern which segregates, read and write operations in a data store. this approach allows, each model to be optimized independently. There are, drawbacks to this model as well, but, wherever possible, and if you follow this, in pattern, like it definitely gives benefits. A service mess is a dedicated infrastructure layer that manages communication between different, microservices, in a software applications. So it provides tools and capabilities to handle tasks like load balancing, security monitoring, ensuring, efficient and secure interactions between those services. So that's, service mesh. Now, in the next few slides, we'll go into detail, a little bit detail of, these architectures. we'll cover them, as much as we can. microservices architecture is a method of, software development where an application is broken down into smaller, independent and loosely coupled services. each service is, designed to fulfill a specific business function that can be, developed, and deployed and maintained independently. this approach actually contrast with the traditional, monolithic, architecture where all functionalities are tightly integrated into a single, code base. the key principles around, microservices, pattern is, independent development and, deployment. so each component can be developed and deployed independently. So this helps, in faster, development cycles, small focused teams. each team can be like, handled, given a, specific service so they can walk independently. Technology diversity, so each different microservice can be based on different, technologies. which is can leverage different skills of the teams, fault isolation. if one microservice fails, then doesn't affect the entire, application. I. definitely scalability. Scalability is a huge advantage in microservices application that each component or each microservices can be allowed to scale independently. so it doesn't, we don't have to scale the whole application. So that's a huge benefit. how does microservices work? Basically, microservices communicate with each other, through well-defined APIs, often using, rest event streaming message brokers. So each service has its own database, ensuring like a data autonomy and reducing dependencies. this promotes, flexibility, ease of development, and very, like an easy maintenance. if you summarize the benefits like, it's, the modularity and decoupling, scalability, technology, diversity, rapid deployment, continuous delivery, however microservices comes with, challenges as well. it's very complex to maintain, like the managing, multiple micro, very complex, managing multiple microservices, may require very careful, orchestration and monitoring as well. Testing, as a coherent model may be very difficult, because of their inter interdependencies. data consistency across, microservices can also be challenging because these are all independent, independent, services. let. Network latency as well, because there's a lot of communications involved between those services, so it can lead to, increase in, latency. So this is very briefly covering, covered where, we covered, the microservice architecture. Let's move on to, the event given architecture in the next slide. In this slide, we'll briefly cover, the event given architecture. A-K-A-E-D-A, event driven architecture, software design pattern where, S where systems, system components basically communicate by, generating, detecting, and responding to events. events, basically represent significant occurrences such as, user actions or, change in systems state. in EDA, the components are decoupled, allowing them like to operate independently. When an event occurs, a message is sent. Triggering, the appropriate response, in other components. So again, in this, architecture also, the key principles involved are, decoupling. this is one of the major, advantage of, ED architectures, like it's decoupled relationship between front end and the backend components is allow system to share information without knowing about each other. So producers can send events without knowing which consumer will receive them, and consumers can receive events without sending requests to producers, as synchronous communication. it's basically EDA enables, as in communication where components operate independently and as synchronously, in response to events. This allows for parallel processing, real time pro responsiveness as well. event handling, events are typically handled like using a publish, subscribe model, popup model. So components, interest in certain types of events, subscribe to them while component that generate events, publish them. So this model, allows for flexible, scalable, communication between the components. if we cover, briefly touch upon what are the main components of EDA, it's even source, any component system that are generat events. such as sensors, databases, interface, user interfaces. Event broker, or basically even bus. we may have heard it a lot, many times. So it's an intermediary, middleware that facilitates the communication of events between, the components. So it handles the distribution, filtering, and routing of events. a publisher like an A component that generates, sense event to the event bus. Subscriber, the component that, like an expresses interest in those specific type of events and subscribe to them. So subscribers like can always listen for events, on the event bus and respond, accordingly. event handler, is a, is a piece of code or logic like an associated with a subscriber that specifies like how to respond when a particular type of event is, is, is received. So benefits of, ED architecture, again, like other, it's flexibility and agility. it, it allows to adapt easily to changing requirements, scalability. it supports scalability by allowing components to come to, to operate independently. systems can handle increased load or drain data sets by adding more components to the resources. Real time responsiveness. so EDA, one of the major strengths of EDA is like in real time processing, ensuring that events are handled as they occur. So this is, that's why like an EDA is used in a lot, many financial, like in applications and IOT applications. Enhance, modularity as well. like any other, click the pattern, BDS has got drawbacks as well. definitely it's a very complex, system. then, consistency, like the mentoring, the order of events and ensuring consistency across the system can be very complex. Debugging and tracing as well, is, is very, Challenging compared to other architectures. use cases of EDA is like a financial services, I said like real time processing of transactions, fraud detection, e-commerce, like order placements, inventory updates, internet of things. is a very, like a good use case for, the ED architecture. telecommunications, like network monitoring, event driven r communication with network and components. Healthcare also, monitoring patient data, handling medical alerts and those kind of things. this is very briefly it's a past topic, but we've covered it very briefly. what are, the major components, their benefits and the challenges. We'll move to the next, architecture, which is A-C-Q-R-S pattern. in the next slide. in the slide, we'll briefly, cover, the CQRS pattern, architecture pattern. So command and query responsibility, segregation, c qrs, separate read and write operations like, optimizing, optimizing like you both, ments for both. read and write, operations. So in a traditional architecture, where we have single data model is often associated, both read and write operations. So this approach, is very straightforward and suited for basic, crude applications, operations, but, in modern kind of applications like, it, poses like a lot of challenges, mainly around performance. some of the issues like, with that approach was like, data mismatch. the read write, the read and write representations of data often differ. Sometimes, the fields that are required during updates might not be, might be unnecessary, during read operations. lock contention, performance problems, can have a negative, know the traditional approach can have a negative effect on the performance because both 'cause a load on the data store and the data access layer and the complex request. You got to re tribe, a lot of information. which, which is which impacts, the read and write as well, like at the same time. security challenges as well. It can be difficult to manage security when entities are subject to read and right. Operations. So this overlap can expose data in like an unintended context. There are instance where it happened, so the solution, is, to these problems is A-C-Q-R-S pattern, where, we use the CQRS to separate the read and write operations or basically commands from the read operations or queries. So that's what, like it says, command and queries. So command commands, update data queries, retried data. the is useful in scenarios where that require a clear separation between commands and so it's not applicable in all scenarios. But wherever there is an opportunity, we should use it. so separating the read and read model and the right model simplifies, the system design, of course, like an if the system design allows it to, that use case, if it is applicable to that use case. So it, it simplifies system design and implementation by addressing, specific concerns for data rights and data reads. it increases clarity, scalability, performance, because those are, now separated. so if you summarize the benefits of cqs, it'll be like dependent scaling. So we can independently scale like, the command and the query part, separately. the schemas are now optimized. They're either right, optimized or read optimized. So definitely there's a lot of increase like in performance security, and also, by separating data and rights, you can ensure that only the appropriate domain entities or preparations have permission to perform, right application, right actions on the data or maybe like on the read, separation of concerns. then, basically it. the writing of the queries can become a lot simpler. then simpler queries, when you square much less view on the read database, the application can avoid, like in writing complex, very complex. who has, anybody who has written, a complicated query can understand, like the joints can extend and the query can be quite big. So if, the, The design allows us to use this one. We should definitely use this one. in, in the next, slide, we will go over briefly, to the service, mesh architecture. So in this slide, we'll briefly touch upon, the Service MES architecture. So service, is a. Dedicated infrastructure layer, that, that manages, manages all the communication between the market services, in a software application. So it provide tools, and capabilities to handle, the task like load balancing, security monitoring, ensuring the efficient and, secure in interaction between those services. The, it has two main components. One is the data plane, the other is control, plane. so data plane is basically it's a, it considers sidecar proxies that handle the actual transfer of data between the services. they also implement like features like load balancing, service discovery, traffic routing, and control plane, acts, as a central management layer. it, it allows the administrative define and configure services, within the mess. specific parameters like, the service endpoint, routing rules, load balancing policies, and security settings. it has got this architecture has got benefits, namely service discovery, load balancing, traffic management, security, and monitoring and observability. it has got challenges as well. Like it's very, it one of the main challenges. Like it's very complex. It involves a lot of communication between the microservices, so there is a performance over it as well. There is operational overhead, like there's a lot of operational effort that goes into maintaining, deploying, configuring, these, this architecture and also that like the lining curve is very high. designing a distributed application, comes with several kind of challenges, which we have to, which we have to address. so namely the network latency data inconsistency. Service discovery complexity, then distribute transactions. and eventual con consistency, the security. so all these are, common challenges, when we have to address, while, while designing a distributed application. In this slide, we'll we listed, the various, ways we are very much aware of those. the tools or methodologies like, for, for designing or increasing scalability, like horizonal scaling, which is like an adding more nodes or, computers. vertical scaling is like an adding more resources within a particular computer. So that's called vertical stateless and stateful. So we should always go for, stateless designs. load balancing, So there, there are tools like, which are, available, but, which follow these kind of like round robin, list connections go geo based. So these are the, tools. there are tools available, which help us to do, achieve all these, strategies, implement all these strategies for, scalability. so we have to like you use them. In this slide, like when I have listed out, the best practices, which are normally followed where, while designing like distributed applications, Designing for failure. Like it's a first and foremost, fundamental, principle or practice. we should apply, we should design the application for failure. that means we, we assume like that the application can fail in multiple, several ways and we just, designed for, to avoid that, using stateless comp, stateless component, components. CICD using. then, the observability and monitoring tools, this is, this has become a very important topic nowadays. then practicing fault tolerance, security, load balancing. Event driven communications, so domain event design, then, adopting chaos engineering. So these other things like, which we should consider definitely into consideration. when we design, distributed applications, I. With this, slide we have come to near the end of the presentation. so here I've tried to summarize like, the key takeaways. so it all depends. though, when we. Talk about architectural patterns, like it all depends on the right architecture choice. So we have to choose a pattern based on, based on the fitment and suitability, for a particular use case. and obviously continuous refinement, this is a very important one, even if there's an architecture like which is implemented now, there can be like, enhancements or, opportunities to, to use, other patterns as well as you go along, and as the application matures. So we have to always be open to that. so also challenges, like there'll be challenges and we have to proactively deal with them and we have to, as an architect, and we have to anticipate the challenges, for the next three to four years timeline, and then plan accordingly. One thing, which is, lemme just come up very strongly, with the very, these microservices event driven. and the cloud deployment. investing in strong monitoring and observability tools, this is very critical. I. with this, we have come to the end of the presentation, so I, I've opened the floor for questions and answers. there's an opportunity to clarify any points we discussed today. your insights and inquiries are extremely valuable for me, so please feel free to share your thoughts. you can also drop me, email, or reach me individual like later on as well. thank you so much for your attention. please feel free to ask any questions you may have, but I really appreciate your time and, and your patience today. Thank you so much. Have a great day.

Slides

Download slides (PDF)

See all 109 talks at this event!

Conf42 Site Reliability Engineering (SRE) 2025 - Online

April 17 2025 - premiere 5PM GMT

Designing Robust and Scalable Distributed Applications – Architectural Patterns, Challenges, and Best Practices

Video size:

Abstract

Summary

Transcript

Slides

Ashis Chowdhury

Lead Software Engineer @ Mastercard

Join the community!

Featured event

2026

2025

Info

Conf42 Site Reliability Engineering (SRE) 2025 - Online

April 17 2025 - premiere 5PM GMT

Designing Robust and Scalable Distributed Applications – Architectural Patterns, Challenges, and Best Practices

Video size:

Abstract

Summary

Transcript

Slides

Ashis Chowdhury

Lead Software Engineer @ Mastercard

Join the community!