Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
Welcome to our session on Resilient by Design, a data-driven migration from
monoliths to event driven microservices.
Hello, I'm Amlin Go.
I have over two decades of experience in architecting software solutions
with a focus on modernization from monolithic applications to microservices
based event driven architecture.
So today we are tackling one of the most pressing challenges in modern system
architecture, the transition from monolith system to event driven microservices.
This transformation while promising unparalleled scalability and
agility demands meticulous planning and execution to maintain system
stability Throughout the journey.
In this talk, we will explore a data-driven framework.
Designed to address these challenges head on from understanding system domains to
designing robust schemas and implementing progressive migration techniques.
We will highlight strategies to ensure resilience, fall
tolerance, and seamless operations.
So let's dive in and discover how resilience is engineered by design.
While microservice offer undeniable benefits like modularity, scalability,
and fault isolations, the road to adopting this architecture is far straightforward.
I. Transitioning from a monolithic system to an even driven
microservices architecture comes with its own set of challenges.
This include handling data consistency, managing distributed systems, and ensuring
system resilience during the migration.
This is where SRE principles and data-driven strategies play a crucial
role in overcoming these challenges.
Okay, migrating.
Let's get into some facts.
Migrating to a new architecture is no small fit and the numbers
speak for themselves about the challenges enterprise encounter.
68% of enterprise identifies stability concerns.
As their primary migration challenge, downtime or disruptions
during this process can severely impact business operations making
reliability a top priority.
43% of migration fail on the first end due to insufficient architecture.
Planning a rushed or poorly designed approach often leads to
incomplete transition or disruptions.
37% of the project face cost overruns.
Traditional migration methods often underestimate the complexity
leading to unplanned expenditure
with stability concerns, first time migration failures, and cost overruns
posing significant challenges.
It's clear that traditional migration methods often fail, fall short.
So how can we overcome these hurdles and ensure a smooth transition?
So the answer lies in adopting a well-defined data-driven approach.
So enter our event driven migration framework.
This framework is designed to guide enterprise through the complexities of
migration from monolithic to resilient, scalable, and event driven microservices.
It addresses these challenges head on by combining robust architecture
principles while real-time data insights.
So let's explore how this framework can pave the way for
successful future-proof migrations.
So let's start at the base.
The domain driven design.
So the domain driven design is a strategic methodology for monolithic
complex systems by aligning software architecture with business needs, it
emphasizes the use of ED context to create clear service boundaries that
encapsulate specific business logic, ensuring modularity and scalability
essential for microservices migration.
Additionally.
Ubiquitous language, foster seamless communication between developers and
domain experts, reducing misunderstanding and improving collaboration by leveraging
tools like context mapping and even storming domain event design simplifies
identifying sub-domains and their interactions enabling a structured
approach to managing complexity and building resilient systems.
This makes domain driven design of vital framework for transitioning to a
event driven microservices architecture.
Next comes event sourcing.
So event sourcing is again, a technique that preserves the entire
system history through immutable event logs, allowing every change
to be logged.
And recorded these logs are invaluable for reliable state reconstruction
making system robust and fall tolerant by enabling precise recovery of the
system state at any point in time.
Key use cases include debugging issues auditing for compliance replaying events
to recover from failures or errors.
Tools and frameworks like Apache Kafka, even store Exxon framework, provide the
necessary infrastructure to implement event sourcing efficiently, ensuring a
resilient and auditable architecture.
Next is our event driven patterns.
So even driven architecture is a design approach that ensures
responsive and reliable in dis in reliability in distributed system.
By enabling components to communicate through events, it fosters decoupling
making system more scalable and adaptive.
Architectural safeguards like circuit breaker pattern prevents
cascading failures by isolating problematic services while bulkhead
com compartmentalization workloads.
To protect critical resources.
Additionally, fault isolation mechanism ensures failure in one
microservice don't get impacts others.
Maintaining overall system stability.
These patterns not only enhance resilience, but also aligned with SRE
principle of reliability, fault tolerance, and proactive problem mitigation.
Next is our continuous delivery.
Continuous delivery.
As it's essential in the resilience, but also it aligns with SRE principles
of reliability, fault tolerance, and proactive problem mitigation.
Reducing time to market for updates.
Automated testing serves as the foundation of safe deployments,
ensuring high code quality and catching issues early in the development cycle.
Rollback capabilities further ensure quick restoration of system functionality
in case of deployment failures, tools like Jenkins, GitLab AWS code pipeline.
Are widely used to implement these practices effectively driving stability
and innovation in development processes.
So distributor systems are the backbone of modern applications, but they come
with their own share of complexities.
Complexities that often challenge site reliability.
Engineers tasked with ensuring stability and reliability.
From managing state changes to debugging issues and ensuring fault tolerance,
maintaining uptime in such systems requires robust tool and approaches.
This is where event sourcing shines as an alternative pattern that not
only reduces complexity, but also aligns perfectly with SRA principles.
Today we'll explore how event sourcing can transform the way we manage system states.
And reliability, focusing on capturing events, building
event logs, reconstructing, state, and enabling projections.
So even searching align seamlessly the the event sourcing align seamlessly
with SRE principles by reducing system complexity and enhancing reliability.
It begins with capturing events recording all state changes as
immutable timestamped objects to ensure accurate tracking and observability.
This is followed by building an append only event log.
Which serves as the system's definitive source of truth, providing resiliency and
auditability through state recognition.
The current application state is derived by sequential processing of the event
streams, enabling swift debugging and efficient incident response,
critical for mentoring upline uptime.
Lastly, projections.
Allow tailored data model to be generated from the same event sequence, improving
operational insights and empowering SREs with actionable metrics to proactively
manage reliability and performance.
Together this practices create a robust fall tolerant architecture
well suited to SRE goals.
So we explored how event sourcing reduces complexity and enhances reliability
by aligning with the core principles of site reliability engineering.
But how does this play out in practice to truly understand its impact?
Let's dive into a real world example, the migration of a Fortune five, 500 retail
companies system with a monolithic to event driven microservices architecture.
So this case study highlights the practical application of even
sourcing in solving challenges related to system downtime,
scalability, and incident response.
It showcases the SRE practices, combined with event driven patterns enabled
the company to achieve an operational exp excellence, reduce complexity, and
build up system resilient enough to handle millions of daily transactions.
Let's look at the challenge we faced, the solution we implemented and the
key outcomes from this migration.
So let's delve into the practical application of event sourcing and
microservice migration through the journey of of this use case.
So this migration illustrates the principle of SRE.
So now let's take a closure look at how site reliability engineering principle
guided the migration journey from a monolithic to event driven architecture.
This migration, this transformation was carried out into four phases.
So now let's take a closer look at how this the principles guided us.
The SRA principles guided us in the migration journey.
So the first phase is during the assessment phase the team model domains,
and identify service boundaries.
This strategic effort not only broke down the monolithic system into manageable
components, but also ensured modularity, a critical factor for minimizing risk
and maintaining operational stability.
Second, the focus shifted to even schema design, where the team
created 47 distinct event types, along with a versioning strategy.
By structuring event data with clarity, we laid the foundation
for precise tracking and debugging, empowering SREs with enhanced
observability for issue re resolution.
The third phase that ran parallel.
Saw a development of Macroservices alongside the legacy monolithic
application, ensuring the existing system remained functional during the transition.
This dual approach aligned with the SRA principles of minimizing
disruption and maintaining uptime while introducing new components.
Finally, the progressive migration phase enabled incremental traffic shifting to
the new architecture with zero downtime.
This gradual rollout shared stability, allowing SRE to monitor performance
and address any issues proactively while keeping reliability intact.
These four phases demonstrate the power of combining event driven architecture
with SRE practices to create a system that is both scale, scalable and resilient.
So now let's explore the measurable outcomes of this migration and
the tools that made it possible.
Even schema design is crucial for ensuring reliability and scalability
in an event driven architecture.
Key base practices includes exp explicit versioning, which incorporates schema
versions in event metadata to maintain compatibility during updates, domain
aligned events, leverage ubiquitous language from the business domain
for naming events, ensuring clarity and alignment with domain logic.
Temporal con context embeds creation of timestamps and casual
metadata within the events, enabling accurate tracking and sequencing.
Lastly, even should be self-contained, including all necessary context within
the payload to minimize the dependency and streamlining the process processing.
This practice collectively enhance observability, fall tolerance,
and operational efficiency.
So having established the importance of an well-designed event schemas,
the next step is to explore how to enhance system reliability.
I. With resilience patterns implementation, right?
So resilient patterns such as circuit breaker bulkheads and fault isolation
mechanisms are integral to building robust, even driven systems for SREs.
These patterns not only help in mitigating failures.
But also aligned with the core principles of uptime, observability,
and proactive incident management.
So let's now dive into the resilience pattern, discussing the role and
ensuring high availability and how they can be effectively implemented
in a distributed architecture.
So resilience patterns are essential for maintaining system ability
reliability and fall tolerance in distributed architecture.
This circuit breaker pattern prevent system overload by f failing fast
when dependencies are unhealthy, reducing cascading failure by 85%.
And enabling automatic recovery with configurable threshold for each service.
The bulkhead pattern that isolates components to contain failures
within bounded context, utilizing resource pool isolation, trade pool
segregation, and requested limiting to protect critical services.
Meanwhile, retry with backup handles.
Transient failures through intelligent retry mechanism, leveraging exponential
back of algorithms, jitter for load disruptions, and dead letter cues to
manage unprocessed events effectively.
Together this patterns align with SRE principles.
By improving this system, stability, scalability, and operational
efficiency, again, with the resilience pattern in place.
The next critical step for ensuring system reliability is real time monitoring.
In event of architecture, event driven architecture, real-time monitoring
provides the visibility required to maintain operational excellence.
And proactively address issues before they impact the users from SRE Perspectives.
This strategy provides pivotal in tech tracking key metricses detecting
anomalies and ensuring adherence to service level objectives.
So let's delve into how realtime monitoring implementation.
Monitoring compliments resilience patterns and supports a proactive
approach to system reliability.
So realtime monitoring strategies align with SRA principles by
enhancing visibility and enabling.
Proactive Reliability management, so aggregation, centralizes matrices while
preserving context Through correlation IDs, ensuring accurate analysis across
service, alerting tri alerts, the alerting triggers notification in
response to service level objectives.
Violation enabling rapid intervention before issue gets escalated,
then comes your investigation.
Investigation fate facilitates your tracing request flows across distributed
services, helping pinpoint failures and streamline debugging efforts.
Lastly, instrumentation that embeds telemetry in every service and event
flow, providing continuous insights into system performance and help
together these practices ensure effective monitoring and operational
excellence in complex architectures.
So now that we have explored the strategies for real-time monitoring,
let's, IM it's important to evaluate how different migration
methodologies stack up in the context of reliability and scalability.
From SRA perspectives, selecting the right migration approach is crucial.
To minimizing downtime, maintaining system stability, and meeting
service level objectives.
In the next section, we will compare popular migration methodologies,
analyzing their strengths, limitations, and alignment with SRE principles
to help determine the best fit for the resilient architecture.
So when evaluating migration methodologies through the lens of reliability
and scalability, it's clear that traditional approach and event driven
strategies often vastly differ outcomes.
The traditional approach often relies on big bang cutover strategy,
which involves extended downtime.
Windows mi downtime windows migrating monolithic database, and relying on
mutual and manual verification process.
While this method can work, in some cases, it has limited rollback
capabilities posing significant risk to system stability.
On the other hand, our event approach prioritizes incremental service migration.
Enabling zero downtime deployment and ensuring system availability
throughout the transition with data synchronization, powered by events,
automated cannery analysis for gradual testing and instant rollback mechanism.
So this methodologist aligns with our SRE principles and delivers.
The required results by adopting this event driven migration strategy,
organized, can minimize risk, maintain, uptime, and achieve a more
scalable and reliable architecture.
With the migration methodologist compared, the next step is to outline
a clear path of execution through an implemented implementation roadmap.
A well-defined roadmap ensures a structured transition.
So with the migration methodologist compared next step is to
outline the clear path, right?
A roadmap.
So now that we have established the importance of the structured migration
process, let's walk through the key stages of our implementation roadmap, and that
is designed to ensure reliability and scalability while aing to SR principles.
So number one that we should discuss is about the domain analysis, right?
We begin by mapping business domain to bounded context and
identifying clear service.
Boundaries.
This ensures modularity and lays a solid foundation for decomposing
the monolithic into microservice microservices, then comes event storming.
The next step involves close collaboration with domain expert to
document core events and commands.
This phase ensures that the architecture aligns with real world business processes,
enabling payer system behavior modeling.
Next is our schema design.
Here we define even schemas and implement a compatibility strategy with
versioning to handle future changes.
This step is crucial for maintaining system integrity and ensuring
consistent communication between services infrastructure setup.
With Schema ready, we deploy our event broker, set up observability
platform and create robust.
Continuous integration and continuous deployment pipelines.
These elements form the backbone of the reliable and scalable event
driven architecture, and then is our incremental migration.
Finally, we migrate one bounded context at a time using progressive delivery.
This approach allows us to validate each step, ensuring zero
downtime while maintaining system stability, and that is what we.
Are doing as a systems site reliability engineers.
Thank you.
Thank you everyone.
Thank you for joining today.
In conclusion, we have explored the transition to event-driven architecture,
emphasizing the importance of SRE principles in ensuring reliability.
And scalability and resilience.
From implementation to roadmaps, to migration methodologies and
resilience patterns, these strategies collectively paved the way for
modern high performing systems.
Thank you for your time today and your attention.
It's been a pleasure discussing this impactful practices with you, wishing
you continued success in driving innovation and reliability in your system.
Thank you.