Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
My name is Navin Carri.
I'm truly excited to be speaking today at Con 42 Machine Learning 2025.
Today I'll be discussing a mission critical topic, optimizing high
availability and disaster recovery strategies for cloud SAP systems.
In today's digital world, downtime isn't just inconvenient, it's
expensive, disruptive, and can cause major operational risks.
To give you a sense of just how important SAP systems are, SAP forms the
backbones of critical business operations for organizations around the world.
In fact, about 84% of Fortune 500 companies.
Relay on SAP to manage their enterprise resource planning functions, covering
everything from finance and supply chain to human resources and customer services.
And it's not just about big enterprises.
Over 92 million users interact with SAP platforms every single
day across more than 180 countries.
This is incredible scale.
A key point to highlight here is when SAP systems face downtime, it's not
just a technical issue, it's a major business risk with a global ripple effect.
So building a resilience in SAP environment is in just optional.
It is absolutely essential.
Let's dive into the key strategies that ensures SAP systems to be available.
Resilient and ready for anything.
First, let's take a moment to understand the real world
impact of SAP system downtime.
Every minute a core SAP system is offline costs an average of $8,750, and without
proper high availability strategies, incidents can stretch beyond four hours.
This adds up quickly.
Uh, leading to over $2.2 million law, potential losses per incident beyond
the financial implications, um, the operational impact is more alarming.
84. 87% of organizations experience a significant supply chain disruptions
within just eight hours of SAP downtime.
And 64% report customer facing impacts like service delay and
transaction failures within first two hours of system unavailability.
These cascading effects underlines a vital truth.
Downtime is in just a technical inconvenience.
It's a major business threat.
It disrupts not only the internal processes, but also
damages the customer trust.
Brand loyalty and revenue streams.
This clearly shows us building res resilience isn't optional.
It's a mission critical.
So the now the key question is how do we build resilience now that we have
seen the real impact of downtime?
Let's dive deep into the two key strategies that organization
use to build resilience.
High availability and disaster recovery.
Although they sound similar, they serve very distinct,
but equally important roles.
First, let's see, with high availability.
High availability focuses on maintaining continuous system operations, and even
during component failure, it eliminates single point of failure within a
primary production environment using redundancy and automated failover.
Typically it OP operates within the same region.
Leveraging availability, availability zones, it usually adds up 15 to 22% of
the base infrastructure cost, which is a smart investment to avoid costly times.
Um, let's discuss the examples of, uh, high availability if, um.
A single SAP server fails ha ensures that system continuous functions seamlessly.
User does not even realize anything went wrong.
Um, for an example, if a storage, uh, device fails, fails over, um, the
secondary storage component will take over and then users see, uh, users
sees seamlessly these transactions.
Um.
Uh, with the component failover will be very seamless, um,
in the case of an HR ha.
Now let's discuss about the Dr. Disaster recovery.
Disaster recovery addresses large scale disruptions, things that can be bring
down entire environment such as data center, wide outages or natural disasters.
Dr enables.
Business continuity By recovering SAP operations in alternative
geographic locations, it usually spans multiple regions to ensure
true geographic isolations.
However, implementing DR can increase total cost by 28 to 47% depending on the,
uh, organization's recovery objectives.
Uh, let's consider an example of Dr. If a hurricane wipes
out a data center in Florida.
Um, DR allows, um, recovery from secondary data center, let's say in Texas or, uh,
wherever it is, quickly and reliability.
And, um, here is the important insight to observe here organizations
that implement both HA and d, DR. Strategies, experience, or.
An average of 35% fewer operational disruptions compared to, to those
with only partial implementations.
Such as like if, if company operates only he ha then they don't, um, exper
they don't have the benefits, uh, of um, 35% fewer operational disruptions.
So they need to implement both of these things to ensure, like achieve 35% of.
Um, 35% of fewer operational dis uh, disruptions, um, disruptions.
When SAP's landscapes are properly configured, companies can achieve up to
99.595 percentage system availability even in the phase of failures.
Now that we understand the differences between HA and DR, let's move forward.
Um, how organizations are actually putting.
High availability into a practice by looking at the graph stats.
87% of our production SAP environments today have some form of a basic
high availability measure in place, which is a great progress.
But, um, here is where the gap becomes clear.
Only 53% of these organizations have fully documented and properly
tested disaster recovery plans.
And this is concerning because ev, even though 76% of respondents
classify that SAP systems as business critical, many still operate without
full disaster recovery protections.
This tells us something important.
High availability isn't alone enough.
Without disaster recovery, organizations still expose it to significant risk,
especially during large scale disruptions.
High availability and disaster recovery must go on hand in hand and to truly
protect the business operations.
Now let's take it deeper into, um, exactly how we build
effectively high availability.
Starting from very foundation layer infrastructure.
Layer building.
True high availability starts with a strong foundation, and that foundation
is infrastructure redundancy.
We start with computer redundancy.
By deploying SAP applications across multiple availability zones.
This way, if one zone or hardware component fails, the application continues
running seamlessly, seamlessly in another.
Then we have a network redundancy, which means setting up redundant
network parts, load balancers, and multiple virtual interfaces.
This eliminates single point of network failure.
Which are often silent risks, and of course, storage redundancy.
Leveraging a cloud storage platforms that automatically replicate
data, achieving extremely highly durable levels, even up to 99.99%.
Each of these compute network storage acts as a safety net.
Individually.
They are powerful.
But together they form a solid and resilient high availability foundation.
And it's not just about a hardware infrastructure application.
Uh, level redundancy is just as important to maintain the true resilience.
Before moving on with the application resilience, um, Ebo Infrastructure
Foundation, we have one more layer.
Um, database resilience.
Let's, uh, which is the one of the most critical parts of SAP environment.
Let's discuss about the database resilience Across leading, uh,
database platforms, there are proven high availability solutions designed
specifically for minimizing downtime.
Let's discuss in this, um, presentation Hana, SAP Hana, Oracle database card.
SQL server, um, SQL server always on.
For SAP Hana, uh, system replication maintains a real time standby
instance, enables recovery times as fast as two to five minutes.
It is extremely popular.
About 84% of SAP Hana deployments rely on it, and 65% of, uh,
has fully automated failover.
When it comes to the Oracle data card, provides synchronous or near
synchronous replications for SAP systems using Oracle databases.
Typically, um, it achieves, um, it achieves recovery time within
three to seven minutes after primary database failovers.
Let's move on to the SQL server.
Always on is used, uh, for SAP.
On SQL server platforms, implements availability groups.
It achieves average recovery times of around four minutes after a failover,
after a failover, or sorry, after a failure, and is adopted roughly
71% of SQL based SAP enrollments.
So across, all across databases, whether it's an SAP, HANA.
Oracle, Oracle or SQL server.
The message is consistent.
Replication and failover are not optional.
They must tightly integrated and uh, into any serious high availability strategies.
And now that we have covered the, um, database redundancy, let's move
forward and explore the how we build the resilience for application layer.
Now moving up the stack, let's talk about application level
redundancy at this level.
At this layer, we create a resilience by load balance load
balancing multiple SAP servers.
This ensures that if one server fails other instantly take over and maintains
about 95% normal transaction throughputs with without the users even noticing.
For SAP Central Services, that a SCS and SCS instance, we
deploy clustering solutions.
These allow us to avoid single point of failure, enabling recovery
time averaging just 2.5 minutes.
And, uh.
It doesn't stop there.
NQ replications ensures that critical in-flight transactions are protected
even if the primary NQ servers goes down.
All of these are backed up by automated failure mechanisms, which can detect
component fail, uh, failure and instance recovery within 12 to 30 seconds.
But what hap but what should we, you, you, what should we know here is,
um, the infrastructure components, what we have, the components
we have, uh, discussed here.
But what happens if the entire region or data center goes down?
That's where the disaster recovery comes into play.
Building a resilience isn't just about preventing small failures, it's also about
readying for larger regional disruptions.
That's where disaster recovery strategies add a critical second layer of protection.
At the foundation, we start the regular backups ensuring we always have recent
reliable copies of our SAP data.
Next.
We implement cross-regional replication, which ensures that even
it, if an entire region suffers an outage, our data is still protected
and ready in another location.
And finally, we bring it all together, a recovery orchestration, automating
the entire recovery workflow.
So when a disaster strikes, a recovery is fast, smooth,
and minimize the human errors.
When all the three layers backup, replication, and orchestrations
are executed properly, mature disaster recovery strategies can
shrink recovery times by 72%.
That means a cutting, downtime and average of, uh, 18.7 hours to 5.2
hours, making a huge difference in a. Making a huge difference in how fast
an orchestration can bo uh, can bounce back By looking at this, uh, normally,
uh, orchestration can bounce back.
Now let's explore how major cloud providers AWS Azure GCP enables
enhances high availability and disaster recoveries in the cloud
when it comes to the resilience.
Major cloud providers offer powerful tools, but each has its own strengths.
Starting with A-W-S-A-W-S holds 37% market share.
78% of user adopts multi AZ developments boosting availability
from 99.95 to 99.98% EBS snapshots and elastic disaster recovery services.
Achieve faster recoveries.
Recovery, uh, uh, recoveries.
Um, RP was of one to five minutes and RT was of 15 to 20, 20, 30 minutes.
Moving to Azure with 32% market share.
Azure's availability zones delivers to 99% availability,
site recovery and Azure backups ensure RP os of five to 15 minutes.
And, uh.
Reliable.
SAP certified backups and G-C-P-G-C-P holds 24% market share.
The regional persistent discs and cross regional copy backups are
widely adopted and enable fast, seamlessly recover across the regions.
The key takeaway is every cloud has building blocks of H-A-N-D-R choosing
the right fit for your SAP landscape.
Is what truly makes the difference.
Now let's move on from the cloud responsibilities to how we
should design and implement these solutions for real world success.
Now that we have seen what tools and strategies are available, let's talk
about how to implement them effectively.
First, begin with the business client re business requirements.
Start with the business impact analysis to prioritize what's crucial.
Typically, organizations allocate eight to 15% of SAP infrastructure
budget towards resilience.
Second layer, your defense protect at every layer.
Infrastructure, database, and applications.
Multi-layer protection increases rates to about 92%.
Compared to the 63% of single layer setups.
Third, automate where possible automation speeds up recovery
and reduces human errors.
Automation process improves success rate by 89% versus 62% for manual effort.
And finally, test regularly organizations that perform
quarterly recovery drills nearly.
Doubles their success rate during the actual disaster recovery events
compared to those test once a year.
Ultimately, it's about building the muscle memory because when a real crisis
hit, practic practice makes resilience.
And finally, after all planning and preparation, how do we
know if it is truly resilient?
That's where key metrics like recovery time, objective.
And, uh, recovery point objective comes into play.
As we near the end of today's session, let's focus on how we measure resilience.
There are two critical matrices.
Every organization must define recovery time, objective, recovery point,
objective, recovery time objective.
The maximum acceptable time is to restore system functionality after disruption.
Recovery point objective, the maximum acceptable amount of data loss measured
in time, but it doesn't stop there.
Continuous improvement through regular testing and refinements is
essential to strengthen recovery cap recovery capabilities.
Organizations that clearly define Optimizational
Optimizational, these metrices.
Achieve recovery times 42%.
Closer to the business requirement experience is 58% fewer business
disruptions during recovery operations.
And then on an average recovery, 2.6 times faster, reducing recovery
from 12.3 hours down to 4.7 hours.
As cloud and SAP technology evolves, mastering R tvOS and R tvOS will be a key.
For maintaining competitive advantages and ensuring true business continuity.
And with that, we have completed our journey through optimization,
uh, high availability and disaster recovery for SAP systems.
Let's move on to wrap up, uh, today's discussion.
Thank you.