Optimizing High Availability & Disaster Recovery for Cloud SAP Systems: Strategies for Continuity

Video size:

Abstract

Learn how to safeguard your cloud-based SAP systems with advanced High Availability & Disaster Recovery strategies. Discover cutting-edge techniques to eliminate downtime, protect critical data, and ensure business continuity, even in the face of cyberattacks, outages, or disasters. Don’t miss it!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everyone. My name is Navin Carri. I'm truly excited to be speaking today at Con 42 Machine Learning 2025. Today I'll be discussing a mission critical topic, optimizing high availability and disaster recovery strategies for cloud SAP systems. In today's digital world, downtime isn't just inconvenient, it's expensive, disruptive, and can cause major operational risks. To give you a sense of just how important SAP systems are, SAP forms the backbones of critical business operations for organizations around the world. In fact, about 84% of Fortune 500 companies. Relay on SAP to manage their enterprise resource planning functions, covering everything from finance and supply chain to human resources and customer services. And it's not just about big enterprises. Over 92 million users interact with SAP platforms every single day across more than 180 countries. This is incredible scale. A key point to highlight here is when SAP systems face downtime, it's not just a technical issue, it's a major business risk with a global ripple effect. So building a resilience in SAP environment is in just optional. It is absolutely essential. Let's dive into the key strategies that ensures SAP systems to be available. Resilient and ready for anything. First, let's take a moment to understand the real world impact of SAP system downtime. Every minute a core SAP system is offline costs an average of $8,750, and without proper high availability strategies, incidents can stretch beyond four hours. This adds up quickly. Uh, leading to over $2.2 million law, potential losses per incident beyond the financial implications, um, the operational impact is more alarming. 84. 87% of organizations experience a significant supply chain disruptions within just eight hours of SAP downtime. And 64% report customer facing impacts like service delay and transaction failures within first two hours of system unavailability. These cascading effects underlines a vital truth. Downtime is in just a technical inconvenience. It's a major business threat. It disrupts not only the internal processes, but also damages the customer trust. Brand loyalty and revenue streams. This clearly shows us building res resilience isn't optional. It's a mission critical. So the now the key question is how do we build resilience now that we have seen the real impact of downtime? Let's dive deep into the two key strategies that organization use to build resilience. High availability and disaster recovery. Although they sound similar, they serve very distinct, but equally important roles. First, let's see, with high availability. High availability focuses on maintaining continuous system operations, and even during component failure, it eliminates single point of failure within a primary production environment using redundancy and automated failover. Typically it OP operates within the same region. Leveraging availability, availability zones, it usually adds up 15 to 22% of the base infrastructure cost, which is a smart investment to avoid costly times. Um, let's discuss the examples of, uh, high availability if, um. A single SAP server fails ha ensures that system continuous functions seamlessly. User does not even realize anything went wrong. Um, for an example, if a storage, uh, device fails, fails over, um, the secondary storage component will take over and then users see, uh, users sees seamlessly these transactions. Um. Uh, with the component failover will be very seamless, um, in the case of an HR ha. Now let's discuss about the Dr. Disaster recovery. Disaster recovery addresses large scale disruptions, things that can be bring down entire environment such as data center, wide outages or natural disasters. Dr enables. Business continuity By recovering SAP operations in alternative geographic locations, it usually spans multiple regions to ensure true geographic isolations. However, implementing DR can increase total cost by 28 to 47% depending on the, uh, organization's recovery objectives. Uh, let's consider an example of Dr. If a hurricane wipes out a data center in Florida. Um, DR allows, um, recovery from secondary data center, let's say in Texas or, uh, wherever it is, quickly and reliability. And, um, here is the important insight to observe here organizations that implement both HA and d, DR. Strategies, experience, or. An average of 35% fewer operational disruptions compared to, to those with only partial implementations. Such as like if, if company operates only he ha then they don't, um, exper they don't have the benefits, uh, of um, 35% fewer operational disruptions. So they need to implement both of these things to ensure, like achieve 35% of. Um, 35% of fewer operational dis uh, disruptions, um, disruptions. When SAP's landscapes are properly configured, companies can achieve up to 99.595 percentage system availability even in the phase of failures. Now that we understand the differences between HA and DR, let's move forward. Um, how organizations are actually putting. High availability into a practice by looking at the graph stats. 87% of our production SAP environments today have some form of a basic high availability measure in place, which is a great progress. But, um, here is where the gap becomes clear. Only 53% of these organizations have fully documented and properly tested disaster recovery plans. And this is concerning because ev, even though 76% of respondents classify that SAP systems as business critical, many still operate without full disaster recovery protections. This tells us something important. High availability isn't alone enough. Without disaster recovery, organizations still expose it to significant risk, especially during large scale disruptions. High availability and disaster recovery must go on hand in hand and to truly protect the business operations. Now let's take it deeper into, um, exactly how we build effectively high availability. Starting from very foundation layer infrastructure. Layer building. True high availability starts with a strong foundation, and that foundation is infrastructure redundancy. We start with computer redundancy. By deploying SAP applications across multiple availability zones. This way, if one zone or hardware component fails, the application continues running seamlessly, seamlessly in another. Then we have a network redundancy, which means setting up redundant network parts, load balancers, and multiple virtual interfaces. This eliminates single point of network failure. Which are often silent risks, and of course, storage redundancy. Leveraging a cloud storage platforms that automatically replicate data, achieving extremely highly durable levels, even up to 99.99%. Each of these compute network storage acts as a safety net. Individually. They are powerful. But together they form a solid and resilient high availability foundation. And it's not just about a hardware infrastructure application. Uh, level redundancy is just as important to maintain the true resilience. Before moving on with the application resilience, um, Ebo Infrastructure Foundation, we have one more layer. Um, database resilience. Let's, uh, which is the one of the most critical parts of SAP environment. Let's discuss about the database resilience Across leading, uh, database platforms, there are proven high availability solutions designed specifically for minimizing downtime. Let's discuss in this, um, presentation Hana, SAP Hana, Oracle database card. SQL server, um, SQL server always on. For SAP Hana, uh, system replication maintains a real time standby instance, enables recovery times as fast as two to five minutes. It is extremely popular. About 84% of SAP Hana deployments rely on it, and 65% of, uh, has fully automated failover. When it comes to the Oracle data card, provides synchronous or near synchronous replications for SAP systems using Oracle databases. Typically, um, it achieves, um, it achieves recovery time within three to seven minutes after primary database failovers. Let's move on to the SQL server. Always on is used, uh, for SAP. On SQL server platforms, implements availability groups. It achieves average recovery times of around four minutes after a failover, after a failover, or sorry, after a failure, and is adopted roughly 71% of SQL based SAP enrollments. So across, all across databases, whether it's an SAP, HANA. Oracle, Oracle or SQL server. The message is consistent. Replication and failover are not optional. They must tightly integrated and uh, into any serious high availability strategies. And now that we have covered the, um, database redundancy, let's move forward and explore the how we build the resilience for application layer. Now moving up the stack, let's talk about application level redundancy at this level. At this layer, we create a resilience by load balance load balancing multiple SAP servers. This ensures that if one server fails other instantly take over and maintains about 95% normal transaction throughputs with without the users even noticing. For SAP Central Services, that a SCS and SCS instance, we deploy clustering solutions. These allow us to avoid single point of failure, enabling recovery time averaging just 2.5 minutes. And, uh. It doesn't stop there. NQ replications ensures that critical in-flight transactions are protected even if the primary NQ servers goes down. All of these are backed up by automated failure mechanisms, which can detect component fail, uh, failure and instance recovery within 12 to 30 seconds. But what hap but what should we, you, you, what should we know here is, um, the infrastructure components, what we have, the components we have, uh, discussed here. But what happens if the entire region or data center goes down? That's where the disaster recovery comes into play. Building a resilience isn't just about preventing small failures, it's also about readying for larger regional disruptions. That's where disaster recovery strategies add a critical second layer of protection. At the foundation, we start the regular backups ensuring we always have recent reliable copies of our SAP data. Next. We implement cross-regional replication, which ensures that even it, if an entire region suffers an outage, our data is still protected and ready in another location. And finally, we bring it all together, a recovery orchestration, automating the entire recovery workflow. So when a disaster strikes, a recovery is fast, smooth, and minimize the human errors. When all the three layers backup, replication, and orchestrations are executed properly, mature disaster recovery strategies can shrink recovery times by 72%. That means a cutting, downtime and average of, uh, 18.7 hours to 5.2 hours, making a huge difference in a. Making a huge difference in how fast an orchestration can bo uh, can bounce back By looking at this, uh, normally, uh, orchestration can bounce back. Now let's explore how major cloud providers AWS Azure GCP enables enhances high availability and disaster recoveries in the cloud when it comes to the resilience. Major cloud providers offer powerful tools, but each has its own strengths. Starting with A-W-S-A-W-S holds 37% market share. 78% of user adopts multi AZ developments boosting availability from 99.95 to 99.98% EBS snapshots and elastic disaster recovery services. Achieve faster recoveries. Recovery, uh, uh, recoveries. Um, RP was of one to five minutes and RT was of 15 to 20, 20, 30 minutes. Moving to Azure with 32% market share. Azure's availability zones delivers to 99% availability, site recovery and Azure backups ensure RP os of five to 15 minutes. And, uh. Reliable. SAP certified backups and G-C-P-G-C-P holds 24% market share. The regional persistent discs and cross regional copy backups are widely adopted and enable fast, seamlessly recover across the regions. The key takeaway is every cloud has building blocks of H-A-N-D-R choosing the right fit for your SAP landscape. Is what truly makes the difference. Now let's move on from the cloud responsibilities to how we should design and implement these solutions for real world success. Now that we have seen what tools and strategies are available, let's talk about how to implement them effectively. First, begin with the business client re business requirements. Start with the business impact analysis to prioritize what's crucial. Typically, organizations allocate eight to 15% of SAP infrastructure budget towards resilience. Second layer, your defense protect at every layer. Infrastructure, database, and applications. Multi-layer protection increases rates to about 92%. Compared to the 63% of single layer setups. Third, automate where possible automation speeds up recovery and reduces human errors. Automation process improves success rate by 89% versus 62% for manual effort. And finally, test regularly organizations that perform quarterly recovery drills nearly. Doubles their success rate during the actual disaster recovery events compared to those test once a year. Ultimately, it's about building the muscle memory because when a real crisis hit, practic practice makes resilience. And finally, after all planning and preparation, how do we know if it is truly resilient? That's where key metrics like recovery time, objective. And, uh, recovery point objective comes into play. As we near the end of today's session, let's focus on how we measure resilience. There are two critical matrices. Every organization must define recovery time, objective, recovery point, objective, recovery time objective. The maximum acceptable time is to restore system functionality after disruption. Recovery point objective, the maximum acceptable amount of data loss measured in time, but it doesn't stop there. Continuous improvement through regular testing and refinements is essential to strengthen recovery cap recovery capabilities. Organizations that clearly define Optimizational Optimizational, these metrices. Achieve recovery times 42%. Closer to the business requirement experience is 58% fewer business disruptions during recovery operations. And then on an average recovery, 2.6 times faster, reducing recovery from 12.3 hours down to 4.7 hours. As cloud and SAP technology evolves, mastering R tvOS and R tvOS will be a key. For maintaining competitive advantages and ensuring true business continuity. And with that, we have completed our journey through optimization, uh, high availability and disaster recovery for SAP systems. Let's move on to wrap up, uh, today's discussion. Thank you.

Slides

Download slides (PDF)

See all 136 talks at this event!

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Optimizing High Availability & Disaster Recovery for Cloud SAP Systems: Strategies for Continuity

Video size:

Abstract

Summary

Transcript

Slides

Naveen Karuturi

Senior SAP Basis Administrator @ Pacific Gas and Electric Company

Join the community!

Featured event

2026

2025

Info

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Optimizing High Availability & Disaster Recovery for Cloud SAP Systems: Strategies for Continuity

Video size:

Abstract

Summary

Transcript

Slides

Naveen Karuturi

Senior SAP Basis Administrator @ Pacific Gas and Electric Company

Join the community!