Conf42 Site Reliability Engineering (SRE) 2025 - Online

- premiere 5PM GMT

Ensuring Financial System Reliability: SRE Principles in SAP S/4HANA Implementation

Video size:

Abstract

Discover how SRE principles transformed a global manufacturer’s financial backbone from fragile legacy systems to a resilient SAP platform. Learn actionable strategies that revolutionized availability and turned finance teams from skeptics to passionate SRE evangelists.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hey. Hi everyone. Welcome. Thank you for joining me today. In today's session we'll discuss how SRE approach has transformed financial operations by addressing system latency, global access disparities, and also reliability concerns. Let's explore the key strategies and results achieved through this transformation. So moving on the next one, what we are looking at is the challenge. So what are the challenges which are, we are currently facing? As part of the financial systems which needed SRE. So the first one, what we see is unpredictable performance during the financial close. The system response time increases drastically from an average of two seconds to 30 seconds per transactions. This latency caused delays in the consolidation, financial statements leading to missing deadlines and the increase in manual intervention. For instance, a finance team had to wait for extended periods to complete each transaction, which not only slowed down the entire process, but also led to overtime work and increased operational costs. So the second challenge, what we see is global availability issues. Teams located in different regions such as Asia and Europe, faced inconsistent access to financial systems. For example, during the peak hours, users in Asia frequently experienced a timeout, some slowdown responses while users in Europe did not face any such issues. This disparity in system availability, the availability of the global teams to collaborate effectively and maintain the continuous operation. As a result critical financial processes were delayed and overall efficiency of the organization was compromised. So the next issue, what we see is reliability concerns. So the financial system experienced unexpected outages particularly during the critical periods such as month end close. These outages led to incomplete data entries and disrupted financial workflows requiring ex extensive manual reconciliation. To correct errors. For instance, a certain system crash during the final stages of financial reporting, causes data loss and also inconsistencies. Which took several days to resolve. So the result in increasing in the operational risk and reduce the confidence among the stakeholders and potential financial losses due to delayed in reporting condition making. So as an example, like in British Airways in 2020 the the SAP outage was disrupted. The payroll and invoicing cascading into 80 million losses. Okay. Moving on. The next one, what we are looking at is Ari approach to finance transformation. So here here we are looking at three different approaches how the finance can transform this SRE. So the first one, what we are looking is reframe financial systems as services. So the financial transactions and reporting processes were treated as services with the defined reliability on. Performance targets and similar to how how we customer facing applications are managed. So this swift in perspective ensures that the financial system receive the same level of attention and resources as the other critical business services leading to improve the reliability and also the performance. So the second approach, what we see is a defined clear reliability target. Okay, let's say the, a certain example. Certain example, the set service level object to. So of 99.99% of time for the financial system, meaning it could only be down for approximately four minutes per month. This high standard insurance, that system was almost always reliable, reducing the risk of downtime. During the critical periods and improving the overall user satisfaction. So Visa sets their at 99.999% to align it with the business critical operations, which will reduce overall system downtime. And reduce the latencies. So the next one, what next approach, what we see is implement a continuous measurement. Integrate real time monitoring tools, so to track the system performance and also user experience and enable proactive issues protection. Enable proactive issue detection through prevent a predictive analysis and an anomaly detection. So continuous monitor monitoring flags will increase in a PA response time prompting the system to automatically scale up resources before, before user experience any delay. So proactive scaling maintains performance standards and also prevent potential slowdowns. So the next one, what we are looking at is establishing financials, LO. So again, we are having four different approaches here. The first one, what we see is availability metrics. Ensure that critical functions just speak. The lecturer updates or financial reporting were almost always available with the downtime, less than four minutes per month. This high availability targets minimize the disruptions and ensures that finance teams could rely on the system to complete their tasks on time. Our next one, what we see is performances. So set a performances where 95% of the financial transactions should complete within two seconds. Ensuring quick and efficient processing this target helps maintaining a smooth workflow and reduce the time finance teams. Spent waiting for the transactions to complete. So the Coca-Cola mandated this two, two second sub-process times for any S-A-P-F-I transaction during the quarter closures. So the next one, the third one, what we are looking at is continuous refinement. Every quarter the team should review system performance data and not just the based on the observed trends and also the business needs. So ensuring that. Continuous alignment with the operational goals. This regular review process allows the teams to adapt and change in requirements continuously improve the system reliability. The fourth one, what we look at is business aligned Excel goals. So during month and close, the Excel was for the availability was increased to 99.999. 5% allowing for only two minutes of downtime to ensure smooth and uninterrupted financial operations. So this higher target ensures that system remained highly reliable during the most critical periods reducing the risk of delays and errors. Okay. So the next one, what we are looking at is automated monitoring and alerting. So the first one, what we we have three different approaches how the system alert. New alerts will be sent to the financial processes. So the first one, what we are looking at is intelligent alerting. So alerts for issues affecting month and close processes, or prioritized over less critical functions ensuring rapid response and also high impact problems. So this prioritization helps the teams to focus on most critical issues first that minimizes the business disruption. So the second one, what we see is. Proactive detection. So monitoring the tools monitoring tools that detect detected increases the latency in transaction processing and alerts the system before it affect the user. So following for primitive actions. So this proactive approach helps prevent issues from cascading and ensures a smooth user experience. So the next one, what we see is a comprehensive coverage so monitored, not just the system uptime and also the performance, but also completion rates of the financial processes like reconciliations and reporting generations. This comprehensive monitoring approach provides a holistic view of system health and performance. Okay, moving on. The next one, what we are looking at is incident response time. So in order to provide the response time, so for the incidents to try to take care we defined the three different topics. Approaches. The first one is finance specific playbooks. So develop playbooks for common issues like transaction failures and data inconsistencies, detailing step by step resolution procedures, and escalating escalation parts. These playbooks provides clear guidance for teams and ensures quick and effective incident resolution. So the second one, what we see is cross-functional response teams. So form form teams with both IT and finance experts who could quickly diagnose and resolve the issues such as finance experts identifying the impacts of system error on financial reporting. So this cross cross-functional approach ensure that incidents were addressed from both the technical and business perspective. Then third important one is blameless postmortems. So after a system outage conducted a postmortem to analyze and analyze root causes and identify the improvements focusing on process and system outages rather than blaming individuals. So this approach fostered a culture of continuous improvement and also helps prevent similar issues in future. So the next one, what we are looking at is a technical results. So again, the number speaks. So the first one the system availability. So achieved near perfect uptime, significantly reducing the downtime, onur, continuous availability of the financial system. So this improvement enhance the user confidence and also reduce the risk of business disruption. Ana, an example, as SAP PS Fourna Cloud achieved 99.99% of time for Luhan after migrating from ECC ECC 6.0. So previously it was at 97%. The second one, what we see as a technical results is the incident resolution. Reduce the average time to resolve the incidents from four hours to one hour. Minimizing the business disruption. This faster resolution time include overall system reliability and also the user satisfaction. So third point what we are looking at as automation. So within 90% of recovery process procedures, were automated, human intervention is minimized, reducing the error and also speed up the recovery. So automation ensures consistency and reliability, which is essential for maintaining the critical business operations without any interruption. The next one, what we see is business impact. So again, how it is impacting the business, how businesses using this SRE. So the first one, what we see is a predictable month-end close. So achieve a predictable month-end close process with the financial statements consistently completed on time and improving the stakeholder conference. This predictability, reduce the stress and finance teams and also ensure timely reporting. Some of the companies after using this SRE approach, they reduce it from 10 days to five days during the financial closing. The second one, what we see is reliable reporting. So eliminated the system errors in financial reports ensuring the accurate and reliable financial data for decision making. This improvement enhanced the credibility of the financial reports and supported better business decisions. So the third business impact, what we see is really real time analytics. System provided real time access to financial data, enabling timely and informed business decisions. So this real time access supported agile decision making and improved overall business performance. As an example, like ma users SA PS 400 plus power BA for realtime container cost tracking cutting reporting errors by 90%. That is a big achievement for the ma. So the next one, what as part of this is next one is cultural transformation. So again, we have three different approaches how it transformed. So the first one, what we see is shared ownership. So both IT and finance teams participated in the regular, reliability reviews jointly prioritize the system improvements fostering a culture of shared responsibility. So this collaboration ensures that both technical and business perspectives were considered in the decision making. The next one is data driven decisions. So use the data from monitoring tools to identify areas needed needing investment such as upgrading infrastructure, opt or optimizing the processes. Ensuring the resources were allocated effectively. So this data driven approach supported, informed the decision making and optimized the resources used. So the third transformation, what we see is continuous improvement. So conducting monthly reviews of incidents and near misses to identify trends and implement improvements and continuously enhancing the system reliability. So this regular review process ensures that lessons learned were applied and that system was continuously optimized. So that is one of the major major improvement or the transformation we need to consider. So we need to keep on keep on adding the continuous improvement so that whatever the issues we have faced earlier should not be receiving in near future. Moving on the next one, what we have is a key implementation lessons. We have we have four different no four different approaches, what we see, so the start with business metrics. So focus on metrics like a transaction, complete time on financial closer. Close duration, which directly impacts the business operations. So this focus ensures that reliability improvements were aligned with business goals. Okay. The next one is balance feature velocity and reliability. It allows, so for certain number of errors on the issues. So LY is balancing the need of need for new features with maintaining system reliability. So this balance ensures that innovation did not compromise system stability. We can have a couple of issues which will come as part of that particular month end. Okay. And the next one, what we see is investing cross trans training. Conduct training sessions where SR engineers learned about the financial process. And similarly, the finance teams were educated on relatability principles. This cross training issues that both the teams understand each other, the priorities and could work together effectively. The final one, what we see is measured business outcomes monitored not just the system uptime on performance, but also impact on the financial operations such as time timeliness and accuracy of financial reporting. This comprehension measurement approach ensures that the technical improvements are translated into business benefits. So the next one, what we are looking at is expanding SRE across business systems. So the first one, what we are looking is procurement systems. So implementing reliability targets and monitoring for the procurement processes to ensure timely and accurate order processing. This expansion would improve the efficiency and also the reliability of supply chain operations. So Boeing applies SRE to SAP Ariba Slashing, PVO processes latency by 50%. So the next one is manufacturing operations. So applying SRE practices to manufacturing systems to minimize downtime and also ensures consistent production output. So this approach would enhance production reliability. And also reduce the operational disruptions. The third one, implementing SRE at HR platforms. So ensuring high availability and performance of HR systems such as payroll and employee self-service platforms. So this implementation would improve employee satisfaction and also operational efficiency. So the final one, what we are looking at is where we can apply customer service. So improving the reliability of customer service platforms and e-commerce systems to enhance customer satisfaction and reduce the service disruptions. So this focus would support a better customer experiences and also business growth. So this concludes our presentation. Thank you so much for joining here today. By addressing the system latency and global access disparities and reliability concerns we have significantly improved our financial operations. Our SRE approach has enhanced system availability and reduce the incident resolution times, and fostered a culture of continuous improvement. So we look forward to expanding these principles in across all other business systems for greater efficiency and reliability. I appreciate your attention and engagement. If you have any questions please reach out to me on my LinkedIn. Thank you. Thank you everyone.
...

Pavan Kumar Bollineni

Manager @ Deloitte



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)