Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey.
Hi everyone.
Welcome.
Thank you for joining me today.
In today's session we'll discuss how SRE approach has transformed financial
operations by addressing system latency, global access disparities,
and also reliability concerns.
Let's explore the key strategies and results achieved
through this transformation.
So moving on the next one, what we are looking at is the challenge.
So what are the challenges which are, we are currently facing?
As part of the financial systems which needed SRE.
So the first one, what we see is unpredictable performance
during the financial close.
The system response time increases drastically from an average of two
seconds to 30 seconds per transactions.
This latency caused delays in the consolidation, financial statements
leading to missing deadlines and the increase in manual intervention.
For instance, a finance team had to wait for extended periods to
complete each transaction, which not only slowed down the entire
process, but also led to overtime work and increased operational costs.
So the second challenge, what we see is global availability issues.
Teams located in different regions such as Asia and Europe, faced
inconsistent access to financial systems.
For example, during the peak hours, users in Asia frequently experienced a timeout,
some slowdown responses while users in Europe did not face any such issues.
This disparity in system availability, the availability of the global
teams to collaborate effectively and maintain the continuous operation.
As a result critical financial processes were delayed and overall efficiency
of the organization was compromised.
So the next issue, what we see is reliability concerns.
So the financial system experienced unexpected outages particularly during the
critical periods such as month end close.
These outages led to incomplete data entries and disrupted
financial workflows requiring ex extensive manual reconciliation.
To correct errors.
For instance, a certain system crash during the final stages of
financial reporting, causes data loss and also inconsistencies.
Which took several days to resolve.
So the result in increasing in the operational risk and reduce the
confidence among the stakeholders and potential financial losses due to
delayed in reporting condition making.
So as an example, like in British Airways in 2020 the the SAP outage was disrupted.
The payroll and invoicing cascading into 80 million losses.
Okay.
Moving on.
The next one, what we are looking at is Ari approach to finance transformation.
So here here we are looking at three different approaches how the
finance can transform this SRE.
So the first one, what we are looking is reframe financial systems as services.
So the financial transactions and reporting processes were treated as
services with the defined reliability on.
Performance targets and similar to how how we customer facing
applications are managed.
So this swift in perspective ensures that the financial system receive the
same level of attention and resources as the other critical business
services leading to improve the reliability and also the performance.
So the second approach, what we see is a defined clear reliability target.
Okay, let's say the, a certain example.
Certain example, the set service level object to.
So of 99.99% of time for the financial system, meaning it could only be down
for approximately four minutes per month.
This high standard insurance, that system was almost always reliable,
reducing the risk of downtime.
During the critical periods and improving the overall user satisfaction.
So Visa sets their at 99.999% to align it with the business critical operations,
which will reduce overall system downtime.
And reduce the latencies.
So the next one, what next approach, what we see is implement
a continuous measurement.
Integrate real time monitoring tools, so to track the system performance
and also user experience and enable proactive issues protection.
Enable proactive issue detection through prevent a predictive
analysis and an anomaly detection.
So continuous monitor monitoring flags will increase in a PA response
time prompting the system to automatically scale up resources before,
before user experience any delay.
So proactive scaling maintains performance standards and also
prevent potential slowdowns.
So the next one, what we are looking at is establishing financials, LO.
So again, we are having four different approaches here.
The first one, what we see is availability metrics.
Ensure that critical functions just speak.
The lecturer updates or financial reporting were almost always
available with the downtime, less than four minutes per month.
This high availability targets minimize the disruptions and ensures
that finance teams could rely on the system to complete their tasks on time.
Our next one, what we see is performances.
So set a performances where 95% of the financial transactions
should complete within two seconds.
Ensuring quick and efficient processing this target helps
maintaining a smooth workflow and reduce the time finance teams.
Spent waiting for the transactions to complete.
So the Coca-Cola mandated this two, two second sub-process times for any S-A-P-F-I
transaction during the quarter closures.
So the next one, the third one, what we are looking at is continuous refinement.
Every quarter the team should review system performance data and
not just the based on the observed trends and also the business needs.
So ensuring that.
Continuous alignment with the operational goals.
This regular review process allows the teams to adapt and change
in requirements continuously improve the system reliability.
The fourth one, what we look at is business aligned Excel goals.
So during month and close, the Excel was for the availability
was increased to 99.999.
5% allowing for only two minutes of downtime to ensure smooth and
uninterrupted financial operations.
So this higher target ensures that system remained highly reliable during
the most critical periods reducing the risk of delays and errors.
Okay.
So the next one, what we are looking at is automated monitoring and alerting.
So the first one, what we we have three different approaches how the system alert.
New alerts will be sent to the financial processes.
So the first one, what we are looking at is intelligent alerting.
So alerts for issues affecting month and close processes, or prioritized over
less critical functions ensuring rapid response and also high impact problems.
So this prioritization helps the teams to focus on most critical issues first
that minimizes the business disruption.
So the second one, what we see is.
Proactive detection.
So monitoring the tools monitoring tools that detect detected increases the latency
in transaction processing and alerts the system before it affect the user.
So following for primitive actions.
So this proactive approach helps prevent issues from cascading and
ensures a smooth user experience.
So the next one, what we see is a comprehensive coverage so monitored,
not just the system uptime and also the performance, but also completion
rates of the financial processes like reconciliations and reporting generations.
This comprehensive monitoring approach provides a holistic view
of system health and performance.
Okay, moving on.
The next one, what we are looking at is incident response time.
So in order to provide the response time, so for the incidents to try to take care
we defined the three different topics.
Approaches.
The first one is finance specific playbooks.
So develop playbooks for common issues like transaction failures
and data inconsistencies, detailing step by step resolution procedures,
and escalating escalation parts.
These playbooks provides clear guidance for teams and ensures quick
and effective incident resolution.
So the second one, what we see is cross-functional response teams.
So form form teams with both IT and finance experts who could quickly
diagnose and resolve the issues such as finance experts identifying the impacts
of system error on financial reporting.
So this cross cross-functional approach ensure that incidents
were addressed from both the technical and business perspective.
Then third important one is blameless postmortems.
So after a system outage conducted a postmortem to analyze and analyze root
causes and identify the improvements focusing on process and system outages
rather than blaming individuals.
So this approach fostered a culture of continuous improvement and also
helps prevent similar issues in future.
So the next one, what we are looking at is a technical results.
So again, the number speaks.
So the first one the system availability.
So achieved near perfect uptime, significantly reducing the downtime,
onur, continuous availability of the financial system.
So this improvement enhance the user confidence and also reduce
the risk of business disruption.
Ana, an example, as SAP PS Fourna Cloud achieved 99.99% of time for
Luhan after migrating from ECC ECC 6.0.
So previously it was at 97%.
The second one, what we see as a technical results is the incident resolution.
Reduce the average time to resolve the incidents from four hours to one hour.
Minimizing the business disruption.
This faster resolution time include overall system reliability
and also the user satisfaction.
So third point what we are looking at as automation.
So within 90% of recovery process procedures, were automated, human
intervention is minimized, reducing the error and also speed up the recovery.
So automation ensures consistency and reliability, which is essential
for maintaining the critical business operations without any interruption.
The next one, what we see is business impact.
So again, how it is impacting the business, how businesses using this SRE.
So the first one, what we see is a predictable month-end close.
So achieve a predictable month-end close process with the financial statements
consistently completed on time and improving the stakeholder conference.
This predictability, reduce the stress and finance teams and
also ensure timely reporting.
Some of the companies after using this SRE approach, they reduce it from 10 days to
five days during the financial closing.
The second one, what we see is reliable reporting.
So eliminated the system errors in financial reports ensuring the
accurate and reliable financial data for decision making.
This improvement enhanced the credibility of the financial reports
and supported better business decisions.
So the third business impact, what we see is really real time analytics.
System provided real time access to financial data, enabling timely
and informed business decisions.
So this real time access supported agile decision making and improved
overall business performance.
As an example, like ma users SA PS 400 plus power BA for realtime container cost
tracking cutting reporting errors by 90%.
That is a big achievement for the ma. So the next one, what as part of this
is next one is cultural transformation.
So again, we have three different approaches how it transformed.
So the first one, what we see is shared ownership.
So both IT and finance teams participated in the regular,
reliability reviews jointly prioritize the system improvements fostering
a culture of shared responsibility.
So this collaboration ensures that both technical and business perspectives
were considered in the decision making.
The next one is data driven decisions.
So use the data from monitoring tools to identify areas needed needing investment
such as upgrading infrastructure, opt or optimizing the processes.
Ensuring the resources were allocated effectively.
So this data driven approach supported, informed the decision making and
optimized the resources used.
So the third transformation, what we see is continuous improvement.
So conducting monthly reviews of incidents and near misses to identify trends and
implement improvements and continuously enhancing the system reliability.
So this regular review process ensures that lessons learned were applied and
that system was continuously optimized.
So that is one of the major major improvement or the
transformation we need to consider.
So we need to keep on keep on adding the continuous improvement so that
whatever the issues we have faced earlier should not be receiving in near future.
Moving on the next one, what we have is a key implementation lessons.
We have we have four different no four different approaches, what we see,
so the start with business metrics.
So focus on metrics like a transaction, complete time on financial closer.
Close duration, which directly impacts the business operations.
So this focus ensures that reliability improvements were
aligned with business goals.
Okay.
The next one is balance feature velocity and reliability.
It allows, so for certain number of errors on the issues.
So LY is balancing the need of need for new features with
maintaining system reliability.
So this balance ensures that innovation did not compromise system stability.
We can have a couple of issues which will come as part of that particular month end.
Okay.
And the next one, what we see is investing cross trans training.
Conduct training sessions where SR engineers learned
about the financial process.
And similarly, the finance teams were educated on relatability principles.
This cross training issues that both the teams understand each
other, the priorities and could work together effectively.
The final one, what we see is measured business outcomes monitored not just
the system uptime on performance, but also impact on the financial
operations such as time timeliness and accuracy of financial reporting.
This comprehension measurement approach ensures that the technical improvements
are translated into business benefits.
So the next one, what we are looking at is expanding SRE across business systems.
So the first one, what we are looking is procurement systems.
So implementing reliability targets and monitoring for the procurement
processes to ensure timely and accurate order processing.
This expansion would improve the efficiency and also the reliability
of supply chain operations.
So Boeing applies SRE to SAP Ariba Slashing, PVO processes latency by 50%.
So the next one is manufacturing operations.
So applying SRE practices to manufacturing systems to minimize downtime and also
ensures consistent production output.
So this approach would enhance production reliability.
And also reduce the operational disruptions.
The third one, implementing SRE at HR platforms.
So ensuring high availability and performance of HR systems such as payroll
and employee self-service platforms.
So this implementation would improve employee satisfaction
and also operational efficiency.
So the final one, what we are looking at is where we can apply customer service.
So improving the reliability of customer service platforms and e-commerce
systems to enhance customer satisfaction and reduce the service disruptions.
So this focus would support a better customer experiences
and also business growth.
So this concludes our presentation.
Thank you so much for joining here today.
By addressing the system latency and global access disparities and
reliability concerns we have significantly improved our financial operations.
Our SRE approach has enhanced system availability and reduce the incident
resolution times, and fostered a culture of continuous improvement.
So we look forward to expanding these principles in across all
other business systems for greater efficiency and reliability.
I appreciate your attention and engagement.
If you have any questions please reach out to me on my LinkedIn.
Thank you.
Thank you everyone.