Conf42 Site Reliability Engineering (SRE) 2025 - Online

- premiere 5PM GMT

AI-Powered Mainframe Modernization: Enhancing Reliability in Legacy System Transformation

Video size:

Abstract

Discover how AI revolutionizes mainframe modernization while ensuring reliability. Learn practical strategies for using machine learning to transform legacy systems, reduce incidents, and overcome technical challenges—all while maintaining operational stability throughout your modernization journey.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. This is Akala. As you guys know, mainframes has been around in multiple industries over, over the last couple of decades. They have been supporting with very robustness, scalable, and have been supporting operations, like financial transactions, insurance processing, healthcare transactions, and all of that. But however, at some point we have to modernize these mainframe applications. The, and the best way possible is right now to use an AI powered approach. And basically see how AI is basically enhancing the reliability in this legacy trans system transformation. I. As enterprises continue to rely on mainframes for mission critical applications, AI basically offers groundbreaking approaches to modernize these application, these systems, while maintaining operation stability. Today we are going to talk about what are the different technologies and tools AI basically offers to enable this transformation journey and focus primarily on the reliable engineering principles as we do that. As you can see, that, there are some key features from AI that, that are used as part of this transformation journey, which is automated code analysis, intelligence testing, predictive monitoring, and showcasing how these site liability engineers teams harness these capabilities to ensure that seamless this transition is seamlessly done even in the most critical environments possible. Now let's talk about some of the challenges that we encounter when during this modernization process. One is the legacy dependencies. There's a lot of logic and code built over the years that with limited documentation that we need to basically start identifying and tracking down. We have to, there's very complex interdependencies between systems built over decades. The next would be the knowledge gap. Most of these mainframe systems are built based on cobol. The COBOL is a very obsolete language today that we don't have the people with the right skills to manage and maintain the system or even understand what the code in the current system is. The next would be the operational risk. Since, as we mentioned earlier, that these mainframe systems are used in very highly transactional operation systems. The risk and stakes are very high when they're during this transaction and there is downtime required. The last would be performance. So when during modernization, the expectations are usually very high. That you know the performance should be subpar in your and to ensure that it, it outperforms the traditional, the old legacy transaction system. But the scale, there are factors, a scalability that needs to be taken into concentration. During these, during this transition. Transition. Next would be how AI is basically enabling us. To determine how what kind of resources and scaling that are required for mi for this specific migration. As you can see on the graph that we have three different types of tracking. One is a traditional tracking method. That means, traditionally if you don't use ai, you come up with a prediction of this is how it's supposed to be for how it's supposed to be scaled as part of this modernization. Second, that in the green you, that you see that's an AI driven approach wherein it understands the existing system, tries to identify the dependencies and gives you predictability of how much resources need to be allocated to make this happen. Then when you start to compare with the actual usage, you see that actual usage correlates highly with how the AI is predicted. By, by leveraging these AI different techniques, you can reach accuracy levels of 95%, 95%, and in very mature app implementations even the best of the best traditional, mature implementation, we achieve only to 60 to 70% accuracy. Next we talk about basically how natural language processing is playing a very key role in trying to understand, trying to basically transcript the existing documentation. So what NLP does is what AI brings into the picture is it basically can start to document any kind of manuals, documentation available and starts to learn and train itself, and trying to provide you actionable insights to determine how reliability and scalability needs to come into picture. Then at the same time, as we mentioned earlier, by reading through this documentation, it tries to identify the different dependencies and starts to create a semantic analysis and mapping around the different components of this implementation. And the last would be knowledge graphing. The knowledge graphings, basically trying to create a map around the different sources of information identified as part of the documentation. An LP with AI basically provides us the capability to, to pass through this unstructured documentation and provide you very valuable insights, which are very key for any kind of migration activity. Next, let's take a case study, an active case study, which is implemented out there in the financial services industry. So the project scope, as you can see that it is a 30-year-old legacy core banking system. With comprising of around 5.2 million lines of complex COBOL code and 400 mission critical dependent applications over the decade, and then there is strict zero downtime requirement because this is a financial services environment, how AI basically enabled this migration. First, it was able to do comprehensive automated dependency mappings as we mentioned earlier. That using NLP, trying to understand documentation, trying to create those semantic mappings, it was able to achieve that. Next, next would be to, to use AI to to generate advanced predictive incident prevention. That means you have to understand how to leverage AI to, to understand what kind of issues might foresee while doing a migration activity. An AI can provide you better tools and enablement to, to catch ahead of time and try to save save the time for debugging and trying to resolve it ahead of time itself. The next would be the real time performance monitoring with alerting. So while set, while doing the migration transition of. Piece by piece or sections of each of the of the system. This real time monitoring basically enabled us to basically start to see if there is any dip in performance or new spikes that we are noticing that we are not aware of, and also generate alerting alerts to to enable users to understand where it is going wrong. And then the last would be this is the most important critical piece while doing a migration from one, from the legacy to a modern system. The key is testing. So how manual testing over the years has always been a skew in entire project planning and pro project planning or migration planning. AI basically enables us to do automated intelligent test generation, which will automatically do the testing for us. While we transition the core from one system to another, so the outcome of it, these are the outcomes that we have encountered. We have basically reduced 72% in critical production incidents. We have mi, we have cut chart, the migration timeline. We have 14 months and in overall cost savings, we have saved on $4.3 million and 99.998% system. Our time maintained through, through throughout this implementation of this migration. Next, let's talk about as I mentioned earlier the criticality of this testing is very key during any kind of a migration initiative. There are four different capabilities that AI offers during this, creating a testing framework. One it gen, it provides you with this generation capability. It provides a comprehensive test scenarios based off the documentation, the migration, the user stories, the product documentation, all of that. Then the automated execution, once the code is ready automate automatically. The AI basic in provides us the tools to pick up the code, run through the testing scenarios and tell us whether it is working or not, or it also tells you which part of the code is basically causing the issues as well. Then based off the results that we have encountered over a period of testing, it also enables us to basically create pattern recognitions in all of the testing outcomes. It'll enable us to predict in future code migrations how we can leverage all of that predictability to to write better code and to do better migration activities. And then the last piece is the test refinement. So as we start to train the model to understand what the different test scenarios could be, and when we start to implement these product scenarios via code, it starts to start to self-improve itself and starts at better testing, testing scenarios and provides a better testing coverage. Us next. This is, I'm talking about, this is the implementation framework. How do we implement ai? First thing is the assessment phase During any kind of migration activity, the assessment phase is very key because that is where you catalog all existing systems, identify modernization candidates, and establish reliability base baselines using automated discovery tools. Next, we start with the pilot. We basically take a simple MVP and then start to migrate that or modernize that based off using these, using AI driven reliability improvements and scalable approaches and refining approaches to, to do that implementation. Once the SA pilot is successful, then is basically the implementing AI at a larger scale, then we will leverage AI to, and tooling to within existing SRE practices, focusing on code analysis, predictor monitoring, and automated testing capabilities. The last would be scale, the deployments. Now that we have implemented a pilot, you have now implemented the integration capabilities. Now we have to, we will leverage AI to determine how much scaling is required for this to successfully work the way it is supposed to work or a perform even better. So we extend proven approaches across the enterprise, maintaining continuous learning, learning loops to provide modernization outcomes. Then what are some of the common challenges and mitigation strategies that we need to consider during this migration? One, data quality issues. Primarily when we move from one system to another system, we have to take into account, but that the EO system acts in a different way. It ingested in a different way. It reads data in a different way, so we need to basically implement very data cleansing pipelines and verification algorithms to ensure. AI systems receive reliable inputs, and then we have to establish data quality scoring mechanisms and trigger human review for edge cases. Then technical debt. Ideally there's a lot of technical debt which is involved during any kind of a migration activity. We basically need to use AI to quantify that and quant categorize the technical depth, creating prioritization, remediation, roadmaps. Then we have to make sure that we implement automated refactoring tools to systematically address high impact issues first. So the prioritizations and everything are very key to any kind of tech technical data. So whatever product backlog we have, technical backlog we have to ensure that we prioritize it and then start to roll it out according to that itself. And then comes the skill gaps. So as we mentioned, as I mentioned earlier, the primary problem with. With legacy mainframe systems is to find people who have that COBOL understanding and knowledge. So we need to basically start to bridge the limited set of mainframe veterans that we can identify and bridge them with AI specialists to basically do a proper knowledge transfer and then ensure that that AI is properly educated along among these mainframe veterans to understand and then bridge the product gap and bridge, bridge the technology gap. Then now let's talk about some future directions, like how quantum computing applications will, is a game changer in this specific migration activity. First it would be around complex optimization problems. Quantum algorithms will dramatically accelerate optimization, challenges in resource allocation and performance during mainframe transition, solving in minutes. What currently takes days. Quantum computing basically gives you the capability to process problems or solve problems in very, in a very fast method. I'm talking about microseconds, not even seconds. It is that fast. We are still not there today, but in the future day when once we get into quantum computing, it solves many other cases. Not just pertaining to modernization, but it solves many other scenarios and questions as well, including space travel, including automated language learning. There's a lot of areas that quantum compute can help. Then next would be how the enhanced security modeling of what quantum computing offers. Quantum resistant cryptographic systems will safeguard sensitive data during migration. While quantum simulation will identify potential security vulnerabilities possible to detect with classic computing. What this means is it provides you the different tools and encryptions possible while transitioning data from one system to another system. We have a traditional encryption methods that we follow today. But this cryptographic encryption is taking this to the next level the vulnerable, it, it'll proactively identify all system vulnerabilities, security vulnerabilities, and try to identify what remediations can be in place and provide better encryption and cryptographic capabilities to, to safeguard it. Then the last the next capability is system behavior prediction. Quantum machine learning models will achieve unprecedented accuracy and predicted system behavior under load, enabling perfect fit cap capacity planning during critical migration phases. What this means is basically it allows you to provide you high levels of accuracy while predicting. A certain scenario, whether it is system behaviors, whether it is capacity planning, whatever it might be in every migration phase, it can give you unprecedented accuracies when when trying to implement predictability in that scenario. So early research basically now suggests that quantum approaches could possibly reduce complex migration timelines with 30 to 40% while improving reliability outcomes. That is a big number. Then practical guidance for SRE teams. So for site reliability engineering teams, some of the things that we need to consider why AI needs, why AI is a major player in any kind of an SRE environment is one. It basically reduces the incidence by 89%. It is a proven fact done and implemented and validated across multiple implementations. Average reduction in critical incidents during modernization when using AI driven reliability tools. As reduced by 89% compared to a traditional approach. Then the ROI is very key. The money plays a major factor or any kind of an ROI plays a major factor during an investment. So any kind of a mainframe modernization project has to be backed with an investment. So with AI being a major part of that migration activity it has proven that, or theoretically we have identified that. Around we have the ROI is basically generated 3.7 times what it was supposed to usually provide in a traditional migration activity. Then using AI assistance, we can save the time and planning efforts required as part of as part of the analysis. So during planning phases, analysis phases, we were able to, there were proven cases. The studies say that we were able to achieve 65% of time savings during that in, during analysis and planning phase. Us. So overall, I want to make sure this is this is how I want to conclude that, AI plays a very major role in how the reliability, scalability is, plays a major factor during any kind of a complex migration activity involved. So all of your, all of my friends who, who are thinking about migrating legacy system, please ensure that AI is implemented as part of that and all of those key and use my inputs to basically drive your your leadership towards understanding what the different benefits AI can offer during any kind of migration activity. Thank you so much for everyone's time. Have a lovely day. Thank you. Bye.
...

Sanath Chilakala

Director, Digital Solution Architecture @ NTT DATA

Sanath Chilakala's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)