Ransomware Readiness: Backup and Recovery in the Cloud

Video size:

Abstract

Don’t let ransomware hold your data hostage! Learn battle-tested techniques to build an unbreakable defense using immutable backups, automated recovery, and zero-trust architecture. Transform your backup infrastructure from a liability into your strongest security asset.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey everyone. Thank you for joining me today for this critical discussion on ransomware readiness, backup, and recovery in the cloud. My name is Sharri pti and I'm a principal cloud and security architect currently working for Trade and Track two, with over 15 years of uni experience across cloud, security, engineering, DevSecOps, and cloud solutions architecture. And then coming to this. What is ransomware? So ransomware refers to a business model that a wide in a wide range of associated technologies that unauthorized users to use to export money from entities, unauthorized users or system vulnerabilities to access data and then restrict the rightful owner from accessing it. So ransomware encrypts your data, making it, inaccessible. Until your ransom is paid, attackers often demand payment in cryptocurrency. So before we dive in, so let's start with some context. Ransom error attacks have exploded in recent years. It's hard to turn on the news without seeing another company getting hit. So according to the Sophos 2024 report. 59% of the organizations surveyed, they were done. Out of 5,000 organizations were hit by ransomware in the past year. That's just the attacks that were reported. Many incidents go unreported due to various reasons, like brand reputation, stock price impact, various other reasons. So if you look at the report, 94% of victim, victims said attackers targeted their backups. So that means it's, you're just not, that the attackers are targeting for your primary data, but also the backup infrastructure. So 32% of the victims whose data was encrypted also had data stolen. So that means. They, once your backups are compromised or your data is compromised, they will extract the data exfiltration of data so that they can host you as hostage for the data. And then the attacks, 70% of attacks result in data encryption. So that means once they accelerated the data out of the organization or network. They encrypt the data. And after that, so organizations whose attacks began with exploitation of unpatched vulnerability report considerably more severe outcomes than those were with the attacks started with compromise credentials. So based on these statistics, the average initial ransom demand that was being demanded is $2 million. And then 98% of the organizations that were attacked were recovered their encrypted data. So either by paying the ransom or from those backups. But lately the trend is slowly started shifting towards the restoration is happening from the backups that they have. Which is a good sign. And then the next one is. The recovery. So the average recovery cost is $2.7 million. So we excluding ransom payment, so 30% of these organizations had more than a month to recover. So that means once they're compromised, it's either due to lack of backups or lack of readiness or playbooks. There's no proper incident management. So all these due to these factors. It can take more than a month to recover from those incidents. That means once your applications are down or your organization applications are down, it can take 30 days to recover. That's a lot. In some scenarios, it can go beyond that. Today's object two is straightforward but crucial to ensure you understand the critical nature of disaster recovery planning in the context of modern ransomware threats. We will explore cutting edge backup strategies, bat battle tested techniques, zero trust architectures, and automated recovery workflows, transforming your backup infrastructure from your potential liability into your strongest security asset. So let's take a look at the ransomware landscape. Ransomware is not a static problem. The ransomware landscape is constantly evolving. You are playing chess with thinking humans on the other side who can quickly adapt. Just in the last year, we have seen threat actors significantly evolve their tools, techniques and procedures like ransomware as a service, including AI based sophisticated techniques. For example, we have witnessed the rise of intermittent encryption, so which speeds of the encryption process and awais the detection ransomware operations have become almost industrialized with the different specialized groups handling different parts of the attack. From initial incursion to negotiation specialists, reactor objectives are expanding beyond money. So some groups are now taking sides in geopolitical conflicts. Meanwhile, cyber insurance is falling short. Major global underwriters are reducing or re refusing ransomware coverage, adding sublimates to policies. And we have seen cases where payouts were denied, but classifying ransomware as an act of war. And attackers. Usually gain that attack pattern is established and repeating across ordinations. So usually it happens like event of chain a chain of events. So attackers gain initial access through infection vectors, including like phishing mls in malicious attachments, drive by downloads, vulnerabilities in remote access protocols. Exploited vulnerabilities both in operating systems or in framework language frameworks. So compromise credentials, flawed public facing applications, something like S3 or weak authentication setup. So from there they can get into your infrastructure, cloud infrastructure, and cloud storage. So they do the discovery actions. Then they can go to the next state of events like a data destruction, data exfiltration, data encryption. So these can impact a lot of ways. So they establish a persistence. They move laterally throughout the network. They exfiltrate data before encrypting it. They deploy encryption and demand ransom. So sometimes in usual cases, some of these organizations, the attackers were so persistent that they can stay there for months, like two to three months. They will do the slowly it, these operations in the backend without any detection. So once they hold all the footprint what they want, that's when they'll start encryption process. And here is what the particularly concerning in most organizations making, it's much easier to block inbound traffic than outbound traffic. So making data exfiltration a significant risk because the, before they do the encryption, usually they try to exfiltrate your data so that they'll try to demand or they'll try to do the extortion. They'll try to do the extortion demand sim telling the sensitive data to expose it public. And that's what like the, those are like the chain of events that it can happen. So coming to attack scenarios, attack scenarios are a little bit different, like the same little bit on similar way that what we just talked about. So like data encryption. So usually attackers log down critical data and files making it inaccessible, and then system disruption. So they try to halt your, all your systems, like operational systems or rendered unusable, disrupt critical systems and services. So that means you cannot operate anything inside your organization. Related to infra infrastructure, IT infrastructure DA acceleration is a sensitive data is stolen before encryption. So they try to have this data before encryption so that they can demand ransom payments or threats saying that this will be released to public. And then the next one is the extortion demands. So they demand ransom payments to restore access with threats or public data leaks, like you pay the ransom through the crypto cryptocurrency. Then they'll ask, they'll tell you to give the key. They'll promise you that, it never goes that way. So the common fit Pauls. In ransomware response is when an attack happens, organizations often make critical mistakes that compound the damage. So these areas are little bit wider here. So there are inadequate crisis communications. So that means too often organizations lack a clear crisis communication plan. In the midst of an attack. You need to know exactly who's involved and who takes ga specific actions. Accounting to a P WC study. Pricewater Cooper study. 69% of our nation that successfully navigated a ransomware attack had a documented and practiced communication plans. So having that practice and documentation will really help you when in need. And there is. The other one is insufficient logging and monitoring. So usually security teams depend on these. They need access to logs to build an incident response timelines. That's how they know when the attacker enter into the network or when the data has been compromised. But many organizations discovered too late that their logging was inadequate. So research reports found that organizations with proper security monitoring detected breaches 74 days faster on average than those without. And the next one is unprepared for ransom decisions. So many crucial questions arise when there is a incident happened. Who can authorize a ransom payment? Has your board approved such payments? Do you have cryptocurrency available? Are there legal implications? The user, US Treasury Department, has sanctioned several ransomware groups, making payments to them potentially illegal, so making this option completely off the table and then underestimating recovery times. So most of these organizations often assume recovery will be quick, but I. But the co but comprehensive incident response takes a lot of time, sometimes without any preparations. So IBM's cost of data breach report indicates that the average time to identify and contain a breach is 2 77 days. That's over nine months. So that's, those are like the common pitfalls in ransomware response. So without that proper response, you will be doomed when there is a real incident happens. And then there are back of failures happens because this is perhaps the most devastating pitfall. So many organizations discover too late that their backups are either compromised or encrypted. Incomplete or too or doted to be useful, inaccessible when needed, most unable to be restored in a timely manner. So here the key takeaway, it's not if, but when ordination should be ready for ransomware situations. So how do you do this recovery planning and the recovery planning considerations or strategy? The When planning your recovery strategy, four key factors will impact both effectiveness and cost. So one is scope. This tells you how much data needs protection, and then our people, so recovery point object you. How much data can you offer to recreate or lose? And the third one is r table recovery time objective. So how quickly. Must you recover? What is the cost of downtime? And then the last one is copies. So how many backup copies do you need to maintain? So different workloads will have different requirements, and then these things can be derived if you, in order to do the recovery strategy or you want to do the optimize that recovery strategy. The critical thing that you need to do is the data classification. You need to have, you need to make sure you have your organization has a data classification already done or defined. Take a step back one step back to like I did mention RPO and RTO as part of the strategy. These are critical in defining what kind of strategy that you want to design. And to do that, you need to define these. And you will define based on the data classification. So when it comes to RPO and RT O, how much data can you afford to recreate or lose when a disaster happens, like of the disaster happen couple of days back, can you withstand these two days downtime or two days loss of data? That's called RPO. Then the other one is how quickly must you recover? What is the cost of downtime? That means how long you can offer to lose your downtime. So that means your own application is down, like how long you can offer that. So you are, you can you offer one hour? You can, you have eight hours, can you 10 hours. So that's our table. And then. Coming back to the data classification. So that's a critical thing in order to discover your R-P-O-R-T-O or the scope or the copies. So how do you decide the criticality? So if you look at here, you need to understand what type of data you have and who owns it, and if it is critical for your business. What does actually critical means? So does it impact your customer? Does it impact your cus business revenue? So how do you determine your, the criticality of data criticality or data classification? So you do this by simple flow chart here. So you can define is your data operation so operational, that means your data is being, operated or not. So that means you have, assuming your data is completely lost. So what do you do? So if the business was to move forward without the data, or would there be an impact, if there will be an impact, that means can the data be recreated? If you cannot recreate, that means that is your critical, true, critical data, and then you should go for a backup plan. If the data is not that it can be recreated. So that means it takes how long it'll take to recreate the data. Can it take in minutes or hours? So based on that new further guidance is required, and if the data where there would be impact. If there is no impact, then it is not a critical data. So that's how you simply follow the chart and determine what kind of critical data that. You have. So first you need to know your data. And then there's a different payer and different RP and RP RTOs defined for each of these kind of bus de, depending on their data classification. Generally the Michigan mission critical systems, these are like re required true disaster recovery solutions, typically highest cost here, like example payment processing, critical infrastructure. Which typically they have RP and RTO within less than 10 minutes. So that means without these systems you'll be losing a lot of money. And the next one is business critical systems. These are like required for these require robust disaster recovery solutions. Examples like ERP systems or customer facing applications, and generally these RPOR tvOS for these type of. Business critical systems is around less than one hour. And the next kind of hitting is business Operational systems like these are, like, can we often use standard backup solutions? The RPO and RT O is like less than eight hours, like examples, like internal collaboration tools, reporting systems. So even if they are down, that's the impact is not much for any customer or revenue generation. And the last but not least, is administrative systems. These are like standard. You can go for standard backup solutions that are sufficient like exams and HR systems or documentation repositories, something like that. So these typically the RQR and RQO fall in 24 to 48 hours. So depending on these that you define RT o at RPU and the data classification and critical systems, you can design your, strategy against your resilience. So how resilient you want to test set up. So if you take this one, this slide gives you three scenarios. Like one is a backup scenario, how much it costs, and what are the RT word and RPO, and then the disaster recovery. So what are the typical RP and RT was? Remember that every minute of downtime costs money. So according to Gartner, the average cost of it downtime is $5,600 per minute. So which extrapolates to over 300,000, 300,000 per hour. This will help you determine what kind of application or what kind of data that you have and what kind of strategy that you want to build. Resilience strategy like we, whether you want to go for backups with RP and RTO within hours, or you want to go with the disaster recovery, which has RPO and RTO within seconds and minutes. Or you can go for R-P-Y-R-T-O in real time, near real time, literal, within seconds. Like basically this is a high availability. So this is this high availability is designed in the phase of architecture, operational mechanisms. That's when you design these things. Obviously the cost association will be something respect to the, each of these strategies. Then coming to the next part which is my favorite thing. So let's talk about a functional element of ransomware readiness. So the 3 2 1 1 0 backup tool. This is an evolution of the traditional 3, 2, 1 backup strategy. So design specifically to enhance data production, again, as modern threats like ransomware, while ensuring reliable. Recoverability. So back in the days when I used to work on the infrastructure, which is hosted in the on-prem, this used to be a standard now like 3, 2, 1, and then now everything is cloud. Most of our nations moving towards cloud and it's imperative that a lot of the attacks that are also happening in the same space now, there's a need to evolve that traditional. Rule to customize or adapt for the new threats, modern threats like ransomware. So let's go a little bit more in detail on these. What is 3 2, 1, 1 0 backup rule is, so the starting one, three through. So to maintain three copies of your data. So basically it says you need to maintain three copies of data, including your production, so one primary copy, which is a production, and two backup copies. This ancy ensures that even if one or two copies are compromised, you will have a reliable backup. And the next is number two. Use two different type of storage media storage store backups on at least two different storage types. So traditionally it should be like hard drives, SSDs tapes. Now with the cloud, you can do with the cloud storages like block storage or object storage based on the Your data classification. Yeah, in the data tire, this diversification minimizes the risk of simultaneous failures due to hardware or software issues. The next is one, so keep one copy offsite, so at least one backup should be stored in a separate location such as a remote facility or in the cloud. These products against local disasters like fires or floods that could destroy all on premises data. Or if it is the cloud, then the clouds does have regions that available to zones they do can hit with these natural disasters like fires or floods or earthquakes. So you need to be prepared for that. And then the next one is store one copy offline or make it immutable. So this part is a new. In adoption of ransomware readiness or ransomware production. So you need to make it I immutable. So this is crucial for ransomware production. So one backup must be either offline, like a gap, completely disconnected from networks, making it inaccessible to ransomware, and then immutable, so cannot be altered or deleted even by the administrators adding an additional layer of security. The last part is zero. So the zero represents ensure zero backup errors through regular verification, regular test that backups are error free and can be successfully restored ac. According to your Veeam survey, 40% of recovery attempt fail and 70, 70, 70 7% of our nations don't test their backups frequently enough. I did mention a couple of terms there. One is immutable, other one is AirCap. So let's see. What are those main terms? So the immutable backups or write ones, read many, like one model apply to backup data provides immutability to recover from accidental or mania ions. Once data is written to your warm compliant backup system, which is. Right ones read many. It cannot be altered or deleted. This immutability is your first line of defense against both accidental deletions. And the next one is a gap walls. So this a gap walls is the key to secure backups. But immu immutability alone isn't always enough, like what we talked about just before. So we did say that this immutability is your first line of defense, but it's not just always enough. With the recent threats, technology or sophistication that what is going through these hands on wear, we need to take it to the next step further. That's where the concept of a gap vault comes in. So imagine a secure, isolated location for your immutable backup. This vault is logically separated from your primary environment. Make it significantly harder for threats to reach and compromise your recovery data. With the air gap vault, the immutable backup copies are locked by default and further protected through encryption using service provider owned case. It's not a custom manager or customer created keys. These are with the Hyatt highest level of encryption, like a S 2 56. Encrypting recovery points within service provider owned key, not only safeguards against accidental or unwanted deletions of user management keys, but also reduces operational overhead and key management costs for users. I. The Logically aid gap Vault simplifies sharing backups for restore purposes across accounts using Resource Access Manager called ran. So customers can share the vault data with the specific accounts, including cross organization, cross region for faster direct restored. Once the vault is shared, the backups can be directly restored. Removing the step where backups are copied into the destination account first this reduces the operational overhead time to recover from a data loss event and cost of extra copies. These walls can be configured in couple of modes. So one is governance mode and governance mode. So what is the difference? Like what is governance mode and what is compliance mode? Governance mode is intended to allow a war to be managed only by the users with sufficient IAM privileges. So that means specific level of administrators can only administer them. Vault lock is deletable if needed. So all the recovery points in the vault are locked and cannot be deleted until the lock is lifted. So that's the governance mode and the compliance mode is a little bit different. So this is designed for backup walls in which the vault is expected to never be deleted or altered until the data retention period is complete. So when a vault is comple is in compliance mode. Is locked. The lock can't be changed because it's immutable. No one including the root user or your service provider can manage or remove it. However, you can define a grace period, also known as cooling off period before the war locks and becomes immutable. The only method to remove the lock is to terminate the account. However, doing so also. Deletes all your previous backups if needed. Accounts can be restored from cloud service providers within some time limits. So example, AWS, if your account is getting deleted, then you can always restore it back within the 90 days. So that's an AM amount of time. And then implementation tip. So AWS backup and Google Cloud now offers logically walls that are immutable by default with compliance mode. Wall log. These walls support cross account sharing with the resource access manager and use service phone KMS keys to protect against key compromise. Along with these two providers, there are other custom providing solution providers like, veeam. And then there's other one, ELAs osteo, there's a red bear. So there are multiple these different multiple vendors that who supports these kind of AAF vault solutions. And then moving on. So let's move on to the advanced backup architecture. So let's discuss how to implement a ransom resilient backup architecture. So if you look at this so before I go into the architecture diagram explanation, so I'll just do a little bit approach like so how this is approached to recovery architecture. So it's a multi-layered approach. In the security world, there is a concept known as defense and depth. So let's take a very similar approach in terms of making it harder for bad actors to be able to compromise backups. But defense and depth, similar to the analogy of casl, the A medieval casl where you have multiple layers of defense, you'll have a MO filled with water, you'll have big capes. A big approach is no different in the security world. So you want to make it as hard as possible for the bad actors to be able to achieve those objectives. So implement a local backup walls for operational backup needs. Create a logically air gap walls in dedicated accounts to store immutable backups. So implement these logically air ga walls for multi-cloud environments as well. So you can apply that for both multi-cloud provider or standalone cloud provider design, a cross design cross region world architecture to provide production against regional disruption. So we do want to specify this because. If you are doing the backups in the same region, when it comes to the disaster recovery, if something happens in that region, could be natural disaster like floods, fires, earthquake, anything. So this will really help you. So this will serve the boats of both purposes, disaster recovery, as well as the ransomware recovery, ransomware, backup and recovery. And then establish a clean room, so which is like a new account for data validation from the aid gap world. So if you look at the diagram, so here, so I have the main production workload accounts, which is in region one. So I have all these different type of workloads starting from object stores, block storage, like computing resources, relational databases, those equal databases or graph TVs or document ds. So all these are being, locked or backed up to a local backup vault. So this local backup vault is being encrypted with a, either a customer managed key or a service provider key. So this will help you in case of any operational purposes, say you have some issue happen with your database and you want to reach, restore the. Yesterday's backup or right before the snapshot, right before that incident. So not related to the security incident, but basically any operational incident that happened internally or some deployment happened, which corrupted the data. So you want to roll it back. So in those case scenarios, this local backup port will help you. And the next one. So we are designing this, the air gap vaults, to be in a dedicated account so that way, even if your primary account gets compromised, you don't want the attacker to be connected to your air gap vault account, so that way your Air Gap vault account is completely isolated, cannot be reachable. Neither through network setup or your or national network or production network to the dedicated account because you don't need a network access kind of thing. It can be either in the same region or cross region. So for instance, for demonstration purpose, I showed you here as a region too. So that means we are calling this as a data bunker account. So where you'll be saving or centralized account to manage your a gap. World datas this one is being guarded with a service provider keys itself instead of your custom keys. So that way. Even if something happened or attacker gets gain access to the account, he cannot do anything without the keys. So since it is not being created by custom customer or custom key, it's a service provider, which he cannot, it's impossible to get me access for the key. So that makes it more harder for the attacker to even touch the ACAP walls. So even if the account gets account being compromised. The account, the backups are immutable so it cannot be touched or altered or tampered. So all they can do is delete the account, even if the account is been deleted. We just talked about it a little bit before. The accounts can be always be restored within after some days or within a few months, so that way you still have a chance of recording your data. And then the next crucial point here is. You have the backup account data, bunker account where you are keeping your backups, and then, if the incident happens, like if the ware attack happened and there is a need to restore your a reward backup stuff to your account. We don't want to do that back to your primary account like production account again, back, because you never know, the attackers still may have access to your network somewhere. So don't assume that assume that it is screen compromised, so don't ever put back your restore or backups or restored backup solutions into the same account again. It'll be compromised again. The attacker can get an access and then again, the same, repeated the same thing. So that's the first thing. So don't assume that it's being compromised. You don't want to go back to that account. And the next option is you create a, either you recover in the same backup account. Or you can create a fresh account so that way you can deploy them clean. So that way you are sure that attacker doesn't have any access that's clean, only you have access, or your automation have access. Sometimes even if there's a the assuming the backups are good is a is also a big mistake. So you need a security controls for your backup systems. So you need implement multifactor authentication for backup system access. Use role-based access control with principle of least privilege. Enable geofencing to restrict access from specific IP ranges because when in case to access the backups you don't need, everybody doesn't need, doesn't access your backups. So only the administrators or corresponding teams only. Do the just specific job, so you can restrict that through the geofencing and set up detailed audit logging for all your backup system activities. That means when these backups are being done or initiated or completed, so make sure these actions are being logged so that way you have a trail log and then deploy animal detections to identify suspicious backup behavior in case. You wanna make sure if somebody trying to tamper with your backup solutions as well, or backup systems or walls. So you need to make sure you have an MLE detections enabled on these things to identify that kind of behavior. And also, next thing is automated backup testing. So having backups a backup isn't a backup until you have successfully restart it. So that means a crucial, but often overlooked component is regular automated testing of your backups. So according to Gartner, 30% of our nation that test their disaster recovery plans find critical gaps that would have prevented successful recovery. So it's pretty important to have that kind of automated backup testing. So in this scenario I explained or design in such a way you, you can either. Do the automated testing in a different account called the forensic account where you will be doing the backup, restore, and then to a actual resource type that which you want to deploy the data to restore to. Using that, using the ram like resource access managers. You don't need to do a copy manually or anything. So it's easy with the air gap vault with the resource access manager management access, so you can simply give the access to a cross account or external account, anything that you trust, you can configure that policy so that your walls are being accessed to the your resource can be accessed to the backups. Once you restore the backups in the account or in the same account, you can test that backup data that you need to make sure that you validate data integrity, verify the recoverability metrics, and then also schedule the regular test scores. So you do that and then you can also add additional step on top of it, like a scanning, scan your data that you restored, so that way your data is clean, malware free. Or any kind of ransomware kind of malware, so that way you can set up alerts if something like the automation, automated backup testing completed successfully or not. So that way you will be informed and if it is something happened, you have a way to track it and then pay the pay immediate attention to that step or alert. And then automatically clean up the test resource. So once you're done with automa, automated backup testing, you need to make sure you clean up your resource because this is going to incur cost costs if you want to keep them running, because it's just not a one-time iteration. It'll be keep happening every day or every, based on how you schedule that kind of backup validation. It could be weekly ones, daily ones monthly ones, something like that. So make sure you document results for compliance purposes. And then the last one is how do you recover? So in case of actual incidents happen now you have all these precautionary steps all in place. You have the eight gap walls, you have the locked vault locks, you have the. Game as Keys service provider game as keys. You have the Restore automated test, backup testing capacities, including the scanning of the restore resources or data and the data integrity validations. So all these are steps are already been placed. So now I. It's a one step thing that you can restore your data in case of actual incident, if some attack happens and you cannot use your primary production environment, you can simply create a new recovery account or clean room and then provide the resource access, management access through the ran. Through your backup air gap ward to the new account that you created, and then you can restore the backups straight. You don't even need to copy the backups from the Air gap ward account to the new clean room because that's the beauty of air gap ward log. So you can just assume that role and then deploy it, restore, restored into your new clean room, and then your applications, your all your data is being restored and ready to go. So that's a that's a design of a ransomware recovery architecture, how you can implement that's effective. We have, and then the. As I mentioned earlier, restore testing is a critical piece that you need to consider. So most often this has been overlooked. Backup, restore testing can assess re recoverability of your business data against data loss events and prove the recovery posture of for compliance using custom defined resource testing plans. Because sometimes organizations do have different kind of complaints that they need to follow, depending on type of industry they are in financial healthcare kind of things. So those, this will really help in managing that posture strong periodic resource test. The supported resources that's been backing up. This will give you a much confidence on restore care capabilities. And then moving on to the next section, which is a ba what factors impact the cost on these backup cost associations or backup cost factors? You can say. So there are four areas like scope, how much data needs protection, and then frequency, how often should backups occur, and then retention. How long should backups be stored. And then the last one is copies. How many redundant copies are necessary? So based on these things, you, your cost can vary. So if you look at here, so that the re resiliency options that you want to choose in the cloud environments, so usually the backup and restore, that's a least cost associated because your RPO and RT was typically in ours. And then the pilot light, which is like a ready to go. But it can take a little bit recovery point. Object too can be minutes. And recovery time object is an hour, so that means your data is within the tolerable time, like a few minutes, but your restore time is within one hour or one and a half, two hours, something like that. And then the warm standby, this will increase the cost as well because you are almost maintaining a secondary kind of setup or environment. But you will get a good RPO and RTO in minutes, but the cost will be much more. Then coming to the last one, act two Act traditionally in the on-prem infrastructures architecture kind of thing, used to have Act two. Act two, like we used to have a act two data center, and then there disaster recovery data center set up where you have the global load balancers for every application. If something goes down in one of one data center, automatically, the other one will pick up. The route traffic will be routed, so it's like within seconds you won't see any customer impact or anything. So it's within seconds or within real time. Your RPO and rts are close to real time, and then cost is much more because you are paying twice your production environment. So that's that you should be wary of. And then coming to cost optimizations or use cases. So where this majority of your data being sitting in your environments majorly you have three kind of storage places where one is a block storage. Something like if I want to take an analogy of AWS, like it's EBS volumes, like ideal for high performance applications requiring fast access to data, and then the other one is object storage. These are best for unstructured data and large scale storage needs, like logs or anything. So this is something related to, you can analogy with S3 in AWS. And then the other one is relational database or the non-relational databases. This is suitable for applications needing complex queries and transactions. Typically, this is like your RDS in the AWS. It could be anything. It could be my QL Postgres, M-S-S-Q-L, or even the No SQL DB Cassandra, something like that. So these are the usual cases there where you will be mostly, you'll be placing or holding your data onto. So based on how your application is designed when it comes to the automation best practices, since we know the which kind of what kind of cloud storage that you are using for your use cases. Based on that we can derive the cost optimization based practices. So in terms of block storage, so this is especially like block storage, something like EBS know your change rate and projected gout, so you know. How much data you'll be accumulating or what is the growth that you projected? Accumulated data and retain backups only as long as needed because this could cost you a little bit more and use cold storage when appropriate. So if you, if your data is not in operation that you just need it for some kind of rare usage you can use a cold storage. That way you can reduce the storage storage cost for, especially related to block storage. And then the obs object storage, so something like S3. So use continuous backups with periodic backups, delete order back object versions and clean up expired, delete markets markers, adjust backup retention and copy frequency retention if needed. Because those are the few ways that you can optimize your costs in the object storage class. And the third one is database or relational database storages. So you can leverage free backup during continuous backup retention periods. Some providers offer these kind of initial free backups, and then you only keep backups past continuous backups. Factor in backup copies costs because based on your database size, you are you. You can incur a lot of costs when you want to have number of copies, more number of copies, so that will impact directly to your costs. Associations access and understand your current cloud backup costs because you can assess your, what is your current storage costs are, cloud operation costs are, and then how, what kind of backup solution that you are planning. Like we, we just discussed our couple of our picture styles or our pictures that can pull foolproof ransomware readiness. So these will give you a little bit more understanding on your backup costs. Implement lifecycle policies to automatically move older backups to lawyers cost storage tiers while maintaining appropriate accessibility, because that's, you don't want to have these ever-growing storage needs on a hot or hot storages. So you can go for in infrequent access or glacier storage, something like that kind of things. So that can reduce your storage costs, especially related to the cloud optimizations or cost optimization best practices, especially related to the storage. So moving on. Building a ransomware incident response plan. So despite. Your best efforts, a ransomware attack may still occur, so therefore, it's essential to have a robust incident response plan in place to minimize the impact and facilitate a safe recovery. So you need to, there are a few areas that you need to focus on. So define roles and responsibilities. So you need to make sure, establish a cross-functional incident response team clearly define who makes critical decisions. Include technical, legal, communications and executive stakeholders. Document backup escalation contacts for each one, each role. Believe me, these things will help you a lot when you are in a real incident and then create communication protocols. So when some when things get seized, you don't have a way to communicate in a normal organization, the communication channels can be down, depending on what type of incident happened on the type of ransomware attack happened. So it could be your administrative tools or platforms got encrypted or, systems impacted. So in those cases, you don't have a way to communicate in a regular way. Instead, you should create your communication protocols in case if something happens. So establish secure outand communication channels, develop templates for internal and external communications. Include notification procedures for regulatory requirements, document contact information for law enforcement, and third party experts. Document recovery procedure is also important because you need to create a step-by-step recovery playbook so that you know who you know, who needs to do which parts, and what are the sequence of steps. And you need to include decision trees for various attack scenarios and document restore procedures for inform immutable backups. Maintain up to date system conversation documentation. So these things. Excuse me. And then the last one, practice regularly. This is the most oftenly over liquid item. So until, unless you test your backups. Unless you test your backups and they're valid, you cannot call them backups. So you need to practice these regularly. So conduct tabletop exercises quarterly, run full recovery simulations at least annually or semi-annually. Update processes based on exercise findings involve executors in practice scenarios. Remember that in the heat of incident documented procedures are invaluable. The National Institute of Standards and Technology reports that organizations with documented incident response plans resolve incidents 40% faster than those without these plans. So coming to the end, so I'll just to do a few conclusion and key stake, key takeaways from this session. To wrap up today's discussion. So let's emphasize these critical points. So ransomware is evolving and inevitable, so it's not going away. So you need to, it's not a matter of if, but when. So that threat landscape continue to evolve with attacks becoming more sophisticated and demanding. So how, it doesn't matter how, what kind of shift left kind of strategy that you follow. Predict two or pro to or detect two kind of tools that you implement or technologies that you have. The scale that it is evolving, the ransomware is evolving, the threat landscape is abnormal so that the sophistication is also extended with recently with the ai introduction. So you know for sure it's not. It's not if, but when, and then backups are your last line of defense. So believe me. So if, but only if backups are properly protected, tested, recoverable, adopting the 3, 2, 1, 1 0 rule that we talked about. Zero. Trust, ransomware architecture, and immutable backups provide a solid foundation for ransomware resilience. And the third one, recovery capabilities must be tested. So untested backups often fail when needed most. So regular testing is essential to ensure your recover capabilities are functional and prepare preparedness preparation reduces impact. So orations with documented and practice response plans recover faster and experience less financial impact. And then the last but not least is incident response requires cross-functional collaborations. So the technical teams, executives, legal, communications, and other stakeholders must work together tirelessly during an incident. So I would like to close with this part. Ransomware readiness isn't just about technology. It's about people, process and preparation. The organization that handle ransomware incident most effectively are those that have invested time in planning, practicing, and preparing before an attack occurs. Thank you for attending today's UDE conference. Hope this session is useful for you and your our nation. Thank you.

Slides

Download slides (PDF)

See all 109 talks at this event!

Conf42 Site Reliability Engineering (SRE) 2025 - Online

April 17 2025 - premiere 5PM GMT

Ransomware Readiness: Backup and Recovery in the Cloud

Video size:

Abstract

Summary

Transcript

Slides

Srihari Pakalapati

Principle Cloud & Security Architect @ Trader Interactive

Join the community!

Featured event

2026

2025

Info

Conf42 Site Reliability Engineering (SRE) 2025 - Online

April 17 2025 - premiere 5PM GMT

Ransomware Readiness: Backup and Recovery in the Cloud

Video size:

Abstract

Summary

Transcript

Slides

Srihari Pakalapati

Principle Cloud & Security Architect @ Trader Interactive

Join the community!