Conf42 DevSecOps 2021 - Online

Patterns for Encrypting Data at Rest in Cloud-native Applications

Video size:

Abstract

All Enterprises across different Industries have begun to rely on Data to enable Business decisions, processes and workflows. Data comes in different types and there are a plethora of data storage solutions for cloud-native applications. Data is stored and processed in a highly distributed fashion to fuel Analytics, AI/ML, Edge/IoT use cases. These factors open up challenges in securing the data and protecting the sensitive information.

Encryption is the de-facto mechanism to protect data from malicious users. Encrypting Data at Rest is a fundamental requirement for many Organizations. This talk will introduce you to the different patterns for achieving Data encryption at rest, the relative merits and de-merits of the approaches, challenges and solutions. Attendees will benefit from this talk by gaining a good understanding of the different techniques and which ones to use for different use cases.

Summary

  • In cloud native applications, data at rest refers to data that resides in some sort of a storage. Today several applications make use of a huge amount of data. It is very vital from a security standpoint to encrypting the data.
  • My primary job is to architect and develop cloud native AI ML platforms. These are platforms used for running different types of machine learning models. Many of the machine learning applications rely on huge amount of data. When data is stored, data needs to be encrypted in order to protect these sensitive information.
  • What exactly is cloud native? Is it just a jargon or does it has any real meaning? Cloud native applications are typically developed as decoupled microservices. They are also containerized applications because of the benefits containers offer.
  • The entire cloudnative apps application is divided into four layers. At the topmost layer we have the microservice, which encapsulates the business logic of the application itself. The third layer is actually the volume, these volume on which the database is running. The fourth layer is the actual infrastructure layer, which consists of the disks themselves.
  • There are four different layers at which you can implement encrypting at rest. Data can be encrypted by the microservice itself before it is stored in the database. And finally, these last layer for doing the encryption is the disk itself. There are benefits, merits and demerits of each of these layers.
  • The patterns of encrypting at the microservice level has the least attack surface. The next pattern is encryption by the database itself. This requires the database containers to be run in privileged mode. It is highly a plug and play solution if at all you find the right fit of database solution.
  • The third pattern that I want to present is volume level encryption. Public cloud providers invariably provide you with volume services which have inherent capability of encrypting. The final pattern that you should consider is the disk level encryption, which has the highest attack surface.
  • I hope you will be able to choose what is the right type of data encryption at rest. With that, I come to the end of this presentation. If you have any questions, you can reach out to me on Twitter itself.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I am Sentul from Ericsson. Today my talk is going to be about patterns for encrypting data at rest. In cloud native applications, data at rest actually refers to data that resides in some sort of a storage. And nowadays several applications make use of a huge amount of data. For instance, machine learning applications. These applications have to process a huge amount of data and it is very vital from a security standpoint to encrypting the data at rest. And this is more or less a primary requirement for many organizations that handle data. So in these talk, I'm going to talk about what cloud native applications are, how to look at data encrypting data rest using a layered approach, and I will also be talking about patterns for encrypting data at rest. So let's get started. A brief introduction about me I work in Ericsson and my primary job is to architect and develop cloud native AI ML platforms. And by the way, these are platforms that are used for running different types of machine learning models in a highly distributed fashion and with massive scale. And many of the machine learning applications rely on huge amount of data. And this data can be in batch format or it could be in real time streaming format. So whatever it is, when data is processed by any cloud native application, it needs to be stored in some sort of a storage. And when data is stored, data needs to be encrypted in order to protect these sensitive information. So that is what I'm going to talk about today. And by the way, I am also the organizer for Kubernetes Community Days Chennai 2022 that is coming up sometime next year and we are gearing up to hosting the very first event of KCD, Chennai 2022. I am maintainer of an open source project called as Cube Fledged which is all about caching container images directly on the Kubernetes worker nodes. I am a tech blogger and I am fairly active in Twitter and I am a big fan of the money heist series in Netflix. And by the way, season five part two is launching on third December. So don't miss it. The agenda for my talk is going to be three different sections. In the first section I will talk about what is a cloud native application. So why is cloud native application development very popular and how data is being stored, processed and transferred in a typical cloud native application. My second section of the talk is data encrypting at different layers of a cloud native application. This is where I will split these application into different layers and we will see how we can implement a data encryption solutions at these different layers. And third, I will be talking about various patterns for encrypting data at rest. So what is it? So you might have heard about the term cloud native quite often, right? So what exactly is cloud native? Is it just a jargon or does it has any real meaning? So what is cloud native? Cloud native application is fundamentally a way of developing applications, right? So it is not applications targeted for a specific deployment environment. It's not applications targeted for deploying into specific cloud providers, right? When we say cloudnative apps, applications, these are applications that predominantly have more or less these four elements inside in it, right? First of all, DevOps, right? So DevOps way of developing an application and adopting the DevOps principles in developing applications, in maintaining the application, and how different teams collaborate with each other to produce the final piece of application. So that is a key element of cloud native application. And second, you will always see a continuous integration and continuous delivery. So this is actually a process by which the software is developed incrementally and there is a high degree of automation so that you are able to develop features and push these features even into production on a continuous basis, right? So that is another salient feature of cloud native applications. And typically, cloud native applications are also containerized applications because of the benefits containers offer in delivering the various benefits that are required out of a cloud native application. Containers are quick to start, containers can be easily packaged and run in the same manner in a multitude of platforms and environments, and containers are now the defacto standard of packaging and distributing cloud native applications. And the fourth dimension of cloud native application are microservices. Cloud native applications are typically developed as decoupled microservices. So each and every microservice has the business logic and also the data store required for storing the data. And microservices expose their business logic to other microservices and also to the external environment via clearly defined APIs. So now you have an understanding of what cloud native applications are. Now let's talk about encryption. So in order to explain about encryption, so let me first dissect the entire cloudnative apps application into four different layers. At the topmost layer we have the microservice, which actually encapsulates the business logic or the programming logic of the application itself. And underneath the microservice layer, we have the database layer. This layer is responsible for storing these data. Invariably, microservices rely on some sort of storage for processing the information and for storing the state of the application. And microservices can also be developed as a stateless microservice, which means it will not be holding the state information or the data information in itself. But typically, microservices also have a database in which they store the state of the state of the application, and also sometimes the state of the environment in which the application is running. And the third layer is actually the volume, these volume on which the database is running. And this is typically a volume that is carved out of a physical disk or a virtual disk. And volumes are the point at which databases create files and store files. Right. And the fourth layer is the actual infrastructure layer, which consists of the disks themselves. And these could be physical disks that can be found in disk attached storage on a server, or it could be virtual disks that are created and supplied by the cloud service provider. So whatever it is, a disk is the undermost layer within the entire layered architecture. And when you see encryption through the prism of these different layers, right, data can be encrypted in any of these four layers. For instance, data can be encrypted by the microservice itself before it is stored in the database. So in this case, what happens is, apart from the business logic, the microservice will also have the logic for encrypting and decrypting the data by itself. And it will also rely on some sort of key management system. Either it will be managing the keys itself or it will be using the service of an external key management service in order to manage the keys. And the microservice itself will be capable of keeping track of what keys are used for encrypting certain piece of data. So it knows how to decrypt these data. So every logic of encrypting these data, and also the decrypting of data is taken care by the microservice itself and the database or any layer underneath it is not doing any sort of encryption or decryption. Okay? Whereas in the second case, where encrypting data, the database, right, the microservice doesn't perform any sort of encryption or decryption. It handles plain text data. And it is these responsibility of these database to do the encryption. And again, databases can offer many advanced features. For instance, certain databases will be able to generate and manage their own keys, whereas certain databases will again rely on an external service to do that. And again, if you are talking about databases that are consumed as a managed service from a cloud provider, there could be databases which support user managed keys, and there could be databases which support only the cloud provider way of managing the keys. And there could be also performance related implications that you have to keep in mind, because certain database engines are capable of offering very good performance even on encrypted data. But certain database engines are not that performance. So you will have to be very careful in determining whether the performance merits that is supported by the database will be suitable for you. And also you should take care of what is the overhead that you need to bear in terms of managing the keys and what are the repercussions if in case a database is getting breached. So those are the other considerations that you need to take care of. And the third thing is volume level encryption. This is where you simply run your database and microservice assets and both these layers will still be handling plain text and the entire responsibility of encrypting the data will be taking place at the volume level. So this is where you will be typically using a storage provider, a solution that is provided by these storage provider which will be responsible for encrypting and encrypting the data. And for instance, if you are talking about a Kubernetes environment, then you will probably make use of a CSI based provisioner for provisioning these volumes and provisioners come in different feature sets. So you may have to check whether a volume provisioner supports encryption at the volume level or they support the encrypting at a storage class level and what is suitable for your use case. And accordingly you will have to choose the solution. And finally, these last layer for doing the encryption is the disk itself. And these are disks that could be either physical disks or it could be virtual disks. So whatever it is, the encryption and decryption takes place at the disk level. Okay, so typically these are implemented by using certain kernel modules and these kernel modules will actually intercept the data that is actually getting written into the file system on the disk. And these kernel modules will be capable of managing the keys and encrypting and decrypting the data. So as you see, there are four different layers at which you can implement these encrypting at rest. And there are actually benefits, merits and demerits of each of these layers. So that is what we are going to talk about in the subsequent slides. So let's enter into the patterns for encrypting data at rest, right? And let's say encryption by the microservice itself. So what happens in this way of encrypting data at rest? So we saw that earlier. So this is the case in which these application microservice itself has the logic or the responsibility for encrypting and decrypting the data. So typically in these cases you will have to watch out for things like sorting and searching of data. So if your application is doing more amount of sorting and searching of data then this is not possible because the data is stored as an encrypted format in the database and the database engine doesn't know how to sort the data and it will not be capable of searching the data. So this is something that you will have to keep in mind whether your application can be written in such a fashion that it can tolerate this limitation. And by the way, if you are having existing applications which are already talking to a database, right, and it could be expensive for you to redesign applications for you to introduce the logic of encrypting and decrypting the data in your application. And by the way, this patterns of encrypting at the microservice level has the least attack surface. The reason being at the very first layer itself where the data is generated, the data is getting encrypted and as the data cases through the underneath layers it passes in an encrypted fashion, which means the attack surface in this case is the least. So you get a high degree of protection for your data. And of course using this pattern you should be very mindful of key management issues because the key management is now the responsibility of the application. And of course the application can rely on other microservices to perform the key management activities. But at the end of the day these application is still accountable and responsible for doing the key management related activities. So this is something that you will have to keep in mind if you are going to choose this pattern. The next pattern is encryption by the database itself. Now this is where the database itself has the necessary capability to do the encrypting and the decryption. And by the way, predominantly this requires the database containers to be run in privileged mode and this might not be suitable for special use cases, special security requirements. So your organization might be having some security guidelines which will prevent you from running privileged containers in production. This kind of patterns you will have to carefully choose and see whether the database is capable of performing the encrypting data when it is run without these privileged mode. And most of the databases that provide the encryption functionality use a tool called as Dmcrypt. And typically these databases have written some wrappers around Dmcrypt in order to provide some functions and key abilities that the database engine can use. So you will have to be aware what kind of mechanism that the database employs or what is the functionality that the database employs, whether it uses d encrypt kind of solutions or the database itself has its own solution for encrypting and decrypting data. And in these kind of pattern you will see limited support in open source software. So typically if you are used to using open source software as your database solution, you will see that not every open source solution has this capability and you will nevertheless have to invest in commercial plugins or enterprise licensed versions of your database, if at all you choose this pattern. But at the end of the day, it is a decision that you will have to make considering the benefits and benefits and the overhead that you will have to bear in terms of cost and complexity. And database encryption is very simplistic because you don't have to rewrite your applications and you don't have to also consider changes to your storage solution or challenges to your infrastructure in order to encrypting data addressed. So it is highly a plug and play solution if at all you find the right fit of database solution for your application's needs. The third pattern that I want to present is volume level encryption and you will typically find this pattern of encrypting widely used in public cloud environments. Public cloud providers invariably provide you with volume services which have inherent capability of encrypting and certain public cloud providers also provide you the mechanism of managing the keys yourself rather than the cloud provider managing the keys. So that could be another sweet spot for you to consider. Public cloud providers managed service for volume encryption and third party storage providers many of these storage providers support volume level encryption and these storage providers have encrypting. Sometimes they have implemented their own key management key management solution for managing the keys, or sometimes they allow you to bring your own key management solution which the storage provider will talk to. But invariably we are seeing many such storage providers supporting volume level encryption. So this could be a choice for you if you have the capability of choosing these storage provider and if you have the control over the infrastructure aspects of storage on which your application is running. And by these way, CSI plugins also have support for encrypting. By the way, they have exposed some APIs of the storage provider, but not all CSI plugins have exposed the complete encrypting feature of the storage provider. So there are some limitations that you might encounter in CSI plugins. So if you are deploying Kubernetes applications which rely on CSI plugins to provision volumes, then you will have to carefully see what is the support that is provided by the CSI plugin or sometimes you may have to write your own CSI plugin which has the complete functionality that you require for data encryption address and many OSS solutions are available which support volume level encryption. So if you are into open source software and this could be a very viable solution for you to do the volume level encrypting data. One key disadvantage in volume level encryption is that if you are deploying your application into an infrastructure or into an environment in which you don't have much of a control, right, then that environment might not have the ability of doing volume level encryption. Then your application cannot assume that whatever volumes that it consumes will be encrypted, right? So if you want to ensure an end to end security of the applications that you deliver to your customers, and if you want to enforce certain rules on how the storage provisioner should work, and if sometimes it might not be feasible for you to enforce those solutions, then you may have to consider the other previous patterns for applying encryption. The final pattern that you should consider is the disk level encryption. And again, this is not feasible sometimes because you will never have control over the disk disks, might not get fully exposed to your applications. You may have to consume only volumes at these application level. And if the encrypting is happening at these disk level, then unless you have a tight control or visibility into your infrastructure and environment, it might not be feasible to do disk level encryption for you. And this has the highest attack surface. That's because the encryption happens only at the bottom most layer, right? So at the microservice layer, at the database layer, at the volume layer, everything is plain text. Only when the data enters at the disk, then the data is encrypted. Okay, so which means the attack surface is high. The attacker can still steal the data at the microservice level, at the database level, or even at the volume level. And by the way, this is considered to be these most simplistic solution. The reason being disk level encrypting data been around for a while and many of the disk level encryption solutions are very mature and you have lots of tooling to automate this kind of encryption. And so this could turn out to be the most simplistic solution for your needs, if at all. You have the required amount of control and visibility on the environment. And by the way, you have the luxury of a standardized format for hard disk encryption. For instance luks, which is actually a Linux based format for disk encryption, which means your applications could be highly portable because then you rely on a standardized format of disk encryption. So these are some of the advantages that you gain out of disk level encryption. But again, as I said earlier, you will have to have good control and visibility on the infrastructure. So if you are deploying your application into an environment which you can design, in which the infrastructure portion is something that you can design upfront, in which you can enforce certain rules, in which you can bring in your disk level encryption solutions, then this could be a most simplistic solution. But again, the attack surface on this pattern is very high. So if you are looking for a highly sophisticated, highly secure solution, then this might not be suitable for you. Okay, so these are some of the considerations that you may want to consider. Okay, now that's it. I am more or less at the end of my presentation, so we talked about what cloud native applications are, what are the salient features of cloudnative apps applications. And then we saw what are the different layers of a cloud native application, and how encryption can app can happen in these different layers. And finally, we talked about the various patterns. And inside each of these patterns, what are the considerations that you should be aware of? What are the benefits and what are the disadvantages in each of these patterns. So using this information, I hope you will be able to choose what is the right type of data encryption at rest. So solution for your needs. So with that, I come to the end of this presentation. Thank you so much for watching this talk and please connect with me on Twitter and if you have any questions, you can reach out to me on Twitter itself. Thank you so much, have a nice day.
...

Senthil Raja Chermapandian

Principal Software Engineer @ Ericsson

Senthil Raja Chermapandian's LinkedIn account Senthil Raja Chermapandian's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways