Conf42 Python 2023 - Online

Protecting Sensitive Data and Machine Learning Models

Video size:

Abstract

The area where data is being processed and held in memory is nearly always unencrypted and may be at higher risk of cyberattacks. This talk will share how engineers can deploy functions containing proprietary algorithms, models, and secrets while keeping your intellectual property confidential.

Summary

  • Justin will talk about protecting sensitive data and machine learning models. He'll show a couple of demos with our Python SDK called Pycape. You can pick up Pycape right now and try Cape out. We're looking for feedback on all aspects.
  • Confidential computing is a broad set of technologies that allows data to be protected while in use. It is important to keep encryption and key management as essential primitives when developing a confidential computing platform. It can still be quite complicated to do these things in a seamless way.
  • Cape is primarily built upon AWS nitro enclaves nitro enclave is a service that allows users to deploy code to a lockdown. Through an attestation process, the person who is triggering the code to run inside the enclave can confirm that they are in fact talking to an enclave. Everything is fully auditable.
  • Kate Kate has been working on the bleeding edge of confidential computing for over four years. Our platform helps developers easily protect their data and their users'data. Cape provides many sdks for encrypting data and interacting with the cape system. We plan on supporting as many languages as possible.
  • Pycape is the Cape SDK written in Python. It provides all the core functionality of encrypting data and deploying and running the function. At Cape we hope to write all common features of our SDK in rust. Building common components helps reduce the service area of potential bugs.
  • Today we're going to use this image classification model, Onyx model, to help with our examples. It uses a Resnet model pre trained on imagenet. We'll be using the Onyx runtime today to run our machine learning model. The first step is to deploy our script.
  • Next up we can deploy with the experimental CLI package of Pykeep. It currently wraps the Golang CLI command to easily be able to deploy directly from Python. Let's see what the model thinks. So we just need to run python run prediction py.
  • Next up, I'll show you a demo over in quiver functionality. This can be used as standalone or we can use it to call a cape function. One last example here of how to encrypt for someone else. Hope I inspired you to give Cape a try and see how this could be helpful.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, Justin here from Cape privacy. Today I'm going to be talking about protecting sensitive data and machine learning models. More specifically, I'm going to be talking about confidential computing, how Cape enables confidential computing, and show a couple of demos with our Python SDK called Pycape. So who am I? I've been working at Cape for about five years. I've learned a lot about confidential computing, as can emerging technology and how it can be used to protect sensitive data and machine learning models. Mostly backend developer, but I have some experience working in front end with Node JS and JavaScript, and I've also spent some time building sdks out written in Python and some other languages. You can pick up Pycape right now and try Cape out. We're looking for feedback on all aspects, so I encourage you to try it out and let us know what you think. We'll provide links to documentation, getting started guides, and a link to join our discord so you can easily try cape out and get started today. So let's get tired of talking about what confidential computing really is. My favorite analogy is to compare it to how ubiquitous encryption at rest and encryption transit are. Confidential computing is a broad set of technologies that allows data to be protected while in use. Data is kept private and is not leaked to whatever party, such as a cloud provider is actually processing the data. There are many different technologies that can complement each other to help provide a confidential computing system, technologies such as multiparty computation, fully home orochropic encryption, and trusted execution environments, or enclaves. Today, Cape's main focus is enclaves. Before digging into enclaves and Cape's main service, I want to talk about how important it is to keep encryption and key management as essential primitives when developing a confidential computing platform. Even though thinking encryption and key management is common and should be more common for software developers, it can still be quite complicated to do these things in a seamless way. When implementing encryption for your application, there are a lot of decisions to be made, such as ease using EAS, which mode to use, for example GCM or CBC security. Depending on what is chosen, security could be better or worse efficiency. Depending on methods chosen, one could be more efficient than the other, and finally, how to pack all the required data before sending it to a consumer. Finally, here's an example from Kai Cape source code of some vague looking acronyms that should probably be understood before using them. You can see Shaw 256, OAEP and MGF one. Also, should you be passing something for the label or not? Ideally, you would understand all these acronyms, the choices that must be made and the implications of using them if you implemented them yourself. While you can likely look up good defaults for encryption parameters without some background knowledge, it can be still difficult to make the right choices or even understand what you're deciding. It'd be much easier if a library could make the best choices for you. In addition, the library can provide education and the option to configure things if needed. This is our goal with Pycape and our other sdks. Key management can be quite hard as well. Depending on the cloud provider you are using, you would have to consider which products best fit the needs of your software. AWS, for example, has KMS and secret managers and even other products help manage keys or other secrets. Simplifying this is just one way to do key management would be quite powerful. So from the outset, one of Cape's major goal was to make this easier to manage for the average software developer. Software developers need to consider the importance of keeping data safe in their day to day activities. Alright, next we're going to talk about nitro enclaves. Cape is primarily built upon AWS nitro enclaves nitro enclaves is a service that allows users to deploy code to a lockdown containing no one can see what is running inside the container. Awdes cannot see and when running on Cape's platform, we cannot see what the users are running. Through an attestation process, the person who is triggering the code to run inside the enclave can confirm that they are in fact talking to an enclave and is running the code that they think it is. Everything is fully auditable. Nitro enclaves is a flexible platform because you can virtually take any container and turn it into an enclave image file. Often referred to as an EIF. This file is used to deploy your code to secure container. The EIF contains the whole os file system that is generated from the docker container plus metadata to assist with attestation. It can also be signed with a private key to prove that it came from a specific source. Eifs contain metadata that help tell a user where the EIF came from. Eifs can be signed by a private key to prove the identity of the entity creating the EIF. A hash of the signing certificate is also stored in a platform configuration register or PCR. The pcrs contain hashes of important information about the EIF. These help enable the attestation process so the user knows exactly what is running when communicating with the enclave. Since I've mentioned it a bit already, I'm going to go over what attestation is from a high level. Attestation is the process where a user can communicate with enclave and prove the enclave is what the user thinks it is. During communication, the enclave sends what is called an attestation document, this document containing all the information needed to prove what software the enclave is running. The pcrs are one of the most important aspects of the document. Along with these, the document is signed by a root AWS certificate, which also must be verified to confirm the authenticity of the document. Next up, an overview of Kate Kate has been working on the bleeding edge of confidential computing for over four years. Our platform helps developers easily protect their data and their users'data. We provide three main entry points into our system. Here are those entry points unencrypted as previously mentioned, while processing data securely, it is just as important to provide simple encryption primitives for developers to use to easily encrypt their data before sending it to be processed. You can easily encrypt your data for yourself or another entity you trust. Deploy deploy is used to send a Python function to cave to eventually be run inside the nitro enclave. This is a secure processing code that will eventually be used to process your previously encrypted data. Run run is used to run the function that you just deployed. Here, you pass a function, id the encrypted data and in return you get some output depending on what your function does. There are some concrete examples coming up soon. Cape provides many sdks for encrypting data and interacting with the cape system. We provide sdks in Python JavaScript, either from the browser or with node JS and Java. We also have a CLI tool written in golay. We plan on supporting as many languages as possible, so keep an eye out for new languages and let us know if there's another language you'd like to see an SDK written in. So next we're going to talk about the core components written in Python, Pycape and Cape functions. Pycape is the Cape SDK written in Python. It provides all the core functionality of encrypting data and deploying and running the function, because currently a small component written in rust used to automatically encrypt the data before being sent to the backend, whether you've previously encrypted or not. As a side note, at Cape we hope to write all common features of our SDK in rust. Building common components helps reduce the service area of potential bugs while also making the resulting code more auditable because it's not spread out amongst different sdks so, cape functions, what are those? The functions that are deployed to Cape are also currently written in Python. We have utilities for packaging your Python dependencies and your main code altogether. The resulting directory is what is then uploaded to cape using the deploy function. Here's an example of what the script must look like. As you can see, it contains a function called Cape handler that accepts a byte string input and then returns something, either a string or byte string. So all cape functions will have this format and you'll see in the demo next what that looks like. Exactly. It's demo time, finally. Okay, let's transition to the terminal. Today we're going to use this image classification model, Onyx model, to help with our examples. It uses a Resnet model pre trained on imagenet. For those of you who don't know what Onyx is, it stands for open neural network exchange. It is an ecosystem for machine learning and AI. We'll be using the Onyx runtime today to run our machine learning model. The first step is to deploy our script. So here is the simple python script we're deploying to cape. So first, all our imports. We import JSOn numpy and Onyx runtime. We load the Resnet model with this inference session. We open an imagenet underscore classes text file, which is all the classifications that are in imagenet. I think there's a thousand of them. And then get top five classes. Uses the softmac function to print out the classes in a nice way. So finally we have the main section. It's the Cape handler, which will end up getting being called inside the enclave. It takes inputs, bytes. It takes those bytes and puts them inside of a numpy buffer. And then it puts it inside something that Onyx will understand. And then finally we run the actual model. We take the output from that model and we generate the top five classes. Then we return the classes in a JSON block. So after that we can actually deploy our model. So next being we need to do is prepare for deployment. First we create the deployment folder. To make things a little bit easier, we define an environment variable called Onyx underscore resonant, underscore, deploy resonant. We make that directory and then we can copy a few things into it. So we copied the app file. We just looked at app Py. We can copy the model, which is a directory. It contains everything. Onyx needs to be able to run the model itself. And then we need to copy that imagenet underscore classes text file that we were just looking at. Okay, so next part is a little more complicated, but we need to add the Onyx runtime dependencies. We need to make sure it's added using a proper python environment because that is what we're running inside an enclave. To do this, we use Docker to install the requirements inside the build directory deployment directory. I'm just going to copy and paste this one because it's a bit more complicated, but here you can see it's creating up some volumes connections here, setting the working directory to build, and then it's actually using this Python 3.9 slim bullseye docker container. And then next we're just simply installing the Onyx runtime and the target is that volume we just connected to our deployment directory. So yeah, it'll take a few seconds, but while it looks a bit complicated, we hope on adding helper functions to assist in this step in the future. Next up we can deploy with the experimental CLI package of Pykeep, which currently wraps the Golang CLI command to easily be able to deploy directly from Python in the future we may add more native solution here. So let's take a look. Okay, so we're doing all our specifically the important one is to import the experimental package as CLI. Then down here we're calling CLI deploy with that directory we created earlier. And finally it prints out the function id which we'll need to use to run. Okay, so let's run that function. We'll export it to an environment variable so we can use it easily later and then there. So while that runs, let's talk about what's happening here. It's dipping the directory, connecting directly with enclave over a websock connection, running the attestation process, encrypting the function and then sending it directly to the enclave to be securely processed and stored for later use. The reason it was a bit slow here is because of the size of the transfer over the Internet. Machine learning models can take quite a bit of space sometimes. Okay, there, now it's done and we can try running it next. So to be able to run we need to get authentication token that is specifically scoped to the function we just created. So we'll export that to another environment variable. So here you can see we exported to the token environment variable. We call the Cape cli command token create. We pass the name and we pass the function Id that we created earlier. By the way, the function token id looks like this, it's just a random string. Basically it with the next script we'll able to actually run the deploy function. Let's go over the script we'll be using to run the prediction first. So once again at the top we have all our imports. A new one here is torch or pytorch. We'll be using that to preprocess the image. After that we load our environment variables so the token and function id that we've been being along the way. And here's the preprocessing process image file that I mentioned. So this just makes it so it's in a format that the Resnet model will understand. We could do this inside the enclave, but just to simplify a bit, we're doing it here in the user script. So we create a Cape context object here and then we create a handle for the function and the token. So here we're actually preprocessing the image. You can see here we're loading a dog JPeG file and cape run is where we actually run the models in the background. So we're passing the function and token and then the input puts bytes which is the preprocess image file. So the top five classes are returned and then we load the JSON and then we print them in a nice way. So this print function will end up printing the percentage of each class. Okay, before we run it, let's quickly see what image we're putting into the models. All right, so there's a dog, a pretty cute dog. Pretty sure that's a golden retriever puppy. So let's see what the model thinks. So we just need to run python run prediction py. So this is doing much of the same stuff as deploy. It's setting up the websocket connection, protecting and then encrypting the image. So once that done, now it's done. So we'll see the confidence in percentage of each class that it thinks the image we just put in could be. So you can see the winner here is golden retriever at 39.7%. Definitely not a tennis ball or a clumber, whatever that is. Next up, I'll show you a demo over in quiver functionality. This can be used as standalone or we can use it to call a cape function. When calling cape with a cape function, no changes are needed for the function. Let's look at a simple example first so we can take a look at this encrypt py file and we can see how simple it is. So first we import cape, we create the cape context and we just unencrypted this simple string hello world. By using cape unencrypted we get a ciphertext back, which is a byte string, and then we just print out encrypted and then decode the ciphertext to a string so we can look at it nicely. Okay, so now we can run this and we can see the encrypted string. There's a base 64 encoded string that starts with a prefix cape. This is helpful to track where the string came from. And also cape can detect the string when it is passed into the cape function to automatically decrypt it inside the enclave. Now we can see how the image classifier model would work with the encrypted string. Here we can see that it's mostly the same except for this one unencrypted line. If we scroll down here we can see it. So we're just calling cape encrypt with the input bytes and sending it to the input bytes again. And just as before, we pass it into the cape one function, we quit it over here and then we can run it. We should get the exact same output as before, even though the input was encrypted. There we go, there's the results. Golden retriever again, I'm going to show one last example here of how to encrypt for someone else. So say if there's a service that you trust to decrypt your data, then you can encrypt it for that service specifically. So let's take a look. So I created a simple script here that takes one argument, which is the username of the user who you want to encrypt for. So just like before we call Cape encrypt and we have a message that says hi and then the name or the username as the Sys RGB and then we just decode it to text again. And then let's see, we can do cape docs, which is just our cape user, and we can see the encrypted string there it looks just before. All right, that's it for demos. We're going to go back to the slideshow for 1 second. Here are some links to the documentation and our discord invite link. Please check them out. All right, thanks everyone for listening. Hope I inspired you to give Cape a try and see how this could be helpful in your day to day activities. These, as mentioned before, we're looking for feedback and are always ready to help out if needed, so please reach out. Bye everyone.
...

Justin Patriquin

Software Engineer @ Cape Privacy

Justin Patriquin's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways