Conf42 Chaos Engineering 2022 - Online

Chaos Engineering: At the age of AI and ML

Video size:

Abstract

AI is omnipresent in our every day; Embedded AI is embraced in almost all types of the business fabric. At this juncture, we are setting our course on the value frontier by reimagining how we operate and putting AI at the heart of everything we do. At the same time, the Adversarial Machine Learning evasions and poison attacks are introducing the complexities and new challenges in Model testing strategies. As we all see how chaos engineering has become a no-brainer testing approach, Adversarial ML techniques raises the unique question if ML Models Testing still benefits from Chaos Engineering; how do we adapt the chaos engineering strategies. This session will guide you through building an approach of Chaos Engineering at the age of AI and ML.

Summary

  • Chaos engineering gives real time feedback into the behavior of your distributed systems. Embedded AI is embraced in almost all types of business fabric. How do we adopt the Cows engineering strategies within ML lifecycle? This session will guide you through building an approach of chaos engineering at at the age of AI.
  • Everything fails all the time. Introducing machine learning and its complexity model performs and predicts may not fail adversarial introducing a significant business impact. AWS Sagemaker model debug and monitoring chaos engineering as a continuum product thinking versus every time everywhere.
  • There are four key categories where we tend to use machine learning more than anything else. Use ML when you can't code it. Replacement repetitive tasks needing humanlike expertise. Fourth category is where you need to adapt or personalize like automated driving. But adversarial or adversarial input can also be fatal.
  • AI and AML systems are actually very complex. It's not because it's a complex algorithm, it is because it has got many steps. Two key stages are model evaluation and model test. Unlike traditional models, ML systems are not heuristics, they are probabilistic.
  • A adversarial machine learning is a machine learning method that crafts input to trick machine learning models. There are three different kind of attacks are predominantly observed, evasion, poison and model extraction. Natalie will take you through how to detect adversarial samples. She will also show you how to handle adversarial input in our model management and model lifecycle.
  • Sometimes adversal samples are very sophisticated and difficult to distinguish from normal samples. Adversarial samples can have devastating business impact. To detect adversely samples, we need to use the representations produced by the deeper layers of a deep neural network.
  • Amazon Sagemaker is a fully managed machine learning service that you can use to more easily build, train and deploy machine learning models. The main features I'm going to use is Sagemaker debugger and Sagemaker model monitor. Show us how you can detect adversarial inputs using Amazon Sage mega model monitor and debugger.
  • System design for detecting adversal inputs using Sagemaker model monitor and debugger. Every hour it will capture the layer representations that were recorded during the training. You can also use Amazon Sagemaker Studio to get some further insights.
  • We needed to build systems that embrace failures as a national occurrence. Let's build confident ML systems that withstand turbulent conditions and adversarial inputs every time it runs. And I would like to thank you all for your time today.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Jamaica make up real time feedback into the behavior of your distributed systems and observing changes exceptions. Errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. Close hello friends. Thank you everybody for taking the time to join our session today with you. It's me, Soumen Chatterjee, partner Solution architect AWS and my colleague Natalie, senior applied scientist AWS taking you through the cows engineering at at at. At. At. At. At the age of AI and know that AI is omnipresent in our everyday life. Embedded AI is embraced in almost all types of business fabric. AI is at the heart of everything we do. Chaos engineering, on the other hand, has become a no brainer testing approach. Adversarial ML techniques and attack spectrums raises the unique question, if ML model testing still benefit from chaos engineering approach, how do we adopt the Cows engineering strategies within ML lifecycle? This session will guide you through building an approach of chaos engineering at at the age of AI. And let's quickly take you through the key topics or our agenda today in the session. Everything fails all the time. Introducing machine learning and its complexity model performs and predicts may not fail adversarial introducing a significant business impact AWS Sagemaker model debug and monitoring chaos engineering as a continuum product thinking versus every time everywhere. Can you guess what will happen here? There is a man walking, then decide to go over the fence. Let's see what happens. Still time to guess. See, yes. So, so that is very, very familiar, right? Isn't it? So, in our every day life, when we do our programs, we write programs, we write our model. It follows very similar situation. Sometimes it is very unpredictable. One simple things can shut down your whole system, or it can shut down your program. It can do a completely incorrect prediction. So everything fails all the time. That's really a great quote from our CTO of Amazon.com. What about machine learning? Can it fail too? Let's have a quick review of what ML does. So what is machine learning? So we all know what machine learning is in a simple way. So using technology to discover trends and patterns, use compute, mathematical, complex computation mathematically to predict models based on factual past data. So past data, statistics and probability theory are the key tools used to build machine learning models and make predictions. So where traditional business analytics aims at answering questions about past events, machine learning aims at answering questions about the possibilities or probabilities of the future events. So when to use machine learning? There are four key categories, I would say, where we tend to use machine learning more than anything else. So category one is use ML when you can't code it. So complex tasks where deterministic solutions don't suffice, like recognizing speech or images. Category two, use ML when you can't scale it. So replace repetitive tasks needing humanlike expertise. Example would be recommendations, spam, fraud detection, machine translation. Category three is where you need to adapt or personalize like recommendation engine or personalization engine. And the fourth category where you can't track it like automated driving. And imagine all this category. As you know that all this category is very much dependent on the data, your model will be as good as your data. So in category one, for example, you are in a manufacturing unit or a car manufacturing company. Or it could be a production lab for any fast moving goods or any food supply company. And they are heavily dependent on machine learning, especially computer evasions and images for the production quality detections or finding any faults in the system. And guess what? If you get a program which manipulates your model through a different set of data input and then that will change the prediction or the behavior of your model and that will be fatal. Sometimes that could be fatal, right? If it is a manufacturing company and you are not able to detect your production issues through the images, that could be a fatal things. So these kind of things we call the data drift. Like if we manage to drift your data to a different segment of the data and that can change the behavior of your model. Similarly, in the category two where we can bypass a lot of content instead of spam, it can look like a legitimate good content and that can go to your normal folders like an email spam, right? And EU and we all are facing that every day. You will see that there are lot of models like a lot of content which are bypassing and coming to our main folder which are good candidates for spam. Now imagine the fourth category where you are building your automated driving. A lot of things, I would say, if not everything are AI AML dependent, right? If your model is not able to detect the right level of speed from the speed signs outside or stop signs, or not able to detect the objects on the way. So that could even be another fatal example. It could be life threatening for the users of the particular automated car. So these are the scenario where machine learnings are greatly, I think, great examples and had been adopted significantly across the industry, but at the same time adversarial or adversarial input, I would say, right, that trade vector introducing a lot of different kind of challenges for your industry and how that impacts your business. And that's one of the key intention to introduce today. Little later, I think Natalie will take you through some of the great examples of adversarial input of the data and how that impacts your machine learning, its prediction quality and how that disturbs the model prediction. So AI and AML systems are actually very complex. It's not because it's a complex algorithm, it is because it's iterative and it has got many steps, especially the stage one itself, like preparation or prepared stage, right, where you collect and prepare training data. And this data, one of the key criteria for this data. For a very successful or accurate, or I would say high performing models, you need to have representative data or samples you need to collect. And that's not an easy stage, actually. So the building is relatively easy in terms of the model, which you call the algorithm, right? Or that's the heart of your code, but that's not heart of the whole thing. So model or algorithm is just a one part of the successful ML ecosystem. So data, how you train it or how you tune it and how you manage your evasions, how you train and debug are the key part for any successful model deploy or delivery, and then how you manage and scale and monitor its predictions and accuracy quality. So that is another complex aspect of a successful full model. So traditional software and program testing. So traditional programs are deterministic, as you all know. Right? So it's based on a fixed set of heuristic rules. Generally, testing, traditional software includes unit testing, regression and integration testing. But in our ML system, in the world of ML, ML systems are not heuristics, they are stochastics and probabilistic. What it means that starting from left to right, all the stages, like pretrain, post train, integrate every stage, it goes through a flow of data and comes as an output or model. And every stage, every different data set or data flow, changes these and refine this model as a final outcome of the model. So, model learns from the data provided and used for the training all the time. And now, coming to the, in the context of chaos engineering, right. In the chaos engineering, we follow all the usual steps, right? But I think there are two key stage, I would say are worth noting, which are different, quite different compared to traditional software and machine learning. So there is state called steady state, right? But in the ML system, there is no as such steady state, because models are not steady. Like, if you think that your model is just static model and it is now functioning, maybe 98% accuracy and everybody is happy, it may not function continuously like that. Similarly, the verify stage, we inspect metrics and plots summarizing model performance, not like verifying certain test that it passed or failed. So it is quite different, especially these two stage compared to any other software and models moving to the next one. So model evaluation and model test. So ML systems, unlike traditional models, does not produce a report of specific behaviors and metrics, such as how many tests passed or failed or it has got 100% code coverage, so on and so forth. Instead of that, what do we do here? We perform the model evaluation towards performance analysis. Whereas model test is an approach towards error analysis, developing a model test for ML systems can offer a systematic approach towards error analysis. For machine learning model, we inspect metrics and plot summarizing models performance over evaluation data sets. So now we are entering into adversarial machine learning stage and it can get more complicated. Machine learning obviously we are talking here when adverse serial input gets introduced. So ML models are vulnerable to such inputs. So adversarial machine learning is a machine learning method that crafts input to trick machine learning models to strategies alter the model output and there are three different kind of attacks are predominantly observed, evasion, poison and model extraction. So types of detection method in individual input samples and distribution ships. So chaos in practice, I think watch this for another few seconds and you will really enjoy it. So chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production. Break your system on purpose, find out their weakness and fix them before they break when least expected. So I'm going to hand over to my colleague Natalie now. She will take you through how to detect adversarial samples. She has prepared a few fantastic examples from our lab. She will also take you through our AWS edge maker tools for model monitoring, debugging and how to handle adversarial input in our model management and model lifecycle. Over to you, Natalie. Thank you. Let's start with an example first. Quite often, adversal samples are very sophisticated and difficult to distinguish from normal samples. On this slide here, I'm showing you an image that I have been taken from the Calltech 101 test data set. We can see that the model correctly predicts the image class starfish. Next, I use the same image and I apply an attack on this image. I'm using the attack technique PGD, which stands for projected gradient descent. This technique uses an epsilon parameter to define the amount of noise that is added to the input. The higher the amount of noise, the more likely the attack is going to be successful. What we see now is that we can barely distinguish the original test image on the left from the adversarial sample on the right. But the model does no longer predict the correct image class. Next, I increase the epsilon parameter. Again, the model does not predict the correct image class, and we cannot see much of a difference between the images. When I increase the epsilon parameter even further to zero five, we finally see some artifacts that have been introduced into the input image. Adversarial samples can have devastating business impact. Imagine, for instance, you have an autonomous driving application, and as part of this application you have a traffic sign classification models. The image that I'm showing here is taken from the german traffic sign data set. We see on the left side the original input image that shows a speed limit sign of 80 km/hour again, in the image in the middle, we can see barely a difference, but the model can no longer correctly predict it. It predicts now the traffic sign, AWS stop sign. And when we compare the difference between the both, which is indicated here on the right, we see some difference in the inputs. So now how can we detect these adversarial samples? Let's take a look at the model input distributions. To do that, we use TSNE. TSNE stands for dedistributed stochastic neighbor embedding. It's a technique that allows you to take highly dimensional data and map it into a two or three dimensional space. The image that I'm showing here is I have taken a set of test images that are indicated as orange data points. And then I applied the PGD attack on it. These samples are presented as adversal samples, indicated as blue data points. Then, for each of these images, I compute the TSNE embedding and visualize it. After all, in this two dimensional space. So each data point that you see here in the image presents the embedding. For an input image. What we see is that there is no difference between the orange and the blue data points. That means if we would use now a technique to distinguish adversarial and normal inputs in the input space, like the images, you would not be able to distinguish them because the distributions look very similar when you look at deep neural networks, then these are models that consist of multiple layers. Each layer learns different kind of features of the inputs, and they create different representations. So, the same analysis that I have shown you on the previous slide, I'm repeating now this analysis for different representations produced by different layers. In the model, the layer zero corresponds to our input layer. So that is basically the input images and again, we don't see much of a difference between adversarial and normal samples. Next, I take the activation outputs of layer four and I repeat this analysis. And again there is not much of a difference. In layer eight we see now that normal and adversarial samples cluster slightly differently, the same we observe in layer twelve. When we go to layer 14 we see an even larger difference, and in layer 15 we see now a clear distinction between the adversarial and the normal samples. I have done this analysis on a REsnet 18 model. So the layer 15 was the pen ultimate layer. In my model, the penultimate layer is the layer before the classification is done. And when we create adversely samples, then the goal is to change the model prediction. So that means that before the outputs go into the classification layer, they have to create different representations to lead to a different classification. What we also observe is that in initial layers we cannot well distinguish between adversarial and normal samples because the initial layers learn mainly basic features of your inputs, while the deeper layers of your model learn more complex patterns of the input data. So this analysis shows us if you want to detect adversely samples, then we need to use the representations that are produced by the deeper layers of a deep neural network. What we can do is we can now apply statistical test to distinguish between these distributions. We can use a two sample test using MMT, which stands for maximum mean discrepancy. MMD is a kernel based metric that allows you to measure the similarity between two distributions. The distributions we are going to compare are these layer representations captured from the intermediate layers of your deep neural network, and we capture them during the validation phase in the training because these presentations present the normal samples and then during inference we capture the same layer representations and then try to see if the inference data matches the data during training. Now I would like to show you how you can detect adversarial inputs using Amazon Sage mega model monitor and debugger. So basically, the analysis that I have shown on the previous slide we are now going to deploy on Amazon Sagemaker and run it in production. First, let me give just a brief overview of what Amazon Sagemaker is. Sagemaker is our fully managed machine learning service that you can use to more easily build, train and deploy machine learning models. When you think about machine learning, then it's not just about creating and training a model, but there are many different steps involved. For instance, you need to create training data set. You need to build and train models. You need to perform hyperparameter tuning. Then you maybe want to compile the model for faster inference and then you need to deploy the model to the cloud or to the edge. Amazon Sagemaker provides features for each different step in this machine learning lifecycle. And as part of the workflow that I'm going to show in a few slides, the main features I'm going to use is Sagemaker debugger and Sagemaker model monitor let's take a brief look at what Sagemaker model monitor is. Model monitor is a feature of Amazon Sagemaker that allows you to detect data drift. So once you have trained an Amazon a model on Amazon Sagemaker, you can now deploy it as an endpoint. Now when users are interacting with your endpoint, model Monitor will now automatically capture requests and predictions and upload them to the Amazon S three bucket. Model monitor will also create a baseline processing job. It basically takes the training data and computes some statistics on it. So for instance, assume you have a tabular data set. Now this baseline processing job will check the different columns in your tabular data set, such as the min and max value average, and then in the deployment phase, you can now specify a scheduled monitoring job that will run once an hour or once a day. You can specify the monitoring interval and this job will basically take the requests and predictions and compare them against the baseline. As an example, let's assume in your tabular data set column one was always between a value of zero and ten during training, and now during inference, it has a value of -100 to plus 100. Model monitor would automatically detect this problem and record these violations and statistics in an output file that is uploaded to Amazon S three, and it will also publish some metrics to Amazon Cloudwatch. Let's take a brief look on what Sagemaker debugger is. Sage Maker Debugger is a feature that provides you utilities to record and load tensors from your model training. It's typically used for training, but you can also use it for inference. It comes with an API that is called SM debug. It's open sourced, it's a framework agnostic and concise API to record and load tensors. It supports the major machine learning frameworks and it also provides the concept of rules to automatically detect issues, which is very useful as part of the training. You can also customize and extend sagemaker debugger. If you use debugger as part of Amazon Sagemaker, you can use built in rules as well as offload the rule analysis to separate instances and specify rule actions and notifications as part of the workflow to detect adversarial inputs I mainly use the SM debug API to capture the layer representations. Let's take a brief look at the SM debug API. So with just a few lines of code, you can enable debugger to capture certain layers from your model. So as we have shown on the previous slides, I was analyzing the TSNE embeddings and that I computed on the different representations produced by the deep neural network. I use the activation outputs. So what I do, I specify now a debugger hook configuration where I can just specify a regular expression of all the tensors that I want to have collected, and I specify the output path where this data should be uploaded to. So with just a few lines of code, I can now enable debugger and capture this data. And then once you have captured the data, you can easily access the data using the same API. You specify where the data has been recorded and with that object that is created the trial object, you can now start iterating over all the different inference requests that have been recorded, access the tensor, and then do a computation on that. So now I would like to show you the system design for detecting adversal inputs using Sagemaker model monitor and debugger. So first I train the model on Amazon Sagemaker and enable debugger to capture the layer representations. These tensors are then uploaded to Amazon s three. Once the model has been trained, I will now deploy it as an endpoint on Amazon sagemaker. And in the endpoint I also have debugger enabled to capture the layer representations during inference. Now my users may interact with the model and send some inference requests, and my model performs predictions. The layer representations as well as the model inputs are recorded in Amazon history. Now I use a custom model monitor to basically run this two sample test using MMD. I run this every hour. So every hour it will capture the layer representations that were recorded during the training and compare that with the layer representations recorded during inference. And it then runs this two sample test and if an issue has found it will record some metrics as well as write an output file to Amazon S three with the recorded violations. And basically this custom model monitor will now output the result of which kind of images have been the most likely adversarial. So as a user, you can then download this file from Amazon S three and perform some further investigations. You can also use Amazon Sagemaker Studio to get some further insights. Amazon Sagemaker Studio is a machine learning IDE, and you can check, for instance, the execution of each of these model monitoring jobs. What we see here is that in the first hour the model monitoring job did not find any issue, and in the subsequent hour I send adversarial samples against the endpoint. So an issue was detected by model monitor. The custom model monitoring container also outputs some metrics to Amazon Cloudwatch metrics, such AWS, the inference request process, and a detection rate that is indicated AWS orange line shear that indicates how many of these inference requests were detected as adversarial. And this is computed for every hour. So you can use that to determine if there was an attacker active at a specific time frame. What we see here is that the detection rate was roughly about 100% at 05:00 a.m. To 06:00 a.m. And then it started dropping. So around 07:00 a.m. The attacker was no longer active. Sagemaker Debugger stores the tensors in Amazon s three so as a further analysis, you can now, with just a few lines of code, use the SM debug API to do some further analysis. You can now just create this trial object to access the tensors that have been recorded during inference, iterate over each inference request, and visualize, for instance, the TSNE embeddings for each of these tensors to see. How does the distribution of the representations during inference compare with the ones that have been recorded during training? With that, I would like to conclude my session. Thank you, Natalie, for taking us through those great examples of adversarial input and how to deal with that. We needed to build systems that embrace failures as a national occurrence. Another fantastic example, and quote, I always refer from our Amazon CTO. So chaos engineering as a continuum. Let's build confident ML systems that withstand turbulent conditions and adversarial inputs every time it runs, not just in production or any particular moment of time. And I would like to thank you all for your time today, your interest in our session, and spending time with us today. So thank you for that. And this is not the end. If you really wanted to know more or wanted to be in touch, please feel free to reach us, or you can connect us through LinkedIn. Once again, thanks everybody.
...

Soumen Chatterjee

Partner Solution Architecture @ AWS

Soumen Chatterjee's LinkedIn account

Nathalie Rauschmayr

Senior Applied Scientist @ AWS

Nathalie Rauschmayr's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways