Conf42 JavaScript 2021 - Online

Developing Spidey Senses: Anomaly detection for Javascript apps

Video size:

Abstract

Anomaly detection is the process of identifying unexpected items or events in data sets. It’s about detecting the deviation from expected pattern of a dataset. It’s like having “spidey senses” for your apps that can detect when there’s danger or something is not right. Attend this session and learn about using anomaly detection in javascript and Cognitive Services API, become a superhero and save the day.

Summary

  • Anomaly detection for Javascript application. What is this spidey sense? Most likely you've heard about Spider man. It's that stingling sensation on back of Peter Parker's skull. If you're a new web developer and JavaScript developer, we're here to help you understand.
  • What is this anomaly detection? It's that gut feel and vibe or intuition that you learn through time. We'll do anomaly detection specifically for time services. And then we'll do some demos and some takeaways.
  • Anomaly detection is identifying unexpected items and events which is different from what is normal. There are two causes of outliers: artificial or non natural or natural cause. Sometimes it could be rule based systems, sometimes it's statistical techniques. Sometimes you would use machine learning.
  • Internet of Things has a lot of time services data because of whatever data you collect from sensors. There are different time series anomaly types. It could be outlier spike and level shift, pattern change and seasonality. These systems are becoming more and more critical day to day.
  • Azure cognitive services is AI for every developer without the need or expertise for machine learning expertise. Today we're focusing on decision capability and there's this anomaly detector which identifies potential problems early on.
  • In order to call the anomaly detector API, you need to use this anomaly detector client. Before I can use anomaly detector, I need to create an instance of anomaly detecting through Azure ClI. And what I want to do is force it to be anomalous in this case.
  • It has C sharp JavaScript or Python SDK clients. There's docker containers. You can actually integrate it with power Bi or Azure databricks if you want streaming data. There is also another Azure cognitive service called Metrics Advisor. It can diagnose anomalies and help with root cause analysis.
  • Ron Dagdag: The best superpower that you can give to your project is anomaly detecting. Can API to detect anomalies automatically adapts and learn from new data sets without needing training data. Feel free to test out your new superpowers that you just learned today.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good morning, good afternoon, good evening, wherever you are in our virtual world, my name is Ron Dagdag and I will be talking about developing spidey senses. Anomaly detection for Javascript application. Let's get started. So what is this spidey sense? Most likely you've heard about Spider man. It's that stingling sensation on back of Peter Parker's skull that gives him that ability to sense or react to danger. It increases his ability to figure out and detect clones, navigate if he is impaired, can't see anything to find secret passageways and different hidden and lost objects. It actually helps him fire his web shooters and swing instinctively. And I think it helps him change to his costume. And the real amazing part is this real spidey senses that spiders has, it's called hyper awareness. It's this long thin hairs, it's called trichobotria that actually allows them to detect and low level vibrations and events from sound. And that's the interesting part of it, that they can even detect up to insects up to 3 meters away because of that. And every time I see spiders in all these different hairs, you feel different. It feels the Hibby GB's. Yes. And then of course, if you're a new web developer and JavaScript developer and you're still starting out and trying to cast your web out there in the World Wide Web and there's none coming out, it's okay. We're here to help you understand. What is this anomaly detection? Okay, what is this spidey senses? It's that gut feel and vibe or intuition that you learn through time. Right? You learn from the past. There are some as a developer, being a developer for more than 20 years now, you get that sense of feeling of a project, if it can become successful or not. I guess you learn it through time and you learn it through the different experience from the past. And you kind of build that intuition and what the technology can deliver in terms of requirements and those things. So today we'll be talking about what is anomaly detection, what is time series data? And we'll do anomaly detection specifically for time services. And then we'll do some demos and some takeaways. Okay, let's get started. What is anomaly detection? It is identifying unexpected items and events which is different from what is normal. It's so weird, like pandemic, right? It's not something that we're used to. I guess we're getting used to it by now. So it's becoming the new normal, right? So sometimes it's called an outlier. The assumptions are that anomalies rarely occur in your data set and the features differ from the normal instances significantly. So there are two causes of outliers. It's either artificial or non natural or natural cause. So one causes of it could be data entry errors. Think about it's 100,000 versus 1 million. That excerpt zero makes a whole lot of difference, but that is an outlier. Measurement errors, which is very common experimental error. When you start in the late of the sprint, you start collecting at the late of the sprint, even though you're supposed to collect the whole data set around certain interval levels. Intentional outlier. One good example of that is if you ask high school students or college students about their consumption of alcohol, most likely they may underreport it for any other reason. So depending on how you collect your data, data processing errors is where you extract from one service to put it on another service or one dataset and pass it to another dataset. Sometimes you may encounter extraction errors that may cause outliers. Sampling errors. In this case, you're trying to report the height of all the athletes, but most of your data set are basketball players. So your data would get skewed and that may cause some outliers and of course natural outliers when it is not artificial and it wasn't caused by data processing or data collection. So at the end of the day, you have an input data stream. This area right here, you're trying to detect. These are good data. And of course you have some defective data right there, and then you're trying to analyze if this data set is anomalous or not. So most of the time it's good data, but sometimes you would encounter it. You haven't really figured out that that is defective. So we'll go to this column, but hopefully, most of the time you'll be able to detect the things that are defective. But sometimes it is not really defective, but you would be able to catch it here. So it just depends on how you would implement this anomaly detector. Sometimes finding anomalies in a data set, it's kind of like finding a needle in a haystack. If the needle is that big. Yeah, that is easy. Most of the time it's not that big. So what are different methods and how you would do anomaly detection? Sometimes it could be rule based systems, sometimes it's statistical techniques. Sometimes you would use machine learning. And we'll go through each one of these rule based systems. Most likely you kind of know that it's where you specify the specific rules and assign a threshold of limits on, like for example, at certain temperature level, you want to alert when it reaches a certain threshold or certain limits you want to set up, can alert or if it goes down a certain value, send an alert. The advantage, in a way, advantage and disadvantage of it is it does require an experience of industry expert to detect known anomalies. So you have to interview people and say, what do you think is the possible problems that causes this type of issue? And if it goes to a certain threshold, then we can do alert for certain conditions. Right. The disadvantage of rule based systems is it does not adapt as pattern changes. So once you sets up the formula to calculate and set up the rules, then it would not adapt because you have to change it and modify that logic again. And of course, it does require data labeling and knowing. Okay, this data set, it is anomalous. This data sets is not anomalous for statistical techniques. It's where you can flag the data points that deviate from common statistical properties. So this is where you calculate the mean, the median or quantiles, or some other cases where you figure out rolling averages or moving average, most likely, if you're like buying stocks and it gives you the moving averages or buying and selling securities, those kind of things, you use a lot of statistical techniques to identify if it's out of the ordinary and of course the trends where it's going. Right. You can also sometimes have simple moving averages, sometimes be called low pass filters. One good example of that one is Kalman filters. There's a formula specifically for that. Sometimes it's histogram based outlier detection that can be implemented. The good thing, the advantage of statistical techniques is it's more interpretable and sometimes it's useful than machine learning methods. It's easy to explain to someone, to one of the bosses, this is the formula I use. I use the mean and the median, and this is how we detect anomalies that way. So for machine learning methods, sometimes you can do anomaly detecting as supervised, unsupervised, or self supervised services is more of decision tree. Unsupervised would talk about k means, hierarchical clustering, self supervised. When we start talking about auto encoder, we're not going to cover the formulas on each one. I'm just showing you different ways on how you would do machine learning methods here. But we want to know when do we use anomaly detecting versus supervised learning? Anomaly detection for machine learning, when you have very small positive positive examples and very large negative examples, you would use anomaly detection techniques. If it's supervised learning, most likely you have large number of positive and negative examples and you have enough positive examples for the algorithm to learn. For the anomaly detecting type. Sometimes it's hard to learn from positive examples as compared to supervised learning. And sometimes the anomalies have not been discovered yet. So you want as much as possible to do anomaly detection techniques for this rather than the services learning. Because for supervised learning, future positive examples may have not likely to be similar than your training set and it might not know how to detect because of that. So when would you use anomaly detection techniques? If you're doing fraud detection, manufacturing engines or machineries, they have a certain routine or a machine just goes through items cycles and if it's out of that cycle, that's when you know it's anomalous. When you're trying to monitor data centers, that would be a good use case for anomaly detection and Internet of things, which I would explain a little bit more for supervised learning. Email spam classification. Why is that? Because there's a lot of good examples of what spam and not spam is. And so you're detecting a certain type of email to detecting and use that for supervised learning. Weather prediction, there's specific criteria for weather to identify it. And that would be a good example for using supervised learning and cancer classification. Because an expert already knows what they're looking for and specific cancer cells and those kind of things, then it might make sense to use supervised learning rather than anomaly detection. So for machine learning, sometimes it could be density based anomaly detection, where they can cluster whenever they cluster the data set. So based on the kneeest neighbor, so where the normal data points occur around the senses neighborhood. So that means they're closer to each other and anything that's outside of it. These are the anomalous because they're not close to the center of your data set. Clustering based is the assumptions are that data points are similar and tend to belong to clusters from local centroid versus, and then anything outside of that, anything farther away, then it can be detected as anomalous. You can also use gaussian distribution where you calculate for any given data point, the probability of that data point being as normal. These are all the normal in terms of the gaussian distribution right here. Anything outside of that for a very far away would be considered anomalous or an outlier support vector. Machine based anomaly detection, that's also a good formula. At the end of the day, what it's trying to do, it tries to split your data into two. This side is these are your normal data anything outside of that line, most likely it's anomalous. That's one way on how you would do different anomaly detecting techniques. All right, so let's try to do a simple anomaly detection and we're going to focus it on our javascript. So let me try to pull in my data set. Am using, right now I'm using this Jupyter notebook, and I have under Jupyter notebook I'm running typescript application. I think this is more javascript application right here. And the reason why I'm showing this so I can execute line by line and be able to show you the results. So in this case I'm using stats analysis. There's an NPM package and I have this array of numbers right here. And what I would like to do is to filter out the outliers on this and just keep the ones that are normal. So in a way they cluster together, right. They're kind of close to each other and this is so far away from the rest of the data set, so they are considered outliers. So run it this way too. So it gives you the results. So the results here is that all the outliers are taken away in just the good data. So that's the simplest explanation, simplest code that I can find that we can start, how to start using outlier detection in our javascript application. Okay, let's go back to the presentation. Okay, so let's talk about time series data. Time series data is a series of data points indexed in time order. One good example of that are logs or stock market data or sales data or senses related to at the end of day. What we're talking about here is any data captured with the timestamp. So you have your timestamp data and then the value timestamp, then value timestamp and value, right. And you can have multiple values, or however those values are, as long as they're indexed against time. Most likely this is very common because if you start looking at log files, you'll see it's all time series based. Of course, Internet of Things has a lot of time services data because of whatever data you collect from sensors, it's from specific time, right? So because Internet things is happening, because you have increased data volume, you can start detecting data from these senses. The sensor are getting cheaper and of course there are increased data speed, meaning the networking to collect this data and send it to the cloud or get processed, it's possible, but it's very important that the data that you're collecting from these sensors are moving very fast. But failures are, these systems are becoming more and more critical day to day. Right. Tell me about that. Because sometimes whenever our Google home or Alexa device are down, we're having trouble how to turn off the tv, and we have to find that remote again, the remote control, those kind of things, little things here or there. But it's becoming critical at our household. So whenever the Internet of broken things, it feels something like this. It's trying to debug, like, what actually happens on that data stream that you are receiving. So there are different time series anomaly types. It could be outlier spike and level shift, pattern change and seasonality. And we'll go through each one of these. Outlier would look something like this, right? You have your data set, your time services data through time as you received it, and of course, the values of each one. And then, of course, there is a spike here or an outlier, and this is out of that ordinary. So this is what you want to detect. It could be spidey and level shift. One good example of this one goes through this level, and then suddenly it shifted up. And what happened? Sometimes you want to detect this area right here where you're detecting that spike. And of course, the level shift can also be possible. Notice how the data is flowing through like this, and now it's lower. And why was that level shift changed? Pattern changes look something like this, where the way I kind of imagine this is you have, you're watering your garden and there's specific flow of water as it flows out of the hose. And suddenly someone stepped or there's a kink in the hose, and then suddenly water just slowed down. And you want to know when that happened, where it happened, those kind of things. And so you're trying to detect pattern changes because of that. And, of course, seasonality, you have to consider that, too, whenever you're detecting anomalies. If you think about it, certain times of the year, there's seasonality, like around summertime, of course, ice cream sales are higher compared to the winter months. There's also, like here in the United States when we have football season or around Super bowl, pizza sales are higher compared to anywhere else. Everyone wants to watch their favorite, favorite game, those kind of things. So you have to consider that as part of your data sets and identify if there's seasonality around that. So what you're trying to do here in terms of time series is to detect these type of instances where it's out of the ordinary. So this is the pattern. And suddenly these data is outside of its pattern, what you can expect, and this one too. And through time you have the series of time and based from these data set identify if the last part is an anomaly or not. So it depends on how far and you have to specify sensitivity to how sensitive you are to trigger an anomaly. Okay, so far what I've been talking about is it's called univariates where you have one variable and through time series data set, but there's also a concept of multivariate variant where you have different time series data and you're trying to identify if this lot is this out of the ordinary or this lot is out of the ordinary. This is more complex to implement as compared to a univariate. So we're going to focus on the univariate today. But I just want to let you know that sometimes depending on what the needs are, you might need to implement a multivariate system. Okay. Azure cognitive services is AI for every developer without the need or expertise for machine learning expertise at the end of the day what it is, it's an API call. So each azure cognitive services have different capabilities in terms of this. And today we're focusing on decision capability and there's this anomaly detector right here which identifies potential problems early on. So that's where it's more of a decision make time. So we're going to focus on the anomaly detection detecting. So anomaly detector can detect anomalies as they occur in real time and also you can detect anomalies as a batch. So you have a choice if you want to pass your data to this API, do you want it real time or you want it as batch. It automatically adapts and learns from newt data set and you can fine tune its sensitivity for it to detecting anomalies. So there's settings that you can do. These are rest APIs. It does not require machine learning expertise and it does not need labeled data. That's the crazy part about this is because you don't need training data to send. You just call the API, send your data and it would detect anomalies based from a time series data set. It automatically identifies and applies the best fitting model for you at the back. And it actually has these gallery of algorithms and a lot of these I do not know how to implement. It's using sometimes Fourier transform, which is kind of like in the computer vision side. You would do extremes, all these different algorithms that it's implemented. But the interesting part of the anomaly detector is it classifies what type of algorithms it's going to use. So if it figures out your data set has some seasonality in it, it would have these algorithms related for seasonalities. If it has course, granularity without seasonality would have different set of algorithms. And it's doing this every time you call the API. So that's the interesting part. It's trying out different algorithms all at the same time too. There are some limitations on how you would use the anomaly detector API. The data granularity, it's either daily, hourly, minutely, monthly, weekly, yearly. And the series data points that you have to pass in looks something like this, where it says series. And this JSON file where you have the time series data and the value, the minimum is twelve items, so twelve on this array and maximum is 8640. And you specify that granularity. The interesting part is if you want every five minutes, you have to specify this custom interval that it would know that, hey, this is every five minutes. Okay, so there are two ways in how you would call anomaly detector API. It's either through a client SDK, a c sharp python node, which I'm going to demo today, how to use the client SDK node, or it's through rest API, so it can support any language as long as you can call HTTP or rest calls. So let's start with our demo. So I have here actually, this Jupyter notebook right here is actually running on one of my raspberry PI's right here. And this raspberry PI has this sense hat so I can get temperature data of the room and also have some led pixels so I can display if the data that we've collected is anomalous. And then we display something here. Okay, so before we start, I can show you the package JSon that I'm using for this in order to call anomaly detector NPM package. There's azure AI anomaly detector, and of course Ms. Rest js. We would need. This env allows us to read environment variables. Then this spidey senses hat, which allows me to talk to the raspberry PI hat. It's called the senses hat. And then this was the stat analysis I did demo a few minutes ago. Okay, let's look at this senses hat right here. And what we'll do is I'm going to clear all my outputs. Not yet. Well, I just wanted to show you how I did run it a while ago. And like right here, see how I'm running it. This typescript kernel, I'm actually using Tslab to be able to have typescript running Javascript running into Jupyter notebooks. So right here is the version I'm using for tSlab. So this one right here is node sense hat. I would like to get the leds on that matrix. And then I wanted to read some data from that acceleration data. So I'll show you what the output does look like right here. Let me try to run that. So, notice how the acceleration data looks something like this. It reads it. So I was able to get, in this case, I was able to get this temperature of my raspberry PI right here and to display that value. And then I went through here and actually get this. What I'm doing here is I read every minute, and every minute I will push it into an array, and then after that I will have something like this. So I will have this value with this timestamp I get the value. So this is my time services data that I collected. So this is where I was running it and I would like to get it every minute and then make it look something like this. So once I got my time series data, now it's time to process it and send it to anomaly detector. So that requires me to use this AI anomaly detecting client SDK. I need this core auth to be able to get the credentials. Before I can do this or before I can use anomaly detector, I need to create an instance of anomaly detecting through Azure ClI. And these are the commands I did to create the resource group, the cognitive services instance, and then to get the keys. So there are two things that you need. In order to call the API, you need the endpoint, that means the URL where you would read the call, and also you need the access key or the API key. So that's what I'm doing here. I have that in this config or this environment file that just loaded it to memory. So in order to call the anomaly detector API, you need to use this anomaly detector client. You specify the endpoint and then you pass the key to this azure key credential. And then it would give you this anomaly detector client. And once you have that anomaly detector client, now you can pass things to it. This one right here, what it's doing is it's sending a data set, right? And it's detecting the last entry of that data set. So I have to send certain set of data, a time series data set, and that's why I'm putting this into the body and I'm identifying my data set is every minute. And what this one does, it would give me a response that if the last items on my list is anomalous or not. So you would say true if it's anomalous or false if it's not anomalous. So you can actually run this. Of course. The important part is to run this first. Right. Initialize the anomaly detector client. Now I can call it right here. And that's what I did. So it tells me right here, the last point on my list, which is row 15, is not detecting as anomaly. And then I will create. So what I did here is I'm creating a new instance. This one's new points. And what I want to do is I want to get the last item. This is the last item on my list, right? So 34.275. And I just want to force it to be anomalous, right. So in this case it has to be 134 instead of 34. So now my new points would look something like this, where this one is the normal and this one is outside of the normal abnormal. So this one should be detected as anomalous. So this one right here, if it's just some constant that I want to pass in to what you call these, let's go back there to my leds and I want to put an x, if it detecting as anomalous. Okay, let's go back and I would like to show you how that would look like. Let me try to set it up real quick. I want to make sure that you can actually see what it's going to do. So let's try to run this one again. Come on, set it up. See if we can fit all that data set. So when I run this, if it's the last detection, if the last item on my list is anomalous, I would set the pixel to cross. So this one would have an x in it. And let's see what happens. Boom. There is, well, it's kind of harder to see, but there's a letter. The leds right there is a little bit, it's too bright if you ask me. That has an x. That means there is anomalous there. Let me clear that up. And there you go. So it kind of cleared it. Okay. Isn't that cool? What just happened? What we did was to read data from our sensor. Right here, I'm using JavaScript to read data from the temperature sensor of this raspberry PI. And then I collected some array. I used anomaly detector API to send my data set that I collected, and then it gave me a result that says the last item on my list is anomalous. And then I send an alert and say, hey, there's something wrong. With my data set and set the pixels on these raspberry PI and set an x in it and I cleared it out. Cool. All right, so let's go back to the presentation. So where can you use anomaly detector API? It has C sharp JavaScript or Python SDK clients. There's docker containers. You can actually integrate it with power Bi or Azure databricks if you want streaming data. So there's a lot of use cases where you could integrate anomaly detector. So where can. We already talked about that. Those are just different links. The cool thing is there's docker containers so you can easily integrate it into your application too and running it at the edge. There is also another Azure cognitive service called Metrics Advisor. And this metrics advisor is specifically has a web portal that you can actually diagnose anomalies and help with root cause analysis. It's more of a software as a service application where you can collect time series data from different data sources and detecting anomalies from there, and then you can configure it where it would send alerts and it would help you find the root cause of that issue. All right, so the best superpower that you can give to your project is anomaly detecting, which sometimes it's called Spidey. So if you're interested in learning more about what I did today, if you want to get the code, this is the GitHub link where you can get and download the code. So just to recap what is anomaly detecting? It is the process of identifying unexpected items or events in our data set. What is time series data? It's a series of data points indexed by time order. And then today I did demonstrate what is anomaly detector API? It's can API to detect anomalies automatically adapts and learn from new data sets without needing training data. Cool. If you're interested in learning more about me, my name is Ron Dagdag. I'm a lead software engineer at Spacee. I'm a fifth year Microsoft MVP awardee. The best way to contact me is through Twitter at Ron Dagdag or LinkedIn. Connect me through LinkedIn. Ron Dagdag thanks for geeking out with me about spidey senses and anomaly detecting. End now that you got bitten off by these virtual spider, feel free to test out your new superpowers that you just learned today. Thank you very much. I appreciate your time and have a good day.
...

Ron Lyle Dagdag

Lead Software Engineer @ Spacee

Ron Lyle Dagdag's LinkedIn account Ron Lyle Dagdag's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways