Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, my name is Emmanuel Bakari, and today I'll be taking you through
exploring the math of thresholds.
Thresholds are bounds, upper and lower bounds.
And I'll be showing methods using numerical methods which is in
mathematics, iterative methods for predicting where something should be.
So where am I?
My name is Ivano Ery.
People call me back, man.
I'm a senior developer engineer at Twilio.
I also on decide build solutions at Baseline EQ Cloud.
One of them is Course Craft where we use some of these methods for things
like right sizing request and limiting Kubernetes automatic right sizing.
So you can check out the product there.
Cost craft to baseline HQ Cloud if you like it.
What is the threshold?
A threshold is a boundary at certain time that defines a minimal maximum.
So it can either be an upper bound or a lower bound.
Metrics define.
The values that you then use, right?
To check whether threshold exceeds or not at the time.
T And then if it reaches for assessing period, usually
that's when you can take action.
In this case I'm showing what is an anomaly band, which is a open
and lower bound maxima, right?
This is a scatter up plot, but in this, that the values are drawn out, right?
So this is a predictive model.
So as you may notice, no taches are the same, right?
That is very different to what you would've defined as a threshold.
And that's because the one we are aware of is static thresholds, right?
Which is just a classic line through the noise, right?
They're fixed, they're easy to estimate.
They're also easy to diagnose.
Like you can look at it straight up and know that find this is not there,
but they're also prone to noise, right?
They can fail.
They don't adjust well.
So the situations of your system, things like organic growth, E to C, TC, and
they're also single variants, right?
You can either monitor CPU or memory, but in dynamic thresholds,
you can use CPU on memory to say.
To define, okay, fine.
We need to scale the system.
Right?
So autoscaling models usually employ dynamic thresholds and dynamic thresholds.
In this case, they change with the metric observations.
So they're boundary based, right?
So that can be upper or lower, right?
In the case of the anomaly bands I just showed you they're always calls home.
So they rely on past and present data and they are feedback loop driven.
That's basically what the causal sense means, right?
It's control systems, that kind of stuff.
So they're motivated, they adapt well, they're complex to align, right?
You can't basically say this is where it's gonna be, right?
They need more, you need at the start.
'cause you also have to understand how your system is modeled and,
but they're less pro to false positives and a result of that.
So definitions are done.
So what's the process like for static thresholds?
So start off with a static threshold.
You have a bunch of data points, right?
So you need to filter out for out layers, right?
To start off, right?
Clean up the data, that kind of pre-processing if you want to.
It's not particularly a required process, but you can do so with
things like I QRS and Z course, right?
Just to see just how the world spread out.
Your data is right.
Standard deviation also comes in here, but that's on the Z course.
The second thing you do after that is that you define your aggregate
statistics, which is like all these data points that are present,
how do we make it to one value?
And you can do that with present ours.
You can also do that with a linear aggregate like in me.
Which is just an average or in this there's geometric kind of means, but in
this case, I mean a normal linear mean.
So the once you've done all of that, then you can reduce your mean error.
I'm using something like the law of large numbers, which I'm explaining a bit,
but this is like the entire process for static thresholds with numerical methods.
So why do we need to filter for out the years is because
static thresholds don't adapt.
So that means that from the start, you need to be sure that is the normal
band of your system that isn't really influenced by those art layers, right?
So that's the skewness of your distribution.
The ketosis is reasonably fair, right?
That's that be core distribution.
Art layer filtering basically guarantees that you reduce the case of
a false negative or a false positive.
I can keep it going from there.
Now in Takota Ranges, as we've spoken about before for that filtering
process is very simple, right?
And I only recommend this, and then you can then move into
more extreme methods, right?
But this is very straightforward.
The idea is that you take what the first and third quarter would be, right?
You subtract them, which is the range, part of the inter quarter range, and
then you basically define the minimum maximum using the first quarter, minus
1.5 times that inter quarter range.
And then the third quarter plus 1.5 times the inter quarter range.
So it helps you like, stabilize those numbers before you then go into
predicting you know, where they should be.
Now, the law of large numbers as you have seen me mentioning, is basically a
lot that the average of a lot of results will converge towards the value that
should be the result of the experiment.
So if you had a lot of data points, say spread out and say you took
the P 99 of the first five minutes.
Then on the next five minutes, then on the next five minutes in that like
rolling window fashion, you might get one in the first one, one in the second
one, two in the third one, one in the fourth one, one in the fifth one, right?
That's who is the outlier.
But because of the fact that you have a lot of samples broken down, if you
got the average of everything, it would converge below two and towards one.
And with the more samples you add, the larger the probability that it tends
towards one, which is your ideal mean.
So that's the idea of the law of large numbers.
It reduces the error case.
So the way that you start it off is that you define a trend for
what a system should behave like.
In this case is a CR job that runs three times every minute.
You can fail every time.
It can pass once, it can pass twice.
So it can pass all the time.
You take those numbers and then you define what would be a logic table, right?
With those sets of currencies.
Can it fail all the time or can it pass all the time?
And in that order, and then you basically throw it into what that trend formula
would be because it's a static threshold.
If it's predictive, you can do this.
If it's non predictive, then you can basically go about that P 99 case I
mentioned of just sampling by time, right?
In this case, I can already know what the value would be at that time,
and I can use that then to estimate what that, trend would be like.
And then just generate like my own data set for it.
So yeah, so you run multiple IT versions of that, right?
Using percentiles or, you can use an aggregate function as we've already
spoken about, mean max or a percentile.
And then as you begin to then average that trend, you will get towards
the value that meets that threshold.
You can also guarantee the convergence using standard deviation, right?
So in this case, this is like a sample of what that looks like
from like when I do one and 10.
Iterations of, each running.
So one in the sense that there's only one crown jump running and 10 in the sense
that there are 10 crown jobs running.
Those three iterative steps every minute, either health checks or
whatever you use to define it.
And then this is like what that random occurrence would look like
if I six a DP 90 or the P 99.999.
You notice the P 90 has higher aviation, right?
So this would be nice if your system say, had a lot of spikes, right?
And it was very like jagged.
And this P 99.999 works in a sense that it's very predictive.
It's always available and you want to have very like strict SLOs.
So the standard deviation will obviously capture this, right where you can see
standard division for P 90 is quite high.
And as you then add more ations, you'll notice it goes down.
So this also then means that we're attending towards that convergence.
So this is where the low large numbers then plays a good part
in running experiments like this.
'cause you can be able to then define a way to then aggregate.
That value towards getting the threshold.
And also then analyze whether like you are getting much closer
towards the actual mean that you want rather than it's increasing.
And in that case, you would see the standard deviation go up.
Like in this case where you can see a hundred, 200, it decreases
two 50 with 10 increases again.
So with duration, P 99.999 obviously has more variants, but
the standard deviation is quite low.
So you can.
Obviously use that to judge based on your expectations.
So yeah.
But yeah, here's a graph of what the experiment looks like.
You notice that the P 90 threshold is close enough at
102 hundred iterations, right?
And it obviously then, shows like what that range of values would be.
So it can take all this data and then be able to define what that
threshold value on comp should be like.
We know what are heuristic estimate.
Would fit into.
And then this is basically how you apply the low, large numbers, right?
And yeah, you can basically then define that actic threshold.
So that's basically what the low, large numbers is.
It's a simple method for approximate static threshold to predefined behavior.
Even if it's not predefined, you can obviously then procedure 99
to aggregate for that timeline.
It allows you to get clear methods on where that value should be.
At any given point in time.
And then you can obviously then define it based on that your aggregate function,
whether it's a sum, whether it's a percentile, whether it's an average,
and then convergence across multiple.
The top points implies the limits that captures where that
threshold should be, and so you don't then have to guess, right?
You've already proven that out, right?
Over a very large period of time with multiple iterations, experiments.
And checks.
So yeah, static thresholds are very easy, right?
What about dynamic thresholds?
How do we get there?
Now, dynamic thresholds in platforms is, these are is usually called anomaly
detection, but it's the same thing, right?
So you want to basically capture whether the behavior system is
outside, what a predictive model would say it should be in, right?
So again, dynamic, right?
It's not fixed.
So the idea is that you can build them on.
Supervised models, right?
So in this case you get into machine learning or tical models,
so things like logistical regression falls in here you have a priority
algorithms, things like clustering, unsupervised learning, and also core
fitting, which is just Aris, right?
You just basically say, this is how I feel the strength should be, and then
I can then find values that match it, and then I can then use that to then
judge the fit against whatever assign disorder behavior that system exhibits.
So I'll talk about curve fitting 'cause it's the one that I've
done and I can share examples of.
But they are, but all the other methods I'll talk about them in a bit.
So the basics to curve fitting is that you have a, like a series that exhibits
like, some or wave like wave form, like fashion, and then you're trying to then
find what's trend would fit on it, right?
You can say it's a sine wave, a hand wave, a cross wave, like whichever way you want
to express that wave form, if, even if it's exponential to the exact same way.
So you start off like this, you basically define, say perfect.
In this case I've generat, I've generated random data showing
you our sales in millions.
It goes up and down across the hour.
All of this is basically random, randomly distributed, but it's still
follows as any sort of fashion.
Then we then find a waveform that would fit it, right?
And here you can see what the normal waveform would be, and here you can see
what the fits the curve would fit for.
It would be like I, since that division is reasonably low, and
then the fit is around 70%, right?
So how do we get here?
The idea is that remember that?
Use case that we defined.
So we obviously didn't have to estimate whether the fit is good or not.
You can use the cheese square, good fit or standard deviation, which
is used in cheese square, good fit.
And then cheese square good fit is better because it also takes into
consideration the number of samples as opposed to standard deviation where I
just checking the range of deviation from those, of the samples from the mean.
So you get there by taking A, B, and C. So A can be the amplitude B can.
B, I guess the phase multiply and then CS the constant, right?
So you take all of those together, you then iterate across different values of
A, B, and C to see like which one fits better, and then you then basically
check what the error there would be.
Or in this case, you can use the FIT formula, right?
So then judge like how well it actually fits the curve that you defined.
It's the, for core fitting, the V variables are bounded, right?
You can bound it based on, say, the amplitude of the curve, right?
Like how high does it go.
And then you can obviously then use that to then find, okay, fine.
If a look like it's tend towards the fits, then you can then change B and
C to see whether fits better or not.
So that would be that regressive the defense model for error correction.
So yeah, it's very simple, it's very straightforward.
There also are methods for doing, dynamic thresholds, right?
Based on that machine learning phase of either you're doing anomaly bands,
or in this case you're just having one part trying to, approximate
what that threshold should be.
You can use logistic regression methods.
K Ns, you can use naive phase scaler vector machines also fitting in here.
And you also have so these ones you would've classified data, they would
label properly, everything else.
So this is more supervised.
The case where you have like unsupervised methods, right?
So you can use isolation forest.
The case that I showed before of those are normally band with the open and
lu with the open lower fits, right?
You can obviously then use density based cans.
This case, this example, use them PCA for filtering, alignments
and all those sort bits.
So that this is also an example where you can do.
Anomaly detection on random scatter, upload data with, unsupervised learning
methods like density based scans and PCAs.
Yeah.
The summary of all of this is that to basically use numerical methods, you can
identify the business opportunities like the kids that I've mentioned with the
dynamic autoscaling or say in the non, technical sense where you're just trying
to see whether user orders have increased so that you can scale operations.
All these things are more, you can use them to basically predict what that value
should be if it lies outside effects.
And then you can obviously then, take action over time.
Organic growth and, pass data will influence these models.
So they would just auto fit over time, but at least now you can
capture, those trends without having to get paged every minute or two.
Yeah, confirm with that.
You need dynamic or static threshold.
Determine what the noise step should be.
You can use cut up plots, you can use clustering methods, you can
use iqr, you can use these scores.
There's various ways to actually check whether your dataset is
fit for these methods or not.
And then from there, just iterate until the error is minimized.
A lot of these things are iterative methods, multicolor in that.
And, whichever we choose to define them.
These are just methods.
They don't particularly define the approach, but it's a good way to
think about thresholds and statistics when you're modeling, monitoring
systems or just trying to improve the way that you iterate everywhere.
Yeah, thank you all.
Again, you don't need to use all these guys.
Sometimes eyeballing might be worth all of these, but if your
systems behave in very weird ways.
It might be worth the deal to with statistics to save you months of headache.
Thank you all.
Go forward and explore thresholds.
Thank you.