Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
So I'm here to talk Mo mostly about how we can do the carbon aware
Kubernetes scheduling instead of the traditional Kubernetes scheduling.
You might be either using E-K-S-A-K-S or GKE, like Google
Kubernetes engine, whatever the cloud platform Kubernetes you're
using, or any bare metal Kubernetes.
I think the future is that we have to be, aware of building
the sustainable infrastructure and also not harming the planet.
I've been like researching about that and I've come up with a framework which
will talk about what can be implemented to improve the cost effectiveness,
sustainability, and even environmental friendly infrastructure as a whole.
So basically the current challenge is that, that there is like lot of cloud
computing that is happening, like with AI booming up and machine learning
models are being training on that.
Lots and lots of cloud EC2 resources and e case resources are being used.
So the traditional Kubernetes resource management, usually like how it happens,
like there is a scheduler and the scheduler based on the request that it
has based on the deployments that we have.
It usually schedules the pods and whatever the workloads that we
have, they usually run on the pos.
But the key issues that here are.
Here are that the data center account for approximately 1% of
the global electricity consumption.
That is like a lot.
And the current k, the schedulers, they only prioritize on like
performance and availability of the clusters, but they don't like
schedule based on the sustainable, your carbon air infrastructure and
the cloud infrastructure often runs.
Sometimes, like there are like sta city instances that are just like lying around.
There is no proper usage that is being done and that is costing in many ways.
Eventually the cost wise, we are getting impacted.
Environment wise, we are getting impacted and also even availability wise.
So here is the solution that I was like thinking of or developing.
So basically even all Google also has come up with such APIs, I think before, but not
like a fully grown Kubernetes solution.
They were just trying to see how they can do the, how they can work,
how they can see the carbon air scheduling can be done on the pots.
And our solution is that it's a carbon.
Scheduling.
There are four major things to it.
Firstly it integrates with the carbon intensity data APIs.
Like there, there might be so many data data sources from where like we
can integrate our Kubernetes with such APIs and then try to gather like real
time information like what is the what is the, whether it is, whether the.
Energy sources like wind energy or like coal energy or hydro energy, we will
be able to identify what kind of energy it is so that real time car carbon
awareness and predictive energy modeling.
So we'll be deploying like a machine learning models, which will
forecast the energy consumption.
Say supposedly I, if I have like batch workloads, how much energy consumption,
if I have a very critical production workloads, how much energy consumption,
if I have just like a simple standalone jobs, what is the energy consumption?
They predict that particular energy consumption using the ML models and
there is rust powered performance.
Framework where like the critical components implemented in the rust, they
deliver like memory safe, concurrent processing architectures like vary
with minimal overhead, ensuring like both sustainability and they
don't compromise on any performance.
So this rust framework has been like coming out a lot and like many people
are trying to play around with it on implementing the best strategy possible.
The next one is the.
SLA SLAs.
Yeah.
We all our applications, they have to meet what the SLA service level
agreements and like crossing them or not agreeing to that may cost a lot
and million millions for the companies.
So when you are like building and algorithms you have to be
like really aware of like your applications, SLA agreement.
And based on that, you have to design your system.
The Chrome, the core framework architecture here, mainly we have
like carbon aware scheduler, workload scheduler, and metrics collector.
So what this carbon aware scheduler mainly does is that firstly it'll try.
This is like completely rust based and it will try to replace
the traditional K scheduler.
The KS will have its own scheduler based on the replica set deployment,
like whatever we define, it'll try to schedule the bots.
So this is more like car it's like next version of it, say Carbon Air scheduler
that optimizes, and then it tries to integrate with the carbon data sources.
The EPS that I was talking before.
And then it, based on that, it will try to schedule the workloads.
The next one is the workload analyzer.
Yeah.
As I was telling before this specifically, is that you have to understand like
when your workload is like running, you have to understand what is its
energy consumption, like what kind of energy consumption does it need.
So based on that, you can actually categorize.
Your workloads, whether it is like batch crossing, so and
different kind of workloads.
And you have to also estimate the energy that it takes.
So this is such kind of like workload analyzer.
And the third one is the metrics collector.
Metrics collector is mainly you want to understand like
how it is performing, right?
What is the CPU memory usages, like what is the energy consumption
and like which type of the.
Energy source that it is that it is using.
So there are different kind of metrics that you can categorize it.
So you know that this particular workload actually has utilized
the energy source from the hydro.
This particular workload has energy source that is utilized from the coal.
This particular workload has used from solar.
This particular has used different kind wind energy.
So different kinds of different kinds of sources so that you know that you want to.
Schedule those critical workloads on the, or say, suppose if you have like very
high availability or high performance workloads, you can schedule them like
on a mediocre or like high level cost ones considering like they'll always
be available, but some batch workloads or some like minimal, not needed, not
so important non-critical workloads.
You can use them on the low energy consumption workload, so
that way you can save the cost.
And the next one is the rust implementation.
I don't have to talk much anything about it that, why did we pick rust for the
carbon air computing because it has memory safety without any garbage collection.
It has predictable performance.
It has the concurrency, it does like load source footprint and
it has compiled and guarantee.
So there are like lots of benefits using the rust.
The next one is the machine learning models for energy prediction.
So what kind of machine learning models actually predict the energy
consumption of the workloads?
So first one is the gradient decision.
Trees like the gradient boosting algorithm.
The next is RNs.
You can use them to analyze the temporal patterns inside the workloads, to
predict the what is the future energy consumption trends, how much the
energy is needed, and all of that.
The last thing is the reinforcement learning, where it'll continuously
take the feedback loops.
And then accordingly, we'll try to improve the, improve towards the carbon reduction
or lesser energy consumption techniques.
RL is also included here.
The next thing is the carbon aware scheduling algorithms.
So what kind of algorithms together make up that, I make
up that core of the system.
The first one is the carbon data integration, where you have to
integrate the carbon intensity APIs or data sources so that you understand
where that energy is coming from.
And the next one is the workload classification.
You can try to classify them as like batch workloads or, the regular
non-critical workloads or a, the stateless workloads, based on the API
calls, some workloads get triggered so you can classify them accordingly.
The next is, the next thing is the temporal optimization.
One, these different workloads are identified and they're potentially
rescheduled to execute during the periods of like low carbon intensity or
higher renewable energy, a availability.
So there is like spatial, there is like temporal optimization.
Spatial optimization, where you have like different workloads
and non-deal workloads.
These non-deal workloads are assigned to the nodes, like where the
lowest carbon emissions take place.
Probably different workloads are assigned to the higher renewable energy category,
like saving wind energy or solar energy.
So you can categorize based on that resource efficiency.
Definitely they can help to improve the.
They can help to improve the resources used on the node.
So that improves the availability.
Eventually it'll improve the cost, definitely, and it'll
reduce the energy consumption.
So these five algorithms are like really core.
You have to integrate to the carbon air APIs, you have to
do the workload scheduling.
You have to understand how much energy is needed for your workloads.
That is done with the machine learning algorithms.
They'll give you that prediction based on that.
Whatever the scheduler that we have defined, it'll automatically route the.
Route the workloads onto the specific types.
Say, suppose I have three or four types where this node is this node
is hosted on, this this node is coming from an energy source, wind.
This is coming from solar.
This is coming from coal, this is coming from hydra.
So like different source of energy.
So based on the priority or the scheduler is like intelligent enough to
understand, Hey, this has to go here.
This is critical.
This has to go here.
So that carbonate scheduling algorithms are defined and as usual coming
to the next part, like integration with the Kubernetes ecosystems.
We usually have scheduler, we have a metrics server.
We have custom resource definitions, and there is like Kubernetes operator.
So all of these are ities, but there will be a. A bit more enhancements
that are that are there and they you can like, try to implement and plug
in these to the existing Kubernetes.
Systems.
Coming to the next slide, it's about the deployment and implementation
strategy, which is the critic key critical part, like whenever you're
trying to implement these or give them or techniques in your own environments.
The first thing is that you definitely have to pick the non-critical workloads,
like when you are hosting them there.
Begin with like batch processing jobs or non-time sensitive workloads that can be
easily shifted to the low carbon periods.
So you can implement this through your like CSCD pipelines.
And then suppose there are any like data processing jobs, you can implement
that and collect any baseline metrics.
You can do that so that you will understand like how much are you able
to actually do that shifting towards the carbon air scheduling or not.
The next thing is the production monitoring extent the product, extent
to the production workloads as well, but in the only monitoring only mode.
So that so that you are not directly enabling the carbon aware scheduling, but
you're just like putting there enough like in the monitoring mode and then you're
analyzing like what is the energy this particular thing is needed, or, or or if
you integrate with the carbon sources, you at least understand the understand the
limitations or what is the kind of what is the kind of scale that it needs to go
which workloads has to go where you at least analyze that and collect all that
data and metrics in the second phase.
The third phase is you actually schedule the workloads, like you
enable the carbon air scheduling for some stateless services, for some
applications where there is still like little bit of leeway with the with the
SLAs, which are not like very critical.
Then you can configure these stateless applications.
With the carbon preferences, you can implement some canary deployments or the
rolling deployments, and then you can monitoring like how they're performing
or is there any breakage or is there any like interruptions that are occurring with
the workloads while they're scheduling.
So you can monitor all of that.
The next fourth phase is the full implementation.
Only after you have collected the metrics from both non-prod and production
environments, then you're able to.
Gather them together, sit with your teams, discuss, and then you can go
ahead for the full phase of production.
So say, suppose if this been implemented, then that will be definitely 40%
carbon reduction, considering you'll be moving to the renewable energy sources.
There will be some energy savings, like you're not scheduling too
many nos for a simple task, but you will go in a controlled fashion.
And then there will be cost reduction as well, like a 15% cost reduction as.
Definitely in all areas and case study.
So you, one of the like global financial companies or say Google, they're already
like starting up with such carbon air scheduling processes with the rust
in the backend and they're trying to like, move the noncritical workloads
to the higher renewable energy re resources to save some costs and then.
Even they're okay with like little interruptions and stuff.
I'm sure like that will not happen.
But even if it happens that is the case then that they're okay with it and they're
trying and like trying to improve the implementation as much as possible and try
to open source to other companies as well.
Yeah, so this is like a open source and ecosystem and the to framework,
the algorithms and everything has been like developed and then.
And trying to encourage like people from all over to even contribute to it as
well and the future research directions.
Yeah, you can have like hardware level integration.
You can have like edge computing, adaptions even.
You can implement this in your all edge computing devices like mobile phones or
in, if not essentially mobile phones or if there are any devices like onsite inside
the factories wherever they're located.
Some industry specific models also can be developed for all like big large scale
industries and global policy framework.
So still eu, US and like different countries around the world have not
proper policy has been like formulated that is still in like progress.
So yeah, you can get started today if you're interested.
Firstly, understand.
Like how you can see the carbon footprint data, how to integrate with those APIs,
how to understand like how much is the how the consumption patterns occur.
And implement.
Like the second step is to implement the non-destructive components.
Start with like observability and analysis tools so that you can, you'll
be able to understand like in your environment what is the impact, like if
you integrate them, like how would you actually benefit from it, and how you
can actually create a sustainability.
The third thing is you can try this carbon air scheduling the pilot
program and then try to get onto it.
Work it on your test cluster and then try to gather the metrics,
see if it is like helping or not.
Then you can like scale across your organizations to the different teams.
So all in all, I can say that this carbon air scheduling is definitely a benefit
now and in future and we are saving our earth for the future generations.
Yep.
Thank you.