Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Hello everyone.
            
            
            
              So I'm here to talk Mo mostly about how we can do the carbon aware
            
            
            
              Kubernetes scheduling instead of the traditional Kubernetes scheduling.
            
            
            
              You might be either using E-K-S-A-K-S or GKE, like Google
            
            
            
              Kubernetes engine, whatever the cloud platform Kubernetes you're
            
            
            
              using, or any bare metal Kubernetes.
            
            
            
              I think the future is that we have to be, aware of building
            
            
            
              the sustainable infrastructure and also not harming the planet.
            
            
            
              I've been like researching about that and I've come up with a framework which
            
            
            
              will talk about what can be implemented to improve the cost effectiveness,
            
            
            
              sustainability, and even environmental friendly infrastructure as a whole.
            
            
            
              So basically the current challenge is that, that there is like lot of cloud
            
            
            
              computing that is happening, like with AI booming up and machine learning
            
            
            
              models are being training on that.
            
            
            
              Lots and lots of cloud EC2 resources and e case resources are being used.
            
            
            
              So the traditional Kubernetes resource management, usually like how it happens,
            
            
            
              like there is a scheduler and the scheduler based on the request that it
            
            
            
              has based on the deployments that we have.
            
            
            
              It usually schedules the pods and whatever the workloads that we
            
            
            
              have, they usually run on the pos.
            
            
            
              But the key issues that here are.
            
            
            
              Here are that the data center account for approximately 1% of
            
            
            
              the global electricity consumption.
            
            
            
              That is like a lot.
            
            
            
              And the current k, the schedulers, they only prioritize on like
            
            
            
              performance and availability of the clusters, but they don't like
            
            
            
              schedule based on the sustainable, your carbon air infrastructure and
            
            
            
              the cloud infrastructure often runs.
            
            
            
              Sometimes, like there are like sta city instances that are just like lying around.
            
            
            
              There is no proper usage that is being done and that is costing in many ways.
            
            
            
              Eventually the cost wise, we are getting impacted.
            
            
            
              Environment wise, we are getting impacted and also even availability wise.
            
            
            
              So here is the solution that I was like thinking of or developing.
            
            
            
              So basically even all Google also has come up with such APIs, I think before, but not
            
            
            
              like a fully grown Kubernetes solution.
            
            
            
              They were just trying to see how they can do the, how they can work,
            
            
            
              how they can see the carbon air scheduling can be done on the pots.
            
            
            
              And our solution is that it's a carbon.
            
            
            
              Scheduling.
            
            
            
              There are four major things to it.
            
            
            
              Firstly it integrates with the carbon intensity data APIs.
            
            
            
              Like there, there might be so many data data sources from where like we
            
            
            
              can integrate our Kubernetes with such APIs and then try to gather like real
            
            
            
              time information like what is the what is the, whether it is, whether the.
            
            
            
              Energy sources like wind energy or like coal energy or hydro energy, we will
            
            
            
              be able to identify what kind of energy it is so that real time car carbon
            
            
            
              awareness and predictive energy modeling.
            
            
            
              So we'll be deploying like a machine learning models, which will
            
            
            
              forecast the energy consumption.
            
            
            
              Say supposedly I, if I have like batch workloads, how much energy consumption,
            
            
            
              if I have a very critical production workloads, how much energy consumption,
            
            
            
              if I have just like a simple standalone jobs, what is the energy consumption?
            
            
            
              They predict that particular energy consumption using the ML models and
            
            
            
              there is rust powered performance.
            
            
            
              Framework where like the critical components implemented in the rust, they
            
            
            
              deliver like memory safe, concurrent processing architectures like vary
            
            
            
              with minimal overhead, ensuring like both sustainability and they
            
            
            
              don't compromise on any performance.
            
            
            
              So this rust framework has been like coming out a lot and like many people
            
            
            
              are trying to play around with it on implementing the best strategy possible.
            
            
            
              The next one is the.
            
            
            
              SLA SLAs.
            
            
            
              Yeah.
            
            
            
              We all our applications, they have to meet what the SLA service level
            
            
            
              agreements and like crossing them or not agreeing to that may cost a lot
            
            
            
              and million millions for the companies.
            
            
            
              So when you are like building and algorithms you have to be
            
            
            
              like really aware of like your applications, SLA agreement.
            
            
            
              And based on that, you have to design your system.
            
            
            
              The Chrome, the core framework architecture here, mainly we have
            
            
            
              like carbon aware scheduler, workload scheduler, and metrics collector.
            
            
            
              So what this carbon aware scheduler mainly does is that firstly it'll try.
            
            
            
              This is like completely rust based and it will try to replace
            
            
            
              the traditional K scheduler.
            
            
            
              The KS will have its own scheduler based on the replica set deployment,
            
            
            
              like whatever we define, it'll try to schedule the bots.
            
            
            
              So this is more like car it's like next version of it, say Carbon Air scheduler
            
            
            
              that optimizes, and then it tries to integrate with the carbon data sources.
            
            
            
              The EPS that I was talking before.
            
            
            
              And then it, based on that, it will try to schedule the workloads.
            
            
            
              The next one is the workload analyzer.
            
            
            
              Yeah.
            
            
            
              As I was telling before this specifically, is that you have to understand like
            
            
            
              when your workload is like running, you have to understand what is its
            
            
            
              energy consumption, like what kind of energy consumption does it need.
            
            
            
              So based on that, you can actually categorize.
            
            
            
              Your workloads, whether it is like batch crossing, so and
            
            
            
              different kind of workloads.
            
            
            
              And you have to also estimate the energy that it takes.
            
            
            
              So this is such kind of like workload analyzer.
            
            
            
              And the third one is the metrics collector.
            
            
            
              Metrics collector is mainly you want to understand like
            
            
            
              how it is performing, right?
            
            
            
              What is the CPU memory usages, like what is the energy consumption
            
            
            
              and like which type of the.
            
            
            
              Energy source that it is that it is using.
            
            
            
              So there are different kind of metrics that you can categorize it.
            
            
            
              So you know that this particular workload actually has utilized
            
            
            
              the energy source from the hydro.
            
            
            
              This particular workload has energy source that is utilized from the coal.
            
            
            
              This particular workload has used from solar.
            
            
            
              This particular has used different kind wind energy.
            
            
            
              So different kinds of different kinds of sources so that you know that you want to.
            
            
            
              Schedule those critical workloads on the, or say, suppose if you have like very
            
            
            
              high availability or high performance workloads, you can schedule them like
            
            
            
              on a mediocre or like high level cost ones considering like they'll always
            
            
            
              be available, but some batch workloads or some like minimal, not needed, not
            
            
            
              so important non-critical workloads.
            
            
            
              You can use them on the low energy consumption workload, so
            
            
            
              that way you can save the cost.
            
            
            
              And the next one is the rust implementation.
            
            
            
              I don't have to talk much anything about it that, why did we pick rust for the
            
            
            
              carbon air computing because it has memory safety without any garbage collection.
            
            
            
              It has predictable performance.
            
            
            
              It has the concurrency, it does like load source footprint and
            
            
            
              it has compiled and guarantee.
            
            
            
              So there are like lots of benefits using the rust.
            
            
            
              The next one is the machine learning models for energy prediction.
            
            
            
              So what kind of machine learning models actually predict the energy
            
            
            
              consumption of the workloads?
            
            
            
              So first one is the gradient decision.
            
            
            
              Trees like the gradient boosting algorithm.
            
            
            
              The next is RNs.
            
            
            
              You can use them to analyze the temporal patterns inside the workloads, to
            
            
            
              predict the what is the future energy consumption trends, how much the
            
            
            
              energy is needed, and all of that.
            
            
            
              The last thing is the reinforcement learning, where it'll continuously
            
            
            
              take the feedback loops.
            
            
            
              And then accordingly, we'll try to improve the, improve towards the carbon reduction
            
            
            
              or lesser energy consumption techniques.
            
            
            
              RL is also included here.
            
            
            
              The next thing is the carbon aware scheduling algorithms.
            
            
            
              So what kind of algorithms together make up that, I make
            
            
            
              up that core of the system.
            
            
            
              The first one is the carbon data integration, where you have to
            
            
            
              integrate the carbon intensity APIs or data sources so that you understand
            
            
            
              where that energy is coming from.
            
            
            
              And the next one is the workload classification.
            
            
            
              You can try to classify them as like batch workloads or, the regular
            
            
            
              non-critical workloads or a, the stateless workloads, based on the API
            
            
            
              calls, some workloads get triggered so you can classify them accordingly.
            
            
            
              The next is, the next thing is the temporal optimization.
            
            
            
              One, these different workloads are identified and they're potentially
            
            
            
              rescheduled to execute during the periods of like low carbon intensity or
            
            
            
              higher renewable energy, a availability.
            
            
            
              So there is like spatial, there is like temporal optimization.
            
            
            
              Spatial optimization, where you have like different workloads
            
            
            
              and non-deal workloads.
            
            
            
              These non-deal workloads are assigned to the nodes, like where the
            
            
            
              lowest carbon emissions take place.
            
            
            
              Probably different workloads are assigned to the higher renewable energy category,
            
            
            
              like saving wind energy or solar energy.
            
            
            
              So you can categorize based on that resource efficiency.
            
            
            
              Definitely they can help to improve the.
            
            
            
              They can help to improve the resources used on the node.
            
            
            
              So that improves the availability.
            
            
            
              Eventually it'll improve the cost, definitely, and it'll
            
            
            
              reduce the energy consumption.
            
            
            
              So these five algorithms are like really core.
            
            
            
              You have to integrate to the carbon air APIs, you have to
            
            
            
              do the workload scheduling.
            
            
            
              You have to understand how much energy is needed for your workloads.
            
            
            
              That is done with the machine learning algorithms.
            
            
            
              They'll give you that prediction based on that.
            
            
            
              Whatever the scheduler that we have defined, it'll automatically route the.
            
            
            
              Route the workloads onto the specific types.
            
            
            
              Say, suppose I have three or four types where this node is this node
            
            
            
              is hosted on, this this node is coming from an energy source, wind.
            
            
            
              This is coming from solar.
            
            
            
              This is coming from coal, this is coming from hydra.
            
            
            
              So like different source of energy.
            
            
            
              So based on the priority or the scheduler is like intelligent enough to
            
            
            
              understand, Hey, this has to go here.
            
            
            
              This is critical.
            
            
            
              This has to go here.
            
            
            
              So that carbonate scheduling algorithms are defined and as usual coming
            
            
            
              to the next part, like integration with the Kubernetes ecosystems.
            
            
            
              We usually have scheduler, we have a metrics server.
            
            
            
              We have custom resource definitions, and there is like Kubernetes operator.
            
            
            
              So all of these are ities, but there will be a. A bit more enhancements
            
            
            
              that are that are there and they you can like, try to implement and plug
            
            
            
              in these to the existing Kubernetes.
            
            
            
              Systems.
            
            
            
              Coming to the next slide, it's about the deployment and implementation
            
            
            
              strategy, which is the critic key critical part, like whenever you're
            
            
            
              trying to implement these or give them or techniques in your own environments.
            
            
            
              The first thing is that you definitely have to pick the non-critical workloads,
            
            
            
              like when you are hosting them there.
            
            
            
              Begin with like batch processing jobs or non-time sensitive workloads that can be
            
            
            
              easily shifted to the low carbon periods.
            
            
            
              So you can implement this through your like CSCD pipelines.
            
            
            
              And then suppose there are any like data processing jobs, you can implement
            
            
            
              that and collect any baseline metrics.
            
            
            
              You can do that so that you will understand like how much are you able
            
            
            
              to actually do that shifting towards the carbon air scheduling or not.
            
            
            
              The next thing is the production monitoring extent the product, extent
            
            
            
              to the production workloads as well, but in the only monitoring only mode.
            
            
            
              So that so that you are not directly enabling the carbon aware scheduling, but
            
            
            
              you're just like putting there enough like in the monitoring mode and then you're
            
            
            
              analyzing like what is the energy this particular thing is needed, or, or or if
            
            
            
              you integrate with the carbon sources, you at least understand the understand the
            
            
            
              limitations or what is the kind of what is the kind of scale that it needs to go
            
            
            
              which workloads has to go where you at least analyze that and collect all that
            
            
            
              data and metrics in the second phase.
            
            
            
              The third phase is you actually schedule the workloads, like you
            
            
            
              enable the carbon air scheduling for some stateless services, for some
            
            
            
              applications where there is still like little bit of leeway with the with the
            
            
            
              SLAs, which are not like very critical.
            
            
            
              Then you can configure these stateless applications.
            
            
            
              With the carbon preferences, you can implement some canary deployments or the
            
            
            
              rolling deployments, and then you can monitoring like how they're performing
            
            
            
              or is there any breakage or is there any like interruptions that are occurring with
            
            
            
              the workloads while they're scheduling.
            
            
            
              So you can monitor all of that.
            
            
            
              The next fourth phase is the full implementation.
            
            
            
              Only after you have collected the metrics from both non-prod and production
            
            
            
              environments, then you're able to.
            
            
            
              Gather them together, sit with your teams, discuss, and then you can go
            
            
            
              ahead for the full phase of production.
            
            
            
              So say, suppose if this been implemented, then that will be definitely 40%
            
            
            
              carbon reduction, considering you'll be moving to the renewable energy sources.
            
            
            
              There will be some energy savings, like you're not scheduling too
            
            
            
              many nos for a simple task, but you will go in a controlled fashion.
            
            
            
              And then there will be cost reduction as well, like a 15% cost reduction as.
            
            
            
              Definitely in all areas and case study.
            
            
            
              So you, one of the like global financial companies or say Google, they're already
            
            
            
              like starting up with such carbon air scheduling processes with the rust
            
            
            
              in the backend and they're trying to like, move the noncritical workloads
            
            
            
              to the higher renewable energy re resources to save some costs and then.
            
            
            
              Even they're okay with like little interruptions and stuff.
            
            
            
              I'm sure like that will not happen.
            
            
            
              But even if it happens that is the case then that they're okay with it and they're
            
            
            
              trying and like trying to improve the implementation as much as possible and try
            
            
            
              to open source to other companies as well.
            
            
            
              Yeah, so this is like a open source and ecosystem and the to framework,
            
            
            
              the algorithms and everything has been like developed and then.
            
            
            
              And trying to encourage like people from all over to even contribute to it as
            
            
            
              well and the future research directions.
            
            
            
              Yeah, you can have like hardware level integration.
            
            
            
              You can have like edge computing, adaptions even.
            
            
            
              You can implement this in your all edge computing devices like mobile phones or
            
            
            
              in, if not essentially mobile phones or if there are any devices like onsite inside
            
            
            
              the factories wherever they're located.
            
            
            
              Some industry specific models also can be developed for all like big large scale
            
            
            
              industries and global policy framework.
            
            
            
              So still eu, US and like different countries around the world have not
            
            
            
              proper policy has been like formulated that is still in like progress.
            
            
            
              So yeah, you can get started today if you're interested.
            
            
            
              Firstly, understand.
            
            
            
              Like how you can see the carbon footprint data, how to integrate with those APIs,
            
            
            
              how to understand like how much is the how the consumption patterns occur.
            
            
            
              And implement.
            
            
            
              Like the second step is to implement the non-destructive components.
            
            
            
              Start with like observability and analysis tools so that you can, you'll
            
            
            
              be able to understand like in your environment what is the impact, like if
            
            
            
              you integrate them, like how would you actually benefit from it, and how you
            
            
            
              can actually create a sustainability.
            
            
            
              The third thing is you can try this carbon air scheduling the pilot
            
            
            
              program and then try to get onto it.
            
            
            
              Work it on your test cluster and then try to gather the metrics,
            
            
            
              see if it is like helping or not.
            
            
            
              Then you can like scale across your organizations to the different teams.
            
            
            
              So all in all, I can say that this carbon air scheduling is definitely a benefit
            
            
            
              now and in future and we are saving our earth for the future generations.
            
            
            
              Yep.
            
            
            
              Thank you.