Conf42 Site Reliability Engineering (SRE) 2024 - Online

From Error Budgets to Carbon Budgets: Empowering SREs for Eco-Efficiency

Abstract

Planet needs action, your IT holds the key!

SRE can define, monitor, and enforce Carbon budgets and push IT teams to cut emissions, save cash, and avoid greenwashing traps.

Summary

  • Pina Resnick is CEO and co founder of Resing. We are a startup and building carbon reduction platform. Today he will talk about carbon reduction in context of it and SRI specifically. The world is still emitting more and more carbon equivalents, which is 56 billion tons a year.
  • EU recently released new directive against greenwashing. Follow the sun is like in support. Migrating our loads after greener energy. And collecting data about carbon emissions. SRE is a process of maturing.
  • The next level of tooling is profiling. There are many ways to reduce the emissions. Auto scaling is always very important because otherwise we tend to over provision. It's not just about energy consumption, it's also about embodied carbon.
  • Bananas generate about 110 grams of CO2 each a spoon bowl. Cloud is about quarter of a percent of global electricity and quarter percent of total carbon footprint. I would recommend you to read the building green software to get a feel for how the carbon footprint is calculated.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, my name is Pina Resnick. I'm CEO and co founder of Resing. We are a startup and building carbon reduction platform. And today I'm going to talk about carbon reduction, carbon reduction in context of it and SRI specifically. First question, how bad are bananas? Now that very interesting book, I really recommend reading it about all the different emissions coming from carbon footprint, coming from different things. Since this is not an interactive presentation, so I'll just keep it in mind and I'll answer this question. Is this all climate change? It's the real thing, right? It's a stark question, although at this point it's answered thousand times. There is a consensus among scientists that the climate is changing. There is tons and tons of research and generally there is an agreement that if the temperature is changing, it will be really bad. And farther it goes, worse it will become. Right now the temperature increases about 1% or so, but if it will get to, let's say 8%, not percent, degrees celsius, if it gets to 4% it will be really bad. And if it gets to 8%, it would be properly terrible. And as we can expect, the richer countries or bigger, or countries with more industries and more cars and stuff, they emit more. So you can see that Europe, US and others, they actually emit more. But now China and India and others are coming as expected and join in the trend. A somewhat optimistic parameter is that actually the rich countries like US and European Union and others, they are actually significantly reducing their emissions for already quite a few years, for a couple of decades, the emissions in US and Europe and other countries are going down optimistic. But it's not good enough because other countries like India and China are growing. And that's why overall the world is still emitting more and more carbon equivalents, which is in total about 56 billion tons a year. An interesting thing is that you can see COVID in 2020 when we stopped traveling effectively entirely, and that had effective about 5% reduction in carbon emissions. But a year later we picked it up like it never happened. So we are still emitting more than before. COVID what the governments are doing, actually some of the governments doing a lot, specifically more than anyone, is EU and UK and some of the. And some of us. And the biggest agreement that we had was in 2015 in Paris, climate agreement. That's the commitment of the entire world. Most of the countries is to keep the climate change below 1.5%. And there's a bunch of other things that the governments are investing in, like sustainability, development goals and ESG. Those are different things. Each one has an element of climate change control, but they go beyond that. It also about poverty and clean water and food and all kind of things. And also ESG is also about social and governance. So those are different things, not just the environmental change. And also the environmental change also goes into things like plastics and other things that are bad for us but not for the climate. So there's quite a lot of things and there's quite a lot of confusion, but generally there's a lot of regulation coming from the governments. A very specific piece of regulation that is probably the most relevant for us here in Europe is EU green deal and specifically fit for 55, which is a commitment of EU all you countries to reduce carbon emissions by 55% by 2030, which will lead to massive demand on everyone in new emissions. That's the reporting part, which is called CSRD, which is now a requirement for major enterprises and gradually will become requirements for everyone. And of course, you can imagine that carbon emissions reporting is only for step towards actual reduction. Who needs the reporting? Who needs the numbers? Unless this is just the first step, it's just a baseline so we can set the reduction targets and then we will find on those who didn't actually reduce fast enough or carbon taxes to actually increase the cost of services and products that are carbon breach or emitting more emissions. We can expect significant increase in cost of not reducing carbon emissions. Let's try to understand a bit more. What are carbon emissions in it, where they come from? Generally IT industry is not really something people talk about, but actually IT industry needs more carbon equivalent gases than airwand industry, which is surprising because airwand industry is everywhere in all the news. But actually currently the IT industry emits about three to 5% of total gases, carbon equivalent gases, and it's expected to grow all the way to 14% in 2040. If we do nothing about that, because of AI and cloud, and effectively it becomes and central part of any other industry. There is almost no industry today that is not using significant amount of ag. So we will need to reduce, like I said earlier, we need to reduce by 50%, 50 55% within six years and by 90% within 25 years, while double going and delivering ten or hundred times more it services. This kind of challenge of do more with less is quite a big problem. And the emissions are generally divided into three scopes. The scope one is actually burning things that pre carbon emissions, like fuel in the car or when the cement is produced, or steel. There is also direct carbon emissions and then there is indirect scope to through the electricity, and then somebody else is producing electricity emitting those gases. And scope three is through indirect consumption. Consumption of services are good. For example, I buy cloud services from Amazon or Google or AWS or Microsoft. They give me compute services and they do all this scope one and two, and that becomes my scope three. And when we measure, so when we report on scope, on carbon emissions, we need to report on all three scopes. And when we reduce the emissions, we need to reduce on all three scopes, obviously, because what's the point? Yeah, what's the point of reducing one but not the others? Actually, there is a point, but it's not all of it. And we need to remember that my scope one and two is somebody else's scope three. So if I produce emissions, I'm not doing it for myself. I'm doing it to generate certain goods and services I sell to somebody else and not become scope three emissions for them. And the regulation is going into direction that everyone should report everything. It means that we will need to report very detailed, very accurate, both real time emissions to our customers, and we will need to collect from our supply chain. And it will be more like the nutrition value on the food, but it needs to be very accurate. It also needs to be detailed in real time so we can actually act on it. If it's not real time and it comes once a month or something like that, it's, it's not something you can use if there is no feedback loop, it's not something we can use to actually take actions and see the result. So what we really need from our software service or it service providers, we need them to report everything. And when I say everything, it needs to be not just the electricity they are consuming, but also embodied carbon in their hardware, but also the buildings, and also the cooling and heating and also everything else, all the emissions that they produce while they're generating the services for us that needs to be attributed to us. And then we can then calculate that and then send it over to our consumers and to collect all this together. The Green Software foundation created software carbon intensity specification. Greensoftware foundation is a nonprofit organization actually promoting green software. Right. Green culture, practices of sustainability. And it. The general idea of SCI is that there are four things. Here is e, which is energy consumed by piece of software. So when you run, you use cpu, ram, networking, storage and other things. So it needs energy, right? So that's the e. And we need to calculate the energy very precisely for that specific application. Then there is carbon emitted perfume. So basically, carbon intensity in the grid, we are consuming energy, which depends on the grid intensity. So in Poland, for example, 80% is coal, in Germany is 50%, and in Norway and Sweden, almost entirely green energy. M is the carbon emitted during production of the hardware. So the embodied carving, and that's per r, per unit of calculation. For example, a call to API or a tweet or some sort of calculation. So when I do that, I need to know for each of those calculations how much carbon emitted specifically for that calculation. And then that parameter is called wrong ways to do, to reduce emissions or not reduce emissions. The first one is greenwashing, which is basically talking about reducing emissions but not doing. A lot of companies are doing it, but generally governments really don't like it because it actually doesn't do anything. It's all about talking, not doing. It's about pretending and hiding from responsibility. You can find many different ways to do that. You can read more online if you are interested. But there is so many ways to hide emissions and pretend like it's okay. And generally, governments are introducing more and more directives and rules about eliminating that. So recently, a few weeks ago, EU released new directive against greenwashing. So you can just say that you agree you actually need to prove it. And they're finding the law for different bad things companies did to do it wrong is carbon offsets, and they're not always wrong. Carbon offset is when you buy, when you continue generating carbon emissions. But you buy offset, so you offset, you buy something that actually reduces carbon emissions. For example, somebody who plants trees or does others or captures the carbon and you pay them to do that. So you can put it in your accounting and pretend that you didn't emit those, those gases. Most of them are not real. Right. Really recommend to watch this very funny dialogue by John Oliver. It's about 25 or something minutes and it's actually explains the carbon offsets very well and it's very fun. So the other thing is follow the sun. This is like in support. When we have people around the globe they can support during the day, right? So around the sun, yeah. Similar principle is when there is sun, obviously there is more sun energy. So more greener energy mix becomes greener. So theoretically we can relocate our loads around different countries and then enjoy the benefits of green or basically carbon awareness. So greener energy mix, but it's not always value, but it's always really helping us because first it's very, it's not that easy to actually relocate services. It's quite a lot of work and we need to consider how much emissions we're going to generate while we actually doing this work. Then potentially we need capacity in different regions, and then there's embodied carbon and idle energy consumption. So potentially it may not even reduce the emissions. And then when we relocate to green areas, if everyone is relocating to Norway, maybe it will be okay for everyone, but then potentially the cost of the services in Norway will go up because everyone wants to be there to be greener. But then either it will be more expensive or Norway will not be able to generate green energy, sufficient amount of green energy, and they will start doing non green energy. So they're all different difficult questions to ask. But this strategy is not bad per se, but it's not. If you're not optimizing and just going after greener energy, you're ignoring embodied carbon and all kind of other things. That strategy, but it's not automatically a good strategy. So what should we do then? First, there's three major elements in this. All we can divide work into three major areas. One is energy efficiency, so basically, how we use less energy do our work, applications are running. Second is hardware efficiency. So how we reduce the amount of hardware to reduce the embodied carbon in that hardware. And the third is carbon awareness. That's the migrating to after greener energy, migrating our loads after greener energy. And not all of it should come, must come in the same time. So we cannot expect to just jump into being awesome and cream overnight. So it's a process. And although we need to act dramatically faster than we are doing now, there's still, it's impossible to do it overnight. But this is a great project also by grant software foundation maturity Matrix, which basically shows that it's a process of maturing. So there's quite a lot of things we can start with and then gradually improve over time. And the things that I would recommend doing now immediately, is organizational, is about acquiring knowledge and convincing the management to invest in this and these kind of things, and make it critical, strategic. And the second is collecting information, because in our experience, most companies don't really collect data about carbon emissions, or don't even know how to collect the data. How it all connects to and why SRE could be very valuable and helpful to actually achieve those goals is if we look into SRE, what's the actual purpose of it is the slight balance between the speed and reliability. And nowhere mentioned anything about sustainability. But yes, there is continuous improvement of reliability of the software and also of productivity of the developers, but the sustainability is not part of it. And we know that when we look into Sre, there's these four golden signals that are typically used to measure the SLE, the SLos and SLA's and Slis. They are almost based on latency, traffic errors and saturation. Those are the key elements, key golden signals, and they don't actually reflecting the sustainability. Right. So what we would recommend is to start using the fifth golden signal, which is SCi, the software carbon intensity, which I explained earlier. It is not so easy to calculate right now, but it's generally not. It's a very precise metric that can be calculated for each piece of software. It's the energy consumed by the applications multiplied by the energy mix in the grid, plus the embodied carbon per unit of calculation. And if we can attach to every single piece of application, we can attach the SCI. We can also start measuring and improving on it, in addition to those four other metrics, then it's very important to change our objectives, not just from speed of development and resilience of the application. So reliability of the applications, but also to keep our planet healthy, right. We have to keep reducing our emissions and we have to measure an SRE is positioned in precise place where they can have immediate impact. So if we can define the carbon emissions as the objective, then we can define SLA's and SLis in the terms of carbon intensity. And we can continuously measure emissions and then identify the anti patterns and reduce them. And the same as we would do with reliability metrics, we would systematically identify the waste and emissions and then would systematically reduce them. I already said that aspect measurements are very difficult, but there are some measurement tools that can help, and that's the beginning. And we expect those tools to expand the new tools to appear. An example is Kubernetes Kepler. So Kepler, which is a tool that is managing, it's connecting information about energy consumption within Kubernetes and export it through Prometheus. So it's not exactly showing the carbon emissions, but it shows the energy consumed based on different physics, which is much more accurate and much more definitely real time. And it can be integrated with graphamin and normal monitoring systems. You can actually see the energy consumed by applications within your clusters. Another one is scaffolder, which is also which is focusing on mostly on the bare metal and hardware through the kernel codes. There are all kind of details you can find scafander. Both of them are open source projects, and both of them are somewhat advanced, but not yet covering everything. In Vasync, we are working on a project called ether, which is carbon observability platform. Sorry. It's carbon observability tool that also integrates with Kepler, but also directly looks into virtual machines and other things in the cloud and on Prem. And it can generate detailed view on energy consumed and embodied carbon emissions and all converted into carbon emissions based on the actual energy grid. So we're also going and checking with tools like energy mach to understand the current energy mix in the grid. So in any given moment. So this way we can collect information in real time, accurate detail and real time information about carbon emissions, different it systems on Prem and the cloud, and then show it to the users. This one is also open source. You can go to GitHub and encourage you to give it a star and continue to ask questions. The next level of tooling is profiling. So there are quite few different profiling tools that also can. So there is a way to convert effectively every line of code into energy consumption, and then again converts it to carbon emissions. And you can find those two really good talks, one by Ng and one by Firefox. So you can see this presentation and the links will be available in the presentation. But those talks are going into detail of really debugging or profiling the application line by line, and seeing each line, how much emissions it produce, and then figuring out which algorithms or way of coding or different data structures, how they affect sustainability. Of course there is different programming languages. Each one of them is different. Some, obviously the interpreted ones, are heavier than compiled ones. This is not very up to date, but it's not just about languages, but it's also about different tools like databases and everything else. So each one of them would have different carbon footprint. And if we have a general understanding how much each of the tools in our system contributes to the emissions, then we can start taking actions. So there are many ways to reduce the emissions. There is the most obvious ones, it's just reduce waste. So kill the zombies, descale your clusters, they just shut down things that you're not using. Some of the clouds are greener than on prem, or maybe the other way around. It depends. It needs to be measured. But there are many ways to improve the energy consumption. On energy mix. Auto scaling is always very important because otherwise we tend to over provision and then just run unneeded resources. There is a big difference between x 86 and our own energy concern. And those are just examples of different patterns that can be quite easily improved. For example, we say if you have in the same cluster, very stable load, with some spiky cron jobs or some white cron jobs, then those things could be separated and run in different ways, or there are different strategies to deal with it. But basically if you separate the loads, or if you pre provision or dynamically provision the hardware for those spikes, you can often reduce the size of a cluster, sometimes by 50%, with no impact on the actual applications. We need to make sure that we only request the needed amount of resources. CPU and RAM are the most energy consuming components. We need to make sure we don't request more than we need. There are all kinds of different things to optimize, like container images, jvm settings, not to grab too much memory, not to build massive images. All of that translates into more energy consumption, which translates to more carbon emissions. What we see often that people deploy too many test environments. I understand this is very. This is what continuous delivery and continuous integration require. But actually it's not, right? So if you provision an environment on each pr, on each commit, and you don't really use or on tests on it, then what's the point? And maybe it doesn't cost you much because you already pre committed to those servers, but it still consumes energy. And if you provision those things, yeah, you're just emitting more current. Or another example is if your business doesn't require higher availability, don't do it. Because that will mean not just additional cost but also additional emissions. Right? It means more is more redundancy, more different zones, all kind of different things. And there is different tools, different databases, some of them are heavier than others. Do you really need something like Cassandra? If you can manage with a small postgres and stuff? It very much depends on the use cases and there's a lot more. And also there is strengths in paging software foundation, also quite useful. And we always seem to remember it's not just about energy consumption, it's also about embodied carbon. This is an example of a Dell server which is over 7000 tons of emissions over the course of their life. It's emitted before it actually was sold, right? So it's the embodied carbon which when you buy the server, it's already emitted. Now the question for how long you will keep it? You keep it for four years, then it's one and a half tons a year. If you keep it for ten years, it's 700 kilogram per year. It's a big difference. And generally it also costs you money to buy new servers. Of course there's also a bit of balance. Like newer servers are a bit more energy efficient. So it's still not black and white. We need to understand in detail the energy consumption and budget carbon, and then make the decisions based on combination. So in this case, about 17% of lifestyle emissions will be from embodied carbon in case of this specific server on average. And another example, Amazon recently decided to extend the lifespan of their service to six years. And that saved them not just a lot of emissions, but also $900 million. In our experience, we can expect quite reasonably easily reduce the emissions by at least 50% just by fixing infrastructure waste, removing infrastructure waste, and then potentially another 20 30% by optimizing the software, which is. And then more, by using more efficient tooling and replacing them with less resource intense tools. That's about it. Just answering the questions about the question about the bananas. The bananas are actually not very bad. Bananas are at single banana average. It depends if it was flying or not. There are all kind of different considerations, but generates about 110 grams of CO2 each a spoon bowl. Bananas are not that bad. Couple other examples. So an email, it goes from 0.3 grams on a spam email that no one opens. It is still a bit of emissions, because there is brick sending and stuff all the way up to 26 grants. If you send it to, it takes you ten minutes to write some screen time, and you send it to 100 people and they open it and they look at it for 3 seconds. So again, all the servers around and screen time and everything, it means that when you send things to more people, then you produce, generate more missions. So you need to consider, every time you do things, there is impact, especially if you do it on scale. A typical laptop and actually laptop or a phone. So both of them generate about. Actually both of them both. 83 84% of emissions on average would be embodied emissions and not energy consumed. So the energy is actually only 15% and then only 2%. And the services around the Gmail and networking and whatever is about 15% is an interesting thing. So people think there is no emission, but actually there is embodied emissions in the bike itself. And also we, when we bike, we expand, we use our own energy. So if we bananas, it's 40 grams/mile how much? 4.7 kilogram air freighted asparaguses. Right? So that's how much energy we expand to go a mile on the bike. If we compensate it with elephant sporagosis, that's that much. 4.6 hundred times more. It basically shows that asparaguses are the worst. Generally speaking, beef is the worst meat. And then in terms of emissions, and then one, and then chicken, and generally the grown food is less impactful cryptocurrencies is a big deal. They are 0.12% of total world carbon emissions and it doesn't sound a lot, but actually considering that this is not really essential, like all this cryptocurrency mining is quite stupid way to create. I'm not expressing my opinion about cryptocurrency or blockchain technologies, but in general we are wasting a lot of money, resources and generating a lot of emissions. Cloud altogether is about quarter of a percent. It's 1% of global electricity and quarter percent of total carbon footprint and in total is 1.4 billion tons. Right. So it's three to 5%. Depends how we can grow. And I actually recommend you to read these books. There's how bad are bananas? Really good book. To get a feeling how the carbon footprint is calculated. The best and probably the only technical book around green software is by Ann Sara and Sara building green software. It's a great book. It was released just a few weeks ago and then the other books are pilgrim still probably the best book on high level and not the end of the world. I really recommend you if you want to have optimistic and realistic feel. And the last one is just an interesting one. It's not very easy about. You'll see. Look into it. It's an interesting one, but not like the others. Thank you for listening. And the last thing I would say is that if you think what's next? What can I do next? I would recommend you to read the building green software. I think that's an excellent book that will give you a good overview of on the technical level and start measuring. Try to at least roughly understand what are the emissions generated by your systems. Thank you and please get in touch if you have more questions. Okay.
...

Pini Reznik

Co-founder & CEO @ re:cinq

Pini Reznik's LinkedIn account Pini Reznik's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways