Conf42 Site Reliability Engineering 2022 - Online

The State of DevOps and Observability in 2022

Video size:

Abstract

There are many opinions on DevOps, open source, and observability, but what is actually being practiced? What can we learn from the collective experience of the community? We went and surveyed over 1000 engineers across the globe about their DevOps practices, challenges, and more, with special focus on enterprise observability. This session will share data and insights from the survey, with key trends (compared to previous years’ DevOps Pulse surveys), points of interest, and challenges that developers experience on a daily basis.

This session will help you learn from the collective experience and emerging best practices in the community, to help guide decisions on processes, tooling and architecture choices.

The survey analyzes topics such as:

  • What are your challenges with running Kubernetes in production?
  • How long does it take to troubleshoot production issues?
  • Which tools do you use for ticketing, event correlation and notifications?
  • Who is responsible for ensuring observability?
  • How do enterprises handle shared services? And much more.

Summary

  • Dotan Horowitz is the principal developer advocate at Logs IO. He'll talk about the state of DevOps and observability in 2022. Use the data from DevOps Pulse, a questionnaire that people like you answer.
  • AWS has increased its market share from 66% last survey to 71 this time around. Azure and Google Cloud have significantly bumped up their adoption. But still, they're still behind AWS quite significantly. Perhaps vMware is showing some strong signs.
  • Over 50% of startups said that more than half of their apps are containerized. 30% said that over three four of their Apps are containerization. Not just young startups, but also enterprises with 5000 employees or more. It's definitely happening across the board.
  • People reported challenges across the board. The top difficulties were with security and with monitoring and troubleshooting. People also reported issues with networking and cluster management and storage. In short, everyone's moving to containers, but still don't know how to manage them.
  • Logging metrics are still the most common. Nearly half of the companies doing something with distributed tracing. One fifth of the people use all of the above. 70% are planning to start using distributed tracing in the coming one or two years.
  • Survey: 68% say they're getting better with MTTR, the meantime to resolution or meantime to recovery. But the actual MTTR numbers are less optimistic. Nearly two thirds of the people take more than an hour to reach the full recovery. Top challenges were with security and with monitoring and troubleshooting.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Um, hello everyone. Glad to be back at Sreconf. Went 22 and thanks for inviting me again this years to speak. I hope that that means that I wasn't too boring last year round. And this year I'd like to talk to you about the state of DevOps and observability in 2022. I'm going to use the data from DevOps Pulse. This is the years survey that we run at blogs IO, my company. Essentially it's a questionnaire that people like you answer. Over 1000 people answered the last surveys from various companies, all the way from startups with dozen employees to enterprises with 5000 employees or more from different countries, different industries. First, I'm really glad to say I was very enthusiastic to see that on the gender diversity side, we're improving. On the last survey I said, and I shared, that 86% of those who answered were male. And this year I'm glad to say that only 79% are male and we have 15% female. So encouraging. And also you got the first stat of the survey. The survey covers many areas. In this very short talk, I'd like to use the coming minutes to look into some common assumptions around cloud, cloud native and DevOps on what people use, what the issues are, what the solutions they used, and check how these assumptions hold true in light of the results. A word about myself. My name is Dotan Horowitz. I'm the principal developer advocate at Logs IO. At logs IO, we provide a cloud native observability platform that's based on popular open source stacks such as elasticsearch, OpenSearch processes, Jaeger, open telemetry and so on. I've been around for quite some time, both as a developer, a solutions architect, a product manager. I'm also an advocate of open source software, open standards and communities in general, and the CNCF, the cloud Native computing Foundation in particular. I co organize the local CNCF chapter in Tel Aviv. So if you're around, do join one of our monthly meetups. I also run a podcast called Open Observability talks, so if you're interested in open source DevOps observability, do check it out on all your favorite podcast apps. And in general, you can find me everywhere at Horvitz. So if you are treating anything interesting out of this talk, do feel free to tag me. And let's go straight to the first assumption. Everybody is in AWS. What do you think? True false. It was very clear on the survey, AWS still rules. And in fact, in this survey, AWS has increased its market share from 66% last survey to 71 this time around who run there in AWS. Also, Azure and Google Cloud have significantly bumped up their adoption, from around 1112 percent on the last survey to around 30 this time, as you can see. In fact, last time Azure was second place on this survey, Google Cloud runs second place. So 32 on the Google cloud and 29% on Azure, as you can see. And still, we need to remember that they're still behind AWS quite significantly, and most of the rest are pretty much non existent. Perhaps vMware, as you can see, is showing some strong signs. So that's about public clouds and cloud infrastructure. And let's go on to the next assumption. Everything is containerized. What do you think? True false. Ah, it's definitely happening. Over 50% said that more than half of their apps are containerized, and more impressive is that 30% said that over three four of their apps are containerized. So that's pretty very impressive. And do remember, it's not just young startups who answer this survey, but also enterprises with 5000 employees or more. So it's definitely happening across the board. And if we talk about containerization, then obviously the next topic is kubernetes. So let's go on to the next assumption. Kubernetes. A piece of cake. Just give me Yaml and I'll manage your containers. What do you think? True false. Not really true. People reported challenges across the board. The top difficulties that people reported in this survey were with security and with monitoring and troubleshooting. But people also reported issues with networking and cluster management and storage. You name it, you can see that here on the screen. In short, everyone's moving to containers, as we've seen, but still don't know how to manage them, how to do it right. Simple production ready. And as we said, monitoring and troubleshooting is a top challenge. Which leads me to the next assumption. Monitoring and troubleshooting. Just use metrics and logs, you fool. What's new? We've being doing that for ages. Right? Right. Indeed. Logging metrics are still the most common. 80 90% use them. Not surprising, as you can see here. If you summarize, there's another bar here on all of the above. So if you summarize the basis for the specifics and the bar with all of the above, you'll see that it's around 88% for logs, 80% for metrics. So definitely there. Interestingly, distributed tracing increases its adoption with around 48%. Nearly half of the companies doing something with distributed tracing. So that for me was astonishing. And it actually continues the strong momentum trend we've seen in the previous year's surveys. So 48% this year, 26% on the last survey, and 19% the year before that. So you can definitely see the trend. It's happening. This is where the tracing is having a very strong momentum. Another interesting thing here to mention is APM that, although perceived as traditional tools, is used vastly 43% of the users use APM. And maybe the most impressive on this year's survey, at least for me, is that 21% use all of the above. More than fifth. One fifth of the people use all of the above. And that's a significant step towards adoption of full observability. I've been preaching for that for quite some time, and if you read my blogs and articles and podcasts and everything, and it's really, really encouraging to see that people realize that logs are not enough, even not logs and metrics, and you need the combination of signals and the correlation of data to actually gain observability into your system. And going back to the trend around distributed tracing adoption, among those who don't yet use distributed tracing, 70% are planning to start using it in the coming one or two years. 70%. So, to summarize, tracing definitely stands out as a central tool for monitoring microservices undistributed systems, of course augmenting logs and metrics as we, as we said, and to be honest, we've seen a strong momentum also on past surveys and people expected adopting it very quickly. We've seen slightly slower adoption than expected. However, you've seen the numbers, it's definitely picking up. And let's move on to the next assumption. We're getting better on our MTTR, the meantime to resolution or meantime to recovery. True? False. What do you think? When we asked people on the survey, 68% said that they're getting better with MTTR. You can see here the the breakdown 14% said that they greatly reduced MTTR, 23% said they were making great strides in reducing MTTR, 31% said that they're slowly making progress, but still 68% indicated that they're improving on their MTTR. That's very, very positive and encouraging answer, right? However, the actual MTTR numbers are, how shall I say it, less optimistic. When we actually asked the numbers around 64% of the surveys, respondents reported that their MTTR during production incidents was over an hour. Over an hour, 64% nearly two thirds of the people. And if you compare that to last year, it's increased from 47% last report to 64% this year. So it's not just high, it's also increasing in a very, very fast pace. So I'm not sure how better we are getting at this at the MTTR reduction. So let's summarize the takeaways. So far we've seen that everyone's moving the workloads to containers and to kubernetes, but still experiencing many challenges operating kubernetes in production. The top challenges were with security and with monitoring and troubleshooting. Around a third of their respondents reported that we also far from taming the MTTR, in fact it's increasing, perhaps as a side effect of the growing adoption of kubernetes and cloud native architectures. And we've seen nearly two thirds of the people take more than an hour to reach the full recovery. And speaking about monitoring challenges, distributed tracing is rising in popularity for monitoring and troubleshooting microservices, alongside logs and metrics, of course. And also we've seen that more people use full observability leveraging logs, metrics, traces and APm. Over one fifth of the people, 21%, are already using all of the above. So that's it for this survey. But wait, what about security? What about data volumes, cost, open source team structure? Don't worry, you can find the full survey in this link with all of these, the above topics, and even more so do check it out. I prepared a short link for you so it'd be easy to remember. Or you can just take a screenshot bitly DevOps 2022 and you can find the full results there. You can also look at and see the surveys from past years. So interesting to also see the trends over year over year. And of course you're more than welcome to share your feedback. You can find me at Horowitz Horovits. So if you have any feedback on the survey, on the talk, on my insights, or my maybe misinterpretation of the or anything else, feel free to reach out to me at Twitter LinkedIn medium, whichever. I'd be more than happy to catch up. I'm Dotan Horvitz and thank you very much for listening.
...

Dotan Horovits

Principal Developer Advocate @ Logz.io

Dotan Horovits's LinkedIn account Dotan Horovits's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways