Conf42 Kube Native 2023 - Online

Metrics that Matter - Moving from Easy to Impactful

Video size:

Abstract

Metrics are the bane of many organizations, getting fascinated on measurements that don’t matter or can drive improper behaviours. In this session, we walk through a simple grouping for metrics where the groupings not only call out the metrics, but their limits, and help guide to better metrics.

Summary

  • Joel Tosi: I was tired of being the person that said all metrics were bad. People want metrics. They want to know if they're making good investments. We need to be able to guide better metrics for organizations.
  • When we commit to an idea, we want to minimize variability and deliver quickly. We have to be careful that we don't suboptimize the system at the cost of the whole. For quick groupings for organizations I work with, I want them to know where they're at.
  • Talk about simple metrics. This is where a lot of organizations, if they don't have metrics at all, this could be a good place to start. If you're in the simple space, getting to directional is better. These are impactful and economic metrics.
  • Walter Schuert came up with a way, using statistics, to distinguish between common cause variation and special cause variation. If you make a change, you would use Schuer charts to say if your change actually made a difference. To move from simple and directional to impactful require new thinking.
  • Many organizations want more predictability, but they don't monitor their variability. When you have high variability, predictability is out the window. Variability also leads to large queue times, people and teams waiting. The root of the problem is the variability.
  • Another area I work with extensively and do a lot of research lately on is this idea of cognitive load. Many, many teams have too much cognitive to load. Getting better context closer to the team, closer to where the work is done, is the solution.
  • And lastly, I spend a lot of my time looking at social learning. By supporting social learning, we actually lower the dependency upon team members. The whole team learns together across skills and across contexts. And really look, metrics are always going to improve.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
This is Joel Tosi. We'll be talking about metrics that matter. There's the slides, all that fun stuff, all the social media stuff. Let's jump into it, call right some quick background so you know where I'm coming from with this. I do a lot of help with organizations. And of course, just as I'm sure you all are asked, people ask me all the time for metrics. For a while there, I was the guy that kept on saying, those metrics are bad. You can't just look at cycle time. You can't just look at burn down charts. You can't just look at cpu percentages. And I kept on saying, those are bad metrics. You need to do better. And then I realized I couldn't just be the person that said all the metrics were bad. I actually had to have some ideas on what was better. So anyway, that's where I'm coming from. I was tired of being the person that said all metrics were bad, and I wanted to be the person that said, here's some better ideas. Let's see what we can do about it. Quick takeaway for you. Look, you can't ignore metrics as much as we want to, as much as we wanted to say, make it easier, make it better. People want metrics. They want to know if they're making good investments. They want to be able to prove that ideas are working. So we need to be able to guide better metrics for organizations. When we can't measure what we should, we measure what we can. And then, of course, we optimize for the wrong thing. So let's do better. All right, first thought for you, and then we'll jump into some fancy math. We're talking about orienting and grouping. I'll put this up for you real quick, just around the same page. This is a quick idea of a value stream. On the left, customer wants something, a business hypothesis. On the right, the customer gets it. Everybody's happy in the mill. You can see some ideas around there on variability. When ideas are cheap, for example, there hasn't been code written or deployed yet or to be supported. We probably want to exploit variability. We want to take more ideas and find out if the ideas should be invested in before it's too late. Once we commit to an idea, we want to minimize variability and deliver quickly. We'll get through those ideas here a little bit more later around variability. What I want us to think about, though, right off the bat, is when we're thinking about metrics, we have to be careful that we don't suboptimize the system, that we don't optimize one aspect at the cost of the whole. For example, if we're focusing on how quickly we can deploy servers on that very far side of the value stream, the deployment and operations, if we're focusing only on the quickness of deployment, but not on the whole value stream, we could be suboptimizing the delivery. So again, think about these things in context and we'll get back into more of this later on. For quick groupings for organizations I work with, I want them to be able to know where they're at and to be able to think about what could be better. So I did these groupings right off the bat. Talk about simple metrics. This is where a lot of organizations, if they don't have metrics at all, this could be a good place to start. Simple things that organizations can start capturing, easy to count, easy to collect. How many defects do we have? How many teams are doing automated deployments? What's our rate of delivery? Very interesting. I'm sorry, not very interesting. Very simple metrics that are very isolated in that value stream. Now, again, if you don't have any metrics, this gives you someplace to start. Are they interesting over long periods of time? Probably not. They don't tell you a whole story. But if you don't have anything, knowing your defect rate would be a good start. Assuming you're already there and you want to get to a better space, you could look at this idea of directional metrics. So if you think about simple metrics and we add a period of time, a time horizon in them, then we can start talking about directional metrics. Is our defect rate going down over time? Are we increasing our code coverage? Is there a percentage reduction in defects? Is our cycle time decreasing? So again, what we could say here is given some investment or some focus, we've invested in infrastructure automation, has that actually reduced our cycle time? We've invested in test driven development or better test automation, has that had a significant reduction in our defects. So again, if we can start looking at causation and correlation over time, that's where I start seeing these directional metrics come in. Again, if you're in the simple space, getting to directional is better. If you're indirectional, you should be looking at what also might be better before we get to what we better in that space. I love this book from Don Reinerson, principles of product development flow. If you haven't read it, I highly recommend it. What I love about Don Reinerson's book, among many things here is this quote. When cycle times are long, innovation happens so late that it becomes imitation. If our cycle times for our teams are months and quarters or years, it's hard to say, why aren't you being innovative? You can be innovative. You learn too late what the next things might be, and your ability to adapt to the market is out the window. So again, I put that up there for context, that cycle times are interesting in the context of making better products. And so if we think about this last grouping here, impactful or economic metrics, these metrics actually require intentionality. So not just cycle time reduction, but for cycle time for a delivery that actually mattered. Now, if you take back that earlier value stream we talked, but where it was a little bit of product discovery and product framing and then delivery and operations, reduction of a cycle time for a delivery that mattered is interesting because it actually starts going across that whole value stream. It talks about finding products that are interesting and then being able to deliver them in an efficient and effective manner. Systemic cost reductions, lowering the cost of deployments or testing or overall running of businesses, stopping bad ideas, reducing queues, reducing toil inside of an organization. These are impactful and economic metrics. And these are more interesting than just finding out if we have less defects. These are more interesting than just counting how many story points a team is delivering. This is actually saying, is the work we're doing making a difference? And I find this to be super interesting. It's also very challenging for a lot of organizations because to actually measure items that are impactful or economic, you have to agree organizations on what these items actually mean. You'd have to agree what a delivery that mattered actually is. You'd actually have to agree what systemic cost reductions are out there. You'd have to agree upon what do you do with bad ideas and not just have sunk cost fallacies, you'd actually have to do agree upon why queues are bad and why toil is bad. So you have to have this higher level agreement to actually get to this point. Again, if you have nothing, simple is good. If you have simple, directional is better. If you're having directional, impactful is a good place to get to. That being said, how would you actually know if you made a change, that these metrics were improving? How do you separate signal from noise? To actually move from simple and directional to impactful require new thinking. And so this is where we actually need some math, this dapper young gentleman here, Walter Schuert. So if you've heard of Schuert charts, or process behavior charts. This is what we're talking about, control charts. Walter Schuert came up with a way, using statistics, to distinguish between common cause variation and special cause variation. Put more succinctly, if you make a change, you would use Schuer charts to say if your change actually made a difference. Here's where it comes from, right? The way you actually deliver value is a system. How you go about building, deploying your products is actually a system. That's how you work internally. If you do nothing at all, a Stable system will continue to deliver within a given range. Right? You might have a delivery every two weeks. You might have a certain number of defects, you might have certain market share. If you change nothing at all within a certain range, you will get repeatable results. So our goal is not to react to noise just because we invested in infrastructure automation. Are we actually seeing a reduction in toil? Are we actually seeing more stable environments? Right? So how do we actually know what to expect? Let's talk about how to do these. Here's a quick example. Imagine you had a new product released. This could be product, this could be tech, this could be deployments, this could be defects. But in this example, we're just using a product that we're trying to make sales. And your sales day to day kind of look like this. Eight on the first day, six the second day, ten the third day, six the fourth day. So you can kind of see how this plays out, right? If we were to graph that on a time graph, with time being the x axes and conversions being the y axes, it would look a little bit like this right here, the red line in the middle being the average over the period of time. Now, imagine if we had that. That was our number of conversions per day. And on day eleven, we had 14 conversions. Now, obviously, we would say whatever we did on day eleven was awesome. We should do that again. Whatever we released, the team should be celebrated and get raises. But hold on. On day twelve, there was only four sales. I guess we have to let that team go because they're just not performing as well as we thought they would. But on day 13, it goes to eleven, and you can see how this goes, right? So if we didn't use any kind of analysis, and we said on day eleven, we released a new version of the product, we might celebrate for no reason, how would we know it actually made a difference or not? You go about using these sure charts. Now, the math is relatively straightforward. You can kind of see in the top here around 15, there's a yellow dotted line, and then there's actually a dotted line at zero to figure out the upper and lower bounds of the stable system. So, in essence, any values within the upper and lower bound are going to happen naturally through common cause variation, not special cause. The way we calculate those upper and lower bounds, we find the moving average. So the delta between day one and day two, between day two and day three, between day three and day four, and that's the moving average, we divide the total of those deltas by the number of data points. So number the deltas between one, day one and day two, day two and day three, day three and day four and so on and so forth. We sum those up and divide by nine because there's nine deltas. Once we have that value, we multiply by 2.67. Now we can go into. Why is it 2.67? You can read the book yourself. It's just for ease of math. Look, the story becomes the same once we take that moving average. We multiply by the 2.67, we add it to the existing average, the red line, and we subtract it from the red line to get the upper and lower bounds. Now that we have the upper and lower bounds, you can see the upper bounds are a little bit above 15. The lower bound would actually be negative, but we'll say zero because you're not going to have negative sales. Now that we know this system should produce between zero and 15 sales on any given day, and that would be normal variation. We could see that on day eleven when we launched the new product. With 14 sales, it actually didn't matter. And on day twelve, with four sales, it didn't matter. On day 13, it didn't matter. That's common variation. The change we made did not matter. Kind of sad. But what's interesting about this is we can't celebrate changes that don't make a difference. And so we should use process control charts to actually say, are the things we're doing making a difference and can we back it up statistically? Key takeaways for you with this idea of Schuer charts. Be intentional with what you're measuring, right? Know what you're measuring and if it's making a difference or not. More frequent data points obviously make this easier. If you're trying to look at are we increasing the stability of our environments and you only get data points once a week, it's going to take a while for you to actually be able to predict stability. If you're trying to look at set defects. It's the same thing if you're looking at product releases and you're trying to see if a new feature makes a difference, but you only check once a month, you're going to need more time to get enough data to make it easier. So you need to figure, but how to get more frequent data. Again, this process has process control charts, Schuert charts, behavior charts, whatever you want, control charts, whatever you'd like to call it. It works for product releases, process releases, and tech releases as well. It's just math. It works all right. Know your actual problem, just like we talked about there. Now, I quickly went through some groupings, simple, directional, impactful metrics. We talked about how to know if the changes we're making in these metrics are actually provable. I want to leave you with some ideas of what I think are actually more interesting metrics than just cycle time, or even cycle time that matters, or reducing toil. Here's where my energy is most recently. This is a huge one for me, this idea of predictability versus variability. If you remember back in that first slide with the value stream, we talked, but exploit variability on the left side and minimize variability on the right side. Many organizations want more predictability, but they don't monitor their variability. This kind of ties in a little bit to the previous area where we talked about process control charts and Schubert charts. What I want you to think about is in many organizations, they have high variability in the delivery side of the value stream, in the build, test, deploy release side of the value stream. What this looks like, this high variability, is if you think about when you go to test your product releases, you go to test your next release. Is the test setup always predictable? Is the execution. If you run the tests over and over again, are the results the same? Is the setup of the data easy and predictable? Is the access consistent? And so what we see with a lot of organizations is that the tests are unpredictable in the value stream. For product delivery, they're unpredictable because sometimes the lower environments are up, sometimes they're down, sometimes the dependencies are available, sometimes they're not. Sometimes the firewalls are blocking things in lower environments, sometimes they're not. Sometimes the data is ready, sometimes it's not. Sometimes the data is changed. Now, when we have high variability on the right side of the value stream, in the delivery side of the execution, and people are asking for more predictability, when will it be done? How long will it take? The problem is not getting more predictability. The problem is getting less variability. And this is something that I work with organizations over and over on, and you can do it as well. And I would hope that you do it as well. Help people realize how much variability they have. And when you have high variability, predictability is out the window. You're all obviously very good at math. If 80% of the time your code works as expected, and then the next 80% of the time that all the test passed as expected, and then the next 80% of the time, the build works as expected, and then 80% of the time the deployment works as expected, and then 80% of the time the environments work as expected. If you have those five events changed together, chained together, the overall predictability is not 80%, it is 0.8 to the fifth power, right? So six, whatever, four, three, you're probably under 20%. Does the work ever get through that whole chain of events without having problems? So if you want to be predictable and you're doing your best guesses, knowing that only one out of five times, or even less, actually significantly less than 5% time, I think it's actually going to get through successfully. The problem isn't how do you get more predictability? The problem isn't add more process to become more predictable. The root of the problem is the variability. So we have to address variability. Variability also leads to large queue times, people and teams waiting. This is very expensive. So again, think about are we worried about predictability or are we worried about variability and make sure we're solving the right problem. Another area I work with extensively and do a lot of research lately on is this idea of cognitive load. Many, many teams, especially now that everybody is DevOps and devsecops and everybody does everything, the sheer number of contexts that teams are grappling with is through the roof. You can see it in this example here, not exactly a great piece of code through nobody's fault of their own. The problem is the team is working with too many contexts. There's too much work happening inside here where the code and the architecture actually isn't even quite set up right. And so teams have too much cognitive to load. The repos and the code bases get large. You can see bad couplings across teams and deployments, and this is a cognitive load problem. And so then the question becomes, how do we reduce cognitive load for teams? It could be rearchitecting. That tends to be what needs to happen quite a bit. It could be replatforming. But again, this idea, if we start looking at the cognitive load on teams, and we're measuring cognitive load on teams and we figure out ways of reducing the cognitive load. This simplifies the work that teams do significantly. It makes the work flow more smoothly, it makes everybody's job less stressful, and we just end up with a better space to be in. So again, I'm looking now at cognitive load on teams and how do I capture it, and how do I help teams lower their cognitive load? And that just makes everybody's day better. I also look at information lead time quite a bit. This is really interesting for me because the people closest to the work should decide the work to do and how to do the work right. So in the very center of the bubble here, I've done this with a few organizations where we'll put this up visually and we'll see when a problem comes in or an idea comes in, or the team themselves observes something that's interesting. How far up do they need to go to get an answer to a question, can we do this? Would this be a better idea? Is this an option instead? And so if you think about this, if the team needs to ask the manager's approval, and the manager needs to ask the business approval, and the business has to ask the execs approval to make a change that the team found out, then we actually have an information lead time problem. The team doesn't have enough information to make a decision close to the identification of the problem. Now, what can you do about this? This isn't about like, you need to be able to get the execs to answer questions faster or the business needs to have more autonomy. The root of this problem that we look at for how do we solve information lead time is providing context further down into the circles, closer to the work. If you have an information lead time problem, getting better context closer to the team, closer to where the work is done, is the solution. It sounds very easy and it is very difficult to do. We have to get the right context from the executives to the business around why investments are being made a certain way. It also gets into prioritization aspects. The business has to give context to management around the returns they're looking for and why they're identifying these opportunities to focus on right now. And the managers have to be able to bridge that gap into the team, and the team has to want that information and know what to do with it. They have to want that ownership of the product and the problem space. So again, information lead time way more interesting to me than counting cycle time. And lastly, I spend a lot of my time looking at social learning. Now, what I like about social learning is not only is it just kind of better for the team, right? Not only is it just better for, like, we have a more skilled team, we also have discovered things where, by supporting social learning, we actually lower the dependency upon team members. And so, to be very clear, what I mean by social learning is the team that is working on products learns together. Now, this doesn't mean that just the engineers learn one thing and just the testers learned one thing. It means the whole team learns together across skills and across contexts. It might sound silly, but I can't tell you the number of times where a nontechnical person, a business analyst maybe, would say, like, I don't know why the code looks like that. Can you explain it to me? And then through explaining it to the business analyst, they ask a question about the product that then helps the engineer. And conversely, if an engineer is working with some kind of deployment and they're trying to explain things to a test engineer, and the test engineer says, but how do I test it? Like this. You have this nice bridge of understanding, and this idea of social learning just amplifies a team's ability to get work done. So I really like this idea of social learning. There's this item down here at the bottom. You can kind of see it says diffusion index. Diffusion index is a metric that I actually look with in learning teams, and it looks at what's the gap between the highest performer and the lowest performer? I'm sorry, that's the wrong way of looking at the highest skilled and the lowest skilled on a team. And so what we mean by that is teams and people self assess. And we look at the gap between the people that self assess their skills are the highest versus the people that assess their skills the lowest. And what we found is that when we shrink that gap, when we shrink that gap between the highest, the perceived highest skilled, and the perceived lowest skilled across a team. So, across skills, when we can shrink that gap, we tend to have less reliancy on a single person to make a decision. We tend to have less reliancy on a single person to do a certain facet of work. And so, all of a sudden, those silos that exist within a team start to shrink. So, again, I love this idea of social learning. I love this idea of diffusion index. I love this idea of measuring the gaps within perceived skills within a team and looking at how do we address those gaps? Because, again, once we increase the capabilities of teams, we increase the capabilities of organizations, and now, all of a sudden, work is easier. People are less stressed out. We're not working as many hours because there isn't one person waiting on one person to make a decision, waiting on another person to make a decision. Work gets more enjoyable, work flows more smoothly, people are less stressed out. And that is a wonderful thing. So I gave you this kind of groupings of metrics. We went through some math talking about how do we know the changes we're doing actually make sense? And then I ended with where my interest is lately. And really look, metrics are always going to improve. Metrics are always going to get better. So always be thinking about what might be more interesting to you and to your organization and to your team. If you're looking for books, I love these books. The first one by a good friend, Mark Raven. Measures of success how to react less lead better, improve more Mark is a wonderful person in the lean community, looking at statistical analysis, statistical controls. How do you actually get continuous improvement in teams? I've learned a ton from Mark. The Schuert charts and the examples, I'm sorry, the Schuer charts come exactly from this book. Understanding variation the key to managing chaos from Donald Wheeler if you're wondering why 2.667 is the multiplier it's explained in the book. If you're wondering, well, what happens if the charts are nonlinear, if they're exponential or parabolic, this book will get into how to handle those types of situations. Again, we're looking for the story, and we're just looking at how do we create those upper and lower bounds to find out and separate signal from noise. And the last book there, principles of product development flow by Don Reinerson. Again, I can't recommend it enough. A good economics book, a good way of just looking at u curve optimizations and other types of metrics that are probably more interesting than just counting defects. Counting deploys monitoring cpu uptime. Lots of good stuff inside there as well. So to recap, you can help everybody get better metrics. Understand where you're at, how you can improve, and always think about the questions you're trying to answer and think about what might be other ways of getting there. Be careful with metrics. Make sure you see the same reality as the people you're sharing data with. Sometimes people don't see the same reality. So we need to talk to data, not talk to emotion. Make sure we see the same reality and are going to a better place. Make sure if you're making changes, make sure your changes actually matter. And fundamentally, maybe you ask yourself, are we actually learning anything, or are our metrics just reinforcing what we already think and believe. That's what I got. I'll be on discord if you want to chat. Love to hear what questions you have. Thanks for having me. The slides are at that link. The slides are also with Con 42. Love to hear what other metrics you all have and what's working for you. Thanks much.
...

Joel Tosi

Co-Founder @ Dojo & Co

Joel Tosi's LinkedIn account Joel Tosi's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways