Conf42 Incident Management 2022 - Online

Incidents: the customer empathy workshop you never wanted

Video size:

Abstract

Organizations are focusing on incidents more than ever but failing to leverage them to their full potential. But by framing incidents and post-incident reviews as customer empathy-building opportunities, we can facilitate more creative technical problem-solving, unlock improvements to your response process, and enable organizational agility that otherwise might have gone unnoticed. This talk will deliver actionable methods to increase customer empathy before, during, and after an incident.

Summary

  • Ryan McDonald is a responder advocate for firehydrant. He says incidents have been a persistent part of his professional life. Working in customer success as a customer success engineer directly with customers has been enlightening.
  • When we focus on customer empathy, we can improve customer experience during an incident. This can lead to better working relationships with other responder organizations, and better processes and product improvements from incidents. Here are some concrete actions that have helped me drive these outcomes in the organizations that I've been part of.
  • So collecting and displaying customer impact prominently during can incidents is the next thing that I found adds a ton of value. Oftentimes in incidents response is not to resolve the issue or even understand it, but just to stop the bleeding, stop the pain for our customers. This can directly impact business metrics.
  • The next point is to not just collaborate, but partner with your support organization. By proactively building rapport with support, you can help smooth over a lot of these things. Create really clean interfaces and clear expectations for communications with support.
  • When it comes to partnering with your support organization, try and let support behind the curtain more during incidents. Support can provide additional evidence through direct testing. intentionally nurturing your support folks creates a pool of candidates that can lead to your next great hire.
  • The first thing is making space for all of the responders. Bringing them in to these post incident reviews or incident retrospectives can be a great way to ensure that you understand what their concerns are. These things can feed into both your processes and your product in ways that you might not expect.
  • For sufficiently large incidents, plan in advance on having multiple meetings or multiple sessions to cover different themes. Incidents, they're not optional right? Here in these complex domains that we live and work. Taking the time to really dig deep and learn about different ways that things are or are not working can be key.
  • The last item here is to capture all of your improvements, not just the technical ones. By broadening our incident retrospectives, we can have deeper learning, better process, and better product improvements.
  • By partnering with your support organization, we can have more aligned and better responses. By cracking open your retrospectives, you can enable better process and product prioritization. I like to think about customer empathy as an organizational lubricant. Let me know how it goes.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Incidents welcome to incidents, the customer empathy workshop you never wanted I'm Ryan McDonald, responder advocate for firehydrant. I am in the delightful position of getting to chat with fire hydrant's customers and the broader community about all aspects of incident management and response. It's total hoot. So thanks to both fire hydrant and to conf 42 opportunity to be here. That's nice, I imagine you think, but how does someone get such a silly job? Incidents have been a persistent part of my professional life. Regardless of the domain, they seem to follow me. Consequently, I ended up falling in love with them. I started my professional life as an outward bound instructor, leading mountaineering, rafting and climbing expeditions in the western United States for anywhere from 22 to 30 days. And as you can imagine, we had incidents. Thankfully, they were infrequent, but when we did have them, they tended to be pretty large affairs, life and limb emergencies, people being lost, search and rescue skills being required. So kind of a big deal. It was great. I loved that job and I loved that role. A quick joke. What's the difference between an outward bound instructor and a large domino's pizza? A large domino's pizza can feed a family. Consequently, I caved to capitalistic urges and ended up getting into tech. From my first experience in a software outage as an intern, I remember being both intrigued and, I must confess, a little bit confused and entertained. A deploy was botched and people are running around trying to roll it back. Meanwhile, I'm off on the side saying, so the website is down, but nobody's dying, no one's bleeding out, right? Transitioning from the notion of lost teenagers in the wood compound fractures or helicopter evacs to this world was something that seemed fascinating, both lowering the stakes and increasing the complexity and sort of the interest there. This love of interest continued as my time as an engineer in various process roles, and formally or informally, I ended up involved or responsible for the incidents management training programs or other aspects of incident process. I finally doubled down on incidents and ended up as a founding member of the incident command group at Twilio. My time at Twilio was amazing and stressful and high impact and great, and over the course of that, I had the need for an incident management tool, and I was lucky enough to meet the folks here at Firehydrant and as a lot of you know, the time and place that you need an incident management tool oftentimes is pretty stressful. So I was at Twilio for a while and then eventually left. Took a little bit of time off and fire hydrant reached out and asked if I would be interested in a customer facing role, which is not something that I had really considered that was not on my career roadmap or bingo card. But working in customer success as a customer success engineer directly with customers has been an enlightening process. I've had the privilege and the opportunity to dive in with a ton of different organizations and learn so, so much with that. I'm excited to share some of that learning with you all. So we'll go ahead and dive in. We'll start with a story that might seem familiar to some of you. So your day begins and you wake up. It's feeling like one of those above average kind of days, right? Coffee is just like hitting you just right. Everything seems to be lining up. There's a strong possibility of this being like a serious flow state kind of thing. Like real destroyed the to do list kind of scenario. So sit down, crack open your ide, and get into your first task. Not too much longer. Later, your phone buzzes. Pagerduty informs you that your product has other plans for your day. Remind yourself it's can above average day. Things are going well. How bad can this be? Crack your knuckles, jump in and start to jump through the hoops that are required to verify what exactly is happening with your systems. You also promise yourself that you'll actually check into those transient alerts from two weeks ago. But lo and behold, after a little bit of digging, you realize this is a live one. This is not only blowing up your pager, it's also impacting your customers. Remembering things are good. It's a good day, things are lined up. You go ahead and kick off the incident management process and start investigating the issue. You're getting real serious. Out of nowhere, some rando from customer support comes literally screaming into your chat. And you've probably heard these questions before, right? What's the impact like? Just getting unnecessarily deep into something that you've only just started on and at this point your chill is destroyed. Your day feels like it's trending towards dumpster fire. What started with such promise is proving to be just another on call shift that you're not going to be enthused about. Thin and seen. If this sounds familiar, take a moment. Please enjoy this relaxing gIf. Many of us with time and seat will hear this story and think about process changes first, right? An industrial strength racy right to clarify responsibilities. Maybe the introduction of a new role, some kind of incident commander type to wrangle stakeholders or serve as a firewall. While one would be justified to consider these kinds of changes, I would like to propose an alternate take. I would argue that the above story is at least partly due to a deficit in customer empathy. And if you're not familiar with the term, this is the action of understanding, being aware of, being sensitive to, and vicariously experiencing the feelings of your customers. So that's kind of a mouthful. And it does include words like feelings which obviously feel like they fall into that kind of nebulous, squishy side of things in our land. And why am I talking about this? In the course of incident management, shouldn't we be digging into technical solutions to high stigs ARR compromising incidents? What can customer empathy possibly bring to the table when we start to put that into focus? In the course of our incident management process? I would argue quite a few things. I think there's a handful of outcomes that we can land. So when we begin to focus on customer empathy, I've observed the following outcomes, including improving your customers experience during an incident, better working relationships with other responder organizations, and then gaining a deeper learning right from those incidents, and better processes and product improvements from your incidents. So I'd like to share with you all some concrete actions that have helped me drive these outcomes in the organizations that I've been part of. While none of these are mind bending, I do find that when framed through the lens of customer empathy, we'll end up landing in a few places that might not be typical or intuitive. When you're thinking about learning from an incident, let's dig in. So first, missed expectations are the basis for basically every relationship issue that's ever been. So naturally, it shouldn't be any different with our customers or with incidents. The first thing that we can do is we can get more exposure to the product. So take some time and actually use your product and encourage that others use the product as the customers might. This doesn't have to be a deep dive. You don't need to become a power user of your product, but just digging in and trying to get a sense for the basic workflows of your product can add a ton of value. And we'll get into this more later as to what that value actually is. But if you don't have the option to use the product and or there are some other alternative routes to get that kind of experience of the day to day usage of the tool, you can ride along with folks in the customer facing orgs, right? So whether that's customer support, customer success user research, there are a ton of options. All of those folks are going to have a sense for how your product is being used. Common use cases, and then also common friction points. Whatever avenue it is that you choose to take this, it really is all about just gaining that fluency in your tool. This can be invaluable when it comes time to serve as that translation layer from whomever it is is reporting that issues are taking place to trying to figure out what's actually happening. It can also really help during an incident when you're trying to more acutely describe what that impact is to a broader audience. So collecting and displaying customer impact prominently during can incidents is the next thing that I found adds a ton of value. And this is one of those things that it's nice to say, but it can be very difficult to prioritize in the course of an incident. And I found that trying to bifurcate efforts initially. Right. So the first person is digging into what maybe the issue looks like from the back end, and the other is trying to clarify what that experience would be like if that were down for customers. Building space into your incident response processes to collect that customer impact and then display it can also take a huge load off of responders throughout the course of an incident. Stakeholders and other responders, oftentimes this is the first question that they ask. Right. So that's sort of the no brainer side of things. But there is also this idea that if we keep this prominently placed in front of everyone, that it can help frame the experience. So as we add more people to the incident, the fact that customers are experiencing pain is front and center. This isn't a technical exercise. This isn't some kind of logical problem. The people who pay us are experiencing issues, and here's what those are like. And I think centering and grounding responders in that idea can help make the experience more relatable. It also can help increase the urgency. Right. It can be easy for these sometimes to just turn into long sojourns, right. Trying to understand a technical problem instead of thinking about mitigation. Right. Which is our end goal. Oftentimes in incidents response is not to resolve the issue or even understand it, but just to stop the bleeding, stop the pain for our customers. One other thing that we can do to help us understand customer expectations is to consider something like the field of chaos engineering, right. And what kind of value we can bring from looking at our application when we inject faults, when things are going poorly. I'll even go so far, sometimes as to encourage people, if you have downtime during an incident, to use your application in a degraded state. Other folks can speak more deeply to implementing programs like this. But really using your tool when it's not at its best is a great way to help build and develop customer empathy. So by understanding our customers expectations and what their experience is like, it can help us drive urgency and ensure that we stay focused, making incidents potentially suck less for customers. And obviously mitigation, like trying to drive that notion of mitigation is a part of that. But really, when you get down to the dollars and cents of it, this can directly impact business metrics. Net retention and ARR of customers is something that can suffer if customers don't feel like you understand what they're going through on those bad days and are responding accordingly. Right? Upsells can depend upon that sentiment. And really, at the end of the day, as a customer, it's hard to stick with an organization that you don't like. Right? And feeling like your responders understand what's happening is a great way to ensure that customers are feeling heard and understood and that all of those comms really can land for them. Awesome. So our next point is to not just collaborate, but partner with your support organization. And to start with, customer support is hard. In the story earlier, I wanted customer support as a boogeyman of sorts. They emerged out of the ether to pull us from that delightful flow state, and that's not an entirely fair characterization. Customer support is exceptionally difficult. Before we dive in, I've got a hard hitting analysis, the kind that you come to these conferences for. Right. I have captured an actual request from a customer facing team during the early stages of an incident. Get ready. All right. Okay, so obviously, the initial take here, this is a little heavy handed, right? It's a little aggressive, right? Thankfully, with the power of technology, we can break this down into its component parts. There's nuance here if we really get into the details. If we look at this frame by frame, what initially comes across as anger or maybe even aggression, we realize quickly devolves into desperation. The fact of the matter is that customer support, customer success, or other account executives are under immense pressure during incidents. They are the middleman and a bum deal where they have very little to no control and have to serve as a sponge for customer angst. So let's dig in a little bit here and see what we can do to make some of these interactions a little bit more productive. First, by proactively building rapport with support, you can help smooth over a lot of these things. If you remember earlier, I encouraged the idea of ride alongs with customer support. Surprise. Little do you know those are rapport building activities, right? Digging in and getting interested about their domain can go a long way towards helping to build that initial relationship. So not only that, but there's a lot of informal opportunities, engaging with the support organization during larger events, inviting them to happy hours, off sites if possible, all of those kinds of things. Another way that you can think about how to build sort of that reciprocal vibe with support is that you can figure, but ways to leverage them inside of your company, outside of their support job. So for example, are there ways that support could be leveraged for internal or even customer facing trainings? An example would be support at Sungrid ran our product orientation. All new employees were required to go through this set of organizations where these people would do a deep dive into how the app and why the app is doing what it's doing. And by doing that early and setting up support as kind of a tent pole inside of the organization that really deeply understood not only the tech, but the customer experience, it just gave them this opportunity to be more available to everyone else. So we were able to build a ton of rapport by putting those folks in a situation where they could flex their skills just in a different context. And then lastly, I can't speak for everyone in support, but support is a great way to get into tech with fairly limited background in technology. Consequently, a lot of these folks are excited and hungry for mentors. Those of us that have been in the field for a while can provide an amazing resource. Just sitting down and grabbing lunch with people, right? Chatting about what their goals are, where they want to go with their career. So anywhere in that spectrum really are all great options for building rapport with support. So once you've taken the time to get to know support coworkers, the next thing that you can do, especially in the context of an incidents, is to create really clean interfaces and clear expectations for communications with support, including support and stakeholders. So by setting expectations in your process for when you will try to have descriptions of the impact to customers, you can help avoid those frustrated confrontations like we've been talking about. Support's primary goal is to manage the expectation of customers, and input from engineers is a huge currency in that, right? And helping them really describe and empathize with customers and the experience that they're having. So the other thing too to consider is once you set those expectations, sometimes they're hard to meet, right? So don't beat yourself up over it, but instead communicate, right? There's nothing worse than having a vacuum in the middle of an incidents. And so don't put support folks or customer facing groups in the position of having to come bother you or dig into a space where you're doing some kind of technical investigation. Right. Communicate as much. Let them know like, hey, we don't have something yet, but we will, or we'll hope to in 30 minutes, and we'll check in sooner if we've got something. Another added benefit of these types of regular communication cadences is that the broader stakeholder group can feel supported as well. So execs love feeling like they're in the loop, and obviously the careful dance is to keep them in the loop, but just further enough away. And these regular communication cadences can help them build confidence and avoid dropping into the middle of some kind of deep technical issue. All right, and then last here, when it comes to partnering with your support organization, and I alluded to this earlier a little bit, but try and let support behind the curtain more during incidents, I think it's easy to think about support as a stakeholder, right, or someone who simply needs information. Consequently, they can end up feeling like they're playing second fiddle to actual, like the responding engineers or other responders. So by bringing support closer into the fold and engaging them in the process, there can be a bunch of win wins. The only caveat here that I think is worth calling out is oftentimes the models of measurement of productivity and support can be pretty substantially different than in engineering. And oftentimes that looks like something like how many tickets are being passed through your queue in a given period of time. So I think advocating both with and for support, once you understand what that model looks like, to ensure that they have the freedom and the flexibility to engage strictly in can incident and not be distracted by other work. So really it's advocating for them inside of your organization to have them brought in and describing how valuable and how useful they can be. So once you've done that and you've helped them achieve that level of focus so that they can show up as their best selves during that incident, what can you do? How does support actually fit into the response? Right. And I think there's a number of different ways, not only handling tickets that are coming in and ensuring that the correct organizations are going out to those folks, those impacted customers, but support can also provide additional evidence through direct testing. These are folks that are an expert at using the app from a customer's perspective. So by bringing them in, you can have them test different cases, right? Which adds data and adds information to sort of your quiver of information that you can draw on throughout the course of an incident. Additionally, they can pull in additional information from the impacts from customers. So direct customer reports sometimes can help you avoid red herrings and focus your investigation efforts. Lastly, support can also help in testing. So as you begin to roll changes out to mitigate issues, getting them in there in lower environments and having them play or behind feature flags. However, it is that you all have things set up, they can be a person to go out and verify, right? Which can take some energy and some of the attention off of your responding engineers plates. There is actually a bonus outcome from this increased interaction and increased rapport that's being built with support. They can actually before a secret hiring bench if you work in a high growth company, there's no end to that company's appetite for a whole variety of roles, including program project, product managers, incident commanders, customer success managers, even potentially engineering, right? Like more junior engineers, a sufficiently senior support person can bring so much domain and organizational knowledge to roles to like these that no outside hire could ever dream of supplying. I have so many friends that have started in support and are now adding a truly amount of mind blowing value in an organization that they came up in and grew up in. So intentionally nurturing your support folks creates a pool of candidates that can lead to your next great hire. So by partnering with your support organization, it'll feel so much better when you meet those folks on bad days, right? In those incidents. So everyone knows, right? The worst time to meet someone is in a crisis situation. So figuring out the nuances of how your teams work together and what does and doesn't work, push that. Try and start that stuff before you're in the middle of an issue. The other added benefit here is when folks feel like they're collaborating and working as a team, you can have far more aha moments. The number of times, for example, at Sendgrid, where our deeply engaged support team would come to us with findings that helped our investigation accelerate their efforts. It happened frequently, so give yourself the opportunity to take advantage of these collaborative aha moments of having a well oiled team instead of an adversarial group. And the last point here, broaden your incidents retrospectives so retrospectives can sometimes be thought of, for incidents can be thought of as strictly technical exercises. If you are digging in and really a root cause analysis or some technical finding is your goal, you may be leaving quite a bit on the table so here are some ways that we can help broaden that idea of what is a retrospective for and how do you use it. The first thing is making space for all of the responders, right? So after you've engaged customer support as more of a responder role, bringing them in to these post incident reviews or incident retrospectives can be a great way to ensure that you understand what their concerns are, what their experience was like, and in turn, what kind of feedback they were getting from customers. Right? And all of these things can feed into both your processes and your product in ways that you might not expect. If getting those folks into a post into an incident retrospective is challenging, which it is. Scheduling retrospectives is difficult. It's hard to even get the core group of responders. Oftentimes in an organization that's fairly busy, you can always consider doing a quick check in with them or an async check in even, just to get a sense for what their experience was like, and then bring that experience as a proxy to the group of broader responders during the actual retrospective activities. Next item, incident retrospectives, plural. Right? And this can be a bit of a hot take for some folks, but for sufficiently large incidents, plan in advance on having multiple meetings or multiple sessions, whether those are meetings or async exercises to cover different themes in the course of the response. So, for example, having a technical or architectural deep dive, right, like trying to understand the root cause, if that's a thing that you're into, but then also have a session to think, but the broader process. Right? How did you interface in the broader organization, right. Or potentially another session focused expressly around customer communication? What did that cadence and that pattern look like for folks? Incidents, they're not optional right? Here in these complex domains that we live and work. So taking the time to really dig deep and learn about different ways that things are or are not working can be key during these nonoptional investments. So the last item here is to capture all of your improvements, not just the technical ones. I think it can be easy to focus on action items to make the application better or to avoid these types of incidents from happening again. But at the end of the day, there's a whole bunch of opportunities that there are a whole bunch of things that you can take away from an incident that are far beyond that, right? So, for example, how do we think about our cadence of communication with customers? Right? Are there opportunities to improve our products so that if something like this were to happen again, that we could degrade gracefully instead of failing outright? How are our error messages? Did things make sense from a customer perspective during the course of the incident? Making sure that that messaging is really good and clean and clear and crisp can be helpful, not only for customers, right, for them feeling confident that something is happening on the other side of the screen there when something fails. But also, how can we ensure that that is a good signal to the folks who might be raising that to people internally who are responding to this incident? So all of these things, right, whether they're small changes to the application, changes to our process, changes to our architecture or broader organizational structures, all of these things can get loaded into various backlogs, right? And can be invaluable during prioritization exercises. So engaging folks like your PMS and tpMs, those folks that serve as kind of the glue of a lot of your organization, and ensuring that they have a clear understanding for what happened and what findings you had that came out of those retrospective exercises, can be exceptionally useful. So by broadening our incident retrospectives, we can have deeper learning, better process, and better product improvements. From that, I like to think about incidents retrospectives as a flywheel for improvement. By considering more perspectives, especially those that are closer to customers, you're opening up that aperture, capturing improvements to processes and your products that otherwise might never really have been considered. All right, so we have a handful of actions and some outcomes we'd expect. By understanding the customer's expectation, we can create more satisfying incidents experiences. By partnering with your support organization, we can have more aligned and better responses, right, that feel better, that function better. And then lastly, by cracking open your retrospectives, you can enable better process and product prioritization, capitalizing on valuable and potentially expensive learning opportunities. I like to think about customer empathy as an organizational lubricant, right? Using incidents to drive this mentality and then better leveraging your support organization can add all manner of value. So give a few of these actions a try and and feel free to hit me up. Let me know how it goes. Here are my socials, and I'd love to chat with you all on discord on any of this stuff. So thank you again for the opportunity to chat with you all and looking forward next time.
...

Ryan McDonald

Responder Advocate @ FireHydrant

Ryan McDonald's LinkedIn account Ryan McDonald's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways