Video size:

0:09 Miko Pawlikowski

Hello and welcome to another episode of Conf42Cast: The Tech Podcast From A Neighboring Galaxy. My name is Miko Pawlikowski and today I'm super excited to have not one but two amazing guests. With me today are Christina Tan, the strategy at Blameless, and Emily Arnott, blog content writer at Blameless. Hello to both of you. How are you doing?

0:33 Christina Tan

Doing great.

0:33 Emily Arnott

Doing pretty well.

0:35 Christina Tan


0:36 Miko Pawlikowski

Thanks so much for coming. Thanks for finding time for this, busy schedules and all. So as you might have noticed, it's our new tradition to confuse our guests with a first question. And yours for today is, for each of you, if you could live anywhere in our Solar System at all, where would you be and why?

0:55 Emily Arnott

I'd like to live on a comet, maybe one of the comets in the Kuiper belts. I just think that'd be a lot more exciting than any planet with like a boring elliptical orbit. Instead, you get a really cool like zany, 20-year cadence fly around the Sun type thing, seems pretty cool. Seems pretty chill too. Comets are nice and cold.

1:14 Miko Pawlikowski

They are. Don't need an atmosphere?

1:18 Emily Arnott

You don't have an atmosphere. How can you have disastrous climate change? It seems pretty... It's a win-win.

1:24 Miko Pawlikowski

No atmosphere, no problem. I like that.

1:28 Christina Tan

Breathing is optional.

1:30 Miko Pawlikowski

Definitely. What about you, Christina?

1:33 Christina Tan

I want to live on Pluto. Because it's the idea of having a more objective view of the Solar System rather than be in the middle of it. There might be some interesting observations from the outside.

1:45 Miko Pawlikowski

That's fair enough. That's a very good answer. Thank you both. Alright, so hopefully, you're confused enough by now. So I wanted to talk about quite a few things. And, not the least of them, the talk you're giving Conf42 SRE, about the elephant in the blameless war room, accountability. And there's a lot of very, kind of, charged words here, "blame" and, well, I kind of helped with the "less" part of it, and accountability. So can you talk a little bit about, to begin with, what is really accountability? How you define it and why it's important to see it certain ways.

2:26 Christina Tan

Absolutely. Accountability can be defined, regardless of our definition, it can be interpreted in many ways. So, I think in a traditional sense, people, when they hear accountability, they think of holding someone accountable. It typically implies some negative punishment or outcome as a result of an event. And it's a very fear-inducing word. And it also implies specific attribution to a specific person. And, so if you're holding something or someone accountable, there's usually a person at the end receiving end of that. And that is, you know, traditionally how we think about how some leaders are, you know, interacting with the team when it comes to software reliability issues. Now, we know that many companies have advanced beyond that. But at the end of the day, wanting to blame and wanting to find a reason, to explain why something went wrong, is a very human thing to do. I think blamelessness is very much an ideal that we can all work hard to move towards, but never quite complete, much like perfection. And that's a good path to be on. So, you know, in this talk, Emily and I try really hard to define accountability in a way that is more constructive, that makes space for psychological safety. That makes it possible for things to be blameless, which is really, you know, defining it as taking the ownership to improve things moving forward. Rather than trying to change the past of an unwanted outcome.

3:59 Miko Pawlikowski

Okay. And if you can maybe give like a sneak peek? How do you achieve that? Because, like you said, it's kind of a natural tendency, you want to understand what happened, right? You want to make sure that that doesn't happen again. And there's always this little voice in your head that says, 'Okay, so whose responsibility was it to provide that', right? So can you share some pointers on how you can achieve a blameless kind of culture, I guess, is the word here.

4:28 Emily Arnott

The main thing we try to focus on is empathy. That when people have blameful behaviours, when they're looking to point fingers, when they're looking to find who's responsible and punish them, they have goals behind those things that probably overlap with anyone else's goals that are involved. And probably the stress that they're going through. There might be some shame, some frustration, a lot of fear. Those things are a common ground held by everyone too. So, from that empathetic perspective, we then kind of work through, first, what kind of perspectives do they have on the incident that leads them to making certain assumptions on how blame will affect the outcome of the incident? And then once we've kind of built up their worldview, we crossed the bridge to a more reasonable worldview, that essentially espouses systematic changes over individual changes. So, instead of firing someone, instead of blaming someone, look into why the mistake was made, look into what resources were maybe out of date or unavailable. Look into flaws in the system that can lead to misunderstandings. And if you kind of do this, and you present this as an alternative to blame, then you'll be able to alleviate their concerns and alleviate their stress without needing to resort to the blame for behavior.

5:43 Christina Tan

And very much, one of the things I loved was when Gary Pulaski, who used to run reliability at Evernote, he would say, "It's not about who tripped over the power cord. It's about asking: How come the power cord was there for somebody to trip over in the first place?". And you know, sometimes, for leaders, depending on their background and training, may not understand enough of the nuance of the system to be able to come up with solutions that have greater granularity or nuance than kind of working with just the people. So people might be the smallest kind of the unit of dial, if you will, to make change, when really there are other ways. For example, if you mix up testing and production environment, rather than find the engineer that makes that mistake, maybe you can make the distinction between the two environments a lot more obvious.

6:35 Miko Pawlikowski

Yeah. It really resonated with me when you use the word "empathy". Because I think it's very undervalued these days. And I think, partly because of how with social media it's very easy to kind of reduce a person on the other side of a Twit to this one single feature and this 140 characters because you're not facing them face to face. So do you think that is kind of also a factor of what makes it a bit more difficult for the companies who have to deal with that kind of culture, just just being replicated from the society at large?

7:14 Emily Arnott

Yeah, I think that's a really good point. That's a good observation. And as more and more we work remotely, we work through Slack, people are being reduced to textual conversation, or very purpose-driven, ad hoc zoom interactions. And it becomes difficult to remember that the person in the other position has just as full of life as you, all sorts of personal concerns, all sorts of stressors. And then it can be very isolating, you might end up thinking, 'I'm the only one that understands this properly, I'm the only one that knows how severe this is'. And then from that isolated perspective, it's very easy to start assuming the worst from people and assuming intentions, where they might be totally innocent. Where they might just be trying to solve the same problems as you. So we really do think that empathy is the foundation of all of this. That you're not going to be able to get anywhere until you understand why people are making the choices they're making.

8:10 Miko Pawlikowski

And for everybody who can't see Emily right now, she has this amazing pillow of a very large cat that immediately makes one wants to bond over that cat pillow. If it was just a Twit, you know, we wouldn't have that connection. So, just to illustrate, you know, the point that you're making, which I totally agree with. But that brings me to the question of, all of that sounds like a moral cultural kind of shift rather than technological. In which case, I would expect this to be addressed more through training and books and presentations, rather than necessarily a company. So, can you explain a little bit how Blameless as a company, and by the way the name is really cool, actually plays into that and what it offers.

9:07 Christina Tan

Yeah, happy to. So, Blameless is a company that really speed up the maturing of other companies' reliability practices. And we do this by informing the people who are responsible for reliability on how to better, more effectively interact with each other for all things reliability-related. And I think, because of that, because this is where our product is oriented. While it is about the system, ultimately, the goal is to serve the people who are using the system. The company culture and mission and the set of activities that we do are geared towards that too. So Blameless is not just a product company. Blameless is also a company consisted of everyone who's really passionate about fostering and evangelizing for blameless culture at other companies and that other pillar is very important. Because our mission is to be the most trusted company for optimizing software reliability and team resilience. We really want to bring the focus back to the socio of the socio-technical system.

10:12 Emily Arnott

So, it's like a very practical example, we, as we were saying before, it's all about looking for these other contributing factors, rather than just pointing fingers at one another. And Blameless can really easily facilitate this, because it organizes all sorts of contextual information around an incident. It helps situate the incident in the context of the impact that it has on business. So that it's so much easier to point at any cause besides an individual and make meaningful changes instead of just trying to fire someone.

10:41 Miko Pawlikowski

That makes sense. But the first thing I actually saw when I went to your website was end-to-end SRE platform.

10:48 Christina Tan


10:49 Miko Pawlikowski

And that makes me wonder, you know, what else is on offer to make it end-to-end, rather than just tools for more context, essentially. Because I think this is really what you're describing here. For more context around all of the incidents.

11:03 Christina Tan

Yeah. So, end-to-end really is the description of our approach to solving, leveling up companies' reliability practices. So, Blameless is not just about retrospectives, post mortems. It's not just about SLOs. It's not just about incident management. But really, we want to treat reliability as a team sport by bringing together reliability practices, believing that the whole is greater than the sum of its parts. So, I can talk about the parts. The parts, you know, we have a Slack bot that helps you structure the incident response process to make it more systematic. So essentially, automating some of the playbook. We also have a blameless post mortem, which we changed to retrospectives to respect kind of what's happening with a pandemic. To help, you know, teams do the retrospective in a structured way and look at consistently, what are the customers that are impacted? What are the services that are consistently impacted? And how that may affect the prioritization decisions on the leadership's part as they think about planning for headcount. And so that goes into the reliability insights, which is our data visualization piece of the product. And there's also SLOs, which really is connecting operations work to customer value and customer impact. And all these are feed into each other. These components don't work in isolation, but help boost the effectiveness of the other components into. So, we want to help companies better respond, learn and also adapt through more through enhance reliability practices.

12:42 Emily Arnott

One of the things we really have cored to our philosophy is that incidents are learning opportunities. That when something goes wrong, it isn't just a purely negative thing, but it's kind of an investment in making your system stronger going forward. So all of our tools are kind of oriented around getting the absolute most you can out of every incident.

13:00 Miko Pawlikowski

Definitely. So it makes me wonder, so who's the target audience that you offer blameless to? Is it the developers, the SREs, the leadership? Who is it really for?

13:13 Christina Tan

It's for engineers and leaders who are responsible for reliability. And also, you know, as an outcome of that, reliability is a company-level initiative. That's why, you know, when we talk about SLOs, we love talking about SLOs as a foundation of conversation between the product team, the development team and the operations team. So there are stakeholders that are users too, that will come across or benefit from the Blameless platform. But ultimately, the people that we're selling to are the leaders and engineers who are responsible for reliability.

13:48 Miko Pawlikowski

That's awesome. And since we have you, your unique perspective as a strategy person at Blameless. Can you give us a sneak peek of what's next for the company and where you're going? Where are you heading?

14:02 Christina Tan

Well, yes. We just raised $30 million for our CSB. So we're definitely in a phase of growth and hiring more great talents to realize the vision of really elevating the reliability practices. And so what that really means, I think that's why Emily and I are both so personally passionate and invested in the mission, is we want to continue to empower the people. And that's really a different approach and also strategy, compared to a lot of companies in the market today. That's very fixated on the system, on the data. But at the end of the day, it's people that are using the system and are working with the softwares. So we want to help them show the business value of their work. For example, I remember talking to the head of SRE at Uber, former head of SRE at Uber. And I asked him, "What does your career progression look like?". And he said, "It actually doesn't go much further beyond this", because he's not really set up to, for success to take any higher positions of leadership beyond head of SRE. And that's a very poignant point. And he would say that, you know, SRE is like the plumbing of the internet, when things go well, nobody notices. The second something goes wrong, all the executives are hovering. And I've seen, at other companies, where the head of infra is frankly, the bearer of bad news, 'We just had another incident, this just happened, here are the customers impacted'. That's a terrible position for executives to be in. And so, think about the power it has to give them visibility and to make it easier for reporting, to share the good news, all the improvements that are happening on operations teams. For example, look at how much faster we're responding to incidents, look how much we're learning from these post mortems or retrospective. Look at all the action items that came out of it as a result and all the changes that we've made as from these learning opportunities, like Emily says. So, that's the path that we're on that we'll continue to pursue.

16:09 Miko Pawlikowski

That makes a lot of sense. And, frankly, that's really good to hear. One of the realizations that I had, as we were speaking, was that we're throwing around a lot of SRE turns, you know, the SLO here and SLA there, and a post mortem and incident and all of that. Would you like to talk a little bit about where you see the entire field of SRE going and evolving and where you expect it to go? What direction? As a content writer, I'm pretty sure you have strong opinions about that, Emily. Can you share some of your thoughts of what grinds your gears and what you're happy is actually happening with the field as a whole?

16:56 Emily Arnott

Well, what I'm finding is that companies are always kind of following the same sorts of maturity curves. Where at first, they're maybe aware of SRE principles, but they don't really know how to access them. And then it takes quite a bit of growing from the ground. It takes a lot of champions, a lot of advocates to just start slowly implementing practice by practice, until you can get to a position where you have very sophisticated service level indicators that reflect user happiness. You have retrospectives that are totally comprehensive and serve as this wonderful learning library. And all these other really idyllic things that SREs are always preaching about. And I think what kind of bothers me is that there's a big gap there. Were if you look up SRE stuff, it's usually being written by people like Google or Netflix that have huge resources and a lot of support, and a lot of investment in these things. And they kind of assume that anyone reading it will have those resources as well. And I think that makes SRE very intimidating for people. And it takes them a while to realize that no, this is something achievable by organizations of every size and every maturity. And so I'm hoping that in the future, there's going to be a lot more content around growing into SRE, instead of just what it'll look like at the final stages, to help bridge that gap and to help SREs spread wider and wider. Because I think it's not going to get any less important, software reliability. I think COVID is an excellent example of where software and just global crises interact. Like if you are relying on an application to understand the infection rates in your area, whether it's safe to go out, whether you can go get vaccinated, these are services that we cannot have fail. Likewise, for upcoming things with like global warming or other pandemics, or political things. I feel like software is going to have more and more of an integral role in these major world events. So I think every organization needs to get on board with the idea that reliability is feature number one.

19:09 Miko Pawlikowski

Definitely. I think it really resonates with me what you said about people getting intimidated, because they're not Netflix-scale, or they're not as big or potentially as advanced. I saw that a lot with people tipping their toes into chaos engineering, and immediately thinking, 'Oh, yeah, but we don't have that scale and we don't have that much engineering'. And it was one of the biggest challenges to convince them that they can find good value even without reaching that scale, potentially, ever. But one other thing interested me, you mentioned the pandemic and I'm curious how Blameless survived and thrived during that difficult time. What were what would you say were things that made it, basically, adapt very quickly.

20:03 Emily Arnott

I think, probably our biggest strength is that we already had a very remote team. So I'm based in Canada, just near Toronto. Whereas most of the team is out in San Francisco. So saying, I think one of the best strengths of Blameless is that we've always had a very remote and distributed team. So I'm just outside of Toronto and Canada, whereas most of the team is located in San Francisco. So right from the start, we needed to have like the infrastructure to connect people remotely, not just in terms of functional work, but in terms of culture as well, that we we wanted to establish bonds, even with the remote team. Just because it helps so much with like psychological safety and establishing a blameless culture and all these other lovely things that we're always advocating. And I find just the more remote work gets, the more you have to rely on software, of course. Which again, really emphasizes how much reliable communication software and video conferencing software and the more reliable it is, the more work you can just get done.

21:06 Christina Tan

Yeah, some of the things that we've done are, include a virtual escape rooms for the entire company, where we broke the team down into teams of three or four. We actually didn't get to finish in time, because some of the puzzles were so hard. And our most dedicated engineers took an extra two to three hours after the call to get all the way to the end of the puzzle, so that they could finally escape. That was amazing. And what we've also noticed is that, just by having more interactive time during the all hands, really gets people a lot more involved and engaged, both in the chat and also on the actual call. For example, today, one of our new hires, did a self-intro and shared three fun facts, one of which included something about riding a dirt bike, and everyone talked about their biking experience, as we introduce ourselves to the new hire. So that was a really great bonding moment. And other times, we would just allocate dedicated time to have 10 minutes, we call it zoom roulette, where you're placed in a Zoom room with someone you don't know, to two to three people or you got to catch up on, you know, what did you do over the weekend? What's something you look forward to at work? It's just having to be a lot more intentional about those watercooler conversations.

22:26 Miko Pawlikowski

Love that. And I just wanted to say, I'm always surprised how many people independently came up with the Zoom roulette idea. And how many companies actually implemented that and a lot of great success. So it's interesting, you know, great minds think alike. So I've got a puzzle for you. So, if you need Blameless to implement a blameless culture, does it mean that at the beginning Blameless wasn't blameless, and then it's reinvented itself as blameless? Basically, what came first, the chicken or the egg?

23:04 Christina Tan

I think, like I said, being blameless is not a black and white dial. And it's very much a progression and can be measured in terms of maturity. And so, we always start by meeting our customers where they are. And, both Emily and I, firmly believe that you can't drive change with shame or criticism. You can't drive lasting change that way. And so, rather than even if we see signs of blame, we will try to uncover some of the underlying assumptions that lead to blame and explain the context that we see or the knowledge that we have that we've seen work at other companies as a point of reference. Rather than just going to say, 'No, no, this is the new way'. And so change takes time. And that's really, that's why this is a journey and not a series of well, it is a series of milestones, but it's not a black and white switch that we could just turn on. So, I would say it starts with leadership being open minded to explore tools that will make their engineering team more effective. And then it ends with, throughout the process, to see, 'Oh, look, we can make our team even more effective through blameless culture principles'. And we definitely, we got a lot of sales through readers of the blog posts. And that's one place where Emily does a phenomenal job really advocating for the importance of virtue of having a blameless culture.

24:32 Emily Arnott

In fact, there's one blog post I'd like to highlight. This was written quite a while ago, but it really is still relevant today, where we break down our own journey to reliability excellence. And it involves a phase of starting to eat our own dog food, practicing what we preach. Goes over both cultural things and implementing tools like SLOs. So, if you're curious about that, how a company that espouses all of these things actually implemented themselves, check it out. It's a good read.

25:05 Christina Tan

And I actually do, on behalf of the industry in this space, want to stress that you can't buy reliability and resilience from just a tool alone. It really does take leadership commitment and changes in mentality over time to drive the full transformation. And that's what Blameless as a company, on top of the product, will assist customers with. But really, tooling is one part of the change. And so we can't say, we're not the tool that will magically make SRE happen, but we will most definitely elevate the maturity of your reliability practices.

25:46 Miko Pawlikowski

Definitely. And I love the "eating your own dog food" aspects, especially from a cat person.

25:56 Emily Arnott

I always hear this term. Why isn't 'eating your own dog food'? Why can't it be something nice? 'Eating your own filet mignon'. Reliabilitie's a lot nicer than dog food, you know?

26:07 Miko Pawlikowski

Well, I've seen some pumper dogs that had a pretty good diet. Anyway, before I let you both off the hook, and it's been a pleasure, I want to squeeze one more tidbit of wisdom. And the question is, if you were to recommend one single habit or skill or methodology, or anything really, but a single one to pick up into next year, what would be your personal recommendation?

26:39 Emily Arnott

I think for me, I would advocate starting to have a mentality around documentation, retrospective recording. Again, this is something where I think there's an intimidating gap that we show, this is the ideal retrospective, and it seems like a lot of work. But if you just start making a habit of noting down, 'Okay, this is what worked when we solved this incident' and building like a run book or a guide book out of that, it can make a profound difference. And it's an excellent point to start building on from there.

27:11 Christina Tan

I would actually throw a curveball at you, Miko, and say 'persuasion'. Soft skills can be learned and developed. And SRE, by nature, is a very consultative and change-driving role. And so, it's actually critical to influence, sometimes without authority, sometimes with authority, to help others see the point of view. Because, I think within the SRE industry, there's so much great thinking and advanced, mature thinking about how to examine system failures. But I'm not sure if that's widely shared outside of the realm of SRE. And I think companies, as a whole, can really benefit from this perspective. And so I think having great, strong persuasion skills, we'll do a lot to progress to change that SREs want to see.

28:03 Miko Pawlikowski

And I think that's very well-said. Christina Tan, strategy at Blameless and Emily Arnott, blog content writer at Blameless. Thank you so much for your time. It's been an absolute blast.

28:15 Christina Tan

Thank you, Miko.

28:15 Emily Arnott

Thank you so much for having us. It was a lot of fun.

28:17 Miko Pawlikowski

We'll see you next time.

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways