Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey there.
I'm Paige Cruz, a former SRE, current open source observability advocate at
K Chronosphere and Forever Bravo Holic.
Now, today I'm sharing how the Real Housewives have taught me valuable
lessons about running effective postmortems, and even understanding
the nature of reality itself.
Let's dive in.
For those unfamiliar the Real Housewives follows groups of
affluent women in different cities as they navigate relationships,
businesses, and social lives.
Now what makes Reality TV fascinating to me is how it reflects the multiple
layers in perspectives on reality from the housewives themselves to the
producers, and even us as viewers at home.
My central argument today is that the power of an effective postmortem, like
a productive Housewives reunion isn't about establishing the objective truth.
It's about engaging with multiple, valid, incomplete, and yeah, sometimes
competing perspectives to uncover deeper insights about our system,
our responses, and ourselves.
Now, there are plenty of ways to adapt traditional postmortem
templates and meetings approaches accordingly, which I'll be sharing.
But really the genesis of this talk really came to mind while I was
rewatching Beverly Hills season one.
There was a typical she said, situation between Kyle and Camille.
Kyle claims.
She simply asked, why would they film you solo in Hawaii?
Camille claims that Kyle asked.
This question instead, why would they be interested in filming you without Kelsey?
Kelsey Grammar her celebrity husband.
Both women remained convinced of their interpretations as the truth
of what happened, and each filtered the conversation through their own
context and, mental model and biases.
The kicker, this conversation happened off camera, off mic, so
there was no objective evidence.
And fans often try to decide who was right.
Was Camille Insecure about being in her celebrity husband, shadow,
and thus Prime to hear dismissal?
Was Kyle familiar with production logistics, having grown up on Hollywood
sets and genuinely was curious about practical filming concerns?
What made this conflict illuminating, both during the season and at
the reunion where they rehashed.
This was how it revealed the underlying dynamics and motivations.
Camille was feeling vulnerable as her marriage was ending, and Kyle is
always concerned about her position of security within the friend group, and
it was really seeing how their different backgrounds and perspectives created
this perfect communication storm.
And those insights were only able to emerge by giving both parties space to
share their perspectives rather than crowning one version as the truth.
So reunions every season ends with one, and it's where all of the
cast reconvenes to rehash the major events of the season and address
simmering, unresolved conflicts.
And here again, multiple realities exist simultaneously.
Viewers at home, we see this polished production with a narrative arc, beginning
with pleasantries diving into the drama cast member by cast member with carefully
placed commercial breaks at peak tension moments ending with everybody coming
together, even if they're still holding onto grudges, but clinking their glasses
and raising a toast to next season.
This is a view that's a little bit closer to what the Housewives
see during the reunion.
They're split pitted against each other across two couches.
They're tired from having woken up at the crack of dawn to get to set,
get into hair, makeup and styling, sitting for hours under hot lights
with a whole crew directing their attention, focus, and energy to.
Water's gonna be a fairly dramatic rehashing of events.
They're sitting there all knowing that the next season's contract and continuing
to keep their jobs and cash their paycheck depends on making a splash.
This mirrors our technical postmortems.
They're the engineers who are on call remembering the stress
and pressure of troubleshooting.
There are engineers who weren't involved at all and perhaps only
see the final post-mortem report.
There are managers focusing on impact team dynamics and velocity, and as
always, there are executives who sometimes misguidedly seek assurance
that this can never happen again.
Despite sharing the same incident, each person experiences and prioritizes
different aspects of the response based on their roles, focus and responsibilities.
Even our objective evidence, telemetry data, monitors, dashboards, runbooks,
have already been filtered by what we collectively decided was important enough
to track, instrument or write down.
Recognizing the validity of these multiple perspectives is going to
create the foundation for open sharing, empathy building, and productive
learning, which is exactly what we want the outcomes of our postmortem to be.
So pour yourself, whatever beverage helps you process this new paradigm.
And let's dive into seven lessons from The Real Housewives universe
that can transform your postmortems from reliability theater into
valuable learning opportunities.
Lesson one, bring receipts.
When a housewife makes a bold claim at a reunion or during the season, castmates
immediately demand, show me the receipts.
That's evidence like timestamps with texts, screenshots, or rolling back the
tape to reveal exactly who threw the wine on whom and at what point in the evening.
I have your perfect formula, receipts, proof, timeline, screenshot.
Everything in that clip, Heather was accusing a fellow castmate
of being behind an Instagram account that had been trolling
and harassing the women for years.
Her revelation landed precisely because her evidence was irrefutable.
In SRE, our receipts are more like monitors, telemetry data
like traces, logs, or metrics.
Errors, dashboards, maybe the series of updates posted to the status page,
all of which are crucial pieces for reconstructing an incident timeline.
But here's where we should diverge from the Housewives.
Our evidence shouldn't be kept secret for a dramatic reveal
or used for a gotcha moment.
I believe that our incident reports should be a portal allowing anyone
to explore the incident and system interactions on their own term and time.
Imagine a future engineer who's really hungry and curious to learn
more about the system, who stumbles across your incident writeup.
How far could they explore and engage with the data that you have presented?
I'd say if you're providing a screenshot to a dashboard.
Go ahead and provide a link that's pinned to the exact time
range for that incident duration.
Allow people to see data.
Get curious and then follow the breadcrumbs.
It's really powerful to see something interesting, be able to go review
the data, inspect queries, maybe tinker with aggregations or time
windows, and really engage with this material in a hands-on way.
My other pro tip is to add enough context and details so that engineers
who are outside of your team or domain area of expertise are able to
understand the gist of what happened.
This is all about making your receipts accessible, not weaponized.
Lesson two.
It's not just the facts that matter, but our feelings as well.
I don't think that he's good for you.
How do you know what's good for me?
That's my opinion.
Meet Tamara, judge of Real Housewives of Orange County.
Dramatic yes, but also deeply human.
This wasn't a petty argument over shoes or missing party invitation.
She was sharing her real feelings and opinions about the man her friend
Vicki was dating who'd embroiled Vicki.
Among other things, a very long con fabricating a cancer diagnosis just
as emotions run high at reunions.
They also shape our incident coordination.
I. Response and reflection feelings during an incident can span a wide
spectrum from despair to vigilance to shock at discovering that deprecated
API isn't actually deprecated to pride.
Watching your team confidently dive in, investigate, and collaborate.
And when reflecting on incidents, we often focus on the negative, the
stress, fear, anxiety, and exhaustion.
This negativity bias causes our brains to hold tighter onto those bad experiences
more strongly than the positive ones.
And while we shouldn't invalidate these challenging emotions, postmortem
reports, in my opinion and experience typically overemphasize them.
Three ways that you can counter this negativity bias is asking what
prevented this situation from worsening?
What aspects of our response are you most proud of?
And where did we get lucky?
These questions reveal strengths that we can deliberately amplify, not just
highlight weaknesses and deficiencies that need to be addressed through
a laundry list of action items.
Lesson three, facilitation is a skill.
Let's roll the tape.
During your fight with Carol, you got in a jab saying, at least I'm not
50 years old, but you stopped there.
What was the rest of your insult before you cut yourself off?
That's Andy Cohen, host and executive producer of The Real Housewives.
What you just witnessed isn't facilitation.
It's provocation designed to generate viral moments at the expense of
participants comfort and dignity.
This approach creates the exact opposite of psychological safety.
I. It's an environment where vulnerability is weaponized.
Mistakes are spotlighted for entertainment, and participants arrive
on the defense and armored for attack.
In our world, this is a bit of an anti-lesson because for technical
postmortems, psychological safety is the foundation of upon which
all of our learning is built.
It does mean that we need to create conditions where people can speak.
Honestly about what they saw, what they did, and what they
thought during an incident without fear of career consequences,
punishment, or public humiliation.
When this type of psychological safety exists, people shift from defending
their actions to examining them objectively, alongside others, and
this transformation isn't just nice.
It's necessary for meaningful learning to occur, and you can cultivate this.
By modeling curiosity rather than judgment when asking questions
explicitly stating that the goal is to learn not to blame, and acknowledging
your own mistakes and limitations.
Balancing airtime is another crucial aspect of facilitation.
This chart comes from an account called the Bravo Analyst that
tracks the screen time each cast member gets throughout the season.
What makes this data so fascinating to me is that all of the women are filming
roughly the same amount of time and hours, but as we can see, some dominate
the final edit and others barely appear.
This reveals the invisible hand of production, the deliberate choices made
behind the scenes about whose stories get amplified, whose conflicts drive
the narrative, and whose perspectives are going to shape the viewer's
ultimate understanding of events in our postmortem, the facilitator plays
this producer role and without careful attention to balancing airtime, the
loudest voices, often the ones with the most seniority or sway can dominate.
This means different perspectives get filtered out, not because
they lack value, but because they don't have space to be expressed.
Notice how Andy's question, what was the rest of your insult?
Immediately would put anybody on the defensive.
It frames the housewife as the aggressor and invites judgment
rather than understanding.
In contrast, effective postmortem questions focus on
the how rather than the why.
Why questions often come across as accusatory and trigger defensiveness.
Why didn't you check the logs?
Implies a failure or deficiency?
How?
Questions, however, invite, description and exploration.
How did you approach troubleshooting at that point?
Which open space for the responder to share their context and thought process?
And I'll just add, facilitation is absolutely a skill that can be learned
and honed through practice in feedback.
If you're looking to develop this skill in yourself, I highly recommend
the debriefing facilitation guide from Etsy circa 2016, but totally evergreen.
And lastly, as a facilitator.
Your mindset and energy sets the tone for the entire meeting and the
vibe of the room, and it doesn't have to be stuffy or sterile.
At SR Econ Katie Wilde from sny shared that she begins every incident review by
playing a specially selected song that she thinks relates or ties to the outage.
This change transformed postmortems into anticipated
meetings, creating a buzz around.
What song was Katie gonna pick and play?
I think this is a fabulous way of getting people primed, open, feeling
creative, and ready to share.
If you need musical inspiration from the Housewives universe, an incident
relating to a cloud spend surprise, try.
It's expensive to be me by Erica Jane.
If you've got a security breach, insecure by Candace, we'll have your back.
And delayed logs causing a log jam.
Don't be tardy for the party by Kim Z. This brings us to lesson four,
the meeting before the meeting.
Part of SRE involves influencing without authority, and I've interpreted
this mostly as strategic networking because while formal postmortem
reports and meetings are essential.
Significant value can emerge from conversations outside
those official channels.
I haven't heard from her in a couple of months.
She sends me a text yesterday basically trying to silence me.
It was so manipulative.
It was so calculated here.
Kyle was attempting to control the narrative through backroom deals and
secret packs, which fans refer to as self-producing or in the corporate world.
We might call the meeting before the meeting.
The other thing I noticed in that is Dorit splashy dramatic reveal was
likely egged on by her own producer who was incentivized to get high ratings.
And producers again remain largely invisible despite.
Wielding enormous influence.
They're the ones asking leaning questions and confessionals.
They create situations designed to trigger conflict and ultimately are
the ones who decide what narrative threads to develop in air and what
gets left on the cutting room floor.
But an SREI do think we could put on that producer hat and wield narrative
shaping power for positive outcomes.
Amy.
Toby introduced me to the concept of one-on-one debriefs in her brilliant
talk at SR Econ, one-on-one SRE.
Unlike Kyle's manipulative texting these one-on-one debriefs strengthen
working relationships and deepen collective understanding where Kyle
sought two silence perspectives.
These one-on-ones deliberately create space for voices that
might otherwise remain unheard.
Amy's approach includes some powerful questions like.
What surprised you most about this incident?
Were you able to access the support you needed during the response?
Did our tools and documentation serve you effectively?
And my favorite?
Did you practice self-care before and during the incident?
This question rarely appears in standard postmortem templates, which
is wild because Responder Wellbeing fundamentally affects everything from
incident duration to the quality of the long-term solutions proposed.
This is just a handful of the recommended questions from Amy, the
ones that resonated most with me.
I really encourage you to watch the full talk or leaf through her slides.
For the full list.
I tried this one-on-one debrief approach.
After an incident we affectionately called Kaf Apocalypse, which was major multi-hour
incident resulting in data loss.
By consciously producing these pre-meeting conversations and weaving together
the resulting insights resulted in uncovering layers of understanding
that just wouldn't have surfaced from our typical group setting.
Single hour postmortem meeting.
What was most important to me was that the developers who experienced
acute stress felt heard, and that their experience was meaningfully
acknowledged and received by the org.
This brings us to lesson three, let the Mouse Go.
Which became a memorable legendary housewife's moment from Sutton.
Streck said it five years ago when I met her.
Let the mouse go.
Let the mouse go.
Seriously.
The backstory is that another castmate had invited everyone to a party and said there
would be a special guest in attendance.
Somebody.
They all knew Sutton.
Anxiety spiraling had somehow convinced herself this special guest
was somebody she actively was having a business conflict and beef with.
It wasn't, and the whole thing could have been forgotten except Dorit
kept bringing it up, exaggerating Sutton's reactions and remarks at
every opportunity culminating in this explosive blowup at an all cast dinner.
The takeaway here is don't be the dori.
Know when to let the mouse go.
What do I mean by that?
We all have our reliability soap boxes or longstanding systemic issued.
We'd like to see addressed and have been bemoaning for years.
And if this is you, it is really tempting.
I will admit.
It is so tempting to use these incidents and impacts as validation
that you are right all along.
And dear God, could somebody just listen to you, give you
a roadmap and fund a team?
Let's be real.
Unlike the infamous psychic Allison Deis from the dinner party from hell, you
aren't clairvoyant and no, you couldn't predict exactly when and how an incident
could unfold or what the impact would be.
If you could, you would be called AIOps and at least triple
your current take home pay.
So remember that these I told you so moments actively undermine
our collective learning goals.
And they shift the focus from understanding what happened
to assigning credit and blame.
I'll note this isn't about suppressing legitimate concerns.
It's about strategic focus and supporting the goals of the postmortem at hand.
Lesson six time box, it.
If you've ever suffered through a three Part Real Housewives reunion filled
with endless flashbacks and circular arguments, you understand the pain of a
meeting that doesn't know when to end.
Let's see how co-facilitator Andy handles the long day shooting at a reunion.
I've had an incredibly difficult last two years.
Don't you dare.
Don't you dare.
I'm sorry I didn't.
Don't you dare.
Okay.
I like being able to have that banter with my friends.
Am I boring you?
I didn't have the desire.
Am I boring you?
No, I'm sorry.
I swallowing you.
Are you me?
Are you kidding?
I'm sorry again.
I'm really two.
Sorry.
I'm like, sorry.
Let me pull it back so I can not be long winded.
Andy says, we're well past the point of you being long-winded.
Post-mortem meetings like reunions need clear time boundaries,
and this isn't revolutionary.
It's basic meeting courtesy.
Deeper discussions emerge that require more time than you have allocated.
Go ahead and schedule those separately, but don't let them derail your
primary objectives as a participant.
If you know that you tend towards veracity, gather your thoughts ahead
of time, and when you're speaking, keep an eye on the clock to note how
long you have been chattering away for.
Or perhaps challenge yourself to listen first and only speak
after multiple people have shared.
On the flip side, if you're the facilitator, you should probably
have a sense of who might send things off track and prepare strategies
to reroute the conversation.
Respect everyone's time by staying focused on your learning goals and on time.
No mega multi-hour marathon meetings, please.
We do not need a three part postmortem.
Finally, lesson seven.
Most important of all.
Avoiding the blame game.
The way we talk about blame, at least within tech and reliability, has evolved
across my career span from blame as the default, which unfortunately is where
many organizations are stuck today to blamelessness, which I think also
somehow lost its way after becoming buzz worded, leading us to today's
measured blame aware approach, at least.
That's what we're using the last time I checked the blogs and I'm happy
to give you an apology, Meredith.
Here's a few things that I'd like an apology for.
Number one,
gotta love SLC.
In that clip, Angie wasn't willing to hold herself accountable until
Meredith apologized for the very long list of grievances on that prop scroll.
Angie clearly wasn't seeking understanding or forward movement.
She was casting blame and withholding accountability.
Blame seeks to find the person or people or team responsible
for causing the incident.
It creates a retributive culture focused on punishment, which erodes
trust, harms relationship, and importantly prevents learning.
It also increases risk.
Secrecy.
A new word to me which I've read, is the tendency to hide mistakes
or information that might get those fingers pointed at you.
The alternative is a restorative culture founded on trust,
learning, and accountability.
It seeks to understand what not, who was responsible for an incident.
Here's the truth.
People don't come to work intending to cause incidents.
Developers don't wake up and say, what level of sev should I cause today?
And similarly in the housewives world, castmates don't join franchises planning
to betray decades long friendships.
People make decisions that make sense given their understanding,
available information and pressures that they're under.
Recognition of this shared humanity is where our valuable learning can begin.
Here's a particular passage from Sydney Decker's.
Stop blaming a passage that I clearly vibed with on my first read.
People didn't make bad choices.
They probably had bad choices, and your org has a gigantic role in having handed
them out based on my emphatic highlights.
I wonder if you can guess which type of culture I was working in at the time.
This book, stop Blaming outlines how Blameful cultures harm us, and
importantly provides a roadmap for building a restorative culture.
I highly recommend you check it out, spin up a book club at your company, and invite
the executives to read this one with you.
Now, these seven lessons from the Fabulous Ladies of the Real
Housewives can transform how we in SRE approach postmortems.
Both of our worlds involve humans trying to construct meaning from complex,
evolving, and emotionally charged situations with limited information.
And I'd be remiss not to mention that our worlds do differ in a couple key ways.
First, Housewives are incentivized for drama and viral moments.
Their paychecks literally depend on it.
We, thankfully, are not rewarded for table flips or throwing drinks, but instead,
success metrics around system stability.
Learning and continuous improvement.
And the second difference is obviously the fashion.
Now, I'm not gonna co-sign all of these outfits, but you
cannot deny they've got style.
I have yet to walk into an incident postmortem where folks showed up
in color coordinating floor length gowns or sky high Louboutins.
I'm not opposed to it.
Maybe this is another lesson we could take from the ladies, but beneath these
surface differences lies a profound truth.
In both reality TV and technical incidents, there is not a
single objective reality.
There are only perspectives in complete, biased and uniquely valuable.
The most powerful insights and learnings emerge when we listen deeply to each
other's experiences, and when we approach complexity with humility and curiosity.
Next time you're in a postmortem, remember to bring the receipts, honor
those emotions, facilitate with care.
Try out those incident one-on-one debriefs.
Know when to let the mouse go.
Respect time boundaries, and resist the blame game.
If this type of practical and tactical information on making operations
on call and incident response more effective and sustainable for us humans.
If this type of content is your jam, check out my podcast off Call at off
call dot simple cast.com for more.
Season two is dropping very soon.
Thank you.