How your engineering team can get more from incident reviews

You can’t build software without encountering incidents – from critical bugs to full-blown outages, dealing with incidents are an inevitable part of the process.

As a result, you’ll find no shortage of articles telling you how to write a review – or as they’re commonly know...

  • Do we know what happened?
  • Do we feel confident about how we detected this incident?
  • Was it easy and straightforward to mitigate, rather than difficult and slow?
  • Do we understand the takeaways, aka the lessons learned?
  • Do we feel confident that we can prevent t...

Another problem with the format was that it didn’t create natural openings in conversation for other engineers to jump in. By focusing on a single event with set questions, we often overlooked commonalities and trends across incidents. This made it difficult for other engineers to understand what...

We decided to start experimenting.

  • We looked at the main improvements we wanted to see:
  • We wanted everyone to feel truly safe
  • We wanted to open up the conversation, encouraging open dialogue and collective learning
  • We wanted to observe patterns and move beyond i...

Facilitation is a learned and practised skill – hard to perfect but hugely rewarding when done right.

The three desired outcomes as goals are:

  • Time flies
  • Everyone stays engaged
  • Everyone grows, even the facilitator

We set a goal to generate at least four strong talking points: two based on the particular incident under review and two that addressed common themes. At the start of every meeting we would launch a poll with each of our talking points and ask participants to vote for the topic they’d most like t...

The question and answer format was not working, that was clear. So how could we guide constructive conversations that brought about real learning? It has always been team practice to assess submitted reports before each review, to make sure they meet our standards for a good report. We decided to...

To make sure these techniques were working we started sending automated surveys to our incident review Slack channel after each review. The poll asked participants to anonymously agree or disagree with the following statements:

  • This meeting was a good use of my time
  • I think we ...

The progress we’ve seen to date has been really promising. We still have our outstanding problems, like how to tie these learnings into actionable items on our roadmap, or how to keep attendance strong in particularly busy weeks. But the aim of our program is to build an honest, accountable, and ...

