Header image

Postmortems at a glance

Incidents are inevitable. Repeating them is a choice.

A postmortem helps engineering teams turn failure into insight. It documents what went wrong, why it happened, and what needs to change to prevent it from happening again. But more than that, it’s a cultural tool—it tells your team, “We learn, we improve, and we don’t hide from mistakes.”

Done right, postmortems create real change—not just reports that collect dust.

A postmortem template

incident.io has a really solid breakdown on what should be included in a post mortem;

Summary – What happened, in plain English
Timeline – Key events in order
Impact – Who was affected, and how
Mitigations – What reduced the blast radius
Contributors – Who helped, and how
Root Cause(s) – Why it happened
Follow-ups – What we’re doing to prevent it again

Go deep, not wide

Most teams stop at surface-level causes: a bad deploy, a flaky test, someone clicked the wrong thing. But those aren’t root causes—they’re symptoms.

Use the 5 Whys approach to dig deeper. For example:

A bug was deployed → It wasn’t caught in review → The reviewer didn’t understand the change → There’s no documentation for this part of the system → Ownership isn’t clear.

The deeper you go, the more systemic and actionable your takeaways will be.

In practice

After an incident has been closed, putting in a meeting with the key people that were involved in the incident should methodically go through this list and ensure everyone is aligned on what the state of play is regarding what happened.

Write your postmortem so both engineers and stakeholders can understand what happened. Append technical deep-dives, but keep the core summary accessible.

It's worth noting that depending on the size of your org, you may not need to go this deep on documentation, a simple sense check could suffice to ensure you're not adding undue burden on operations.

Working in the open

Postmortems aren't just documents, they're signals. Writing them well, sharing them openly, and reviewing them together sends a clear message: we don’t blame, we learn. Keeping the communications open and at least internally accessile to all (within reason) is a good practice and encourages people to not shy away from raising the alarm when things go wrong.

Equally, once your postmortem has been written up, consider running an explicit synchronous meeting presenting the findings if the severity of the incident is high enough, this allows for engineers to learn from others mistakes and ensure the blameless culture continues within your organisation.

🧑‍🚀 Operational Readiness & Resilience III: Postmortems

Postmortems at a glance

A postmortem template

Go deep, not wide

In practice

Working in the open

Luke Curtis

Stay Updated