
Postmortems at a glance
Incidents are inevitable. Repeating them is a choice.
A postmortem helps engineering teams turn failure into insight. It documents what went wrong, why it happened, and what needs to change to prevent it from happening again. But more than that, it’s a cultural tool—it tells your team, “We learn, we improve, and we don’t hide from mistakes.”
Done right, postmortems create real change—not just reports that collect dust.
A postmortem template
incident.io has a really solid breakdown on what should be included in a post mortem;
- Summary – What happened, in plain English
- Timeline – Key events in order
- Impact – Who was affected, and how
- Mitigations – What reduced the blast radius
- Contributors – Who helped, and how
- Root Cause(s) – Why it happened
- Follow-ups – What we’re doing to prevent it again
Go deep, not wide
Most teams stop at surface-level causes: a bad deploy, a flaky test, someone clicked the wrong thing. But those aren’t root causes—they’re symptoms.
Use the 5 Whys approach to dig deeper. For example:
A bug was deployed → It wasn’t caught in review → The reviewer didn’t understand the change → There’s no documentation for this part of the system → Ownership isn’t clear.
The deeper you go, the more systemic and actionable your takeaways will be.
In practice
After an incident has been closed, putting in a meeting with the key people that were involved in the incident should methodically go through this list and ensure everyone is aligned on what the state of play is regarding what happened.
Write your postmortem so both engineers and stakeholders can understand what happened. Append technical deep-dives, but keep the core summary accessible.
It's worth noting that depending on the size of your org, you may not need to go this deep on documentation, a simple sense check could suffice to ensure you're not adding undue burden on operations.
Working in the open
Postmortems aren't just documents, they're signals. Writing them well, sharing them openly, and reviewing them together sends a clear message: we don’t blame, we learn. Keeping the communications open and at least internally accessile to all (within reason) is a good practice and encourages people to not shy away from raising the alarm when things go wrong.
Equally, once your postmortem has been written up, consider running an explicit synchronous meeting presenting the findings if the severity of the incident is high enough, this allows for engineers to learn from others mistakes and ensure the blameless culture continues within your organisation.
Luke Curtis
Engineering Leader with over 10 years of experience in building and leading high-performing teams. Passionate about transforming organizations through technical excellence and empowered engineering cultures.