
Why it matters
Incidents are inevitable. How your team responds when systems break is a true test of your engineering maturity, not just technically, but operationally and culturally.
Strong incident management isn't about avoiding failure; it's about building muscle memory for fast, clear, and coordinated responses when things go wrong.
How to get started
When it comes to how to implement something like this at your company, I'm not going to sugarcoat this, incident.io has without a shadow of a doubt, the best incident management tools in the industry. I would advise simply using them to start and following best practices.
It's clear from the get go how to integrate incident IO into your workflows and internal tools to enable your incident response times and management to go beyond panicked slack threads.
From dedicated channels to managed on call rotas, it has everything you could need to ensure you're ready for when an incident occurs.
What great looks like
Once setup here, there are a few things I've witnessed that make for really good incident management practices.
Assign a lead immediately: Clarity matters in chaos. The on-call engineer (or their lead) should step in as incident commander.
Communicate methodically: Post regular summaries in the incident channel. Keep both engineers and stakeholders aligned.
Jump on a call: Default to real-time comms. It speeds up resolution, builds shared understanding, and creates a learning moment for newer engineers.
Automate detection + incident creation: If anomaly detection picks up on issues, create incidents automatically. This gets you to resolution faster, and more transparently.
Dig deep on root causes: Use the 5 Whys to find the true root cause. Bad code is rarely the real issue. Ask why it made it to prod. Missing tests? Poor review practices? Inadequate staging?
How you prepare for incidents is a testament to your engineering excellence at a time when it matters the most, don't skip building this muscle.
In my next post I'm going to go one level deeper on postmortems and how you can get the most out of them.
Luke Curtis
Engineering Leader with over 10 years of experience in building and leading high-performing teams. Passionate about transforming organizations through technical excellence and empowered engineering cultures.