cover | description |
---|---|
assets/img/covers/resolved.png |
Information on what to do after a major incident. Our followup and after action review procedures. |
Information on what to do after a major incident. Our followup and after action review procedures.
In addition to any direct followup items generated from an incident, each of our response roles will have a few standard followup tasks. These are generally lightweight actions that ensure we organize information and followup with customers appropriately.
-
Update the incident in PagerDuty.
- Group any related incidents under the primary incident.
- Set the final severity of the incident.
- Resolve the incident.
-
Create the post-mortem, and assign an owner to the post-mortem for the incident.
-
Send out an internal email to the relevant stakeholders explaining that we had an incident, provide a link to the post-mortem.
-
Occasionally check on the progress of the post-mortem to ensure that it is completed within the desired time frame.
There are no additional steps after an incident is resolved. However the IC may ask for your help with their steps.
-
Review the chat communications and extract any relevant items from key events.
-
Collect all
TODO
items and add them to the post-mortem.
- Add any notes you think are relevant to the post-mortem.
-
Reply to any customer enquiries we received about the incident.
-
Follow the post-mortem progress, and update our status page with the external message once it is available.
There are no additional steps after an incident is resolved. However the IC may ask for your help with answering questions from internal stakeholders.
It's important that we review the incident in detail to see exactly what went wrong, why it went wrong, and what we can do to make sure it doesn't happen again. These take many names; after-action reviews, incident review, followup review, etc. We use the term post-mortem.
You can read all about our post-mortem process, which goes over this in more detail.
As well as reviewing the incident, it's important to review our process. Did we handle the incident well, or are there things we could have done better?
This review isn't very formal yet, and typically involves a few of the incident commanders getting together to discuss how we might have done things differently, or if there are any tweaks we can make to our incident response process.
If you're interested in joining these meetings, just let one of the incident commanders know and we'll be sure to invite you.