Skip to content

Latest commit

 

History

History
55 lines (34 loc) · 2.78 KB

after_an_incident.md

File metadata and controls

55 lines (34 loc) · 2.78 KB
cover description
assets/img/covers/resolved.png
Information on what to do after a major incident. Our followup and after action review procedures.

Information on what to do after a major incident. Our follow-up and after action review procedures.

Follow-up Actions for Response Roles

In addition to any direct follow-up items generated from an incident, each of our response roles will have a few standard follow-up tasks. These are generally lightweight actions that ensure we organize information and followup with customers appropriately.

Steps for Incident Commander

  1. Update the incident in PagerDuty.

    • Group any related incidents under the primary incident.
    • Set the final severity of the incident.
    • Resolve the incident.
  2. Create the postmortem, and assign an owner to the postmortem for the incident.

  3. Send out an internal email to the relevant stakeholders explaining that we had an incident, provide a link to the postmortem.

  4. Occasionally check on the progress of the postmortem to ensure that it is completed within the desired time frame.

Steps for Deputy

There are no additional steps after an incident is resolved. However, the IC may ask for your help with their steps.

Steps for Scribe

  1. Review the chat communications and extract any relevant items from key events.

  2. Collect all TODO items and add them to the postmortem.

Steps for Subject Matter Experts

  1. Add any notes you think are relevant to the postmortem.

Steps for Customer Liaison

  1. Reply to any customer enquiries we received about the incident.

  2. Follow the postmortem progress, and update our status page with the external message once it is available.

Steps for Internal Liaison

There are no additional steps after an incident is resolved. However the IC may ask for your help with answering questions from internal stakeholders.

Reviewing the Incident

It's important that we review the incident in detail to see exactly what went wrong, why it went wrong, and what we can do to make sure it doesn't happen again. These take many names; after-action reviews, incident review, follow-up review, etc. We use the term postmortem.

You can read all about our postmortem process, which goes over this in more detail.

Reviewing the Process

As well as reviewing the incident, it's important to review our process. Did we handle the incident well, or are there things we could have done better?

This review isn't very formal yet, and typically involves a few of the Incident Commanders getting together to discuss how we might have done things differently, or if there are any tweaks we can make to our incident response process.

If you're interested in joining these meetings, just let one of the Incident Commanders know and we'll be sure to invite you.