- PagerDuty webhook receives CHGM alert from Dead Man's Snitch.
- CAD Tekton pipeline is triggered via PagerDuty sending a webhook to Tekton EventListener.
- Logs into AWS account of cluster and checks for stopped/terminated instances.
- If unable to access AWS account, posts "cluster credentials are missing" limited support reason.
- If stopped/terminated instances are found, pulls AWS CloudTrail events for those instances.
- If no stopped/terminated instances are found, escalates to SRE for further investigation.
- If the user of the event is:
- Authorized (SRE or OSD managed), runs the network verifier and escalates the alert to SRE for futher investigation.
- Note: Authorized users have prefix RH-SRE, osdManagedAdmin, or have the ManagedOpenShift-Installer-Role.
- Not authorized (not SRE or OSD managed), posts the appropriate limited support reason and silences the alert.
- Authorized (SRE or OSD managed), runs the network verifier and escalates the alert to SRE for futher investigation.
- Adds notes with investigation details to the PagerDuty alert.