-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate error reporting for failed deployments #278
Comments
The base-image-builder reports failed builds to the |
Ah nice. I wasn't in #ci-firehose previously. Looking at it now, I see it has posts from both failing and successful updates. My thinking is basically that it would help reduce the cycles we need to spend monitoring things if we can just get notifications when something goes wrong, and not have to actively check most things (including slack channels) to catch errors. If we already have a system for that, then this issue may be irrelevant. If not, maybe we can just split the notifications into two different channels, one being just for errors? Then we could enable notifications on the error channel? |
The #ci-firehose channel is very noisy, so I moved high-priority issues, such as spam build failures, to the #ocaml-org-deployer channel instead. IIRC @benmandrew did some work on splitting Slack notifications into different channels. |
That PR is here ocurrent/ocurrent-deployer#192, but I seem to remember not being able to find a way to test it very well, so it's been left as is. |
AFAIU, @mtelvers currently polls the build graph manually to monitor for failed image builds, and then triage the response.
It would save some cycles if we could automate this with A batch email or a slack message that tagged our team when builds failed, providing also a log excerpt giving us a sense for the problem.
I'm not sure if this is best done via graphana, notifications to #ocaml-org-deployer, or something else.
The text was updated successfully, but these errors were encountered: