Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate error reporting for failed deployments #278

Open
shonfeder opened this issue May 3, 2024 · 4 comments
Open

Automate error reporting for failed deployments #278

shonfeder opened this issue May 3, 2024 · 4 comments

Comments

@shonfeder
Copy link
Contributor

AFAIU, @mtelvers currently polls the build graph manually to monitor for failed image builds, and then triage the response.

It would save some cycles if we could automate this with A batch email or a slack message that tagged our team when builds failed, providing also a log excerpt giving us a sense for the problem.

I'm not sure if this is best done via graphana, notifications to #ocaml-org-deployer, or something else.

@mtelvers
Copy link
Member

mtelvers commented May 4, 2024

The base-image-builder reports failed builds to the #ci-firehose channel. However, as you say, I manually poll the graph.

@shonfeder
Copy link
Contributor Author

Ah nice. I wasn't in #ci-firehose previously. Looking at it now, I see it has posts from both failing and successful updates. My thinking is basically that it would help reduce the cycles we need to spend monitoring things if we can just get notifications when something goes wrong, and not have to actively check most things (including slack channels) to catch errors. If we already have a system for that, then this issue may be irrelevant. If not, maybe we can just split the notifications into two different channels, one being just for errors? Then we could enable notifications on the error channel?

@mtelvers
Copy link
Member

mtelvers commented May 7, 2024

The #ci-firehose channel is very noisy, so I moved high-priority issues, such as spam build failures, to the #ocaml-org-deployer channel instead. IIRC @benmandrew did some work on splitting Slack notifications into different channels.

@benmandrew
Copy link
Contributor

That PR is here ocurrent/ocurrent-deployer#192, but I seem to remember not being able to find a way to test it very well, so it's been left as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants