-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate alerts #11
Comments
Ah, yea, that makes sense. No one wants duplicated alerts. |
EDIT: Actually, a better way to do this is with Event Flood Control. I've updated the below to use that method instead. I can work on re-releasing the workflow, but that would require you to re-upload the zip file, and then reconfigure your alertmanager urls. If you'd like to add this to your existing workflow:
.Option 2 |
I don't think those options actually stop the duplicates. Let's say you have |
The event can be configured to live for up to 24 hours. So the question is how long until it would be considered a new alert? I don't think it makes sense to deduplicate forever, so there must be a timeframe limit. I don't think the active/closed status of the event is considered in the flood control, so that is irrelevant. Let's walk through an example. Assume your 12:00 - Prometheus triggers the xMatters integration and flood control allows it. What would you like to see happen? |
So, these are the cases I'm thinking of:
|
Ah, that's helpful context. Is this where the silences could be helpful? You can reply with a silence and we write the silence back to Prometheus. The duration of the silence might be something to work out, or we provide multiple options such as Silence for 30 minutes, Silence for 2 days, etc. |
I think the Is there a way to check if there is an active alert with the same group key? If there is, can you extend the alert that already exists? The workflow above would work like this:
|
Yes, the Get Events step hooked up in my screen shot above will do this. You would pass the
Aside from our newly launched incidents, run time objects in xMatters are "events" and indicate something changed which is why we don't actually term them alerts. With that all said, how does this sound?
Is there any particular reason you want to keep something open in xMatters? Do you have additional reporting in xMatters that you can't get from Prometheus? I mentioned the new incidents, and I'm almost wondering if creating an incident might solve all of this. I haven't played with this much, but it might work something like: (This is the initial launch, and there are a couple of missing pieces that we'd need to flesh this out fully.)
You'd end up with two incidents, as there were two alerts which makes sense to me because you'd want to track any downtime. |
it was mainly for a one to one mapping The incidents workflow seems more like what I'm looking for! |
And yes, a resolved notification would be nice (but it should not be something people have to acknowledge). |
Ok, this is helpful info. We just launched incidents, and there are a couple of features we need to finish development on before we can build this kind of integration between Prometheus and xMatters Incidents. For now I'd say use the Event Flood Control or the Option 2 I listed above. I'll keep this issue open and when we revisit this integration in the coming months we'll see about adding the incidents. |
Hey,
Currently, each alert (or set of alerts) creates a new notification. It would be nice if the duplicates don't create a new alert every time. Maybe a UUID hash of the
groupKey
?The text was updated successfully, but these errors were encountered: