You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
notification-controller sets timestamp label on outgoing alerts, preventing Alertmanager from recognizing subsequent alerts with the same label set as a single alert (Alertmanager considers newly-posted alerts with an existing label set to be the same updated alert). This means Alertmanager will dispatch multiple alerts for what is essentially a single outage (e.g. failing GitRepository with a short reconciliation interval, e.g. 1m, which is the default for flux bootstrap-generated GitRepository)
This also requires us to add alert receivers with send_resolved: false separate from already-configured Prometheus receivers, as Flux alerts would be expiring at the default Alertmanager resolve_timeout of 5m, with Flux posting new ones over time.
Alertmanager (as of v0.27.0) seems to lack the ability to drop incoming alert labels.
Proposal
While group_by combined with a long enough group_interval might make subsequent alerts less annoying, it still means we end up with multiple outgoing Alertmanager alerts in a single message, possibly grouped with the templates.
I propose removing this label altogether. Alertmanager already provides us with .StartsAt / .EndsAt, posting a new alert would just set the new "startsAt". This would technically be a breaking change, as existing users might rely on this label in their alert templates.
I propose removing this label altogether. Alertmanager already provides us with .StartsAt / .EndsAt, posting a new alert would just set the new "startsAt". This would technically be a breaking change, as existing users might rely on this label in their alert templates.
I agree that for Alertmanager removing the timestamp would be better as it's recored in startsAt. We can ship this breaking change in the next minor release Flux v2.3 if you can open a PR.
Description
notification-controller sets
timestamp
label on outgoing alerts, preventing Alertmanager from recognizing subsequent alerts with the same label set as a single alert (Alertmanager considers newly-posted alerts with an existing label set to be the same updated alert). This means Alertmanager will dispatch multiple alerts for what is essentially a single outage (e.g. failingGitRepository
with a short reconciliation interval, e.g.1m
, which is the default forflux bootstrap
-generatedGitRepository
)This also requires us to add alert receivers with
send_resolved: false
separate from already-configured Prometheus receivers, as Flux alerts would be expiring at the default Alertmanagerresolve_timeout
of5m
, with Flux posting new ones over time.Alertmanager (as of v0.27.0) seems to lack the ability to drop incoming alert labels.
Proposal
While
group_by
combined with a long enoughgroup_interval
might make subsequent alerts less annoying, it still means we end up with multiple outgoing Alertmanager alerts in a single message, possibly grouped with the templates.I propose removing this label altogether. Alertmanager already provides us with
.StartsAt
/.EndsAt
, posting a new alert would just set the new"startsAt"
. This would technically be a breaking change, as existing users might rely on this label in their alert templates.I'm willing to work on a PR for this if needed.
System details
Flux v2.2.3, notification-controller v1.2.4, Alertmanager v0.27.0
The text was updated successfully, but these errors were encountered: