Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts sent to Alertmanager contain timestamps as labels, preventing alert grouping #793

Closed
tdemin opened this issue Apr 12, 2024 · 2 comments · Fixed by #795
Closed

Alerts sent to Alertmanager contain timestamps as labels, preventing alert grouping #793

tdemin opened this issue Apr 12, 2024 · 2 comments · Fixed by #795

Comments

@tdemin
Copy link
Contributor

tdemin commented Apr 12, 2024

Description

notification-controller sets timestamp label on outgoing alerts, preventing Alertmanager from recognizing subsequent alerts with the same label set as a single alert (Alertmanager considers newly-posted alerts with an existing label set to be the same updated alert). This means Alertmanager will dispatch multiple alerts for what is essentially a single outage (e.g. failing GitRepository with a short reconciliation interval, e.g. 1m, which is the default for flux bootstrap-generated GitRepository)

This also requires us to add alert receivers with send_resolved: false separate from already-configured Prometheus receivers, as Flux alerts would be expiring at the default Alertmanager resolve_timeout of 5m, with Flux posting new ones over time.

Alertmanager (as of v0.27.0) seems to lack the ability to drop incoming alert labels.

Multiple alerts in Alertmanager

Proposal

While group_by combined with a long enough group_interval might make subsequent alerts less annoying, it still means we end up with multiple outgoing Alertmanager alerts in a single message, possibly grouped with the templates.

I propose removing this label altogether. Alertmanager already provides us with .StartsAt / .EndsAt, posting a new alert would just set the new "startsAt". This would technically be a breaking change, as existing users might rely on this label in their alert templates.

I'm willing to work on a PR for this if needed.

System details

Flux v2.2.3, notification-controller v1.2.4, Alertmanager v0.27.0

@tdemin
Copy link
Contributor Author

tdemin commented Apr 12, 2024

Minimal sample manifest set for testing:

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: gitrepository-that-always-fails
spec:
  url: https://this-repo-doesnt-exist.example.com/repo.git
  interval: 1m
  ref:
    branch: master
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
  name: gitrepository-that-always-fails
spec:
  summary: A GitRepository is failing
  eventSeverity: error
  eventSources:
    - kind: GitRepository
      name: gitrepository-that-always-fails
  providerRef:
    name: alertmanager
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
  name: alertmanager
spec:
  type: alertmanager
  secretRef:
    name: alertmanager-config
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: alertmanager-config
stringData:
  address: http://alertmanager:9093/api/v2/alerts
---
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: alertmanager
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
spec:
  type: ClusterIP
  selector:
    alertmanager: alertmanager
  ports:
    - name: http-web
      port: 9093
      protocol: TCP
      targetPort: 9093

@stefanprodan
Copy link
Member

I propose removing this label altogether. Alertmanager already provides us with .StartsAt / .EndsAt, posting a new alert would just set the new "startsAt". This would technically be a breaking change, as existing users might rely on this label in their alert templates.

I agree that for Alertmanager removing the timestamp would be better as it's recored in startsAt. We can ship this breaking change in the next minor release Flux v2.3 if you can open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants