You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Alertmanager's routing tree is defined statically in the alertmanager.yml configuration file. While this approach works well for environments where configurations are managed centrally, it presents challenges in dynamic or multi-tenant environments, where multiple teams or systems need to share the same Alertmanager instance.
Problem Statement
Managing notifications through Alertmanager routing trees in dynamic or multi-tenant setups is difficult because:
Lack of Modularity: The routing tree is treated as a single, static object. It is not possible to dynamically merge or patch a subtree into the existing routing tree without regenerating the entire configuration.
Scalability: In shared Alertmanager instances, each team may have specific routing requirements. Currently, these must be handled in a centralized, monolithic configuration file, which can become unwieldy and error-prone as the number of teams or requirements grows.
Lack of Dynamic Flexibility: Dynamic environments, such as Kubernetes clusters, often require updates to routing configurations based on events (e.g., new clusters or services). This is difficult to achieve without external tooling to regenerate and reload the configuration.
Use Case
In a shared Grafana + Alertmanager setup, different teams want to manage their own notification routing policies independently, while still using a central Alertmanager instance. For example:
Team A wants to route all alerts with label team="A" to their Slack channel.
Team B wants to route alerts with label team="B" to their PagerDuty service.
Today, this would require a centralized admin to maintain and update the monolithic configuration file, or for each team to run its own instance of Alertmanager.
Proposed Solution
Introduce support for dynamically merging or patching subtrees into the routing tree. This could be achieved through:
Dynamic Subtree Injection: Allow teams or systems to submit their subtree configuration (e.g., via an API or separate file) to be merged into the main routing tree at a specified location.
Granular Reloading: Instead of requiring a full configuration reload, allow for partial reloads where only the modified subtree is reloaded.
Modular Configuration: Support breaking the routing tree into modular files or objects that can be updated independently and then aggregated by Alertmanager.
This is something we would benefit a lot in our own use cases too. I am curious what maintainers think as well :)
I definitely might help for drafting a PR if needed.
Currently, Alertmanager's routing tree is defined statically in the
alertmanager.yml
configuration file. While this approach works well for environments where configurations are managed centrally, it presents challenges in dynamic or multi-tenant environments, where multiple teams or systems need to share the same Alertmanager instance.Problem Statement
Managing notifications through Alertmanager routing trees in dynamic or multi-tenant setups is difficult because:
Use Case
In a shared Grafana + Alertmanager setup, different teams want to manage their own notification routing policies independently, while still using a central Alertmanager instance. For example:
team="A"
to their Slack channel.team="B"
to their PagerDuty service.Today, this would require a centralized admin to maintain and update the monolithic configuration file, or for each team to run its own instance of Alertmanager.
Proposed Solution
Introduce support for dynamically merging or patching subtrees into the routing tree. This could be achieved through:
I hope this makes sense to you. We encountered this problem all the way downstream in crossplane-provider-grafana. Here is also the related grafana issue.
The text was updated successfully, but these errors were encountered: