Skip to content

Commit

Permalink
Add feature flag to enable logging for UTF-8 migration (#9174)
Browse files Browse the repository at this point in the history
This commit moves the logging for UTF-8 migration behind a feature flag,
so it can be disabled once UTF-8 strict mode is enabled. The motivation
for this is that this logging produces a huge amount of log volume,
and doesn't need to be enabled all the time.
  • Loading branch information
grobinson-grafana authored Sep 4, 2024
1 parent 88cefdd commit 0f59b35
Show file tree
Hide file tree
Showing 8 changed files with 39 additions and 11 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
* [CHANGE] Ruler: Removed `-ruler.drain-notification-queue-on-shutdown` option, which is now enabled by default. #9115
* [CHANGE] Querier: allow wrapping errors with context errors only when the former actually correspond to `context.Canceled` and `context.DeadlineExceeded`. #9175
* [FEATURE] Alertmanager: Added `-alertmanager.log-parsing-label-matchers` to control logging when parsing label matchers. This flag is intended to be used with `-alertmanager.utf8-strict-mode-enabled` to validate UTF-8 strict mode is working as intended. The default value is `false`. #9173
* [FEATURE] Alertmanager: Added `-alertmanager.utf8-migration-logging-enabled` to enable logging of tenant configurations that are incompatible with UTF-8 strict mode. The default value is `false`. #9174
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.query-engine=mimir`. #8422 #8430 #8454 #8455 #8360 #8490 #8508 #8577 #8660 #8671 #8677 #8747 #8850 #8872 #8838 #8911 #8909 #8923 #8924 #8925 #8932 #8933 #8934 #8962 #8986 #8993 #8995 #9008 #9017 #9018 #9019 #9120 #9121 #9136 #9139 #9145 #9191 #9194
* [FEATURE] Experimental Kafka-based ingest storage. #6888 #6894 #6929 #6940 #6951 #6974 #6982 #7029 #7030 #7091 #7142 #7147 #7148 #7153 #7160 #7193 #7349 #7376 #7388 #7391 #7393 #7394 #7402 #7404 #7423 #7424 #7437 #7486 #7503 #7508 #7540 #7621 #7682 #7685 #7694 #7695 #7696 #7697 #7701 #7733 #7734 #7741 #7752 #7838 #7851 #7871 #7877 #7880 #7882 #7887 #7891 #7925 #7955 #7967 #8031 #8063 #8077 #8088 #8135 #8176 #8184 #8194 #8216 #8217 #8222 #8233 #8503 #8542 #8579 #8657 #8686 #8688 #8703 #8706 #8708 #8738 #8750 #8778 #8808 #8809 #8841 #8842 #8845 #8853 #8886 #8988
* What it is:
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -14846,6 +14846,17 @@
"fieldFlag": "alertmanager.log-parsing-label-matchers",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "utf8_migration_logging",
"required": false,
"desc": "Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "alertmanager.utf8-migration-logging-enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
}
],
"fieldValue": null,
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,8 @@ Usage of ./cmd/mimir/mimir:
Directory to store Alertmanager state and temporarily configuration files. The content of this directory is not required to be persisted between restarts unless Alertmanager replication has been disabled. (default "./data-alertmanager/")
-alertmanager.storage.retention duration
How long should we store stateful data (notification logs and silences). For notification log entries, refers to how long should we keep entries before they expire and are deleted. For silences, refers to how long should tenants view silences after they expire and are deleted. (default 120h0m0s)
-alertmanager.utf8-migration-logging-enabled
[experimental] Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.
-alertmanager.utf8-strict-mode-enabled
[experimental] Enable UTF-8 strict mode. Allows UTF-8 characters in the matchers for routes and inhibition rules, in silences, and in the labels for alerts. It is recommended that all tenants run the `migrate-utf8` command in mimirtool before enabling this mode. Otherwise, some tenant configurations might fail to load. To identify tenants with incompatible configurations, search Mimir server logs for lines containing `Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible`. To find tenant configurations that are valid but contain ambiguous matchers, search for log lines containing `Matchers input has disagreement`. Each log line includes the invalid input, a suggestion on how to fix the input (excluding ambiguous matchers, as these require manual correction), and the ID of the affected tenant. You must run Mimir with debug-level logging enabled. Otherwise, these lines aren't logged. For more information, refer to https://prometheus.io/docs/alerting/latest/configuration/#label-matchers. Enabling and then disabling UTF-8 strict mode can break existing Alertmanager configurations if tenants added UTF-8 characters to their Alertmanager configuration while it was enabled.
-alertmanager.web.external-url string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2503,6 +2503,11 @@ alertmanager_client:
# UTF-8 strict mode is working as intended.
# CLI flag: -alertmanager.log-parsing-label-matchers
[log_parsing_label_matchers: <boolean> | default = false]

# (experimental) Enable logging of tenant configurations that are incompatible
# with UTF-8 strict mode.
# CLI flag: -alertmanager.utf8-migration-logging-enabled
[utf8_migration_logging: <boolean> | default = false]
```
### alertmanager_storage
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,9 @@ Grafana Mimir provides a number of tools to help you identify whether any existi

To identify affected tenant configurations, take the following steps:

1. Make sure Mimir is running version 2.12 or later with debug-level logging enabled.
1. Make sure Mimir is running version 2.12 or later.

1. Enable [`utf8-migration-logging-enabled`]({{< relref "./../../../configure/configuration-parameters#alertmanager" >}}) and set [`log_level`]({{< relref "./../../../configure/configuration-parameters#server" >}}) to `debug`. You must restart Mimir for the changes to take effect.

1. To identify any tenant configurations that are incompatible with UTF-8 (meaning the tenant configuration fails to load and the [fallback configuration](#fallback-configuration) is used instead), search Mimir server logs for lines containing `Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible`. Each log line includes the invalid matcher from the tenant configuration and the ID of the affected tenant. For example:

Expand Down Expand Up @@ -225,4 +227,4 @@ It's rare to find cases of disagreement in a tenant configuration, as most tenan

1. Any incompatible tenant configurations will fail to load. To identify if any tenant configurations are failing to load, search the Mimir server logs for lines containing `error applying config`, or query the `cortex_alertmanager_config_last_reload_successful` gauge for `0`.

1. You can disable debug-level logging.
1. You can disable [`utf8-migration-logging-enabled`]({{< relref "./../../../configure/configuration-parameters#alertmanager" >}}) and set [`log_level`]({{< relref "./../../../configure/configuration-parameters#server" >}}) back to its previous value.
8 changes: 5 additions & 3 deletions pkg/alertmanager/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ func (am *MultitenantAlertmanager) SetUserConfig(w http.ResponseWriter, r *http.
}

cfgDesc := alertspb.ToProto(cfg.AlertmanagerConfig, cfg.TemplateFiles, userID)
if err := validateUserConfig(logger, cfgDesc, am.limits, userID); err != nil {
if err := validateUserConfig(logger, cfgDesc, am.limits, userID, am.cfg.UTF8MigrationLogging); err != nil {
level.Warn(logger).Log("msg", errValidatingConfig, "err", err.Error())
http.Error(w, fmt.Sprintf("%s: %s", errValidatingConfig, err.Error()), http.StatusBadRequest)
return
Expand Down Expand Up @@ -185,8 +185,10 @@ func (am *MultitenantAlertmanager) DeleteUserConfig(w http.ResponseWriter, r *ht
}

// Partially copied from: https://github.com/prometheus/alertmanager/blob/8e861c646bf67599a1704fc843c6a94d519ce312/cli/check_config.go#L65-L96
func validateUserConfig(logger log.Logger, cfg alertspb.AlertConfigDesc, limits Limits, user string) error {
validateMatchersInConfigDesc(logger, "api", cfg)
func validateUserConfig(logger log.Logger, cfg alertspb.AlertConfigDesc, limits Limits, user string, utf8MigrationLogging bool) error {
if utf8MigrationLogging {
validateMatchersInConfigDesc(logger, "api", cfg)
}

// We don't have a valid use case for empty configurations. If a tenant does not have a
// configuration set and issue a request to the Alertmanager, we'll a) upload an empty
Expand Down
1 change: 1 addition & 0 deletions pkg/alertmanager/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -881,6 +881,7 @@ alertmanager_config: |

limits := &mockAlertManagerLimits{}
am := &MultitenantAlertmanager{
cfg: mockAlertmanagerConfig(t),
store: prepareInMemoryAlertStore(),
logger: test.NewTestingLogger(t),
limits: limits,
Expand Down
16 changes: 10 additions & 6 deletions pkg/alertmanager/multitenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ type MultitenantAlertmanagerConfig struct {
// then the UTF-8 parser will be logged. If it is disabled, then the old regular
// expression parser will be logged.
LogParsingLabelMatchers bool `yaml:"log_parsing_label_matchers" category:"experimental"`
UTF8MigrationLogging bool `yaml:"utf8_migration_logging" category:"experimental"`
}

const (
Expand Down Expand Up @@ -137,6 +138,7 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet, logger

f.BoolVar(&cfg.UTF8StrictMode, "alertmanager.utf8-strict-mode-enabled", false, "Enable UTF-8 strict mode. Allows UTF-8 characters in the matchers for routes and inhibition rules, in silences, and in the labels for alerts. It is recommended that all tenants run the `migrate-utf8` command in mimirtool before enabling this mode. Otherwise, some tenant configurations might fail to load. To identify tenants with incompatible configurations, search Mimir server logs for lines containing `Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible`. To find tenant configurations that are valid but contain ambiguous matchers, search for log lines containing `Matchers input has disagreement`. Each log line includes the invalid input, a suggestion on how to fix the input (excluding ambiguous matchers, as these require manual correction), and the ID of the affected tenant. You must run Mimir with debug-level logging enabled. Otherwise, these lines aren't logged. For more information, refer to https://prometheus.io/docs/alerting/latest/configuration/#label-matchers. Enabling and then disabling UTF-8 strict mode can break existing Alertmanager configurations if tenants added UTF-8 characters to their Alertmanager configuration while it was enabled.")
f.BoolVar(&cfg.LogParsingLabelMatchers, "alertmanager.log-parsing-label-matchers", false, "Enable logging when parsing label matchers. This flag is intended to be used with -alertmanager.utf8-strict-mode-enabled to validate UTF-8 strict mode is working as intended.")
f.BoolVar(&cfg.UTF8MigrationLogging, "alertmanager.utf8-migration-logging-enabled", false, "Enable logging of tenant configurations that are incompatible with UTF-8 strict mode.")
}

// Validate config and returns error on failure
Expand Down Expand Up @@ -813,12 +815,14 @@ type amConfig struct {
// setConfig applies the given configuration to the alertmanager for `userID`,
// creating an alertmanager if it doesn't already exist.
func (am *MultitenantAlertmanager) setConfig(cfg amConfig) error {
// Instead of using "config" as the origin, as in Prometheus Alertmanager, we use "tenant".
// The reason for this that the config.Load function uses the origin "config",
// which is correct, but Mimir uses config.Load to validate both API requests and tenant
// configurations. This means metrics from API requests are confused with metrics from
// tenant configurations. To avoid this confusion, we use a different origin.
validateMatchersInConfigDesc(am.logger, "tenant", cfg.AlertConfigDesc)
if am.cfg.UTF8MigrationLogging {
// Instead of using "config" as the origin, as in Prometheus Alertmanager, we use "tenant".
// The reason for this that the config.Load function uses the origin "config",
// which is correct, but Mimir uses config.Load to validate both API requests and tenant
// configurations. This means metrics from API requests are confused with metrics from
// tenant configurations. To avoid this confusion, we use a different origin.
validateMatchersInConfigDesc(am.logger, "tenant", cfg.AlertConfigDesc)
}

level.Debug(am.logger).Log("msg", "setting config", "user", cfg.User)

Expand Down

0 comments on commit 0f59b35

Please sign in to comment.