EES-5687 - removed base alert templates for CPU, memory and response times #5489

duncan-at-hiveit · 2024-12-31T17:17:03Z

This PR:

is a POC for replacing the use of base alert templates for commonly configured alerts with direct calls to dynamicMerticAlert.bicep and reusable configuration items.
is a POC for improving the usage of only valid combinations of resource type and metric name.

POC for replacing base templates

Why the POC?

When carrying out work for EES-5687, I saw that a number of resources could potentially use a shared default configuration for common alert types - notably CPU percentage, memory percentage and response times. With this in mind, I created some base templates e.g. baseCpuPercentageAlert.bicep which encapsulated common configuration and behaviour for all CPU percentage alerts. If necessary, the defaults could be overridden but in general, we'd end up with consistently-named CPU % alerts that generally behaved the same, and one place to change if we wanted to affect the behaviour of them all.

@ntsim pointed out however that having this AND a resource-specific alert module that reused this base one e.g. containerApps/cpuPercentageAlert.bicep would cause multi-level ARM deploys and hinder debugging etc. He proposed removing the resource-specific ones and just calling the base ones directly from the low-level components' Bicep files e.g. containerApp.bicep would call baseCpuPercentageAlert.bicep directly instead, allowing one less level of redirection.

The proposed POC here

When playing with the above suggestion, I also toyed with making the dynamicMetricAlert.bicep module easier to call and with less boilerplate for the various callsites that use it. To prevent this POC branch from getting too big, I called this new version of it dynamicMetricAlertNew.bicep temporarily.

Because it was easier to call, I then looked at just calling it directly from the low-level component Bicep files e.g. containerApp.bicep would just call dynamicMetricAlertNew.bicep directly and do away with the baseCpuPercentageAlert.bicep file altogether.

I still wanted to keep the consistency of the naming and behaviour of the CPU alerts though, so I created a reusable chunk of configuration that could be used easily with dynamicMetricAlertNew.bicep wherever we were setting up a CPU alert, to ensure consistent behaviour and naming by default.

The advantage of this approach is that we get to remove 2 layers of indirection (containerApps/cpuPercentageAlert.bicep AND baseCpuPercentageAlert.bicep), but still retain the consistency of setting up CPU alerts across the board, as an example.

POC for valid resource type / metric name combinations

Why the POC?

In the review of EES-5687, @ntsim suggested that we look to introduce some type checking for Cpu- and Memory-specific resource types and metric names, calling them for instance CpuResourceType and CpuMetricName when used with a CPU-specific alert template. In this PR however, I've done away with scenario-specific alert templates, instead opting for calling dynamicMetricAlertNew.bicep directly.

The proposed POC here

Based on the above change in approach, I have therefore opted for another approach, listing out valid combinations of resourceType and metricName in a new user-defined type called ResourceMetric, which is a union of valid metrics versus resourceTypes in resourceMetrics.bicep. This constrains combinations of resourceType and metricName that can be passed into dynamicMetricAlertNew.bicep.

What next if the above proposed approaches are OK?

The rest of the resourceType, metricName combinations will be added to resourceMetrics.bicep.

dynamicMetricAlert.bicep will be updated with the contents of dynamicMetricAlertNew.bicep, and dynamicMetricAlertNew.bicep will be removed.

All templates calling dynamicMetricAlert.bicep directly will be updated to use the new reduced-boilerplate syntax that was introduced in dynamicMetricAlertNew.bicep.

As a separate PR, existing resource-specific alert templates that call dynamicMetricAlert.bicep can be removed in favour of calling dynamicMetricAlert.bicep directly from the low-level component Bicep files. As an example, postrgreSqlFlexibleServers/clientConnectionsWaitingAlert.bicep can be removed in favour of calling dynamicMetricAlert.bicep directly in postgresDatabase.bicep.

As another separate PR, staticMetricAlert.bicep can undergo the same treatment. We can then remove the ResourceType and MetricType types from alerts/types.bicep as the new type laid out in resourceMetrics.bicep is much more useful.

…times in favour of reusable configuration and direct calls to rejigged dynamicMetricAlert.bicep template which reduces boilerplate from callsites.

duncan-at-hiveit force-pushed the EES-5687-remove-base-alert-templates branch from 6bdc9b5 to 7901822 Compare December 31, 2024 17:28

EES-5687 - removed base alert templates for CPU, memory and response …

9aeb559

…times in favour of reusable configuration and direct calls to rejigged dynamicMetricAlert.bicep template which reduces boilerplate from callsites.

duncan-at-hiveit force-pushed the EES-5687-remove-base-alert-templates branch from 7901822 to 9aeb559 Compare January 2, 2025 09:40

duncan-at-hiveit mentioned this pull request Jan 2, 2025

Ees 5687 add resource utilisation metric alerts #5464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EES-5687 - removed base alert templates for CPU, memory and response times #5489

EES-5687 - removed base alert templates for CPU, memory and response times #5489

duncan-at-hiveit commented Dec 31, 2024 •

edited

Loading

EES-5687 - removed base alert templates for CPU, memory and response times #5489

Are you sure you want to change the base?

EES-5687 - removed base alert templates for CPU, memory and response times #5489

Conversation

duncan-at-hiveit commented Dec 31, 2024 • edited Loading

POC for replacing base templates

Why the POC?

The proposed POC here

POC for valid resource type / metric name combinations

Why the POC?

The proposed POC here

What next if the above proposed approaches are OK?

duncan-at-hiveit commented Dec 31, 2024 •

edited

Loading