Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EES-5687 - removed base alert templates for CPU, memory and response times #5489

Draft
wants to merge 1 commit into
base: EES-5687-add-resource-utilisation-metric-alerts
Choose a base branch
from

Conversation

duncan-at-hiveit
Copy link
Collaborator

@duncan-at-hiveit duncan-at-hiveit commented Dec 31, 2024

This PR:

  • is a POC for replacing the use of base alert templates for commonly configured alerts with direct calls to dynamicMerticAlert.bicep and reusable configuration items.
  • is a POC for improving the usage of only valid combinations of resource type and metric name.

POC for replacing base templates

Why the POC?

When carrying out work for EES-5687, I saw that a number of resources could potentially use a shared default configuration for common alert types - notably CPU percentage, memory percentage and response times. With this in mind, I created some base templates e.g. baseCpuPercentageAlert.bicep which encapsulated common configuration and behaviour for all CPU percentage alerts. If necessary, the defaults could be overridden but in general, we'd end up with consistently-named CPU % alerts that generally behaved the same, and one place to change if we wanted to affect the behaviour of them all.

@ntsim pointed out however that having this AND a resource-specific alert module that reused this base one e.g. containerApps/cpuPercentageAlert.bicep would cause multi-level ARM deploys and hinder debugging etc. He proposed removing the resource-specific ones and just calling the base ones directly from the low-level components' Bicep files e.g. containerApp.bicep would call baseCpuPercentageAlert.bicep directly instead, allowing one less level of redirection.

The proposed POC here

When playing with the above suggestion, I also toyed with making the dynamicMetricAlert.bicep module easier to call and with less boilerplate for the various callsites that use it. To prevent this POC branch from getting too big, I called this new version of it dynamicMetricAlertNew.bicep temporarily.

Because it was easier to call, I then looked at just calling it directly from the low-level component Bicep files e.g. containerApp.bicep would just call dynamicMetricAlertNew.bicep directly and do away with the baseCpuPercentageAlert.bicep file altogether.

I still wanted to keep the consistency of the naming and behaviour of the CPU alerts though, so I created a reusable chunk of configuration that could be used easily with dynamicMetricAlertNew.bicep wherever we were setting up a CPU alert, to ensure consistent behaviour and naming by default.

The advantage of this approach is that we get to remove 2 layers of indirection (containerApps/cpuPercentageAlert.bicep AND baseCpuPercentageAlert.bicep), but still retain the consistency of setting up CPU alerts across the board, as an example.

POC for valid resource type / metric name combinations

Why the POC?

In the review of EES-5687, @ntsim suggested that we look to introduce some type checking for Cpu- and Memory-specific resource types and metric names, calling them for instance CpuResourceType and CpuMetricName when used with a CPU-specific alert template. In this PR however, I've done away with scenario-specific alert templates, instead opting for calling dynamicMetricAlertNew.bicep directly.

The proposed POC here

Based on the above change in approach, I have therefore opted for another approach, listing out valid combinations of resourceType and metricName in a new user-defined type called ResourceMetric, which is a union of valid metrics versus resourceTypes in resourceMetrics.bicep. This constrains combinations of resourceType and metricName that can be passed into dynamicMetricAlertNew.bicep.

What next if the above proposed approaches are OK?

The rest of the resourceType, metricName combinations will be added to resourceMetrics.bicep.

dynamicMetricAlert.bicep will be updated with the contents of dynamicMetricAlertNew.bicep, and dynamicMetricAlertNew.bicep will be removed.

All templates calling dynamicMetricAlert.bicep directly will be updated to use the new reduced-boilerplate syntax that was introduced in dynamicMetricAlertNew.bicep.

As a separate PR, existing resource-specific alert templates that call dynamicMetricAlert.bicep can be removed in favour of calling dynamicMetricAlert.bicep directly from the low-level component Bicep files. As an example, postrgreSqlFlexibleServers/clientConnectionsWaitingAlert.bicep can be removed in favour of calling dynamicMetricAlert.bicep directly in postgresDatabase.bicep.

As another separate PR, staticMetricAlert.bicep can undergo the same treatment. We can then remove the ResourceType and MetricType types from alerts/types.bicep as the new type laid out in resourceMetrics.bicep is much more useful.

@duncan-at-hiveit duncan-at-hiveit force-pushed the EES-5687-remove-base-alert-templates branch from 6bdc9b5 to 7901822 Compare December 31, 2024 17:28
…times in favour of reusable configuration and direct calls to rejigged dynamicMetricAlert.bicep template which reduces boilerplate from callsites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant