This repository contains tools for managing Azure Update Manager scenarios.
With a staged patching solution, OS updates are first deployed in a test environment and are later deployed in pre-production and production environments, ensuring the latter environments only get the very specific updates initially deployed in the test environment. With this approach, you significantly decrease the chances of having an OS update breaking a production system.
The typical setup of staging patching can be as follows:
- Dev/Test machines of a specific OS type/version are recurrently patched (e.g., every few weeks) for all update classifications (stage 0).
- Each Dev/Test patch cycle ends with a specific set of updates (e.g., specific Windows KB IDs or Linux package versions) that were successfully installed across all Dev/Test machines.
- Pre-production machines are patched a few days later (stage 1) with the very specific updates that were deployed in Dev/Test (stage 0).
- Production machines are patched one or two weeks later (stage 2) with the same updates that were deployed and tested in Dev/Test (stage 0) and Pre-Production (stage 1).
This staged patching approach can be implemented with the help of the Create-StagedMaintenanceConfiguration.ps1 PowerShell script, which runs after stage 0 and automates stages 1 and 2. This script can be, for example, deployed as an Azure Automation Runbook scheduled to run after the Dev/Test recurrent update cycle (see diagram below). It works for both Windows and Linux scenarios (Azure VMs and Azure Arc-enabled servers).
The value of a staged patching solution is to ensure that patches deployed in a production environment are previously tested in non-production environments. The more consistent and repeatable the patching workflow is, the more confidence you have in the patches that reach production. For this reason, it is recommended to define maintenance configurations specifically for each OS version and ensure further stages are applied only to machines of the same OS version. For example, if your environment has a mix of Windows 2016, Windows 2019, Ubuntu 20.4, and Ubuntu 22.4 servers, you should define four different staged patching workflows, one for each OS version. With this approach, for example, Windows 2019 production machines will only get patches that were tested in similar Windows 2019 non-production machines and, similarly, Ubuntu 20.4 production servers will only get package updates that were tested on Ubuntu 20.4 non-production servers.
Tagging is your best friend in this strategy. By tagging your servers according to their OS version and patching stage, it will be easier to dynamically define the scope of a specific patching stage. Continuing with the example above, your servers can be tagged as follows:
- An
aum-stage
tag for each of the patching stages (e.g.,aum-stage
=dev
,aum-stage
=preprod
,aum-stage
=prod
,aum-stage
=prod-ha-instance1
,aum-stage
=prod-ha-instance2
, etc.). - An
os-name
tag for each of the OS versions of your environment (e.g.,os-name
=windows2016
,os-name
=windows2019
,os-name
=ubuntu20
,os-name
=ubuntu22
, etc.)
You can choose whatever tagging strategy that meets your staged patching requirements, provided you end up with a predictable patching workflow. Once you have tagged all your servers according to their stage in the different patching workflows, you have to define a stage 0 recurring Maintenance Configuration (e.g., for servers tagged aum-stage
=dev
), for each of the OS versions. For the next stages, you have two options to leverage this solution:
- Schedule the Create-StagedMaintenanceConfiguration runbook to run after stage 0 (e.g., the next day) and configure it to create/update all the next stages (e.g., stages 1, 2, etc.) based on the results of stage 0.
- Chain all the stages to each other, by scheduling the Create-StagedMaintenanceConfiguration runbook for each of the stages before production:
- After stage 0, create/update stage 1 based on the results of stage 0
- After stage 1, create/update stage 2 based on the results of stage 1
- etc.
IMPORTANT: you must ensure the last stage (production) is not scheduled after the next iteration of the phase 0 stage, otherwise you will end up having the production Maintenance Configuration overwritten with a new schedule/patch selection before it is actually deployed. Also, bear in mind that, as Azure Resource Graph keeps the patching results history for up to 30 days, your updates cycle must not exceed this interval.
Validating the quality of the patching stages before production is one important perspective not addressed by this solution. It is not sufficient to patch dev/test servers - we must ensure the patches pass minimum quality tests before reaching production. At this moment, you must run a parallel process that performs this validation (e.g., automated tests running in the patched servers). With the support for post-maintenance tasks in Azure Update Manager, you can integrate quality assurance in this solution.
- The machines in the scope of this solution must be supported by Azure Update Manager and fulfill its pre-requisites.
- The machines in the scope of this solution must have the Customer Managed Schedules patch orchestration mode, a pre-requisite for scheduled patching.
- At least one recurrent scheduled patching Maintenance Configuration covering a part of the machines in scope. As this maintenance configuration will serve as the reference for the following patching stages, it should be assigned to non-production machines and, ideally, recur every few weeks. See the above recommendations for an effective patching strategy.
- An Azure Automation Account with an associated Managed Identity (can be a system or user-assigned identity) and the following modules installed:
Az.Accounts
,Az.Resources
, andAz.ResourceGraph
. This solution is based on an Automation Account, but you can use other approaches, such as Azure Functions. - The Automation Account Managed Identity must have the following minimum permissions (as a custom role) on the subscription where the reference maintenance configuration was created:
- */read
- Microsoft.Maintenance/maintenanceConfigurations/write
- Microsoft.Maintenance/configurationAssignments/write
- Microsoft.Maintenance/applyupdates/write
- Microsoft.Resources/deployments/*
- Ensure your Azure VMs and Azure Arc-enabled servers are tagged according to the staged patching strategy you defined (see above). Use tags to group your servers according to their patching phase and OS version.
- Create a recurrent scheduled patching Maintenance Configuration for Phase 0 of each OS version in your environment. See instructions here. Assign this Maintenance Configuration to the servers that will serve as the initial testbed for your patches. You can assign servers either directly to the Maintenance Configuration or dynamically, with a Dynamic Scope.
- Create or reuse an Azure Automation Account.
- Install all required Automation Account modules (
Az.Accounts
andAz.Resources
are usually built-in, butAz.ResourceGraph
is not). See here how to install additional modules. - Assign a Managed Identity to the Automation Account and grant it the required privileges (see pre-requisites above).
- Import the Create-StagedMaintenanceConfiguration.ps1 Runbook into the Automation Account, by following these steps. Download the runbook first to your local machine. The runbook must be configured as PowerShell 5.1. Don't forget to publish the runbook (draft runbooks cannot be scheduled).
- Create an Azure Automation schedule for each of the Phase 0 Maintenance Configurations (one per OS version). The schedule must have the same frequency as the Maintenance Configuration it refers to, with at least an 8-hour offset. For example, if the Maintenance Configuration is scheduled on Mondays, every 2 weeks at 8:00 p.m., then the respective Azure Automation schedule should be scheduled on Tuesdays, every 2 weeks at least at 4:00 a.m.
- Link the
Create-StagedMaintenanceConfiguration
runbook to each of the schedules and specify its parameters according to the instructions below. To obtain the Maintenance Configuration ID, check the "Properties" blade of the Maintenance Configuration. - (Optional) If you prefer to adopt a more conservative chained staged approach, you need to create additional schedules (for further stages before production) and link them to the same runbook. In this case, you will have to anticipate the Maintenance Configuration IDs that will result from the previous stages' executions, which will be in the form
/subscriptions/<phase 0 maintenance configuration subscription ID>/resourceGroups/<phase 0 maintenance configuration resource group>/providers/Microsoft.Maintenance/maintenanceConfigurations/<previous stage name>
.
Pre- and post-maintenance tasks in Azure Update Manager follow an event-based architecture, in which you subscribe to events coming from a system topic associated to the Configuration Maintenance, and use, for example, an Azure Automation runbook or Azure Function as the destination of the event. You can learn more about how to configure pre- and post-maintenance tasks in the Azure Update Manager pre and post events overview, and also on the how-to guide and tutorials for Azure Automation- and Azure Functions-based tasks. You can find in this repository two code samples for the pre- and post-maintenance scenarios:
- Turn machines on if needed with Start-StagedMaintenanceVMs.ps1 and turn them all off with Deallocate-StagedMaintenanceVMs.ps1. This requires the Virtual Machine Contributor role to be granted to the Automation Account managed identity in the scope of the VMs to be managed (management group, subscription, or resource group).
- Take an OS disk snapshot and turn machines on if needed, but cancel the maintenance if something fails with Start-VMsWithStateAndSnapshot.ps1. Turn off only the machines that were turned on in pre-maintenance task with Deallocate-VMsWithState.ps1. This requires the Virtual Machine Contributor, Automation Contribuor, and Disk Snapshot Contributor roles to be granted to the Automation Account managed identity in the scope of the VMs to be managed (management group, subscription, or resource group).
NOTE: the staged patching solution here described does not propagate pre- and post-maintenance tasks coming from the reference maintenance configuration to the following stages, but you can manually configure them once the subsequent stages have been created for the first time.
The Create-StagedMaintenanceConfiguration.ps1 PowerShell script receives the following parameters:
MaintenanceConfigurationId
: Azure Resource Manager ID of the Maintenance Configuration to be used as a reference to create maintenance configurations for further stagesNextStagePropertiesJson
: JSON-formatted parameter that will define the scope of the next maintenance configurations, with the following schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items": [
{
"type": "object",
"properties": {
"stageName": {
"type": "string"
},
"offsetDays": {
"type": "integer"
},
"offsetTimeSpan": {
"type": "string"
},
"scope": {
"type": "array",
"items": [
{
"type": "string"
}
]
},
"filter": {
"type": "object",
"properties": {
"resourceTypes": {
"type": "array",
"items": [
{
"type": "string"
}
]
},
"resourceGroups": {
"type": "array",
"items": [
{
"type": "string"
}
]
},
"tagSettings": {
"type": "object",
"properties": {
"tags": {
"type": "object",
"properties": {
"tagName1": {
"type": "array",
"items": [
{
"type": "string"
}
]
},
"tagNameN": {
"type": "array",
"items": [
{
"type": "string"
}
]
}
}
},
"filterOperator": {
"type": "string"
}
},
"required": [
"tags",
"filterOperator"
]
},
"locations": {
"type": "array",
"items": [
{
"type": "string"
}
]
},
"osTypes": {
"type": "array",
"items": [
{
"type": "string"
}
]
}
}
}
},
"required": [
"stageName",
"scope",
"filter"
]
}
]
}
The example below implements a scenario in which the Pre-Production and Production stages are deployed respectively 7 days (with offsetDays
) and 14 days + 4 hours (with offsetTimeSpan
in ISO 8601 format) after the reference maintenance configuration (Dev/Test). The maintenance scope is targeted at two subscriptions (00000000-0000-0000-0000-000000000000
and 00000000-0000-0000-0000-000000000001
), for
both Windows Azure VMs and Azure Arc-enabled servers tagged with aum-stage=preprod|prod
and os-name=windows2019
. The filter
property follows the format defined
for Maintenance Configuration Assignments (see reference).
[
{
"stageName": "windows2019-preprod",
"offsetDays": 7,
"scope": [
"/subscriptions/00000000-0000-0000-0000-000000000000",
"/subscriptions/00000000-0000-0000-0000-000000000001"
],
"filter": {
"resourceTypes": [
"microsoft.compute/virtualmachines",
"microsoft.hybridcompute/machines"
],
"resourceGroups": [
],
"tagSettings": {
"tags": {
"aum-stage": [
"preprod"
],
"os-name": [
"windows2019"
]
},
"filterOperator": "All"
},
"locations": [],
"osTypes": [
"Windows"
]
}
},
{
"stageName": "windows2019-prod",
"offsetTimeSpan": "P14DT4H",
"scope": [
"/subscriptions/00000000-0000-0000-0000-000000000000",
"/subscriptions/00000000-0000-0000-0000-000000000001"
],
"filter": {
"resourceTypes": [
"microsoft.compute/virtualmachines",
"microsoft.hybridcompute/machines"
],
"resourceGroups": [
],
"tagSettings": {
"tags": {
"aum-stage": [
"prod"
],
"os-name": [
"windows2019"
]
},
"filterOperator": "All"
},
"locations": [],
"osTypes": [
"Windows"
]
}
}
]
This solution is not supported under any Microsoft standard support program or service. The scripts are provided AS IS without warranty of any kind. The entire risk arising out of the use or performance of the scripts and documentation remains with you.