Skip to content
This repository has been archived by the owner on May 7, 2024. It is now read-only.

Commit

Permalink
Merge branch 'main' into patch-2
Browse files Browse the repository at this point in the history
  • Loading branch information
rodrigosantosms authored Apr 5, 2024
2 parents ba23bf3 + e1d3c2f commit c624770
Show file tree
Hide file tree
Showing 20 changed files with 217 additions and 288 deletions.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Azure Proactive Resiliency Library (APRL)

> [!CAUTION]
> The APRL repository is scheduled to be migrated to a new repository the week of April 8th.
> The current APRL repository will be placed in **READ-ONLY** mode from April 8th to April 12th.
> No new pull requests will be accepted after April 5th.
>
> **New Repository:** [https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2](https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2)
>
> **[aka.ms/aprl](https://aka.ms/aprl)** will redirect to the new website starting April 15th
[![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/Azure/Azure-Proactive-Resiliency-Library.svg)](http://isitmaintained.com/project/Azure/Azure-Proactive-Resiliency-Library "Average time to resolve an issue")
[![Percentage of issues still open](http://isitmaintained.com/badge/open/Azure/Azure-Proactive-Resiliency-Library.svg)](http://isitmaintained.com/project/Azure/Azure-Proactive-Resiliency-Library "Percentage of issues still open")

Expand All @@ -17,7 +26,7 @@ The library also contains supporting [Azure Resource Graph (ARG)](https://learn.

> The contribution guide can be found on the GitHub pages site here: [aka.ms/aprl/contribute](https://aka.ms/aprl/contribute)
This project only currently accepts Pull Requests from Microsoft FTEs as of today. However, anyone is welcomed to create issues/features requests on the repo for the team to triage and action. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com).
This project only currently accepts Pull Requests from Microsoft FTEs as of today. However, anyone is welcomed to create issues/features requests on the repo for the team to triage and action. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com).

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

Expand Down
22 changes: 18 additions & 4 deletions docs/content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@ description = "Welcome to the home of the Azure Proactive Resiliency Library (AP
weight = 1
+++

{{< alert style="danger" >}}

## WEBSITE MAINTENANCE NOTICE

The APRL repository is scheduled to be migrated to a new repository the week of April 8th.
The current APRL repository will be placed in READ-ONLY mode from April 8th to April 12th.
No new pull requests will be accepted after April 5th.

### New Repository: [https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2](https://github.com/Azure/Azure-Proactive-Resiliency-Library-v2)

### [aka.ms/aprl](https://aka.ms/aprl) will redirect to the new website starting April 15th

{{< /alert >}}

Welcome to the home of the Azure Proactive Resiliency Library (APRL).

<img src="/Azure-Proactive-Resiliency-Library/media/img/aprl-white.png" width=40%>
Expand All @@ -29,10 +43,10 @@ In APRL you will see a number of terms used, like Preview & Verified. The below

{{< table style="table-striped" >}}

| Term | Definition |
| ---- | ---------- |
| Preview Guidance | Guidance that Microsoft FTEs have created based on customer engagements and is in the process of reviewing with the relevant Azure Product Group Engineering Service owners to ensure the content is valid and accurate |
| Verified Guidance | Guidance has been signed off by Azure Product Group Engineering Service owners following their review |
| Term | Definition |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Preview Guidance | Guidance that Microsoft FTEs have created based on customer engagements and is in the process of reviewing with the relevant Azure Product Group Engineering Service owners to ensure the content is valid and accurate |
| Verified Guidance | Guidance has been signed off by Azure Product Group Engineering Service owners following their review |

{{< /table >}}

Expand Down
70 changes: 70 additions & 0 deletions docs/content/services/ai-ml/databricks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ The presented resiliency recommendations in this guidance include Azure Databric
| [DBW-26 - Isolate each workspace in its own Vnet](#dbw-26---isolate-each-workspace-in-its-own-vnet) | System Efficiency | High | Preview | No |
| [DBW-27 - Do not Store any Production Data in Default DBFS Folders](#dbw-27---do-not-store-any-production-data-in-default-dbfs-folders) | Availability | High | Preview | No |
| [DBW-28 - Do not use Azure Sport VMs for critical Production workloads](#dbw-28---do-not-use-azure-sport-vms-for-critical-production-workloads) | Availability | High | Preview | No |
| [DBW-29 - Migrate Legacy Workspaces](#dbw-29---migrate-legacy-workspaces) | Availability | High | Preview | No |
| [DBW-30 - Define alternate VM SKUs](#dbw-30---define-alternate-vm-skus) | System Efficiency | Medium | Preview | No |
{{< /table >}}

{{< alert style="info" >}}
Expand Down Expand Up @@ -765,3 +767,71 @@ Azure Spot VMs are not recommended for critical production workloads that requir
{{< /collapse >}}

<br><br>

### DBW-29 - Migrate Legacy Workspaces

**Category: Availability**

**Impact: High**

**Guidance**

Azure Databricks initially launched with shared control plane, where some regions shared control plane resources with another region. This shared control plane model then evolved to dedicated in-region control planes (e.g. North Europe, Central US, East US) to ensure a regional outage does not impact customer workspaces in other regions.

Regions that now have their dedicated control plane have workspaces running in two configurations:

- Legacy Workspaces - these are workspaces created before the dedicated control plane was available.
- Workspaces - these are workspaces created after the dedicated control plane was available.

The path for migrating legacy workspaces to use the in-region control plane is to **redeploy**.

Review the list of network addresses used in each region in the Microsoft documentation and determine which regions are sharing a control plane. For example, we can look up Canada East in the table and see that the address for its SCC relay is "tunnel.canadacentral.azuredatabricks.net". Since the relay address is in Canada Central, we know that "Canada East" is using the control plane in another region.

Some regions list two different addresses in the Azure Databricks Control plane networking table. For example, North Europe lists both "tunnel.westeurope.azuredatabricks.net" and "tunnel.northeuropec2.azuredatabricks.net" for the SCC relay address. This is because North Europe once shared the West Europe control plane, but it now has its own independent control plane. There are still some old, legacy workspaces in North Europe tied to the old control plane, but all workspaces created since the switch-over will be using the new control plane.

Once a new Azure Databricks workspace is created, it should be configured to match the original legacy workspace. Databricks, Inc.
recommends that customers use the Databricks Terraform Exporter for both the initial copy and for maintaining the workspace. However, this exporter is still in the experimental phase. For customers that do not trust experimental projects or for customers that do not want to use Terraform, they can use the "Migrate" tool that Databricks, Inc. maintains with GitHub. This is a collection of scripts that will export all of the objects (notebooks, cluster definitions, metadata, *etc.*) from one workspace and then import them to another workspace. Customers can use the "Migrate" tool to initially populate the new
workspace and then use their CI/CD deployment process to keep the workspace in sync.

Pro Tip: If you need to determine where the control plane is located for a particular Databricks workspace, you can use the "nslookup" console command on Windows or Linux with the workspace address. The result will tell you where the control plane is located.

**Resources**

- [Azure Databricks regions - IP addresses and domains](https://learn.microsoft.com/azure/databricks/resources/supported-regions#--ip-addresses-and-domains)
- [Migrate - maintained by Databricks Inc.](https://github.com/databrickslabs/migrate)
- [Databricks Terraform Exporter - maintained by Databricks Inc. (Experimental)](https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/experimental-exporter)

<br><br>

### DBW-30 - Define alternate VM SKUs

**Category: System Efficiency**

**Impact: Medium**

**Guidance**

Azure Databricks availability planning should include plans for swapping VM SKUs based on capacity constraints.

Azure Databricks creates its VMs as regional VMs and depends on Azure to choose the best availability zone for the VM. In the past, there have been rare instances where compute can not be allocated due to zonal or regional VM constraints. Thus, resulting in a "CLOUD PROVIDER" error.

In these situations, customers have two options:

- Use Databricks Pools. To manage costs, customers should be careful when selecting the size of their pools. They will have to pay for the Azure VMs even when they are idle in the pool. Databricks pool can contain only one SKU of VMs; you cannot mix multiple SKUs in the same pool. To reduce the number of pools that customers need to manage, they should settle on a few SKUs that will service their jobs instead of using a different VM
SKU for each job.
- Plan for alternative SKUs in their preferred region(s).

**Resources**

- [Compute configuration best practices](https://learn.microsoft.com/azure/databricks/compute/cluster-config-best-practices)
- [GPU-enabled compute](https://learn.microsoft.com/azure/databricks/compute/gpu)

**Resource Graph Query**

{{< collapse title="Show/Hide Query/Script" >}}

{{< code lang="sql" file="code/dbw-30/dbw-30.kql" >}} {{< /code >}}

{{< /collapse >}}

<br><br>
12 changes: 6 additions & 6 deletions docs/content/services/compute/virtual-machines/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The presented resiliency recommendations in this guidance include Virtual Machin
| [VM-16 - Shared disks should only be enabled in Clustered servers](#vm-16---shared-disks-should-only-be-enabled-in-clustered-servers) | Storage | Medium | Verified | Yes |
| [VM-17 - The Network access to the VM disk is set to Enable Public access from all networks](#vm-17---network-access-to-the-vm-disk-should-be-set-to-disable-public-access-and-enable-private-access) | Access & Security | Low | Verified | Yes |
| [VM-18 - Virtual Machine is not compliant with Azure Policies](#vm-18---ensure-that-your-vms-are-compliant-with-azure-policies) | Governance | Low | Verified | Yes |
| [VM-19 - Enable disk encryption, Enable data at rest encryption by default](#vm-19---enable-disk-encryption-and-data-at-rest-encryption-by-default) | Access & Security | Medium | Verified | Yes |
| [VM-19 - Enable advanced encryption options for your managed disks](#vm-19---enable-advanced-encryption-options-for-your-managed-disks) | Access & Security | Medium | Verified | No |
| [VM-20 - Enable Insights to get more visibility into the health and performance of your virtual machine](#vm-20---enable-vm-insights) | Monitoring | Low | Verified | Yes |
| [VM-21 - Configure diagnostic settings for all Azure Virtual Machines](#vm-21---configure-diagnostic-settings-for-all-azure-virtual-machines) | Monitoring | Low | Preview | Yes |
| [VM-22 - Use maintenance configurations for the Virtual Machine](#vm-22---use-maintenance-configurations-for-the-vms) | Governance | High | Verified | Yes |
Expand Down Expand Up @@ -515,20 +515,20 @@ It's important to keep your virtual machine (VM) secure for the applications tha

<br><br>

### VM-19 - Enable disk encryption and data at rest encryption by default
### VM-19 - Enable advanced encryption options for your managed disks

**Category: Access & Security**

**Impact: Medium**

**Guidance**

There are several types of encryption available for your managed disks, including Azure Disk Encryption (ADE), Server-Side Encryption (SSE) and encryption at host.
Azure Disk Storage Server-Side Encryption (also referred to as encryption-at-rest or Azure Storage encryption) automatically encrypts data stored on Azure managed disks (OS and data disks) when persisting on the Storage Clusters. There are several types of advanced encryption options available for your managed disks, including Azure Disk Encryption (ADE), Encryption at host and Confidential disk encryption.

- Azure Disk Encryption helps protect and safeguard your data to meet your organizational security and compliance commitments.
- Azure Disk Storage Server-Side Encryption (also referred to as encryption-at-rest or Azure Storage encryption) automatically encrypts data stored on Azure managed disks (OS and data disks) when persisting on the Storage Clusters.
- ADE encrypts the disks of Azure virtual machines (VMs) inside your VMs by using the DM-Crypt feature of Linux or the BitLocker feature of Windows.
- Encryption at host ensures that data stored on the VM host hosting your VM is encrypted at rest and flows encrypted to the Storage clusters.
- Confidential disk encryption binds disk encryption keys to the virtual machine's TPM and makes the protected disk content accessible only to the VM.
- Confidential disk encryption binds disk encryption keys to the virtual machine’s TPM and makes the protected disk content accessible only to the VM.


**Resources**

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// Azure Resource Graph Query
// Find all disks that are not encrypted
resources
| where type == "microsoft.compute/disks"
| extend encryptionType = properties.encryption.type
| extend diskState = properties.diskState
| where encryptionType !in ("EncryptionAtRestWithCustomerKey", "EncryptionAtRestWithPlatformAndCustomerKeys", "EncryptionAtRestWithPlatformKey")
| project recommendationId="vm-19", name, id, tags, param1=strcat("encryptionType: " , properties.encryption.type), param2= strcat ("diskstate: ", properties.diskState)
Original file line number Diff line number Diff line change
@@ -1,8 +1 @@
// Azure Resource Graph Query
// Find all disks that are not encrypted
resources
| where type == "microsoft.compute/disks"
| extend encryptionType = properties.encryption.type
| extend diskState = properties.diskState
| where encryptionType !in ("EncryptionAtRestWithCustomerKey", "EncryptionAtRestWithPlatformAndCustomerKeys", "EncryptionAtRestWithPlatformKey")
| project recommendationId="vm-19", name, id, tags, param1=strcat("encryptionType: " , properties.encryption.type), param2= strcat ("diskstate: ", properties.diskState)
// under-development
26 changes: 26 additions & 0 deletions docs/content/services/integration/api-management/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The presented resiliency recommendations in this guidance include Api Management
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------:|:------:|:-------:|:-------------------:|
| [APIM-1 - Migrate API Management services to Premium SKU to support Availability Zones](#apim-1---migrate-api-management-services-to-premium-sku-to-support-availability-zones) | Availability | High | Preview | Yes |
| [APIM-2 - Enable Availability Zones on Premium API Management instances](#apim-2---enable-availability-zones-on-premium-api-management-instances) | Availability | High | Preview | Yes |
| [APIM-3 - Upgrade to platform version stv2](#apim-3---upgrade-to-platform-version-stv2) | Availability | High | Preview | Yes |
{{< /table >}}

{{< alert style="info" >}}
Expand Down Expand Up @@ -75,3 +76,28 @@ Enable zone redundancy for APIM instances. With zone redundancy, the gateway and
{{< /collapse >}}

<br><br>

### APIM-3 - Upgrade to platform version stv2

**Category: Availability**

**Impact: High**

**Guidance**

Upgrade to platform version stv2. The infrastructure associated with the API Management stv1 compute platform version will be retired effective 31 August 2024. A more current compute platform version (stv2) is already available and provides enhanced service capabilities.

**Resources**

- [Azure API Management - stv1 platform retirement (August 2024)](https://learn.microsoft.com/en-us/azure/api-management/breaking-changes/stv1-platform-retirement-august-2024)
- [Azure API Management compute platform](https://learn.microsoft.com/en-us/azure/api-management/compute-infrastructure)

**Resource Graph Query/Scripts**

{{< collapse title="Show/Hide Query/Script" >}}

{{< code lang="sql" file="code/apim-3/apim-3.kql" >}} {{< /code >}}

{{< /collapse >}}

<br><br>
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// Azure Resource Graph Query
// Find all API Management instances that aren't upgraded to platform version stv2
resources
| where type =~ 'Microsoft.ApiManagement/service'
| extend plat_version = properties.platformVersion
| extend skuName = sku.name
| where tolower(plat_version) != tolower('stv2')
| project recommendationId = "apim-3", name, id, tags, param1=strcat("Platform Version: ", plat_version) , param2=strcat("SKU: ", skuName)
Loading

0 comments on commit c624770

Please sign in to comment.