In this tutorial, you will:
- Provision a fully functional environment in your own Azure subscription
- Run a sample federated learning pipeline in Azure ML
To enjoy this quickstart, you will need to:
- have an active Azure subscription that you can use for development purposes,
- have permissions to create resources, set permissions, and create identities in this subscription (or at least in one resource group),
- Note that to set permissions, you typically need Owner role in the subscription or resource group - Contributor role is not enough. This is key for being able to secure the setup.
- install the Azure CLI.
Click on the buttons below depending on your goal. It will open in Azure Portal a page to deploy the resources in your subscription.
Notes:
- If someone already provisioned a demo with the same name in your subscription, change Demo Base Name parameter to a unique value.
- If you need to provision GPU's instead of CPU's, you can just use a GPU SKU value for the "Compute SKU" parameter,
Standard_NC12s_v3
for instance. An overview of the GPU SKU's available in Azure can be found here. Beware though, SKU availability may vary depending on the region you choose, so you may have to use different Azure regions instead of the default ones.
In this section, we will use bicep scripts to automatically provision a minimal set of resources for an FL sandbox demo.
This will help you provision a Federated Learning setup with internal silos, i.e. silos that are in the same Azure tenant as the orchestrator. You will be able to use this setup to run the examples in the ./examples/pipelines
directory.
In this setup, the communications between the silos and the orchestrator are secure, and the silos will not have any access to the other silos' data.
We will provision:
- 1 Azure ML workspace
- 1 CPU cluster and 1 blob storage account for the orchestrator
- 3 internal silos in 3 different regions (
westus
,francecentral
,brazilsouth
) with their respective compute cluster and storage account - 4 user assigned identifies (1 for orchestration, 1 for each silo) to restrict access to the silo's storage accounts.
-
Using the
az
cli, log into your Azure subscription:az login az account set --name <subscription name>
-
Optional: Create a new resource group for the demo resources. Having a new group would make it easier to delete the resources afterwards (deleting this RG will delete all resources within).
# create a resource group for the resources az group create --name <resource group name> --location <region>
Notes:
- If you have Owner role only in a given resource group (as opposed to in the whole subscription), just use that resource group instead of creating a new one.
-
Run the bicep deployment script in a resource group you own:
# deploy the demo resources in your resource group az deployment group create --template-file ./mlops/bicep/open_sandbox_setup.bicep --resource-group <resource group name> --parameters demoBaseName="fldemo"
Notes:
- If someone already provisioned a demo with the same name in your subscription, change
demoBaseName
parameter to a unique value. ⚠️ IMPORTANT⚠️ This setup is intended only for demo purposes. The data is still accessible by the users of your subscription when opening the storage accounts, and data exfiltration is possible.⚠️ EXPERIMENTAL⚠️ alternatively, you can try provisioning a sandbox where the silos storages are kept eyes-off by a private service endpoint, accessible only by the silo compute through a vnet. To try it out, use template filemlops/bicep/vnet_publicip_sandbox_setup.bicep
instead. All the code samples below remains the same. Please check the header of that bicep file to understand its capabilities and limitations.
- If someone already provisioned a demo with the same name in your subscription, change
In this section, we'll use a sample python script to submit a federated learning experiment to Azure ML. The script will need to connect to your newly created Azure ML workspace first.
-
Create a conda environment with all the python dependencies, then activate it.
conda env create --file ./examples/pipelines/environment.yml conda activate fl_experiment_conda_env
Alternatively, you can install the dependencies directly:
python -m pip install -r ./examples/pipelines/requirements.txt
-
To connect to your newly created Azure ML workspace, you'll need to provide the following info in the sample python script as CLI arguments.
python ./examples/pipelines/fl_cross_silo_literal/submit.py --subscription_id <subscription_id> --resource_group <resource_group> --workspace_name <workspace_name> --example MNIST --submit
Note: you can also create a
config.json
file at the root of this repo to provide the above information. Follow the instructions on how to get this from the Azure ML documentation.{ "subscription_id": "<subscription-id>", "resource_group": "<resource-group>", "workspace_name": "<workspace-name>" }
Note: The
config.json
is in our.gitignore
to avoid pushing it to git.
The script will submit the experiment to Azure ML. It should open a direct link to the experiment in the Azure ML UI.
If not, the script will print the URL to use in clear:
Submitting the pipeline job to your AzureML workspace...
Uploading preprocessing (0.01 MBs): 100%|#######################################| 7282/7282 [00:00<00:00, 23820.31it/s]
Uploading traininsilo (0.01 MBs): 100%|#########################################| 9953/9953 [00:00<00:00, 32014.81it/s]
Uploading aggregatemodelweights (0.01 MBs): 100%|###############################| 5514/5514 [00:00<00:00, 14065.83it/s]
The url to see your live job running is returned by the sdk:
https://ml.azure.com/runs/.....
Go to the above URL and your pipeline would look similar to this:
If you want to look at the pipeline metrics, go to the "Job overview" (top-right corner) and then click on the "Metrics(preview)". The following screenshot shows what that would look like.
You can also create your own custom graph by clicking on the "Create custom chart" icon. Here is a sample custom chart showing the "Training Loss" of multiple silos in one graph.