Scenario - This is a short example where we showcase possibilities of using Azure Machine Learning(AML) for training a model for credit card fraud detection in federating learning fashion. The example utilizes multitude of model architectures to demonstrate versatility of the proposed solution on a typical use case for the finance indutry. We have simulated a FL scenario by splitting the data into distinct geo-location. The sample provides a simple implementation for preprocessing on tabular data.
Dataset - This example is trained using the Kaggle dataset Credit Card Transactions Fraud Detection Dataset. This dataset is generated using a simulation that contains both genuine and fraudulent transactions.
You'll need python to submit experiments to AzureML. You can install the required dependencies by running:
conda env create --file ./examples/pipelines/environment.yml
conda activate fl_experiment_conda_env
Alternatively, you can just install the required dependencies:
python -m pip install -r ./examples/pipelines/requirements.txt
To run this example, you will need to provision an AzureML workspace ready for Federated Learning. We strongly recommend you use the setup provided in the repository quickstart. We will use the same names for the computes and datastores created by default during this quickstart.
📓 take note of your workspace name, resource group and subscription id. You will need them to submit the experiment.
In the next section, we will run a job in the AzureML workspace that will unpack the demo dataset from Kaggle into each of your silos.
Kaggle requires a username and an API key, so we will first store safely those credentials in the workspace key vault.
- Let's first obtain your AAD identifier (object id) by running the following command. We'll use it in the next step.
az ad signed-in-user show | jq ".id"
- Create a new key vault policy for yourself, and grant permissions to list, set & delete secrets.
az keyvault set-policy -n <key-vault-name> --secret-permissions list set delete --object-id <object-id>
Note: The AML workspace you created with the aforementioned script contains the name of the key vault. Default is
kv-fldemo
.
- With your newly created permissions, you can now create a secret to store the
kaggleusername
.
az keyvault secret set --name kaggleusername --vault-name <key-vault-name> --value <kaggle-username>
Make sure to provide your Kaggle Username.
- Create a secret to store the
kagglekey
.
az keyvault secret set --name kagglekey --vault-name <key-vault-name> --value <kaggle-api-token>
Make sure to provide the Kaggle API Token.
-
In your resource group (provisioned in the previous step), open "Access Policies" tab in the newly created key vault and click "Create".
-
Select List, Set & Delete right under "Secret Management Operations" and press "Next".
-
Lookup currently logged in user (using user id or an email), select it and press "Next".
-
Press "Next" and "Create" in the next screens.
We are now able to create a secret in the key vault.
-
Open the "Secrets" tab. Create two plain text secrets:
- kaggleusername - specifies your Kaggle user name
- kagglekey - this is the API key that can be obtained from your account page on the Kaggle website.
This can all be performed with ease using a data provisioning pipeline. To run it follow these steps:
-
If you are not using the quickstart setup, adjust the config file
config.yaml
inexamples/pipelines/utils/upload_data/
to match your setup. -
Submit the experiment by running:
python ./examples/pipelines/utils/upload_data/submit.py --submit --example CCFRAUD --workspace_name "<workspace-name>" --resource_group "<resource-group-name>" --subscription_id "<subscription-id>"
⭐ you can simplify this command by entering your workspace details in the file
config.yaml
in this same directory.
-
If you are not using the quickstart setup, adjust the config file
config.yaml
inexamples/pipelines/ccfraud/
to match your setup. -
Submit the FL experiment by running:
python ./examples/pipelines/ccfraud/submit.py --submit --workspace_name "<workspace-name>" --resource_group "<resource-group-name>" --subscription_id "<subscription-id>"
⭐ you can simplify this command by entering your workspace details in the file
config.yaml
in this same directory.
This sample experiment provides multiple models you can try:
- SimpleLinear : a model fully composed of
torch.Linear
layers withReLU
activations, takes data as-is sample-by-sample - SimpleLSTM : a model composed by 4 LSTM layers connected to linear head with architecture similar to SimpleLinear, takes data ordered by time in sequences that overlap each other
- SimpleVAE : a model composed of 2 encoder LSTM layers and 2 decoder LSTM layers that tries to recreate consumed sequence of transactions, the latent space created by encoder is consumed by a linear layer to perform prediction, takes data ordered by time in sequences that overlap each other
To switch between models, please update the config.yaml
file in examples/pipelines/ccfraud/
. Look for the field model_name
in the training_parameters
section (use SimpleLinear
, SimpleLSTM
, or SimpleVAE
).