Name	Name	Last commit message	Last commit date
parent directory ..
azureml-notebooks	azureml-notebooks
docker	docker
ort_addon	ort_addon
README.md	README.md

Accelerate GPT2 fine-tuning with ONNX Runtime Training

This example uses ONNX Runtime Training to fine-tune the GPT2 PyTorch model maintained at https://github.com/huggingface/transformers.

You can run the training in Azure Machine Learning or in other environments.

Setup

Clone this repo

git clone https://github.com/microsoft/onnxruntime-training-examples.git
cd onnxruntime-training-examples/huggingface-gpt2

Clone download code and model from the HuggingFace repo

git clone https://github.com/huggingface/transformers.git
cd transformers/
git checkout 9a0a8c1c6f4f2f0c80ff07d36713a3ada785eec5

Update with ORT changes

git apply ../ort_addon/src_changes.patch
cp -r ../ort_addon/ort_supplement/* ./
cd ..

Build the Docker image

Install the dependencies of the transformer examples and modified transformers into the base ORT Docker image.
```
docker build --network=host -f docker/Dockerfile . --rm --pull -t onnxruntime-gpt
```

Download and prepare data

The following are a minimal set of instructions to download one of the datasets used for GPT2 fine-tuning for the language modeling task.

Download the word-level dataset WikiText-103 for this sample. Refer to the readme at transformers for additional details.

Download the data and export path as $DATA_DIR:

    export DATA_DIR=/path/to/downloaded/data/

TRAIN_FILE: $DATA_DIR/wiki.train.tokens
TEST_FILE: $DATA_DIR/wiki.test.tokens

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in Azure Machine Learning

Data Transfer
- Transfer training data to Azure blob storage
To transfer the data to an Azure blob storage using Azure CLI, use command:
```
az storage blob upload-batch --account-name <storage-name> -d <container-name> -s $DATA_DIR
```
You can also use azcopy or Azure Storage Explorer to copy data. We recommend that you download the data in the training environment itself or in an environment from where data transfer to training environment will be fast and efficient.
- Register the blob container as a data store
- Mount the data store in the compute targets used for training
Please refer to the storage guidance for details on using Azure storage account for training in Azure Machine Learning.
Prepare the docker image for AML

Follow the instructions in setup to build a docker image with the required dependencies installed.
- Push the image to a container registry. You can find additional details about tagging the image and pushing to an Azure Container Registry.
Execute fine-tuning

The GPT2 fine-tuning job in Azure Machine Learning can be launched using either of these environments:
- Azure Machine Learning Compute Instance to run the Jupyter notebook.
- Azure Machine Learning SDK
You will need a GPU optimized compute target - either NCv3 or NDv2 series, to execute this fine-tuning job.

Execute the steps in the Python notebook azureml-notebooks/run-finetuning.ipynb within your environment. If you have a local setup to run an Azure ML notebook, you could run the steps in the notebook in that environment. Otherwise, a compute instance in Azure Machine Learning could be created and used to run the steps.

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in other environments

We recommend running this sample on a system with at least one NVIDIA GPU.

Check pre-requisites
- CUDA 10.1
- Docker
Build the docker image

Follow the instructions in setup to build a docker image with the required dependencies installed.

The base Docker image used is mcr.microsoft.com/azureml/onnxruntime-training. The Docker image is tested in AzureML environment. For running the examples in other environments, building a new base Docker image may be necessary by following the directions in the nvidia-bert sample.

To build and install the onnxruntime wheel on the host machine, follow steps here
Set correct paths to training data for docker image

Edit docker/launch.sh.
```
...
DATA_DIR=<replace-with-path-to-training-data>
...
```
The directory must contain the training and validation files.
Set the number of GPUs

Edit transformers/scripts/run_lm_gpt2.sh.
```
num_gpus=4
```

Modify other training parameters as needed

Edit transformers/scripts/run_lm_gpt2.sh.

    --model_type=gpt2
    --model_name_or_path=gpt2
    --tokenizer_name=gpt2  
    --config_name=gpt2  
    --per_gpu_train_batch_size=1  
    --per_gpu_eval_batch_size=4  
    --gradient_accumulation_steps=16
    --block_size=1024  
    --weight_decay=0.01
    --logging_steps=100
    --num_train_epochs=5

Consult the huggingface transformers training_args for additional details.

Launch interactive container
```
bash docker/launch.sh
```
Launch the fine-tuning run
```
bash /workspace/transformers/scripts/run_lm_gpt2.sh
```
If you get memory errors, try reducing the batch size. You can find the recommended batch sizes for ORT here. If the flags enabling evaluation and the evaluation data file are passed, the training is followed by evaluation and the perplexity is printed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

huggingface-gpt2

huggingface-gpt2

README.md

Accelerate GPT2 fine-tuning with ONNX Runtime Training

Setup

Download and prepare data

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in Azure Machine Learning

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in other environments

Files

huggingface-gpt2

Directory actions

More options

Directory actions

More options

Latest commit

History

huggingface-gpt2

Folders and files

parent directory

README.md

Accelerate GPT2 fine-tuning with ONNX Runtime Training

Setup

Download and prepare data

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in Azure Machine Learning

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in other environments