Skip to content

Latest commit

 

History

History
 
 

huggingface-gpt2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Accelerate GPT2 fine-tuning with ONNX Runtime Training

This example uses ONNX Runtime Training to fine-tune the GPT2 PyTorch model maintained at https://github.com/huggingface/transformers.

You can run the training in Azure Machine Learning or in other environments.

Setup

  1. Clone this repo

    git clone https://github.com/microsoft/onnxruntime-training-examples.git
    cd onnxruntime-training-examples/huggingface-gpt2
  2. Clone download code and model from the HuggingFace repo

    git clone https://github.com/huggingface/transformers.git
    cd transformers/
    git checkout 9a0a8c1c6f4f2f0c80ff07d36713a3ada785eec5
  3. Update with ORT changes

    git apply ../ort_addon/src_changes.patch
    cp -r ../ort_addon/ort_supplement/* ./
    cd ..
  4. Build the Docker image

    Install the dependencies of the transformer examples and modified transformers into the base ORT Docker image.

    docker build --network=host -f docker/Dockerfile . --rm --pull -t onnxruntime-gpt

Download and prepare data

The following are a minimal set of instructions to download one of the datasets used for GPT2 fine-tuning for the language modeling task.

Download the word-level dataset WikiText-103 for this sample. Refer to the readme at transformers for additional details.

Download the data and export path as $DATA_DIR:

    export DATA_DIR=/path/to/downloaded/data/
  • TRAIN_FILE: $DATA_DIR/wiki.train.tokens
  • TEST_FILE: $DATA_DIR/wiki.test.tokens

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in Azure Machine Learning

  1. Data Transfer

    • Transfer training data to Azure blob storage

    To transfer the data to an Azure blob storage using Azure CLI, use command:

    az storage blob upload-batch --account-name <storage-name> -d <container-name> -s $DATA_DIR

    You can also use azcopy or Azure Storage Explorer to copy data. We recommend that you download the data in the training environment itself or in an environment from where data transfer to training environment will be fast and efficient.

    • Register the blob container as a data store
    • Mount the data store in the compute targets used for training

    Please refer to the storage guidance for details on using Azure storage account for training in Azure Machine Learning.

  2. Prepare the docker image for AML

    Follow the instructions in setup to build a docker image with the required dependencies installed.

  3. Execute fine-tuning

    The GPT2 fine-tuning job in Azure Machine Learning can be launched using either of these environments:

    • Azure Machine Learning Compute Instance to run the Jupyter notebook.
    • Azure Machine Learning SDK

    You will need a GPU optimized compute target - either NCv3 or NDv2 series, to execute this fine-tuning job.

    Execute the steps in the Python notebook azureml-notebooks/run-finetuning.ipynb within your environment. If you have a local setup to run an Azure ML notebook, you could run the steps in the notebook in that environment. Otherwise, a compute instance in Azure Machine Learning could be created and used to run the steps.

GPT2 Language Modeling fine-tuning with ONNX Runtime Training in other environments

We recommend running this sample on a system with at least one NVIDIA GPU.

  1. Check pre-requisites

    • CUDA 10.1
    • Docker
  2. Build the docker image

    Follow the instructions in setup to build a docker image with the required dependencies installed.

    The base Docker image used is mcr.microsoft.com/azureml/onnxruntime-training. The Docker image is tested in AzureML environment. For running the examples in other environments, building a new base Docker image may be necessary by following the directions in the nvidia-bert sample.

    To build and install the onnxruntime wheel on the host machine, follow steps here

  3. Set correct paths to training data for docker image

    Edit docker/launch.sh.

    ...
    DATA_DIR=<replace-with-path-to-training-data>
    ...

    The directory must contain the training and validation files.

  4. Set the number of GPUs

    Edit transformers/scripts/run_lm_gpt2.sh.

    num_gpus=4
  5. Modify other training parameters as needed

    Edit transformers/scripts/run_lm_gpt2.sh.

        --model_type=gpt2
        --model_name_or_path=gpt2
        --tokenizer_name=gpt2  
        --config_name=gpt2  
        --per_gpu_train_batch_size=1  
        --per_gpu_eval_batch_size=4  
        --gradient_accumulation_steps=16
        --block_size=1024  
        --weight_decay=0.01
        --logging_steps=100
        --num_train_epochs=5

    Consult the huggingface transformers training_args for additional details.

  6. Launch interactive container

    bash docker/launch.sh
  7. Launch the fine-tuning run

    bash /workspace/transformers/scripts/run_lm_gpt2.sh

    If you get memory errors, try reducing the batch size. You can find the recommended batch sizes for ORT here. If the flags enabling evaluation and the evaluation data file are passed, the training is followed by evaluation and the perplexity is printed.