From a7f28162936e2036b6223fe0739ffcae75cea9e9 Mon Sep 17 00:00:00 2001 From: Nikolay Date: Thu, 11 May 2023 18:59:14 +0200 Subject: [PATCH 1/3] cmd to run JPQD in DDP mode --- examples/openvino/audio-classification/README.md | 6 +++--- examples/openvino/image-classification/README.md | 4 ++-- examples/openvino/question-answering/README.md | 4 ++-- examples/openvino/text-classification/README.md | 3 ++- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git a/examples/openvino/audio-classification/README.md b/examples/openvino/audio-classification/README.md index 39896a6ac..562dfa096 100644 --- a/examples/openvino/audio-classification/README.md +++ b/examples/openvino/audio-classification/README.md @@ -18,7 +18,7 @@ limitations under the License. This folder contains [`run_audio_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/openvino/audio-classification/run_audio_classification.py), a script to fine-tune a 🤗 Transformers model on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset while applying Quantization-Aware Training (QAT). QAT can be easily applied by replacing the Transformers [`Trainer`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#trainer) with the Optimum [`OVTrainer`]. Any model from our [hub](https://huggingface.co/models) can be fine-tuned and quantized, as long as the model is supported by the [`AutoModelForAudioClassification`](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForAudioClassification) API. -### Fintuning Wav2Vec2 on Keyword Spotting with QAT +### Fine-tuning Wav2Vec2 on Keyword Spotting with QAT The following command shows how to fine-tune [Wav2Vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset with Quantization-Aware Training (QAT). The `OVTrainer` uses a default quantization configuration which should work in many cases, but we can also customize the algorithm details. Here, we quantize the Wav2Vec2-base model with a custom configuration file specified by `--nncf_compression_config`. For more details on the quantization configuration, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md). @@ -60,7 +60,7 @@ On a single V100 GPU, this script should run in ~45 minutes and yield a quantize `OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Wav2Vec2-base model for keyword spotting task. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). ```bash -python run_audio_classification.py \ +torchrun --nproc-per-node=1 run_audio_classification.py \ --model_name_or_path facebook/wav2vec2-base \ --teacher_model_name_or_path anton-l/wav2vec2-base-ft-keyword-spotting \ --nncf_compression_config configs/wav2vec2-base-jpqd.json \ @@ -92,4 +92,4 @@ python run_audio_classification.py \ --seed 0 ``` -This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. +This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. For launching script on multiple GPU specify `--nproc-per-node=`. diff --git a/examples/openvino/image-classification/README.md b/examples/openvino/image-classification/README.md index 6b42fdbe4..37e080e45 100644 --- a/examples/openvino/image-classification/README.md +++ b/examples/openvino/image-classification/README.md @@ -48,7 +48,7 @@ On a single V100 GPU, this example takes about 1 minute and yields a quantized m `OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Swin-base model for food101 dataset. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). ```bash -python run_image_classification.py \ +torchrun --nproc-per-node=1 run_image_classification.py \ --model_name_or_path microsoft/swin-base-patch4-window7-224 \ --teacher_model_name_or_path skylord/swin-finetuned-food101 \ --distillation_weight 0.9 \ @@ -75,4 +75,4 @@ python run_image_classification.py \ --nncf_compression_config configs/swin-base-jpqd.json ``` -This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. +This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. For launching script on multiple GPU specify `--nproc-per-node=`. diff --git a/examples/openvino/question-answering/README.md b/examples/openvino/question-answering/README.md index 24ac373c6..35a498caa 100644 --- a/examples/openvino/question-answering/README.md +++ b/examples/openvino/question-answering/README.md @@ -47,7 +47,7 @@ python run_qa.py \ ``` ### Joint Pruning, Quantization and Distillation (JPQD) for BERT on SQuAD1.0 -`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. +`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. For launching script on multiple GPU specify `--nproc-per-node=`. More on how to configure movement sparsity, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). @@ -57,7 +57,7 @@ To run the JPQD example, please install optimum-intel from source. This command ``` ```bash -python run_qa.py \ +torchrun --nproc-per-node=1 run_qa.py \ --model_name_or_path bert-base-uncased \ --dataset_name squad \ --teacher_model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \ diff --git a/examples/openvino/text-classification/README.md b/examples/openvino/text-classification/README.md index d10d8f743..47faf87e7 100644 --- a/examples/openvino/text-classification/README.md +++ b/examples/openvino/text-classification/README.md @@ -58,7 +58,7 @@ To run the JPQD example, please install optimum-intel from source. This command ```bash TASK_NAME=sst2 -python run_glue.py \ +torchrun --nproc-per-node=1 run_glue.py \ --model_name_or_path bert-base-uncased \ --task_name $TASK_NAME \ --teacher_model_name_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \ @@ -83,3 +83,4 @@ python run_glue.py \ ``` On a single V100 GPU, this script should run in ~1.8 hours, and yield accuracy of **92.2%** with ~40% of the weights of the Transformer blocks pruned. +For launching script on multiple GPU specify `--nproc-per-node=`. From 79d92a35e30a3fb4114d412b1f2b0cc728f07400 Mon Sep 17 00:00:00 2001 From: Nikolay Date: Fri, 19 May 2023 14:45:01 +0200 Subject: [PATCH 2/3] suggestion from Alexander and note about hyperparameters tuning --- examples/openvino/audio-classification/README.md | 2 +- examples/openvino/image-classification/README.md | 2 +- examples/openvino/question-answering/README.md | 5 +++-- examples/openvino/text-classification/README.md | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/examples/openvino/audio-classification/README.md b/examples/openvino/audio-classification/README.md index 562dfa096..1a6f1ddce 100644 --- a/examples/openvino/audio-classification/README.md +++ b/examples/openvino/audio-classification/README.md @@ -92,4 +92,4 @@ torchrun --nproc-per-node=1 run_audio_classification.py \ --seed 0 ``` -This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. For launching script on multiple GPU specify `--nproc-per-node=`. +This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU. diff --git a/examples/openvino/image-classification/README.md b/examples/openvino/image-classification/README.md index 37e080e45..25d7cbc54 100644 --- a/examples/openvino/image-classification/README.md +++ b/examples/openvino/image-classification/README.md @@ -75,4 +75,4 @@ torchrun --nproc-per-node=1 run_image_classification.py \ --nncf_compression_config configs/swin-base-jpqd.json ``` -This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. For launching script on multiple GPU specify `--nproc-per-node=`. +This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU. diff --git a/examples/openvino/question-answering/README.md b/examples/openvino/question-answering/README.md index 35a498caa..ae54086be 100644 --- a/examples/openvino/question-answering/README.md +++ b/examples/openvino/question-answering/README.md @@ -47,13 +47,14 @@ python run_qa.py \ ``` ### Joint Pruning, Quantization and Distillation (JPQD) for BERT on SQuAD1.0 -`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. For launching script on multiple GPU specify `--nproc-per-node=`. +`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters qmight be required to achieve the same results as on a single GPU. More on how to configure movement sparsity, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). To run the JPQD example, please install optimum-intel from source. This command will install or upgrade optimum-intel and all necessary dependencies: -```python -m pip install --upgrade "git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[openvino, nncf]" +```python +python -m pip install --upgrade "git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[openvino, nncf]" ``` ```bash diff --git a/examples/openvino/text-classification/README.md b/examples/openvino/text-classification/README.md index 47faf87e7..0128220c8 100644 --- a/examples/openvino/text-classification/README.md +++ b/examples/openvino/text-classification/README.md @@ -83,4 +83,4 @@ torchrun --nproc-per-node=1 run_glue.py \ ``` On a single V100 GPU, this script should run in ~1.8 hours, and yield accuracy of **92.2%** with ~40% of the weights of the Transformer blocks pruned. -For launching script on multiple GPU specify `--nproc-per-node=`. +For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU. From cfca9d4fbfc3fc6750a416e472d1abf23da0b194 Mon Sep 17 00:00:00 2001 From: Lyalyushkin Nikolay Date: Fri, 19 May 2023 15:08:45 +0200 Subject: [PATCH 3/3] removed not needed instructions --- examples/openvino/question-answering/README.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/examples/openvino/question-answering/README.md b/examples/openvino/question-answering/README.md index ae54086be..c57d332e6 100644 --- a/examples/openvino/question-answering/README.md +++ b/examples/openvino/question-answering/README.md @@ -51,12 +51,6 @@ python run_qa.py \ More on how to configure movement sparsity, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). -To run the JPQD example, please install optimum-intel from source. This command will install or upgrade optimum-intel and all necessary dependencies: - -```python -python -m pip install --upgrade "git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[openvino, nncf]" -``` - ```bash torchrun --nproc-per-node=1 run_qa.py \ --model_name_or_path bert-base-uncased \