huggingface · echarlaix · Jun 7, 2023 · May 11, 2023 · May 19, 2023 · May 19, 2023
diff --git a/examples/openvino/audio-classification/README.md b/examples/openvino/audio-classification/README.md
@@ -18,7 +18,7 @@ limitations under the License.
 
 This folder contains [`run_audio_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/openvino/audio-classification/run_audio_classification.py), a script to fine-tune a 🤗 Transformers model on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset while applying Quantization-Aware Training (QAT). QAT can be easily applied by replacing the Transformers [`Trainer`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#trainer) with the Optimum [`OVTrainer`]. Any model from our [hub](https://huggingface.co/models) can be fine-tuned and quantized, as long as the model is supported by the [`AutoModelForAudioClassification`](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForAudioClassification) API.
 
-### Fintuning Wav2Vec2 on Keyword Spotting with QAT
+### Fine-tuning Wav2Vec2 on Keyword Spotting with QAT
 
 The following command shows how to fine-tune [Wav2Vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset with Quantization-Aware Training (QAT). The `OVTrainer` uses a default quantization configuration which should work in many cases, but we can also customize the algorithm details. Here, we quantize the Wav2Vec2-base model with a custom configuration file specified by `--nncf_compression_config`. For more details on the quantization configuration, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
 
@@ -60,7 +60,7 @@ On a single V100 GPU, this script should run in ~45 minutes and yield a quantize
 `OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Wav2Vec2-base model for keyword spotting task. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md).
 
 ```bash
-python run_audio_classification.py \
+torchrun --nproc-per-node=1 run_audio_classification.py \
     --model_name_or_path facebook/wav2vec2-base \
     --teacher_model_name_or_path anton-l/wav2vec2-base-ft-keyword-spotting \
     --nncf_compression_config configs/wav2vec2-base-jpqd.json \
@@ -92,4 +92,4 @@ python run_audio_classification.py \
     --seed 0
 ```
 
-This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%.
+This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.
diff --git a/examples/openvino/image-classification/README.md b/examples/openvino/image-classification/README.md
@@ -48,7 +48,7 @@ On a single V100 GPU, this example takes about 1 minute and yields a quantized m
 `OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Swin-base model for food101 dataset. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md).
 
 ```bash
-python run_image_classification.py \
+torchrun --nproc-per-node=1 run_image_classification.py \
     --model_name_or_path microsoft/swin-base-patch4-window7-224 \
     --teacher_model_name_or_path skylord/swin-finetuned-food101 \
     --distillation_weight 0.9 \
@@ -75,4 +75,4 @@ python run_image_classification.py \
     --nncf_compression_config configs/swin-base-jpqd.json
 ```
 
-This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU.
+This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.
diff --git a/examples/openvino/question-answering/README.md b/examples/openvino/question-answering/README.md
@@ -47,17 +47,12 @@ python run_qa.py \
 ```
 
 ### Joint Pruning, Quantization and Distillation (JPQD) for BERT on SQuAD1.0
-`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned.
+`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters qmight be required to achieve the same results as on a single GPU.
 
 More on how to configure movement sparsity, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md).
 
-To run the JPQD example, please install optimum-intel from source. This command will install or upgrade optimum-intel and all necessary dependencies:
-
-```python -m pip install --upgrade "git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[openvino, nncf]"
-```
-
 ```bash
-python run_qa.py \
+torchrun --nproc-per-node=1 run_qa.py \
     --model_name_or_path bert-base-uncased \
     --dataset_name squad \
     --teacher_model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \

diff --git a/examples/openvino/text-classification/README.md b/examples/openvino/text-classification/README.md
@@ -58,7 +58,7 @@ To run the JPQD example, please install optimum-intel from source. This command
 
 ```bash
 TASK_NAME=sst2
-python run_glue.py \
+torchrun --nproc-per-node=1 run_glue.py \
     --model_name_or_path bert-base-uncased \
     --task_name $TASK_NAME \
     --teacher_model_name_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
@@ -83,3 +83,4 @@ python run_glue.py \
 ```
 
 On a single V100 GPU, this script should run in ~1.8 hours, and yield accuracy of **92.2%** with ~40% of the weights of the Transformer blocks pruned.
+For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.