diff --git a/examples/openvino/audio-classification/README.md b/examples/openvino/audio-classification/README.md index 39896a6ac..1a6f1ddce 100644 --- a/examples/openvino/audio-classification/README.md +++ b/examples/openvino/audio-classification/README.md @@ -18,7 +18,7 @@ limitations under the License. This folder contains [`run_audio_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/openvino/audio-classification/run_audio_classification.py), a script to fine-tune a 🤗 Transformers model on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset while applying Quantization-Aware Training (QAT). QAT can be easily applied by replacing the Transformers [`Trainer`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#trainer) with the Optimum [`OVTrainer`]. Any model from our [hub](https://huggingface.co/models) can be fine-tuned and quantized, as long as the model is supported by the [`AutoModelForAudioClassification`](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForAudioClassification) API. -### Fintuning Wav2Vec2 on Keyword Spotting with QAT +### Fine-tuning Wav2Vec2 on Keyword Spotting with QAT The following command shows how to fine-tune [Wav2Vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset with Quantization-Aware Training (QAT). The `OVTrainer` uses a default quantization configuration which should work in many cases, but we can also customize the algorithm details. Here, we quantize the Wav2Vec2-base model with a custom configuration file specified by `--nncf_compression_config`. For more details on the quantization configuration, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md). @@ -60,7 +60,7 @@ On a single V100 GPU, this script should run in ~45 minutes and yield a quantize `OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Wav2Vec2-base model for keyword spotting task. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). ```bash -python run_audio_classification.py \ +torchrun --nproc-per-node=1 run_audio_classification.py \ --model_name_or_path facebook/wav2vec2-base \ --teacher_model_name_or_path anton-l/wav2vec2-base-ft-keyword-spotting \ --nncf_compression_config configs/wav2vec2-base-jpqd.json \ @@ -92,4 +92,4 @@ python run_audio_classification.py \ --seed 0 ``` -This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. +This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU. diff --git a/examples/openvino/image-classification/README.md b/examples/openvino/image-classification/README.md index 6b42fdbe4..25d7cbc54 100644 --- a/examples/openvino/image-classification/README.md +++ b/examples/openvino/image-classification/README.md @@ -48,7 +48,7 @@ On a single V100 GPU, this example takes about 1 minute and yields a quantized m `OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Swin-base model for food101 dataset. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). ```bash -python run_image_classification.py \ +torchrun --nproc-per-node=1 run_image_classification.py \ --model_name_or_path microsoft/swin-base-patch4-window7-224 \ --teacher_model_name_or_path skylord/swin-finetuned-food101 \ --distillation_weight 0.9 \ @@ -75,4 +75,4 @@ python run_image_classification.py \ --nncf_compression_config configs/swin-base-jpqd.json ``` -This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. +This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU. diff --git a/examples/openvino/question-answering/README.md b/examples/openvino/question-answering/README.md index 24ac373c6..c57d332e6 100644 --- a/examples/openvino/question-answering/README.md +++ b/examples/openvino/question-answering/README.md @@ -47,17 +47,12 @@ python run_qa.py \ ``` ### Joint Pruning, Quantization and Distillation (JPQD) for BERT on SQuAD1.0 -`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. +`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters qmight be required to achieve the same results as on a single GPU. More on how to configure movement sparsity, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md). -To run the JPQD example, please install optimum-intel from source. This command will install or upgrade optimum-intel and all necessary dependencies: - -```python -m pip install --upgrade "git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[openvino, nncf]" -``` - ```bash -python run_qa.py \ +torchrun --nproc-per-node=1 run_qa.py \ --model_name_or_path bert-base-uncased \ --dataset_name squad \ --teacher_model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \ diff --git a/examples/openvino/text-classification/README.md b/examples/openvino/text-classification/README.md index d10d8f743..0128220c8 100644 --- a/examples/openvino/text-classification/README.md +++ b/examples/openvino/text-classification/README.md @@ -58,7 +58,7 @@ To run the JPQD example, please install optimum-intel from source. This command ```bash TASK_NAME=sst2 -python run_glue.py \ +torchrun --nproc-per-node=1 run_glue.py \ --model_name_or_path bert-base-uncased \ --task_name $TASK_NAME \ --teacher_model_name_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \ @@ -83,3 +83,4 @@ python run_glue.py \ ``` On a single V100 GPU, this script should run in ~1.8 hours, and yield accuracy of **92.2%** with ~40% of the weights of the Transformer blocks pruned. +For launching the script on multiple GPUs specify `--nproc-per-node=`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.