Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrected docs to run JPQD in DDP mode #315

Merged
merged 3 commits into from
Jun 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/openvino/audio-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ limitations under the License.

This folder contains [`run_audio_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/openvino/audio-classification/run_audio_classification.py), a script to fine-tune a 🤗 Transformers model on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset while applying Quantization-Aware Training (QAT). QAT can be easily applied by replacing the Transformers [`Trainer`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#trainer) with the Optimum [`OVTrainer`]. Any model from our [hub](https://huggingface.co/models) can be fine-tuned and quantized, as long as the model is supported by the [`AutoModelForAudioClassification`](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForAudioClassification) API.

### Fintuning Wav2Vec2 on Keyword Spotting with QAT
### Fine-tuning Wav2Vec2 on Keyword Spotting with QAT

The following command shows how to fine-tune [Wav2Vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset with Quantization-Aware Training (QAT). The `OVTrainer` uses a default quantization configuration which should work in many cases, but we can also customize the algorithm details. Here, we quantize the Wav2Vec2-base model with a custom configuration file specified by `--nncf_compression_config`. For more details on the quantization configuration, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).

Expand Down Expand Up @@ -60,7 +60,7 @@ On a single V100 GPU, this script should run in ~45 minutes and yield a quantize
`OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Wav2Vec2-base model for keyword spotting task. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md).

```bash
python run_audio_classification.py \
torchrun --nproc-per-node=1 run_audio_classification.py \
--model_name_or_path facebook/wav2vec2-base \
--teacher_model_name_or_path anton-l/wav2vec2-base-ft-keyword-spotting \
--nncf_compression_config configs/wav2vec2-base-jpqd.json \
Expand Down Expand Up @@ -92,4 +92,4 @@ python run_audio_classification.py \
--seed 0
```

This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%.
This script should take about 3 hours on a single V100 GPU and produce a quantized Wav2Vec2-base model with ~80% structured sparsity in its linear layers. The model accuracy should converge to about 97.5%. For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vuiseng9 @yujiepan-work @AlexKoff88 mentioned necessity for hyperparameters tuning in case of multiple GPU

4 changes: 2 additions & 2 deletions examples/openvino/image-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ On a single V100 GPU, this example takes about 1 minute and yields a quantized m
`OVTrainer` also provides advanced optimization workflow via NNCF to structurally prune, quantize and distill. Following is an example of joint pruning, quantization and distillation on Swin-base model for food101 dataset. To enable JPQD optimization, use an alternative configuration specified with `--nncf_compression_config`. For more details on how to configure the pruning algorithm, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md).

```bash
python run_image_classification.py \
torchrun --nproc-per-node=1 run_image_classification.py \
--model_name_or_path microsoft/swin-base-patch4-window7-224 \
--teacher_model_name_or_path skylord/swin-finetuned-food101 \
--distillation_weight 0.9 \
Expand All @@ -75,4 +75,4 @@ python run_image_classification.py \
--nncf_compression_config configs/swin-base-jpqd.json
```

This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU.
This example results in a quantized swin-base model with ~40% sparsity in its linear layers of the transformer blocks, giving 90.7% accuracy on food101 and taking about 12.5 hours on a single V100 GPU. For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.
9 changes: 2 additions & 7 deletions examples/openvino/question-answering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,12 @@ python run_qa.py \
```

### Joint Pruning, Quantization and Distillation (JPQD) for BERT on SQuAD1.0
`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned.
`OVTrainer` also provides an advanced optimization workflow through the NNCF when Transformer model can be structurally pruned along with 8-bit quantization and distillation. Below is an example which demonstrates how to jointly prune, quantize BERT-base for SQuAD 1.0 using NNCF config `--nncf_compression_config` and distill from BERT-large teacher. This example closely resembles the movement sparsification work of [Lagunas et al., 2021, Block Pruning For Faster Transformers](https://arxiv.org/pdf/2109.04838.pdf). This example takes about 12 hours with a single V100 GPU and ~40% of the weights of the Transformer blocks were pruned. For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters qmight be required to achieve the same results as on a single GPU.

More on how to configure movement sparsity, see NNCF documentation [here](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/experimental/torch/sparsity/movement/MovementSparsity.md).

To run the JPQD example, please install optimum-intel from source. This command will install or upgrade optimum-intel and all necessary dependencies:

```python -m pip install --upgrade "git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[openvino, nncf]"
```

```bash
python run_qa.py \
torchrun --nproc-per-node=1 run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--teacher_model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \
Expand Down
3 changes: 2 additions & 1 deletion examples/openvino/text-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ To run the JPQD example, please install optimum-intel from source. This command

```bash
TASK_NAME=sst2
python run_glue.py \
torchrun --nproc-per-node=1 run_glue.py \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--teacher_model_name_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
Expand All @@ -83,3 +83,4 @@ python run_glue.py \
```

On a single V100 GPU, this script should run in ~1.8 hours, and yield accuracy of **92.2%** with ~40% of the weights of the Transformer blocks pruned.
For launching the script on multiple GPUs specify `--nproc-per-node=<number of GPU>`. Note, that different batch size and other hyperparameters might be required to achieve the same results as on a single GPU.