Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch FX] Post Quantize Weights Compression #2984

Conversation

anzr299
Copy link
Contributor

@anzr299 anzr299 commented Sep 24, 2024

Changes

Transformation for removing fake quantize nodes and saving all weights to disk in int8 format after quantization. It works as follows:

  1. Reshape the scale if qdq operation is per-channel.
  2. Pattern match the quantize-dequantize nodes.
  3. Filter the matches to only include quantize-dequantize ops with constant input.
  4. Replace with the multiplication of the scale and input.

Reason for changes

To compress the model after quantization

Tests

Add test_post_quantization_compression() in tests/torch/fx/test_model_transformer.py which checks the data type of all weights in the model after applying quantization and also checks the value after the decompression step (element-wise multiplication operation).

Tickets

#2766

@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch experimental labels Sep 24, 2024
nncf/experimental/torch/fx/quantization/quantize_model.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Show resolved Hide resolved
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tests/torch/fx/test_models.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor

nncf/experimental/torch/fx/transformations.py Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsu52 alexsu52 merged commit 7c94b23 into openvinotoolkit:develop Oct 21, 2024
14 checks passed
alexsu52 pushed a commit that referenced this pull request Oct 30, 2024
### Changes

* ~~Constant folding is applied to all TorchFX models before the
quantization~~
* Some torchvision models (swin_v2_s, vit_16_b) are exported by
`torch.export.export` before ov conversation
* Moc transformations are applied to openvino compressed models after
the compression

After the #2984 
* Fixed `_compress_qdq_constant_transformation` for per tensor case

### Reason for changes

* To align TorchFX/OV quantized models

### Related tickets

#2766

### Tests

post_training_quantization/504/ is finished successfully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Freeze experimental NNCF PT Pull requests that updates NNCF PyTorch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants