Computation of compression parameters via OpenVINO models #2727

nikita-savelyevv · 2024-06-11T13:45:12Z

Changes

Implemented OpenVINO model graphs which are used for calculation of compressed and decompressed weights. Since these models are compiled, calculation become significantly faster especially for larger models and int4 compression.
This functionality is exposed by two methods at weight_lowering.py:
- do_int_quantization() is used for computing a compressed weight. Possible signatures:
  - weight -> compressed_weight, scale, (zero_point for asymmetric compression)
  - weight, scale, (zero_point) -> compressed_weight, scale, (zero_point)
- calculate_quantized_dequantized_weight() is used for computing a decompressed weight. Possible signatures:
  - weight -> decompressed_weight
  - weight, scale, (zero_point) -> decompressed_weight
  - weight -> decompressed_weight, compressed_weight, scale, (zero_point)
  - weight, scale, (zero_point) -> decompressed_weight, compressed_weight, scale, (zero_point)
- Output scale and zero_point are the same as the ones given as input (if they were given at all).
- Computation is done via OV models only if openvino package is installed and input tensors are not torch tensors.
Introduce a new NNCF Tensor backend for storing instances of openvino.Tensor. Implementation for this backend is limited by only the required functionality, e.g. addition of OV Tensors is not supported because it is not needed.
- Introduction of OV Tensors is required for seamless handling of tensors in bf16, u4 and i4 data types. For example, bf16 constants are read from an OpenVINO LLM and given as inputs to a compressing OpenVINO model. u4 and i4 compressed weights are seamlessly inserted into the resulting compressed OpenVINO model.
- Added tensor.to_backend() method to convert an NNCF Tensor from one backend to another. Currently on OV<->NP conversion is required.
All calculations are aligned with reference numpy implementation. Some performance and memory sacrifices had to be made for such alignment.

Data-free asymmetric compression:

Data-free symmetric compression:

Data-aware compression:

Reason for changes

Reducing model compression time. Only OpenVINO model compression backend is affected.

Related tickets

139047

Tests

tests/openvino/native/quantization/test_ov_modeling_compression.py::test_quantization_alignment -- check aligment with reference numpy implementation
tests/openvino/native/test_openvino_modeling.py -- checks OV modeling framework hyperparameters
tests/openvino/native/test_tensor.py -- NNCF OV Tensor backend tests

Validation jobs:

NNCF/job/manual/job/post_training_weight_compression/286/
OVVP validation ✅
optimum-intel test job https://github.com/huggingface/optimum-intel/actions/runs/12378718192/job/34551283045?pr=734

nncf/quantization/algorithms/weight_compression/weight_lowering.py

nncf/openvino/quantization/compression_primitives.py

nncf/quantization/fake_quantize.py

nncf/quantization/algorithms/weight_compression/weight_lowering/dispatcher.py

This reverts commit 9a56fae.

…table division

tests/post_training/data/wc_reference_data.yaml

github-actions bot added NNCF Common Pull request that updates NNCF Common NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jun 11, 2024

alexsu52 reviewed Jun 13, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/weight_lowering.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/weight_lowering.py Outdated Show resolved Hide resolved

nncf/openvino/quantization/compression_primitives.py Outdated Show resolved Hide resolved

nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 55cafaa to a68a63d Compare July 3, 2024 18:31

nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 6b98ddd to 3d9faa4 Compare July 16, 2024 14:19

nikita-savelyevv force-pushed the compress-via-openvino branch 6 times, most recently from 1c85732 to b527cac Compare September 6, 2024 11:11

github-actions bot added the documentation Improvements or additions to documentation label Sep 6, 2024

nikita-savelyevv force-pushed the compress-via-openvino branch 2 times, most recently from ac3ea02 to 2a3a63c Compare September 11, 2024 12:59

nikita-savelyevv force-pushed the compress-via-openvino branch from c9569bb to a151d99 Compare October 11, 2024 11:51

nikita-savelyevv force-pushed the compress-via-openvino branch 2 times, most recently from fe30c13 to 19ea412 Compare October 21, 2024 08:52

alexsu52 requested a review from AlexanderDokuchaev October 22, 2024 09:32

alexsu52 reviewed Oct 22, 2024

View reviewed changes

nikita-savelyevv force-pushed the compress-via-openvino branch 3 times, most recently from eef34f8 to ca3447c Compare October 26, 2024 13:40

nikita-savelyevv force-pushed the compress-via-openvino branch from ca3447c to f3891cd Compare October 29, 2024 15:19

nikita-savelyevv added 13 commits December 11, 2024 15:30

Update get_integer_quantization_error implementation

5dcd83d

Remove unnecessary convert

6e22ef5

Move create_ov_const_from_tensor to node_utils

b45e788

Separate checking logic into standalone methods

b2cebd0

Add debug conditions

3a71141

Move ov model cache clearing to ov backend destructor

eeadf1d

Update default ov model parameters

40aef54

Revert debug logic

ab3d35f

Update reference

d48c748

Add debug conditions

9a56fae

Disable dynamic shapes by default

e10d806

Revert "Add debug conditions"

b372dc7

This reverts commit 9a56fae.

Linters

63858d3

nikita-savelyevv changed the title ~~Generalize weight compression via OpenVINO submodels~~ Computation of compression parameters via OpenVINO models Dec 12, 2024

Fix lora correction

87b5c10

nikita-savelyevv marked this pull request as ready for review December 13, 2024 10:57

nikita-savelyevv requested a review from a team as a code owner December 13, 2024 10:57

nikita-savelyevv added 11 commits December 13, 2024 13:46

Remove not used argument

7134e6d

Remove static shapes testing because it is not needed with non-conver…

5a1866f

…table division

Set dynamic shapes by default

6a2c9fc

Merge branch 'develop' into compress-via-openvino

204fb21

Merge branch 'develop' into compress-via-openvino

dca5376

Guarantee call order

92fbba5

Add convertable_division parameter

b27c720

Cleanup

6ab1c08

Add convertable division test

a0fe91a

Add explicit inference precision

97bd61d

Fix import

58963ab

nikita-savelyevv commented Dec 16, 2024

View reviewed changes

tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved

Update tests/post_training/data/wc_reference_data.yaml

ec21996

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computation of compression parameters via OpenVINO models #2727

Computation of compression parameters via OpenVINO models #2727

nikita-savelyevv commented Jun 11, 2024 •

edited

Loading

Computation of compression parameters via OpenVINO models #2727

Are you sure you want to change the base?

Computation of compression parameters via OpenVINO models #2727

Conversation

nikita-savelyevv commented Jun 11, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

nikita-savelyevv commented Jun 11, 2024 •

edited

Loading