[WeightCompression] Statistics caching #3017

kshpv · 2024-10-15T16:50:41Z

Changes

Add statistics saving and loading for the WeightCompression algorithm:

Statistics are cached for all configurations such as:
awq = True, scale_estimation=True with all type of sensitivities.
Then the statistics are dumped in a directory which can be reused for any weights_compression() configuration.

The example for tinyllama was updated with this functionality.

More changes:

Make all statistics used in WeightCompression aligned with TesnorStatistics from nncf/experimental/common/tensor_statistics/statistics.py
Extend StatisticsAggregator by the logic of loading and saving statistics.
Dumping statistics is done using pickle and gzip. Serialization methods were added for Tensor.
Introduced statistics_serializer and statistics_validator to handle the statistics loading/dumping to the file.

Statistics sizes

Model	subset size	statistics directory size	statistics collection time
tinyllama	128	100 MB	61 sec
Phi-3-mini-4k-instruct	128	258 MB	51 sec
ruDialoGPT-medium	128	80 MB	9 sec
llama-3.1-8b	128	393 MB	95 sec

Reason for changes

Speed up compression configuration finding.

Related tickets

153129

Tests

Test coverage were extended by tests on statistics_serializer, statistics_validator, StatisticsAggregator and on WeightCompression algorithm with the proposed functional.

nncf/experimental/common/tensor_statistics/statistics.py

tests/openvino/native/quantization/test_weights_compression_statistics_caching.py

ljaljushkin · 2024-10-25T16:45:03Z

tests/openvino/native/quantization/test_weights_compression_statistics_caching.py

+        ),
+    ]
+
+    MODEL_ID = "facebook/opt-125m"


By taking into account that gptq could be true and lora_correction would be great to test as well, do you see any opportunities to further reduce the time? Maybe smaller ratio, e.g. [0.2, 0.4]?
What is the most consuming part?

tests/openvino/native/quantization/test_weights_compression_statistics_caching.py

nncf/experimental/common/tensor_statistics/collectors.py

daniil-lyakhov

TensorCollector and co LGTM

alexsu52

I marked this PR as request changes because the implementation is not optimal.

nncf/quantization/quantize_model.py

nncf/quantization/algorithms/weight_compression/algorithm.py

nncf/quantization/statistics_caching.py

alexsu52

LGTM, please address my minor comments

nncf/quantization/quantize_model.py

nncf/quantization/algorithms/weight_compression/algorithm.py

nncf/quantization/algorithms/weight_compression/backend.py

nncf/quantization/algorithms/algorithm.py

nncf/quantization/algorithms/weight_compression/mixed_precision.py

nncf/quantization/quantize_model.py

kshpv added 5 commits October 15, 2024 16:57

draft

de0c068

add gzip

8f8b4cf

add description

ac3b28d

update WCstatistics

1f026eb

fix init

ebe17c3

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF Common Pull request that updates NNCF Common experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Oct 15, 2024

kshpv added 6 commits October 15, 2024 19:04

Merge remote-tracking branch 'remote/develop' into statistics_cahing

9d69dac

rollback build_statistic_container

8a358b1

typo; fix mypy

3ffa566

remain only compressed mode

24fc759

add docstrings

25fa2ee

minor

afd3ca9

kshpv changed the title ~~[WC] Statistics caching~~ [WeightCompression] Statistics caching Oct 15, 2024

MaximProshin added the Code Freeze label Oct 16, 2024

rm WeightQuantizationErrorTensorStatistic

ecb82a8

ljaljushkin reviewed Oct 16, 2024

View reviewed changes

nncf/experimental/common/tensor_statistics/statistics.py Outdated Show resolved Hide resolved

kshpv added 5 commits October 16, 2024 15:00

add tests

19646c1

improve code

86dc4da

add __eq__ for statistics

d3903a4

polishing

1e86ddd

dump objects

43161e9

github-actions bot added the NNCF ONNX Pull requests that updates NNCF ONNX label Oct 16, 2024

kshpv added 2 commits October 16, 2024 20:14

_get_statistics_key abstarctmethod

ac6bef9

build_statistic_container -> from_kwargs

ca7317d

kshpv marked this pull request as ready for review October 16, 2024 18:29

add onnx and torch_fx tests on aggregator

931febd

ljaljushkin reviewed Oct 25, 2024

View reviewed changes

kshpv added 3 commits October 28, 2024 10:32

add get_matmul_nodes()

32462f8

add lora to test scope

05e7643

test refactor

bc73ac8

kshpv requested review from MaximProshin, ljaljushkin and daniil-lyakhov October 28, 2024 10:39

minor

d27e41f

ljaljushkin approved these changes Oct 28, 2024

View reviewed changes

daniil-lyakhov reviewed Oct 28, 2024

View reviewed changes

nncf/experimental/common/tensor_statistics/collectors.py Outdated Show resolved Hide resolved

daniil-lyakhov reviewed Oct 28, 2024

View reviewed changes

nncf/experimental/common/tensor_statistics/collectors.py Outdated Show resolved Hide resolved

nncf/experimental/common/tensor_statistics/collectors.py Outdated Show resolved Hide resolved

comments

c94ee95

daniil-lyakhov approved these changes Oct 28, 2024

View reviewed changes

alexsu52 requested changes Oct 29, 2024

View reviewed changes

kshpv added 3 commits October 29, 2024 15:02

rm gptq; optimize NNCFGraph creation;

4080132

introduce get_compression_nodes_info for WC

8799874

comments

787b365

kshpv requested a review from alexsu52 October 29, 2024 15:39

rollback changes in apply (torch has no support layer attributes)

dd926b5

MaximProshin approved these changes Oct 30, 2024

View reviewed changes

alexsu52 reviewed Oct 30, 2024

View reviewed changes

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/backend.py Outdated Show resolved Hide resolved

kshpv added 4 commits October 30, 2024 14:02

available_backends -> get_available_backends

fd6be94

Merge remote-tracking branch 'remote/develop' into statistics_cahing

35157bf

typo

26bab53

rollback backend check

6ba8ac3

alexsu52 reviewed Oct 31, 2024

View reviewed changes

nncf/quantization/algorithms/algorithm.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/mixed_precision.py Outdated Show resolved Hide resolved

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved

comments

1cf2055

alexsu52 approved these changes Oct 31, 2024

View reviewed changes

alexsu52 merged commit 4ddb5da into openvinotoolkit:develop Oct 31, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WeightCompression] Statistics caching #3017

[WeightCompression] Statistics caching #3017

kshpv commented Oct 15, 2024 •

edited

Loading

ljaljushkin Oct 25, 2024

daniil-lyakhov left a comment

alexsu52 left a comment

alexsu52 left a comment

[WeightCompression] Statistics caching #3017

[WeightCompression] Statistics caching #3017

Conversation

kshpv commented Oct 15, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

ljaljushkin Oct 25, 2024

Choose a reason for hiding this comment

daniil-lyakhov left a comment

Choose a reason for hiding this comment

alexsu52 left a comment

Choose a reason for hiding this comment

alexsu52 left a comment

Choose a reason for hiding this comment

kshpv commented Oct 15, 2024 •

edited

Loading