Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WeightCompression] Statistics caching #3017

Merged
merged 101 commits into from
Oct 31, 2024

Conversation

kshpv
Copy link
Collaborator

@kshpv kshpv commented Oct 15, 2024

Changes

Add statistics saving and loading for the WeightCompression algorithm:

  1. Statistics are cached for all configurations such as:
    awq = True, scale_estimation=True with all type of sensitivities.
  2. Then the statistics are dumped in a directory which can be reused for any weights_compression() configuration.

The example for tinyllama was updated with this functionality.

More changes:

  1. Make all statistics used in WeightCompression aligned with TesnorStatistics from nncf/experimental/common/tensor_statistics/statistics.py
  2. Extend StatisticsAggregator by the logic of loading and saving statistics.
  3. Dumping statistics is done using pickle and gzip. Serialization methods were added for Tensor.
  4. Introduced statistics_serializer and statistics_validator to handle the statistics loading/dumping to the file.

Statistics sizes

Model subset size statistics directory size statistics collection time
tinyllama 128 100 MB 61 sec
Phi-3-mini-4k-instruct 128 258 MB 51 sec
ruDialoGPT-medium 128 80 MB 9 sec
llama-3.1-8b 128 393 MB 95 sec

Reason for changes

Speed up compression configuration finding.

Related tickets

153129

Tests

Test coverage were extended by tests on statistics_serializer, statistics_validator, StatisticsAggregator and on WeightCompression algorithm with the proposed functional.

@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF Common Pull request that updates NNCF Common experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Oct 15, 2024
@kshpv kshpv changed the title [WC] Statistics caching [WeightCompression] Statistics caching Oct 15, 2024
@github-actions github-actions bot added the NNCF ONNX Pull requests that updates NNCF ONNX label Oct 16, 2024
@kshpv kshpv marked this pull request as ready for review October 16, 2024 18:29
),
]

MODEL_ID = "facebook/opt-125m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By taking into account that gptq could be true and lora_correction would be great to test as well, do you see any opportunities to further reduce the time? Maybe smaller ratio, e.g. [0.2, 0.4]?
What is the most consuming part?

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorCollector and co LGTM

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked this PR as request changes because the implementation is not optimal.

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
nncf/quantization/statistics_caching.py Outdated Show resolved Hide resolved
nncf/quantization/statistics_caching.py Outdated Show resolved Hide resolved
nncf/quantization/statistics_caching.py Outdated Show resolved Hide resolved
nncf/quantization/statistics_caching.py Outdated Show resolved Hide resolved
@kshpv kshpv requested a review from alexsu52 October 29, 2024 15:39
Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please address my minor comments

nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
nncf/quantization/algorithms/weight_compression/backend.py Outdated Show resolved Hide resolved
@alexsu52 alexsu52 merged commit 4ddb5da into openvinotoolkit:develop Oct 31, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes Code Freeze experimental NNCF Common Pull request that updates NNCF Common NNCF ONNX Pull requests that updates NNCF ONNX NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants