-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WeightCompression] Statistics caching #3017
Conversation
tests/openvino/native/quantization/test_weights_compression_statistics_caching.py
Outdated
Show resolved
Hide resolved
tests/openvino/native/quantization/test_weights_compression_statistics_caching.py
Outdated
Show resolved
Hide resolved
), | ||
] | ||
|
||
MODEL_ID = "facebook/opt-125m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By taking into account that gptq
could be true and lora_correction
would be great to test as well, do you see any opportunities to further reduce the time? Maybe smaller ratio, e.g. [0.2, 0.4]?
What is the most consuming part?
tests/openvino/native/quantization/test_weights_compression_statistics_caching.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TensorCollector and co LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked this PR as request changes because the implementation is not optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please address my minor comments
Changes
Add statistics saving and loading for the
WeightCompression
algorithm:awq = True, scale_estimation=True with all type of sensitivities.
weights_compression()
configuration.The example for tinyllama was updated with this functionality.
More changes:
WeightCompression
aligned withTesnorStatistics
fromnncf/experimental/common/tensor_statistics/statistics.py
StatisticsAggregator
by the logic of loading and saving statistics.Tensor
.statistics_serializer
andstatistics_validator
to handle the statistics loading/dumping to the file.Statistics sizes
Reason for changes
Speed up compression configuration finding.
Related tickets
153129
Tests
Test coverage were extended by tests on
statistics_serializer
,statistics_validator
,StatisticsAggregator
and onWeightCompression
algorithm with the proposed functional.