-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121
base: develop
Are you sure you want to change the base?
[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121
Conversation
efd3367
to
d1941f3
Compare
aea0bdf
to
52e80c8
Compare
nncf/experimental/common/quantization/algorithms/post_training/algorithm.py
Outdated
Show resolved
Hide resolved
9178921
to
43bc251
Compare
activations_range_estimator_params: Optional[RangeEstimatorParameters] = None, | ||
weights_range_estimator_params: Optional[RangeEstimatorParameters] = None, | ||
batchwise_statistics: bool = False, | ||
fold_quantize: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand fold_quantize
arguments controls that quantized weights will convert to int8 or not, am I right? What kind is scenario of using fold_quantize=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fold_quantize=True
is the default parameter of convert_pt2e (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantize_pt2e.py#L208). It applies the constant folding transformation to the final model which folds the quantizers leaving dequantizers nodes (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantize_pt2e.py#L247-L248)
This is not equal to the compress_weights
parameter as compress weights actually replaces qdq pair with a mul and a sub.
The scenario - usage of quantize_pt2e with any non OpenvinoQuantizer (all benchmarks with x86InductorQuantizer were performed with parameter fold_quantize=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, that NNCF should be aligned for all quantizers. I mean, OpenVINOQuantizer and non OpenVINOQuantizer. Would you propose your opinion how to reach this?
P.S. As far as I know, NNCF is able to convert the custom FakeQuantize layers to the upstream layers. Maybe, it can be used for the alignment of model representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stated in the torch compile openvino documentation, openvino backend supports export_pt2e quantization only with the parameter fold_quantize=True
. I believe need to ask the torch compile openvino team
nncf/experimental/common/quantization/algorithms/quantizer/base_quantizer.py
Outdated
Show resolved
Hide resolved
EdgeOrNode = Union[Tuple[torch.fx.Node, torch.fx.Node]] | ||
|
||
|
||
class NNCFFXQuantizer(NNCFQuantizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take into account the OpenVINO Quantizer implementation
nncf/nncf/experimental/common/quantization/algorithms/quantizer/openvino_quantizer.py
Line 45 in 44ebb6c
class OpenVINOQuantizer(InductorQuantizer, NNCFQuantizer): |
- Does not inhered OpenVINO Qunatizer from
nncf.Quantizer
for simplification up-streaming to PyTorch. - Introduce adapters for torch.ao Quantizers and OpenVINO Quantizer, to avoid repacking quantization setup:
TorchAOQuantizerAdapter
,OpenVINOQuantizerAdapter
. Declaration can be the following:
class TorchAOQuantizerAdapter(nncf.Quantizer, torch.ao.Quantizer)
class OpenVINOQuantizerAdapter(TorchAOQuantizerAdapter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenVINOQuantizer
will be introduced in the following PR, I'll apply this to the following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, that the part of this comment can be applied in this PR. I'm open to discuss it offline.
# before the NNCFGraph creation | ||
quantizer.transform_for_annotation(copied_model) | ||
|
||
if not isinstance(quantizer, NNCFQuantizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is more logical to check that it is Quantizer
, before creating NNCFFXQuantizer
quantizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenVINOQuantizer
is an instance of the Qunatizer
as well
nncf/experimental/common/quantization/algorithms/post_training/algorithm.py
Outdated
Show resolved
Hide resolved
nncf/experimental/common/quantization/algorithms/range_estimator/range_estimator.py
Outdated
Show resolved
Hide resolved
7ede33d
to
20147ab
Compare
self, quantizer_setup: SingleConfigQuantizerSetup, nncf_graph: NNCFGraph | ||
) -> Tuple[OrderedDict[TargetPoint, QuantizerConfig], List[List[TargetPoint]]]: | ||
""" | ||
Initializes a cache, finds quantization target points and them puts in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_quantization_setup
and fill_quantization_target_points
have the same docstring. What is difference between them?
activations_range_estimator_params: Optional[RangeEstimatorParameters] = None, | ||
weights_range_estimator_params: Optional[RangeEstimatorParameters] = None, | ||
batchwise_statistics: bool = False, | ||
fold_quantize: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, that NNCF should be aligned for all quantizers. I mean, OpenVINOQuantizer and non OpenVINOQuantizer. Would you propose your opinion how to reach this?
P.S. As far as I know, NNCF is able to convert the custom FakeQuantize layers to the upstream layers. Maybe, it can be used for the alignment of model representation.
raise nncf.ValidationError("Subset size must be positive.") | ||
|
||
batch_size = calibration_dataset.get_batch_size() | ||
batchwise_statistics = batchwise_statistics is None and batch_size is not None and batch_size > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the following case:
batchwise_statistics=True
batch_size=2
|
||
# To make it easier for bias correction algorithms, | ||
# biases are being separated by the followng calls. | ||
fuse_conv_bn(copied_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked about how you test that your implementation of transformation was aligned with PyTorch transformation. This question is relevant because quantize_pt2e
needs to be aligned with PyTorch.
Changes
Introduction of
quantize_pt2e
methodReason for changes
Related tickets
#2766
Tests
graph tests:
tests/torch/fx/test_quantizer.py