[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

daniil-lyakhov · 2024-11-28T16:40:09Z

Changes

Introduction of quantize_pt2e method

Reason for changes

Related tickets

#2766

Tests

graph tests: tests/torch/fx/test_quantizer.py

nncf/experimental/common/quantization/algorithms/post_training/algorithm.py

nncf/quantization/algorithms/min_max/algorithm.py

alexsu52 · 2024-12-16T16:40:17Z

nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+    activations_range_estimator_params: Optional[RangeEstimatorParameters] = None,
+    weights_range_estimator_params: Optional[RangeEstimatorParameters] = None,
+    batchwise_statistics: bool = False,
+    fold_quantize: bool = False,


As far as I understand fold_quantize arguments controls that quantized weights will convert to int8 or not, am I right? What kind is scenario of using fold_quantize=True?

fold_quantize=True is the default parameter of convert_pt2e (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantize_pt2e.py#L208). It applies the constant folding transformation to the final model which folds the quantizers leaving dequantizers nodes (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantize_pt2e.py#L247-L248)
This is not equal to the compress_weights parameter as compress weights actually replaces qdq pair with a mul and a sub.

The scenario - usage of quantize_pt2e with any non OpenvinoQuantizer (all benchmarks with x86InductorQuantizer were performed with parameter fold_quantize=True

I believe, that NNCF should be aligned for all quantizers. I mean, OpenVINOQuantizer and non OpenVINOQuantizer. Would you propose your opinion how to reach this?

P.S. As far as I know, NNCF is able to convert the custom FakeQuantize layers to the upstream layers. Maybe, it can be used for the alignment of model representation.

As it stated in the torch compile openvino documentation, openvino backend supports export_pt2e quantization only with the parameter fold_quantize=True. I believe need to ask the torch compile openvino team

nncf/experimental/torch/fx/quantization/quantize_pt2e.py

nncf/experimental/common/quantization/algorithms/quantizer/base_quantizer.py

alexsu52 · 2024-12-23T09:06:35Z

nncf/experimental/common/quantization/algorithms/quantizer/fx_quantizer.py

+EdgeOrNode = Union[Tuple[torch.fx.Node, torch.fx.Node]]
+
+
+class NNCFFXQuantizer(NNCFQuantizer):


Take into account the OpenVINO Quantizer implementation

nncf/nncf/experimental/common/quantization/algorithms/quantizer/openvino_quantizer.py

Line 45 in 44ebb6c

class OpenVINOQuantizer(InductorQuantizer, NNCFQuantizer):

, I would suggest to make some refactorng:

Does not inhered OpenVINO Qunatizer from nncf.Quantizer for simplification up-streaming to PyTorch.

Introduce adapters for torch.ao Quantizers and OpenVINO Quantizer, to avoid repacking quantization setup: TorchAOQuantizerAdapter, OpenVINOQuantizerAdapter. Declaration can be the following:

class TorchAOQuantizerAdapter(nncf.Quantizer, torch.ao.Quantizer)

class OpenVINOQuantizerAdapter(TorchAOQuantizerAdapter)

OpenVINOQuantizer will be introduced in the following PR, I'll apply this to the following PR

I believe, that the part of this comment can be applied in this PR. I'm open to discuss it offline.

alexsu52 · 2024-12-23T09:09:14Z

nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+    # before the NNCFGraph creation
+    quantizer.transform_for_annotation(copied_model)
+
+    if not isinstance(quantizer, NNCFQuantizer):


I believe it is more logical to check that it is Quantizer, before creating NNCFFXQuantizer quantizer.

OpenVINOQuantizer is an instance of the Qunatizer as well

nncf/experimental/common/quantization/algorithms/post_training/algorithm.py

nncf/experimental/common/quantization/algorithms/range_estimator/range_estimator.py

alexsu52 · 2024-12-24T06:22:33Z

nncf/quantization/algorithms/min_max/algorithm.py

+        self, quantizer_setup: SingleConfigQuantizerSetup, nncf_graph: NNCFGraph
+    ) -> Tuple[OrderedDict[TargetPoint, QuantizerConfig], List[List[TargetPoint]]]:
+        """
+        Initializes a cache, finds quantization target points and them puts in the cache.


find_quantization_setup and fill_quantization_target_points have the same docstring. What is difference between them?

alexsu52 · 2024-12-24T06:29:53Z

nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+    activations_range_estimator_params: Optional[RangeEstimatorParameters] = None,
+    weights_range_estimator_params: Optional[RangeEstimatorParameters] = None,
+    batchwise_statistics: bool = False,
+    fold_quantize: bool = False,


I believe, that NNCF should be aligned for all quantizers. I mean, OpenVINOQuantizer and non OpenVINOQuantizer. Would you propose your opinion how to reach this?

P.S. As far as I know, NNCF is able to convert the custom FakeQuantize layers to the upstream layers. Maybe, it can be used for the alignment of model representation.

alexsu52 · 2024-12-24T06:33:40Z

nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+        raise nncf.ValidationError("Subset size must be positive.")
+
+    batch_size = calibration_dataset.get_batch_size()
+    batchwise_statistics = batchwise_statistics is None and batch_size is not None and batch_size > 1


Please check the following case:

batchwise_statistics=True

batch_size=2

alexsu52 · 2024-12-24T07:01:48Z

nncf/experimental/torch/fx/quantization/quantize_pt2e.py

+
+    # To make it easier for bias correction algorithms,
+    # biases are being separated by the followng calls.
+    fuse_conv_bn(copied_model)


I asked about how you test that your implementation of transformation was aligned with PyTorch transformation. This question is relevant because quantize_pt2e needs to be aligned with PyTorch.

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch experimental NNCF PTQ Pull requests that updates NNCF PTQ labels Nov 28, 2024

daniil-lyakhov changed the title ~~Dl/fx/experimental quantization~~ [Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction Nov 28, 2024

daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 2 times, most recently from efd3367 to d1941f3 Compare November 28, 2024 17:32

github-actions bot added the NNCF Common Pull request that updates NNCF Common label Dec 2, 2024

daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 2 times, most recently from aea0bdf to 52e80c8 Compare December 4, 2024 09:59

daniil-lyakhov added 4 commits December 4, 2024 11:04

WIP experimental quantization

75bbe9b

Experimental quantization

0c406cc

Reuse MinMax algo instead of copy-paste

8e001e3

Correct use of transform_for_annotation

52e80c8

daniil-lyakhov requested review from alexsu52 and anzr299 December 4, 2024 12:16

daniil-lyakhov marked this pull request as ready for review December 4, 2024 12:17

daniil-lyakhov requested a review from a team as a code owner December 4, 2024 12:17

anzr299 reviewed Dec 4, 2024

View reviewed changes

nncf/experimental/common/quantization/algorithms/post_training/algorithm.py Outdated Show resolved Hide resolved

daniil-lyakhov requested a review from anzr299 December 5, 2024 10:45

daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 2 times, most recently from 9178921 to 43bc251 Compare December 5, 2024 12:32

daniil-lyakhov added 2 commits December 5, 2024 13:38

Comments/fixes

43bc251

batchwise_statistics

e285dc6

alexsu52 requested changes Dec 23, 2024

View reviewed changes

daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 3 times, most recently from 7ede33d to 20147ab Compare December 23, 2024 22:15

Comments

20147ab

alexsu52 reviewed Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

daniil-lyakhov commented Nov 28, 2024 •

edited

Loading

alexsu52 Dec 16, 2024

daniil-lyakhov Dec 23, 2024

alexsu52 Dec 24, 2024

daniil-lyakhov Dec 24, 2024

alexsu52 Dec 23, 2024

daniil-lyakhov Dec 23, 2024

alexsu52 Dec 24, 2024

alexsu52 Dec 23, 2024

daniil-lyakhov Dec 23, 2024

alexsu52 Dec 24, 2024

alexsu52 Dec 24, 2024

alexsu52 Dec 24, 2024

alexsu52 Dec 24, 2024

		EdgeOrNode = Union[Tuple[torch.fx.Node, torch.fx.Node]]


		class NNCFFXQuantizer(NNCFQuantizer):

[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

Are you sure you want to change the base?

[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

Conversation

daniil-lyakhov commented Nov 28, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniil-lyakhov commented Nov 28, 2024 •

edited

Loading