Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction #3121

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

daniil-lyakhov
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov commented Nov 28, 2024

Changes

Introduction of quantize_pt2e method

Reason for changes

Related tickets

#2766

Tests

graph tests: tests/torch/fx/test_quantizer.py

@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch experimental NNCF PTQ Pull requests that updates NNCF PTQ labels Nov 28, 2024
@daniil-lyakhov daniil-lyakhov changed the title Dl/fx/experimental quantization [Experimental][TorchFX] quantize_pt2e + X86Quantizer introduction Nov 28, 2024
@daniil-lyakhov daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 2 times, most recently from efd3367 to d1941f3 Compare November 28, 2024 17:32
@github-actions github-actions bot added the NNCF Common Pull request that updates NNCF Common label Dec 2, 2024
@daniil-lyakhov daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 2 times, most recently from aea0bdf to 52e80c8 Compare December 4, 2024 09:59
@daniil-lyakhov daniil-lyakhov marked this pull request as ready for review December 4, 2024 12:17
@daniil-lyakhov daniil-lyakhov requested a review from a team as a code owner December 4, 2024 12:17
@daniil-lyakhov daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 2 times, most recently from 9178921 to 43bc251 Compare December 5, 2024 12:32
nncf/quantization/algorithms/min_max/algorithm.py Outdated Show resolved Hide resolved
activations_range_estimator_params: Optional[RangeEstimatorParameters] = None,
weights_range_estimator_params: Optional[RangeEstimatorParameters] = None,
batchwise_statistics: bool = False,
fold_quantize: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand fold_quantize arguments controls that quantized weights will convert to int8 or not, am I right? What kind is scenario of using fold_quantize=True?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fold_quantize=True is the default parameter of convert_pt2e (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantize_pt2e.py#L208). It applies the constant folding transformation to the final model which folds the quantizers leaving dequantizers nodes (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantize_pt2e.py#L247-L248)
This is not equal to the compress_weights parameter as compress weights actually replaces qdq pair with a mul and a sub.

The scenario - usage of quantize_pt2e with any non OpenvinoQuantizer (all benchmarks with x86InductorQuantizer were performed with parameter fold_quantize=True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, that NNCF should be aligned for all quantizers. I mean, OpenVINOQuantizer and non OpenVINOQuantizer. Would you propose your opinion how to reach this?

P.S. As far as I know, NNCF is able to convert the custom FakeQuantize layers to the upstream layers. Maybe, it can be used for the alignment of model representation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stated in the torch compile openvino documentation, openvino backend supports export_pt2e quantization only with the parameter fold_quantize=True. I believe need to ask the torch compile openvino team

nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved
nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved
EdgeOrNode = Union[Tuple[torch.fx.Node, torch.fx.Node]]


class NNCFFXQuantizer(NNCFQuantizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account the OpenVINO Quantizer implementation

class OpenVINOQuantizer(InductorQuantizer, NNCFQuantizer):
, I would suggest to make some refactorng:

  1. Does not inhered OpenVINO Qunatizer from nncf.Quantizer for simplification up-streaming to PyTorch.
  2. Introduce adapters for torch.ao Quantizers and OpenVINO Quantizer, to avoid repacking quantization setup: TorchAOQuantizerAdapter, OpenVINOQuantizerAdapter. Declaration can be the following:
class TorchAOQuantizerAdapter(nncf.Quantizer, torch.ao.Quantizer)
class OpenVINOQuantizerAdapter(TorchAOQuantizerAdapter)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenVINOQuantizer will be introduced in the following PR, I'll apply this to the following PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, that the part of this comment can be applied in this PR. I'm open to discuss it offline.

# before the NNCFGraph creation
quantizer.transform_for_annotation(copied_model)

if not isinstance(quantizer, NNCFQuantizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is more logical to check that it is Quantizer, before creating NNCFFXQuantizer quantizer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenVINOQuantizer is an instance of the Qunatizer as well

@daniil-lyakhov daniil-lyakhov force-pushed the dl/fx/experimental_quantization branch 3 times, most recently from 7ede33d to 20147ab Compare December 23, 2024 22:15
self, quantizer_setup: SingleConfigQuantizerSetup, nncf_graph: NNCFGraph
) -> Tuple[OrderedDict[TargetPoint, QuantizerConfig], List[List[TargetPoint]]]:
"""
Initializes a cache, finds quantization target points and them puts in the cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_quantization_setup and fill_quantization_target_points have the same docstring. What is difference between them?

activations_range_estimator_params: Optional[RangeEstimatorParameters] = None,
weights_range_estimator_params: Optional[RangeEstimatorParameters] = None,
batchwise_statistics: bool = False,
fold_quantize: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, that NNCF should be aligned for all quantizers. I mean, OpenVINOQuantizer and non OpenVINOQuantizer. Would you propose your opinion how to reach this?

P.S. As far as I know, NNCF is able to convert the custom FakeQuantize layers to the upstream layers. Maybe, it can be used for the alignment of model representation.

raise nncf.ValidationError("Subset size must be positive.")

batch_size = calibration_dataset.get_batch_size()
batchwise_statistics = batchwise_statistics is None and batch_size is not None and batch_size > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the following case:

  • batchwise_statistics=True
  • batch_size=2


# To make it easier for bias correction algorithms,
# biases are being separated by the followng calls.
fuse_conv_bn(copied_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about how you test that your implementation of transformation was aligned with PyTorch transformation. This question is relevant because quantize_pt2e needs to be aligned with PyTorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental NNCF Common Pull request that updates NNCF Common NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants