diff --git a/.markdownlint.yaml b/.markdownlint.yaml new file mode 100644 index 00000000000..3a792ec1aa7 --- /dev/null +++ b/.markdownlint.yaml @@ -0,0 +1,9 @@ +# Default state for all rules +default: true + +MD013: false # Line length +MD033: false # Inline HTML +MD034: false # Bare URL used +MD036: false # Emphasis used instead of a heading +MD037: false # Spaces inside emphasis markers +MD041: false # First line diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 0e3fc8e08cb..8b3e4e56f63 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -14,3 +14,9 @@ repos: hooks: - id: isort name: isort (python) + + - repo: https://github.com/igorshubovych/markdownlint-cli + rev: v0.33.0 + hooks: + - id: markdownlint + args: [--config=.markdownlint.yaml] diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b37d11a8b4a..9382d0ba0b3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,24 +1,18 @@ # Contributing to NNCF Contributions are accepted in the form of: + * Submitting issues against the current code to report bugs or request features * Extending NNCF functionality with important features (e.g. to address community requests, improve usability, implement a recently published compression algorithm, etc.) * Adding example scripts to showcase NNCF usage in real training pipelines and provide the means to reproduce the reported compression results * Providing recipes (specific NNCF configurations and training hyperparameters) to obtain state-of-the-art compression using NNCF for existing models * Adding well-defined patches that integrate NNCF into third-party repositories -* Reducing performance overhead of NNCF compression by writing specialized CUDA kernels for compression operations or improving existing ones. +* Reducing performance overhead of NNCF compression by writing specialized CUDA kernels for compression operations or improving existing ones. The latter forms are accepted as pull requests from your own forks of the NNCF repository. Any contributions must not violate the repository's [LICENSE](./LICENSE) requirements. -## Installation -### (Experimental) ONNXRuntime-OpenVINO -Install the package and its dependencies by running the following in the repository root directory: -```bash -make install-onnx-dev -``` - ## Testing After your pull request is submitted, the maintainer will launch a scope of CI tests against it. @@ -28,42 +22,30 @@ The pre-commit scope may be run locally by executing the `pytest` command (witho Please run the pre-commit testing scope locally before submitting your PR and ensure that it passes to conserve your own time and that of the reviewing maintainer. New feature pull requests should include all the necessary testing code. -Testing is done using the `pytest` framework. +Testing is done using the `pytest` framework. The test files should be located inside the [tests](./tests) directory and start with `test_` so that the `pytest` is able to discover them. Any additional data that is required for tests (configuration files, mock datasets, etc.) must be stored within the [tests/data](./tests/data) folder. The test files themselves may be grouped in arbitrary directories according to their testing purpose and common sense. -Any additional tests in the [tests](./tests) directory will be automatically added into the pre-commit CI scope. +Any additional tests in the [tests](./tests) directory will be automatically added into the pre-commit CI scope. If your testing code is more extensive than unit tests (in terms of test execution time), or would be more suited to be executed on a nightly/weekly basis instead of for each future commit, please inform the maintainers in your PR discussion thread so that our internal testing pipelines could be adjusted accordingly. -### Preset command for testing -You can launch appropriate tests against the framework by running the following command: - -- (Experimental) ONNXRuntime-OpenVINO -```bash -test-onnx -``` - ## Code style + Changes to NNCF Python code should conform to [Python Style Guide](./docs/styleguide/PyGuide.md) -Pylint is used throughout the project to ensure code cleanliness and quality. +Pylint is used throughout the project to ensure code cleanliness and quality. A Pylint run is also done as part of the pre-commit scope - the pre-commit `pytest` scope will not be run if your code fails the Pylint checks. The Pylint rules and exceptions for this repository are described in the standard [.pylintrc](./.pylintrc) format - make sure your local linter uses these. -### Preset command for linting -You can launch appropriate linting against the framework by running the following command: - -- (Experimental) ONNXRuntime-OpenVINO -```bash -pylint-onnx -``` - ## Binary files -Please refrain from adding huge binary files into the repository. If binary files have to be added, mark these to use Git LFS via the [.gitattributes](./.gitattributes) file. + +Please refrain from adding huge binary files into the repository. If binary files have to be added, mark these to use Git LFS via the [.gitattributes](./.gitattributes) file. ## Model identifiers + When adding model configs and checkpoints to be showcased in NNCF's sample script, follow the format for naming these files: + 1. The base name must be the same for the NNCF config file, AC config file, checkpoint file (PT/ONNX/OV) or checkpoint folder (TF), and other associated artifacts. 2. This name should be composed with the following format: `{model_name}_{dataset_name}` for FP32 models, `{topology_name}_{dataset_name}_{compression_algorithms_applied}`. The format may be extended if there are multiple models with the same topology, dataset and compression algos applied, which only differ in something else such as exact value of achieved sparsity. Align the naming of the new checkpoints with the existing ones. -3. Additional human-readable information on the model such as expected metrics and compression algorithm specifics (e.g. level of pruning/sparsity, per-tensor/per-channel quantizer configuration etc.) should be stored in a registry file (`tests/torch/sota_checkpoints_eval.json` for PT, `tests/tensorflow/sota_checkpoints_eval.json` for TF) \ No newline at end of file +3. Additional human-readable information on the model such as expected metrics and compression algorithm specifics (e.g. level of pruning/sparsity, per-tensor/per-channel quantizer configuration etc.) should be stored in a registry file (`tests/torch/sota_checkpoints_eval.json` for PT, `tests/tensorflow/sota_checkpoints_eval.json` for TF) diff --git a/README.md b/README.md index c347022f0a6..d73e6dc538d 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,11 @@ # Neural Network Compression Framework (NNCF) [Key Features](#key-features) • -[Installation](#Installation-guide) • +[Installation](#installation-guide) • [Documentation](#documentation) • [Usage](#usage) • -[Tutorials and Samples](#Model-compression-tutorials-and-samples) • -[Third-party integration](#Third-party-repository-integration) • +[Tutorials and Samples](#model-compression-tutorials-and-samples) • +[Third-party integration](#third-party-repository-integration) • [Model Zoo](./docs/ModelZoo.md) [![GitHub Release](https://img.shields.io/github/v/release/openvinotoolkit/nncf?color=green)](https://github.com/openvinotoolkit/nncf/releases) @@ -21,13 +21,14 @@ Neural Network Compression Framework (NNCF) provides a suite of post-training an NNCF is designed to work with models from [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/), [ONNX](https://onnx.ai/) and [OpenVINO™](https://docs.openvino.ai/latest/home.html). -NNCF provides [samples](#Model-compression-tutorials-and-samples) that demonstrate the usage of compression algorithms for different use cases and models. See compression results achievable with the NNCF-powered samples at [Model Zoo page](./docs/ModelZoo.md). +NNCF provides [samples](#model-compression-tutorials-and-samples) that demonstrate the usage of compression algorithms for different use cases and models. See compression results achievable with the NNCF-powered samples at [Model Zoo page](./docs/ModelZoo.md). The framework is organized as a Python\* package that can be built and used in a standalone mode. The framework architecture is unified to make it easy to add different compression algorithms for both PyTorch and TensorFlow deep learning frameworks. ## Key Features + ### Post-Training Compression Algorithms | Compression algorithm |OpenVINO|PyTorch| TensorFlow | ONNX | @@ -184,7 +185,6 @@ quantized_model = nncf.quantize(onnx_model, calibration_dataset) - [//]: # (NNCF provides full [samples](#post-training-quantization-samples), which demonstrate Post-Training Quantization usage for PyTorch, TensorFlow, ONNX, OpenVINO.) ### Training-Time Compression @@ -272,7 +272,8 @@ For a quicker start with NNCF-powered compression, try sample notebooks and scri ### Model Compression Tutorials A collection of ready-to-run Jupyter* notebooks are available to demonstrate how to use NNCF compression algorithms to optimize models for inference with the OpenVINO Toolkit: -- [Accelerate Inference of NLP models with Post-Training Qunatization API of NNCF](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/105-language-quantize-bert) + +- [Accelerate Inference of NLP models with Post-Training Quantization API of NNCF](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/105-language-quantize-bert) - [Convert and Optimize YOLOv8 with OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/230-yolov8-optimization) - [Convert and Optimize YOLOv7 with OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/226-yolov7-optimization) - [NNCF Post-Training Optimization of Segment Anything Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/237-segment-anything) @@ -287,7 +288,9 @@ A collection of ready-to-run Jupyter* notebooks are available to demonstrate how - [Accelerate Inference of Sparse Transformer Models with OpenVINO and 4th Gen Intel Xeon Scalable Processors](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/116-sparsity-optimization) ### Post-Training Quantization Samples + Compact scripts demonstrating quantization and corresponding inference speed boost: + - [Post-Training Quantization of MobileNet v2 OpenVINO Model](examples/post_training_quantization/openvino/mobilenet_v2/README.md) - [Post-Training Quantization of YOLOv8 OpenVINO Model](examples/post_training_quantization/openvino/yolov8/README.md) - [Post-Training Quantization of Anomaly Classification OpenVINO model with control of accuracy metric](examples/post_training_quantization/openvino/quantize_with_accuracy_control/README.md) @@ -298,7 +301,9 @@ Compact scripts demonstrating quantization and corresponding inference speed boo - [Post-Training Quantization of MobileNet v2 TensorFlow Model](examples/post_training_quantization/tensorflow/mobilenet_v2/README.md) ### Training-Time Compression Samples + These examples provide full pipelines including compression, training and inference for classification, object detection and segmentation tasks. + - PyTorch samples: - [Image Classification sample](examples/torch/classification/README.md) - [Object Detection sample](examples/torch/object_detection/README.md) @@ -309,6 +314,7 @@ These examples provide full pipelines including compression, training and infere - [Instance Segmentation sample](examples/tensorflow/segmentation/README.md) ## Third-party repository integration + NNCF may be straightforwardly integrated into training/evaluation pipelines of third-party repositories. ### Used by @@ -322,30 +328,39 @@ NNCF may be straightforwardly integrated into training/evaluation pipelines of t NNCF is used as a compression backend within the renowned `transformers` repository in HuggingFace Optimum Intel. ### Git patches for third-party repository + See [third_party_integration](./third_party_integration) for examples of code modifications (Git patches and base commit IDs are provided) that are necessary to integrate NNCF into the following repositories: - - [huggingface-transformers](third_party_integration/huggingface_transformers/README.md) + +- [huggingface-transformers](third_party_integration/huggingface_transformers/README.md) ## Installation Guide + For detailed installation instructions please refer to the [Installation](./docs/Installation.md) page. NNCF can be installed as a regular PyPI package via pip: -``` + +```bash pip install nncf ``` + If you want to install both NNCF and the supported PyTorch version in one line, you can do this by simply running: -``` + +```bash pip install nncf[torch] ``` + Other viable options besides `[torch]` are `[tf]`, `[onnx]` and `[openvino]`. NNCF is also available via [conda](https://anaconda.org/conda-forge/nncf): -``` + +```bash conda install -c conda-forge nncf ``` -You may also use one of the Dockerfiles in the [docker](./docker) directory to build an image with an environment already set up and ready for running NNCF [sample scripts](#Model-compression-tutorials-and-samples). +You may also use one of the Dockerfiles in the [docker](./docker) directory to build an image with an environment already set up and ready for running NNCF [sample scripts](#model-compression-tutorials-and-samples). ### System requirements + - Ubuntu\* 18.04 or later (64-bit) - Python\* 3.7 or later - Supported frameworks: @@ -362,7 +377,7 @@ List of models and compression results for them can be found at our [Model Zoo p ## Citing -``` +```bi @article{kozlov2020neural, title = {Neural network compression framework for fast model inference}, author = {Kozlov, Alexander and Lazarevich, Ivan and Shamporov, Vasily and Lyalyushkin, Nikolay and Gorbachev, Yury}, @@ -372,13 +387,15 @@ List of models and compression results for them can be found at our [Model Zoo p ``` ## Contributing Guide + Refer to the [CONTRIBUTING.md](./CONTRIBUTING.md) file for guidelines on contributions to the NNCF repository. ## Useful links + - [Documentation](./docs) - Example scripts (model objects available through links in respective README.md files): - - [PyTorch](./examples/torch) - - [TensorFlow](./examples/tensorflow) + - [PyTorch](./examples/torch) + - [TensorFlow](./examples/tensorflow) - [FAQ](./docs/FAQ.md) - [Notebooks](https://github.com/openvinotoolkit/openvino_notebooks#-model-training) - [HuggingFace Optimum Intel](https://huggingface.co/docs/optimum/intel/optimization_ov) diff --git a/ReleaseNotes.md b/ReleaseNotes.md index 4a2cffe75b1..dd8d13bcc61 100644 --- a/ReleaseNotes.md +++ b/ReleaseNotes.md @@ -1,12 +1,13 @@ # Release Notes ## New in Release 2.5.0 + Post-training Quantization: - Features: - Official release of OpenVINO framework support. - Ported NNCF OpenVINO backend to use the [nGraph](https://docs.openvino.ai/2021.3/openvino_docs_nGraph_DG_Introduction.html) representation of OpenVINO models. - - Changed dependecies of NNCF OpenVINO backend. It now depends on `openvino` package and not on the `openvino-dev` package. + - Changed dependencies of NNCF OpenVINO backend. It now depends on `openvino` package and not on the `openvino-dev` package. - Added GRU/LSTM quantization support. - Added quantizer scales unification. - Added support for models with 3D and 5D Depthwise convolution. @@ -61,15 +62,18 @@ Compression-aware training: - Added Windows support for NNCF. ## New in Release 2.4.0 + Target version updates: + - Bump target framework versions to PyTorch 1.13.1, TensorFlow 2.8.x, ONNX 1.12, ONNXRuntime 1.13.1 - Increased target HuggingFace transformers version for the integration patch to 4.23.1 Features: + - Official release of the ONNX framework support. NNCF may now be used for post-training quantization (PTQ) on ONNX models. Added an [example script](examples/post_training_quantization/onnx/mobilenet_v2) demonstrating the ONNX post-training quantization on MobileNetV2. -- Preview release of OpenVINO framework support. +- Preview release of OpenVINO framework support. NNCF may now be used for post-training quantization on OpenVINO models. Added an example script demonstrating the OpenVINO post-training quantization on MobileNetV2. `pip install nncf[openvino]` will install NNCF with the required OV framework dependencies. - Common post-training quantization API across the supported framework model formats (PyTorch, TensorFlow, ONNX, OpenVINO IR) via the `nncf.quantize(...)` function. @@ -80,13 +84,14 @@ The parameter set of the function is the same for all frameworks - actual framew See [description](nncf/experimental/torch/sparsity/movement/MovementSparsity.md) of the movement pruning involved in the JPQD for details. Bugfixes: + - Fixed a division by zero if every operation is added to ignored scope - Improved logging output, cutting down on the number of messages being output to the standard `logging.INFO` log level. - Fixed FLOPS calculation for linear filters - this impacts existing models that were pruned with a FLOPS target. - "chunk" and "split" ops are correctly handled during pruning. - Linear layers may now be pruned by input and output independently. - Matmul-like operations and subsequent arithmetic operations are now treated as a fused pattern. -- (PyTorch) Fixed a rare condition with accumulator overflow in CUDA quantization kernels, which led to CUDA runtime errors and NaN values appearing in quantized tensors and +- (PyTorch) Fixed a rare condition with accumulator overflow in CUDA quantization kernels, which led to CUDA runtime errors and NaN values appearing in quantized tensors and - (PyTorch) `transformers` integration patch now allows to export to ONNX during training, and not only at the end of it. - (PyTorch) `torch.nn.utils.weight_norm` weights are now detected correctly. - (PyTorch) Exporting a model with sparsity or pruning no longer leads to weights in the original model object in-memory to be hard-set to 0. @@ -98,6 +103,7 @@ Bugfixes: - (ONNX) Improved the working time of PTQ by optimizing the calls to ONNX shape inferencing. Breaking changes: + - Fused patterns will be excluded from quantization via `ignored_scopes` only if the top-most node in data flow order matches against `ignored_scopes` - NNCF config's `"ignored_scopes"` and `"target_scopes"` are now strictly checked to be matching against at least one node in the model graph instead of silently ignoring the unmatched entries. - Calling `setup.py` directly to install NNCF is deprecated and no longer guaranteed to work. @@ -106,18 +112,21 @@ Breaking changes: - (ONNX) Removed CompressionBuilder. Excluded examples of NNCF for ONNX with CompressionBuilder API ## New in Release 2.3.0 + - (ONNX) PTQ API support for ONNX. - (ONNX) Added PTQ examples for ONNX in image classification, object detection, and semantic segmentation. - (PyTorch) Added `BootstrapNAS` to find high-performing sub-networks from the super-network optimization. Bugfixes: + - (PyTorch) Returned the initial quantized model when the retraining failed to find out the best checkpoint. - (Experimental) Fixed weight initialization for `ONNXGraph` and `MinMaxQuantization` ## New in Release 2.2.0 + - (TensorFlow) Added TensorFlow 2.5.x support. - (TensorFlow) The `SubclassedConverter` class was added to create `NNCFGraph` for the `tf.Graph` Keras model. -- (TensorFlow) Added `TFOpLambda ` layer support with `TFModelConverter`, `TFModelTransformer`, and `TFOpLambdaMetatype`. +- (TensorFlow) Added `TFOpLambda` layer support with `TFModelConverter`, `TFModelTransformer`, and `TFOpLambdaMetatype`. - (TensorFlow) Patterns from `MatMul` and `Conv2D` to `BiasAdd` and `Metatypes` of TensorFlow operations with weights `TFOpWithWeightsMetatype` are added. - (PyTorch, TensorFlow) Added prunings for `Reshape` and `Linear` as `ReshapePruningOp` and `LinearPruningOp`. - (PyTorch) Added mixed precision quantization config with HAWQ for `Resnet50` and `Mobilenet_v2` for the latest VPU. @@ -128,6 +137,7 @@ Bugfixes: - (Experimental) Added `ONNXPostTrainingQuantization` and `MinMaxQuantization` supports for ONNX. Bugfixes: + - (PyTorch, TensorFlow) Added exception handling of BN adaptation for zero sample values. - (PyTorch, TensorFlow) Fixed learning rate after validation step for `EarlyExitCompressionTrainingLoop`. - (PyTorch) Fixed `FakeQuantizer` to make exact zeros. @@ -136,6 +146,7 @@ Bugfixes: - (PyTorch) Fixed the statistics collection from the pruned model. ## New in Release 2.1.0 + - (PyTorch) All PyTorch operations are now NNCF-wrapped automatically. - (TensorFlow) Scales for concat-affecting quantizers are now unified - (PyTorch) The pruned filters are now set to 0 in the exported ONNX file instead of removing them from the ONNX definition. @@ -153,21 +164,27 @@ Bugfixes: - (PyTorch - Experimental) Added an algorithm to search the model's architecture for basic building blocks. Bugfixes: + - (TensorFlow) Fixed a bug where an operation with int32 inputs (following a Cast op) was attempted to be quantized. - (PyTorch, TensorFlow) LeakyReLU now properly handled during pruning - (PyTorch) Fixed errors with custom modules failing at the `determine_subtype` stage of metatype assignment. - (PyTorch) Fix handling modules with `torch.nn.utils.weight_norm.WeightNorm` applied ## New in Release 2.0.2 + Target version updates: + - Relax TensorFlow version requirements to 2.4.x ## New in Release 2.0.1 + Target version updates: + - Bump target framework versions to PyTorch 1.9.1 and TensorFlow 2.4.3 - Increased target HuggingFace transformers version for the integration patch to 4.9.1 Bugfixes: + - (PyTorch, TensorFlow) Fixed statistic collection for the algo mixing scenario - (PyTorch, TensorFlow) Increased pruning algorithm robustness in cases of a disconnected NNCF graph - (PyTorch, TensorFlow) Fixed the fatality of NNCF graph PNG rendering failures @@ -175,7 +192,7 @@ Bugfixes: - (PyTorch) Fixed a bug with quantizing shared weights multiple times - (PyTorch) Fixed knowledge distillation failures in CPU-only and DataParallel scenarios - (PyTorch) Fixed sparsity application for torch.nn.Embedding and EmbeddingBag modules -- (PyTorch) Added GroupNorm + ReLU as a fusable pattern +- (PyTorch) Added GroupNorm + ReLU as a fusible pattern - (TensorFlow) Fixed gamma fusion handling for pruning TF BatchNorm - (PyTorch) Fixed pruning for models where operations have multiple convolution predecessors - (PyTorch) Fixed NNCFNetwork wrapper so that `self` in the calls to the wrapped model refers to the wrapper NNCFNetwork object and not to the wrapped model @@ -185,7 +202,8 @@ Bugfixes: - (PyTorch, TensorFlow) Fixed FLOPS calculation for grouped convolutions - (PyTorch) Fixed knowledge distillation failures for tensors of unsupported shapes - will now ignore output tensors with unsupported shapes instead of crashing. -## New in Release 2.0: +## New in Release 2.0 + - Added TensorFlow 2.4.2 support - NNCF can now be used to apply the compression algorithms to models originally trained in TensorFlow. NNCF with TensorFlow backend supports the following features: - Compression algorithms: @@ -217,15 +235,16 @@ NNCF with TensorFlow backend supports the following features: - Compression results are claimed for MaskRCNN - Accuracy-aware training available for filter pruning and sparsity in order to achieve best compression results within a given accuracy drop threshold in a fully automated fashion. -- Framework-specific checkpoints produced with NNCF now have NNCF-specific compression state information included, so that the exact compressed model state can be restored/loaded without having to provide the same NNCF config file that was used during the creation of the NNCF-compressed checkpoint +- Framework-specific checkpoints produced with NNCF now have NNCF-specific compression state information included, so that the exact compressed model state can be restored/loaded without having to provide the same NNCF config file that was used during the creation of the NNCF-compressed checkpoint - Common interface for compression methods for both PyTorch and TensorFlow backends (https://github.com/openvinotoolkit/nncf/tree/develop/nncf/api). - (PyTorch) Added an option to specify an effective learning rate multiplier for the trainable parameters of the compression algorithms via NNCF config, for finer control over which should tune faster - the underlying FP32 model weights or the compression parameters. - (PyTorch) Unified scales for concat operations - the per-tensor quantizers that affect the concat operations will now have identical scales so that the resulting concatenated tensor can be represented without loss of accuracy w.r.t. the concatenated subcomponents. -- (TensorFlow) Algo-mixing: Added configuration files and reference checkpoints for filter-pruned + qunatized models: ResNet50@ImageNet2012(40% of filters pruned + INT8), RetinaNet@COCO2017(40% of filters pruned + INT8). -- (Experimental, PyTorch) [Learned Global Ranking]((https://arxiv.org/abs/1904.12368)) filter pruning mechanism for better pruning ratios with less accuracy drop for a broad range of models has been implemented. +- (TensorFlow) Algo-mixing: Added configuration files and reference checkpoints for filter-pruned + quantized models: ResNet50@ImageNet2012(40% of filters pruned + INT8), RetinaNet@COCO2017(40% of filters pruned + INT8). +- (Experimental, PyTorch) [Learned Global Ranking](https://arxiv.org/abs/1904.12368) filter pruning mechanism for better pruning ratios with less accuracy drop for a broad range of models has been implemented. - (Experimental, PyTorch) Knowledge distillation supported, ready to be used with any compression algorithm to produce an additional loss source of the compressed model against the uncompressed version Breaking changes: + - `CompressionLevel` has been renamed to `CompressionStage` - `"ignored_scopes"` and "target_scopes" no longer allow prefix matching - use full-fledged regular expression approach via {re} if anything more than an exact match is desired. - (PyTorch) Removed version-agnostic name mapping for ReLU operations, i.e. the NNCF configs that referenced "RELU" (all caps) as an operation name will now have to reference an exact ReLU PyTorch function name such as "relu" or "relu_" @@ -235,15 +254,19 @@ Breaking changes: - `"quantizable_subgraph_patterns"` option removed from the NNCF config Bugfixes: + - (PyTorch) Fixed a hang with batchnorm adaptation being applied in DDP mode - (PyTorch) Fixed tracing of the operations that return NotImplemented -## New in Release 1.7.1: +## New in Release 1.7.1 + Bugfixes: + - Fixed a bug with where compressed models that were supposed to return named tuples actually returned regular tuples - Fixed an issue with batch norm adaptation-enabled compression runs hanging in the DDP scenario -## New in Release 1.7: +## New in Release 1.7 + - Adjust Padding feature to support accurate execution of U4 on VPU - when setting "target_device" to "VPU", the training-time padding values for quantized convolutions will be adjusted to better reflect VPU inference process. - Weighted layers that are "frozen" (i.e. have requires_grad set to False at compressed model creation time) are no longer considered for compression, to better handle transfer learning cases. - Quantization algorithm now sets up quantizers without giving an option for requantization, which guarantees best performance, although at some cost to quantizer configuration flexibility. @@ -254,8 +277,9 @@ Bugfixes: - Bumped target PyTorch version to 1.8.1 and relaxed package requirements constraints to allow installation into environments with PyTorch >=1.5.0 Notable bugfixes: + - Fixed bias pruning in depthwise convolution -- Made per-tensor quantization available for all operations that support per-channel quantization +- Made per-tensor quantization available for all operations that support per-channel quantization - Fixed progressive training performance degradation when an output tensor of an NNCF-compressed model is reused as its input. - `pip install .` path of installing NNCF from a checked-out repository is now supported. - Nested `with no_nncf_trace()` blocks now function as expected. @@ -263,15 +287,16 @@ Notable bugfixes: - Now possible to load AutoQ and HAWQ-produced checkpoints to evaluate them or export to ONNX Removed features: + - Pattern-based quantizer setup mode for quantization algorithm - due to its logic, it did not guarantee that all required operation inputs are ultimately quantized. +## New in Release 1.6 -## New in Release 1.6: - Added AutoQ - an AutoML-based mixed-precision initialization mode for quantization, which utilizes the power of reinforcement learning to select the best quantizer configuration for any model in terms of quality metric for a given HW architecture type. - NNCF now supports inserting compression operations as pre-hooks to PyTorch operations, instead of abusing the post-hooking; the flexibility of quantization setups has been improved as a result of this change. - Improved the pruning algorithm to group together dependent filters from different layers in the network and prune these together - Extended the ONNX compressed model exporting interface with an option to explicitly name input and output tensors -- Changed the compression scheduler so that the correspondingepoch_step and step methods should now be called in the beginning of the epoch and before the optimizer step (previously these were called in the end of the epoch and after the optimizer step respectively) +- Changed the compression scheduler so that the corresponding epoch_step and step methods should now be called in the beginning of the epoch and before the optimizer step (previously these were called in the end of the epoch and after the optimizer step respectively) - Data-dependent compression algorithm initialization is now specified in terms of dataset samples instead of training batches, e.g. `"num_init_samples"` should be used in place of "num_init_steps" in NNCF config files. - Custom user modules to be registered for compression can now be specified to be ignored for certain compression algorithms - Batch norm adaptation now being applied by default for all compression algorithms @@ -281,12 +306,12 @@ Removed features: - Added an option to optimize logarithms of quantizer scales instead of scales themselves directly, a technique which improves convergence in certain cases - Added reference checkpoints for filter-pruned models: UNet@Mapillary (25% of filters pruned), SSD300@VOC (40% of filters pruned) +## New in Release 1.5 -## New in Release 1.5: - Switched to using the propagation-based mode for quantizer setup by default. Compared to the previous default, pattern-based mode, the propagation-based mode better ensures that all the inputs to operations that can be quantized on a given type of hardware are quantized in accordance with what this hardware allows. Default target hardware is CPU - adjustable via `"target_device"` option in the NNCF config. More details can be found in [Quantization.md](./docs/compression_algorithms/Quantization.md#quantizer-setup-and-hardware-config-files). -- HAWQ mixed-precision initialization now supports a compression ratio parameter setting - set to 1 for a fully INT8 model, > 1 to increasingly allow lower bitwidth. The level of compression for each layer is defined by a product of the layer FLOPS and the quantization bitwidth. -- HAWQ mixed-precision initialization allows specifying a more generic `criterion_fn` callable to calculate the related loss in case of complex output's post-processing or multiple losses. -- Improved algorithm of assigning bitwidth for activation quantizers in HAWQ mixed-precision initialization. If after taking into account the corresponding rules of hardware config there're +- HAWQ mixed-precision initialization now supports a compression ratio parameter setting - set to 1 for a fully INT8 model, > 1 to increasingly allow lower bitwidth. The level of compression for each layer is defined by a product of the layer FLOPS and the quantization bitwidth. +- HAWQ mixed-precision initialization allows specifying a more generic `criterion_fn` callable to calculate the related loss in case of complex output's post-processing or multiple losses. +- Improved algorithm of assigning bitwidth for activation quantizers in HAWQ mixed-precision initialization. If after taking into account the corresponding rules of hardware config there're multiple options for choosing bitwidth, it chooses a common bitwidth for all adjacent weight quantizers. Adjacent quantizers refer to all quantizers between inputs-quantizable layers. - Custom user modules can be registered to have their `weight` attribute considered for compression using the @nncf.register_module - Possible to perform quantizer linking in various points in graph - such quantizers will share the quantization parameters, trainable and non-trainable @@ -300,7 +325,8 @@ Removed features: - GPT2 compression enabled, configuration file added to the `transformers` integration patch - Added GoogLeNet as a filter-pruned sample model (with final checkpoints) -## New in Release 1.4: +## New in Release 1.4 + - Models with filter pruning applied are now exportable to ONNX - BatchNorm adaptation now available as a common compression algorithm initialization step - currently disabled by default, see `"batchnorm_adaptation"` config parameters in compression algorithm documentation (e.g. [Quantizer.md](docs/compression_algorithms/Quantization.md)) for instructions on how to enable it in NNCF config - Major performance improvements for per-channel quantization training - now performs almost as fast as the per-tensor quantization training @@ -313,11 +339,13 @@ Removed features: - Added an example config and model checkpoint for the ResNet50 INT8 + 50% sparsity (RB) ## New in Release 1.3.1 + - Now using PyTorch 1.5 and CUDA 10.2 by default - Support for exporting quantized models to ONNX checkpoints with standard ONNX v10 QuantizeLinear/DequantizeLinear pairs (8-bit quantization only) - Compression algorithm initialization moved to the compressed model creation stage -## New in Release 1.3: +## New in Release 1.3 + - Filter pruning algorithm added - Mixed-precision quantization with manual and automatic (HAWQ-powered) precision setup - Support for DistilBERT @@ -329,7 +357,8 @@ Removed features: - Docker images supplied for easier setup in container-based environments - Usability improvements (NNCF config .JSON file validation by schema, less boilerplate code, separate logging and others) -## New in Release 1.2: +## New in Release 1.2 + - Support for transformer-based networks quantization (tested on BERT and RoBERTa) - Added instructions and Git patches for integrating NNCF into third-party repositories ([mmdetection](https://github.com/open-mmlab/mmdetection), [transformers](https://github.com/huggingface/transformers)) - Support for GNMT quantization @@ -350,9 +379,9 @@ Removed features: - Support of symmetric quantization and two sparsity algorithms with fine-tuning - Automatic model graph transformation. The model is wrapped by the custom class and additional layers are inserted in the graph. The transformations are configurable. - Three training samples which demonstrate usage of compression methods from the NNCF: - - Image Classification: torchvision models for classification and custom models on ImageNet and CIFAR10/100 datasets. - - Object Detection: SSD300, SSD512, MobileNet SSD on Pascal VOC2007, Pascal VOC2012, and COCO datasets. - - Semantic Segmentation: UNet, ICNet on CamVid and Mapillary Vistas datasets. + - Image Classification: torchvision models for classification and custom models on ImageNet and CIFAR10/100 datasets. + - Object Detection: SSD300, SSD512, MobileNet SSD on Pascal VOC2007, Pascal VOC2012, and COCO datasets. + - Semantic Segmentation: UNet, ICNet on CamVid and Mapillary Vistas datasets. - Unified interface for compression methods. - GPU-accelerated *Quantization* layer for fast model fine-tuning. - Distributed training support in all samples. diff --git a/Security.md b/Security.md index c3e8bdada43..6143e2263ee 100644 --- a/Security.md +++ b/Security.md @@ -1,6 +1,7 @@ # Security Policy -Intel is committed to rapidly addressing security vulnerabilities affecting our customers and providing clear guidance on the solution, impact, severity and mitigation. + +Intel is committed to rapidly addressing security vulnerabilities affecting our customers and providing clear guidance on the solution, impact, severity and mitigation. ## Reporting a Vulnerability -Please report any security vulnerabilities in this project [utilizing the guidelines here](https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html). +Please report any security vulnerabilities in this project [utilizing the guidelines here](https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html). diff --git a/docker/README.md b/docker/README.md index 7d8ba1843c2..7a1f4d37465 100644 --- a/docker/README.md +++ b/docker/README.md @@ -1,3 +1,5 @@ +# Using docker + ## Step 1. Install docker Review the instructions for installation docker [here](https://docs.docker.com/engine/install/ubuntu/) and configure Docker @@ -12,15 +14,18 @@ Review the instructions for installation docker [here](https://github.com/NVIDIA ## Step 3. Build image In the project folder run in terminal: -``` + +```bash sudo docker image build --network=host ``` Use `--network` to duplicate the network settings of your localhost into context build. ## Step 4. Run container + Run in terminal: -``` + +```bash sudo docker run \ -it \ --name= \ diff --git a/docs/Algorithms.md b/docs/Algorithms.md index 2c08e6826d8..94e5ea80fbd 100644 --- a/docs/Algorithms.md +++ b/docs/Algorithms.md @@ -1,4 +1,4 @@ -## Implemented Compression Methods +# Implemented Compression Methods Each compression method receives its own hyperparameters that are organized as a dictionary and basically stored in a JSON file that is deserialized when the training starts. Compression methods can be applied separately or together producing sparse, quantized, or both sparse and quantized models. For more information about the configuration, refer to the samples. @@ -18,4 +18,4 @@ Each compression method receives its own hyperparameters that are organized as a - [Sparsity](./compression_algorithms/Sparsity.md) - Magnitude sparsity - Regularization-based (RB) sparsity -- [Filter pruning](./compression_algorithms/Pruning.md) \ No newline at end of file +- [Filter pruning](./compression_algorithms/Pruning.md) diff --git a/docs/ConfigFile.md b/docs/ConfigFile.md index 86cc2a3f39b..ed131bf495b 100644 --- a/docs/ConfigFile.md +++ b/docs/ConfigFile.md @@ -1,13 +1,13 @@ # NNCF Configuration File Description -The Neural Network Compression Framework (NNCF) is designed to work with the configuration file where the parameters of compression that should be applied to the model are specified. -These parameters are organized as a dictionary and stored in a JSON file that is deserialized when the training starts. +The Neural Network Compression Framework (NNCF) is designed to work with the configuration file where the parameters of compression that should be applied to the model are specified. +These parameters are organized as a dictionary and stored in a JSON file that is deserialized when the training starts. The JSON file allows using comments that are supported by the [jstyleson](https://github.com/linjackson78/jstyleson) Python package. The NNCF config .json file is validated against a JSON schema - you can review the latest version of the schema at https://openvinotoolkit.github.io/nncf/. Below is an example of the NNCF configuration file: -``` +```json5 { "input_info": [ // Required - describe the specifics of your model inputs here. This information is used to build the internal graph representation that is leveraged for proper compression functioning, and for exporting the compressed model to ONNX. Inputs in the array without a "keyword" attribute are described in the order of the model's "forward" function argument order. { @@ -63,7 +63,6 @@ Below is an example of the NNCF configuration file: } ``` - The "compression" section is the core of the configuration file. It defines the specific compression algorithms that are to be applied to the model. You can specify either a single compression algorithm to be applied to the model, or multiple compression algorithms to be applied at once. @@ -74,8 +73,6 @@ To specify multiple compression algorithm at once, the "compression" section sho **IMPORTANT:** The `"ignored_scopes"` and `"target_scopes"` sections use a special string format (see "Operation addressing and scope" in [NNCFArchitecture.md](./NNCFArchitecture.md)) to specify the parts of the model that the compression should be applied to. For all such section, regular expression matching can be enabled by prefixing the string with `{re}`, which helps to specify the same compression pattern concisely for networks with multiple same-structured blocks such as ResNet or BERT. - - The [example scripts](../examples) use the same configuration file structure to specify compression, but extend it at the root level to specify the training pipeline hyperparameters as well. These extensions are training pipeline-specific rather than NNCF-specific and their format differs across the example scripts. diff --git a/docs/FAQ.md b/docs/FAQ.md index 7f4a647737f..39d8a6e0bc3 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -1,15 +1,16 @@ # Frequently Asked Questions Links to sections: + - [Common](#common) - [PyTorch](#pytorch) - [TensorFlow](#tensorflow) - [ONNX](#onnx) - ## Common ### What is NNCF for? + NNCF takes a deep learning network model object and modifies it for faster inference. Within NNCF, the process of modification is colloquially known as compression. @@ -17,6 +18,7 @@ Sometimes this is not possible to do without the loss of accuracy for the networ NNCF provides algorithms that strive for minimal or zero loss of accuracy, which can be applied, depending on the algorithm, during training, fine-tuning or post-training. ### Does the Neural Network *Compression* Framework provide *lossless compression*? + Not in the way the term "lossless compression" usually appears in literature. Under "compression" we mean the preparation of the model for *future* efficient execution of this model in the OpenVINO Inference Engine. Under "future" we mean that the process of compression is usually an offline, one-time step before the model is being used in production, which provides a new model object that could then be used instead of the original to run faster and take up lower memory without significantly losing accuracy. @@ -24,37 +26,43 @@ Under "future" we mean that the process of compression is usually an offline, on No *compression* in the sense of archiving or entropy coding is being done during NNCF compression. ### How does your compression make inference faster? + General, well-known, literature-backed techniques of neural network inference acceleration (such as quantization, filter pruning and knowledge distillation) are applied, with Intel HW/runtime specifics in mind. An overview of some of those can be found in the [following paper](https://arxiv.org/abs/2002.08679). - ### Can I use NNCF-compressed models with runtimes other than OpenVINO Inference Engine? + While this is certainly possible in some cases, with a beneficial outcome even, we recommend NNCF as a way to get the most out of your setup based on OpenVINO Inference Engine inference. We aim for best results on OpenVINO runtime with Intel hardware, and development-wise this is not always easy to generalize to other platforms or runtimes. Some backends such as onnxruntime also support using OpenVINO Inference Engine as the actual executor for the inference, so NNCF-compressed models will also work there. ### Do I need OpenVINO or an Intel CPU to run NNCF? + Currently, this is not required in general. Most NNCF backends can run compression and produce a compressed model object without OpenVINO or an Intel CPU on board of the machine. You only need OpenVINO and Intel hardware when you actually need to run inference on the compressed model, e.g. in a production scenario. ### Do I need a GPU to run NNCF? + Currently all NNCF-supported backends allow running in a CPU-only mode, and NNCF does not disturb this. Note, however, that training-aware compression will naturally work much slower on most CPUs when compared with GPU-powered execution. Check out the [notebooks](https://github.com/openvinotoolkit/openvino_notebooks#-model-training) for examples of NNCF being applied on smaller datasets which work in a reasonable amount of time on a CPU-only setup. ### NNCF supports both training and post-training compression approaches, how do I know which I need? + The rule of thumb is - start with post-training compression, and use training compression if you are not satisfied with the results and if training compression is possible for your use case. Post-training is faster, but can degrade accuracy more than the training-enabled approach. ### I don't see any improvements after applying the `*_sparsity` algorithms + The sparsity algorithms introduce unstructured sparsity which can only be taken advantage of in terms of performance by using specialized hardware and/or software runtimes. Within the scope of these algorithms, NNCF provides functionally correct models with non-salient weights simply zeroed out, which does not lead to the reduction of the model checkpoint size. The models can, however, be used for benchmarking experimental/future hardware or runtimes, and for SOTA claims of applying unstructured sparsity on a given model architecture. For an opportunity to observably increase performance by omitting unnecessary computations in the model, consider using the [filter pruning](./compression_algorithms/Pruning.md) algorithm. Models compressed with this algorithm can be executed more efficiently within OpenVINO Inference Engine runtime when compared to the uncompressed counterparts. ### What is a "saturation issue" and how to avoid it? -On older generations of Intel CPUs (those not supporting [AVX-VNNI](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX-VNNI)) convolutions and linear layer INT8 execution is implemented in OpenVINO's Inference Engine in such a way that mathematical overflow manifests itself _if more than 128 levels are used in the quantized domain_ (out of possible 2^8 = 256) for the weights of the corresponding operations. + +On older generations of Intel CPUs (those not supporting [AVX-VNNI](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX-VNNI)) convolutions and linear layer INT8 execution is implemented in OpenVINO's Inference Engine in such a way that mathematical overflow manifests itself *if more than 128 levels are used in the quantized domain* (out of possible 2^8 = 256) for the weights of the corresponding operations. This is referred to as "saturation issue" within NNCF. On newer AVX-VNNI enabled Intel CPUs the Inference Engine uses a better set of instructions that do not exhibit this flaw. @@ -65,6 +73,7 @@ You can influence this behaviour by setting the `"overflow_fix"` parameter in th See documentation for this parameter in the [NNCF configuration file JSON schema reference](https://openvinotoolkit.github.io/nncf/#compression_oneOf_i0_oneOf_i0_overflow_fix). ### How can I exclude certain layers from compression? + Utilize the "ignored_scopes" parameter, either using an [NNCF config file](./ConfigFile.md) or by passing these as a function parameter if you are using NNCF purely by its Python API. Within this parameter you can set up one or multiple identifiers to layers in your model (regex is possible) and these will be correspondingly ignored while applying the algorithms. This can be done either globally or on a per-algorithm basis. @@ -74,24 +83,25 @@ For better understanding of how NNCF sees the layers in your network so that you These files are dumped in the NNCF's log directory at each invocation of model compression. ### Why do I need to pass a dataloader to certain NNCF algorithms? + These algorithms have to run a forward pass on the model to be compressed in order to properly initialize the compressed state of the model and/or to gather activation statistics that are indisposable in this algorithm. It is recommended, although by no means mandatory, to pass a dataloader with the same dataset that you were training the initial model for. - ### The compression process takes too long, how can I make it faster? + For training approaches the majority of time is taken by the training loop, so any regular methods that improve model convergence should work here. Try the built-in [knowledge distillation](./compression_algorithms/KnowledgeDistillation.md) to potentially obtain target accuracy faster. Alternatively you may want to reduce the number of initialization samples taken from the initialization dataloader by the algorithms that require it. +### I get a "CUDA out of memory" error when running NNCF in the compression-aware training approach, although the original model to be compressed runs and trains fine without NNCF -### I get a "CUDA out of memory" error when running NNCF in the compression-aware training approach, although the original model to be compressed runs and trains fine without NNCF. As some of the compression algorithm parameters are also trainable, NNCF-compressed model objects ready for training will have a larger GPU memory footprint than the uncompressed counterparts. Try reducing batch size for the NNCF training runs if it makes sense to do so in your situation. - - ## PyTorch + ### Importing anything from `nncf.torch` hangs + NNCF utilizes the [torch C++ extensions](https://pytorch.org/tutorials/advanced/cpp_extension.html) mechanism to accelerate the quantization-aware training process. This is done by just-in-time compiling a set of C++/CUDA files using the system-local compilers and toolsets. The compilation happens at the first import of `nncf.torch` or anything under that namespace on the machine, or within the current Python environment. @@ -103,27 +113,34 @@ To resolve these, delete the `torch_extensions` directory (at `~/.cache`, or poi The compilation takes some time and happens upon import, so do not interrupt the launch of your Python script until the import has been completed. ### Importing anything from `nncf.torch` leads to an error mentioning `gcc`, `nvcc`, `ninja`, or `cl.exe` + See the answer above for the general description of the reasons why these are involved in NNCF PyTorch operation. To resolve, make sure that your CUDA installation contains the development tools (e.g. the `nvcc` compiler), and that the environmental variables are set properly so that these tools are available in `PATH` or `PYTHONPATH`. ### My model trains and runs slower in PyTorch when compressed by NNCF + NNCF does not in general accelerate training or inference when the compressed model is run in PyTorch. It only prepares the model for further inference with OpenVINO's Inference Engine, where the runtime has capabilities of processing the NNCF-compressed models so that they run faster than their uncompressed counterparts. The process of compressing in PyTorch relies on hooking regular PyTorch functions and calling extra code for purposes of compression algorithm logic, so the NNCF-processed models will inevitably run slower in PyTorch. Export your model after processing with NNCF to an OpenVINO-ingestible format (e.g. ONNX) and run it with the OpenVINO Inference Engine, to enjoy speedups when compared to the uncompressed model inference with Inference Engine. ### The .pth checkpoints for the compressed model have larger size and parameter count when compared to the uncompressed model + See the answer to the above question. Additional parameters are part of the compression algorithm internal state being saved along with the regular model weights, and any model size footprint reduction is deferred until exporting and/or running the model with OpenVINO Inference Engine. ### My RNN model is not compressed completely or fails at the compression stage + Currently NNCF PyTorch can only properly handle models with acyclic execution graphs. RNNs, which inherently have cycles, can behave oddly when processed with NNCF PyTorch, which includes loss of quality, unreproducible results and failure to compress. -### I get a `Could not deduce the forward arguments from the initializing dataloader output.` runtime error when executing `create_compressed_model`. + +### I get a `Could not deduce the forward arguments from the initializing dataloader output.` runtime error when executing `create_compressed_model` + Dataloaders can return anything, and this output may be preprocessed in the rest of the training pipeline before actually ending up in model's `forward` method. NNCF needs a dataloader already at the compressed model creation stage, e.g. before training, and doesn't in general know about the further preprocessing (turning the output of `v8_dataloader` into actual `forward` args and kwargs. You have to give NNCF this information by wrapping your dataloader object in an own subclass of a `nncf.torch.initialization.PTInitializingDataLoader` object that properly defines the `get_inputs` and `get_target` abstract methods: + ```python from nncf.torch.initialization import PTInitializingDataLoader diff --git a/docs/Installation.md b/docs/Installation.md index 5c4c2bb619a..53dc91e80ca 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -1,53 +1,63 @@ -## Installation +# Installation + We suggest to install or use the package in the [Python virtual environment](https://docs.python.org/3/tutorial/venv.html). If you want to optimize a model from PyTorch, install PyTorch by following [PyTorch installation guide](https://pytorch.org/get-started/locally/#start-locally). For other backend follow: [TensorFlow installation guide](https://www.tensorflow.org/install/), [ONNX installation guide](https://onnxruntime.ai/docs/install/), [OpenVINO installation guide](https://docs.openvino.ai/latest/openvino_docs_install_guides_overview.html). -#### As a PyPI package: +## As a PyPI package NNCF can be installed as a regular PyPI package via pip: -``` + +```bash pip install nncf ``` + If you want to install both NNCF and the supported PyTorch version in one line, you can do this by simply running: -``` + +```bash pip install nncf[torch] ``` -Other viable options besides `[torch]` are `[tf]`, `[onnx]` and `[openvino]`. +Other viable options besides `[torch]` are `[tf]`, `[onnx]` and `[openvino]`. -#### As a package built from a checked-out repository: +## As a package built from a checked-out repository Install the package and its dependencies by running the following command in the repository root directory: -``` + +```bash pip install . ``` Use the same `pip install` syntax as above to install NNCF along with the backend package version in one go: -``` + +```bash pip install .[] ``` + List of supported backends: `torch`, `tf`, `onnx` and `openvino`. For development purposes install extra packages by -``` + +```bash pip install .[dev,tests] ``` - _NB_: For launching example scripts in this repository, we recommend setting the `PYTHONPATH` variable to the root of the checked-out repository once the installation is completed. - NNCF is also available via [conda](https://anaconda.org/conda-forge/nncf): -``` + +```bash conda install -c conda-forge nncf ``` -#### From a specific commit hash using pip: +## From a specific commit hash using pip + ```python pip install git+https://github.com/openvinotoolkit/nncf@bd189e2#egg=nncf ``` + Note that in order for this to work for pip versions >= 21.3, your Git version must be at least 2.22. -#### As a Docker image -Use one of the Dockerfiles in the [docker](./docker) directory to build an image with an environment already set up and ready for running NNCF [sample scripts](#model-compression-samples). +## As a Docker image + +Use one of the Dockerfiles in the [docker](../docker) directory to build an image with an environment already set up and ready for running NNCF [sample scripts](../README.md#model-compression-samples). diff --git a/docs/ModelZoo.md b/docs/ModelZoo.md index bea36a3306a..0e367d8f7fa 100644 --- a/docs/ModelZoo.md +++ b/docs/ModelZoo.md @@ -1,18 +1,18 @@ # NNCF Compressed Model Zoo -Here we present the results achieved using our sample scripts, example patches to third-party repositories and NNCF configuration files. +Here we present the results achieved using our sample scripts, example patches to third-party repositories and NNCF configuration files. -The applied quantization compression algorithms are divided into two broad categories: Quantization-Aware Training ([QAT](../README.md#training-time-compression)) and Post-Training Quantization ([PTQ](../README.md#post-training-quantization)). Here we mainly report the QAT results and the PTQ results may be found on an OpenVino Performance Benchmarks [page](https://docs.openvino.ai/latest/openvino_docs_performance_benchmarks.html). +The applied quantization compression algorithms are divided into two broad categories: Quantization-Aware Training ([QAT](../README.md#training-time-compression)) and Post-Training Quantization ([PTQ](../README.md#post-training-quantization)). Here we mainly report the QAT results and the PTQ results may be found on an OpenVino Performance Benchmarks [page](https://docs.openvino.ai/latest/openvino_docs_performance_benchmarks.html). - [PyTorch](#pytorch) - * [Classification](#pytorch-classification) - * [Object Detection](#pytorch-object-detection) - * [Semantic Segmentation](#pytorch-semantic-segmentation) - * [Natural Language Processing (3rd-party training pipelines)](#pytorch-nlp-huggingface-transformers-powered-models) + - [Classification](#pytorch-classification) + - [Object Detection](#pytorch-object-detection) + - [Semantic Segmentation](#pytorch-semantic-segmentation) + - [Natural Language Processing (3rd-party training pipelines)](#pytorch-nlp-huggingface-transformers-powered-models) - [TensorFlow](#tensorflow) - * [Classification](#tensorflow-classification) - * [Object Detection](#tensorflow-object-detection) - * [Instance Segmentation](#tensorflow-instance-segmentation) + - [Classification](#tensorflow-classification) + - [Object Detection](#tensorflow-object-detection) + - [Instance Segmentation](#tensorflow-instance-segmentation) - [ONNX](#onnx) ## PyTorch @@ -949,4 +949,3 @@ The applied quantization compression algorithms are divided into two broad categ - diff --git a/docs/NNCFArchitecture.md b/docs/NNCFArchitecture.md index e091d0a6a9a..820f630dff0 100644 --- a/docs/NNCFArchitecture.md +++ b/docs/NNCFArchitecture.md @@ -1,19 +1,22 @@ # NNCF Architectural Overview -### Introduction +## Introduction + Neural Networks Compression Framework is a set of compression algorithms and tools to implement compression algorithms that is designed to work atop PyTorch. In essence, all of the compression algorithms present in NNCF do certain manipulations with the data inside the control flow graph of a DNN - be it the process of quantizing the values of an input tensor for a fully connected layer, or setting certain values of a convolutional layer to zero, etc. A general way to express these manipulations is by using hooks inserted in specific points of the DNN control flow graph. -### NNCFGraph +## NNCFGraph + To abstract away the compression logic from specifics of the backend, NNCF builds an `NNCFGraph` object for each incoming model object to be compressed. -`NNCFGraph` is a wrapper over a regular directed acyclic graph that represents a control flow/execution graph of a DNN. +`NNCFGraph` is a wrapper over a regular directed acyclic graph that represents a control flow/execution graph of a DNN. Each node corresponds to a call of a backend-specific function ("operator"). It is built both for the original, unmodified model, and for the model with compression algorithms applied (which, in general, may have additional operations when compared to the original model). -### PyTorch-specific +## PyTorch-specific + +### NNCFNetwork -#### NNCFNetwork During NNCF compression, the incoming original model object is dynamically extended with NNCF-enabling functionality. This is done by replacing the model object's _class object_ with another class object that lists not only the original class object as its base, but also the `NNCFNetwork` object. The compressed model object can then be identified as passing the `isinstance(obj, NNCFNetwork)` checks, but also the `isinstance(obj, original_class)` checks. @@ -25,19 +28,21 @@ In the model object processed in such a way, the following applies: 3. additional trainable modules and parameters specific to the applied compression algorithms are invisibly stored along with the regular model parameters, so that when saving an instance of `NNCFNetwork` via the usual `torch.save` calls, the trainable parameters of the compression algorithm are saved into the same state dict as the rest of the model parameters. The additional attributes and methods that appear in the original model object are separated from the original attribute/method names - the accesses to the NNCF-specific attributes and methods are done via an intermediate `nncf` property: -```python3 + +```python assert isinstance(model, NNCFNetwork) model.original_method_call() model.nncf.nncf_specific_method() ``` -This allows to avoid name collisions between NNCF-specific attributes and original model attributes. + +This allows to avoid name collisions between NNCF-specific attributes and original model attributes. `model.nncf` returns a `nncf.torch.nncf_network.NNCFNetworkInterface` object - the class contains all of the methods and attributes that could be called on the compressed model object to invoke NNCF-specific functionality. During compression algorithm application, the `NNCFNetwork` serves internally as a receptacle for compression algorithm-related adjustments to the control flow graph of the model. +### Model control flow graph tracing -#### Model control flow graph tracing Unlike other frameworks such as TensorFlow, PyTorch does not have an easily accessible graph representation of a model, and thus no way to identify specific points in the control flow graph. For this reason NNCF performs tracing of the PyTorch operators, implemented via wrapping the corresponding function and module calls. Through this process of tracing, NNCF builds an internal representation of the model graph, which is then supplied as the point of reference for specification and insertion of hooks at proper places in the network. @@ -54,17 +59,15 @@ c) the shape of the input tensors to the current operator, and d) the IDs of the nodes that produced each current operator's input as their output. - This information is stored as an `OperationExecutionContext` of the operator. If an operator call does not match to the nodes already present in the internal graph representation based on its `OperationExecutionContext`, a new node is added to the graph. -This process occurs dynamically during each `forward` call of an `NNCFNetwork`. -If the control flow is data-dependent, a whole new subgraph of the model will be built for each branching in the model definition. -The graph building mechanism can cope with some branching, but it is advisable to disable NNCF tracing for the parts of the model that exhibit branching (such as the "detection output" layers of object detection networks) by using a `no_nncf_trace()` context. -It is possible to wrap third party functionality with `no_nncf_trace()` context so that this source code does not need to be changed. +This process occurs dynamically during each `forward` call of an `NNCFNetwork`. +If the control flow is data-dependent, a whole new subgraph of the model will be built for each branching in the model definition. +The graph building mechanism can cope with some branching, but it is advisable to disable NNCF tracing for the parts of the model that exhibit branching (such as the "detection output" layers of object detection networks) by using a `no_nncf_trace()` context. +It is possible to wrap third party functionality with `no_nncf_trace()` context so that this source code does not need to be changed. This can be done by patching, please refer to this [example](../examples/post_training_quantization/torch/ssd300_vgg16/README.md). +### Operation scope and addressing - -#### Operation scope and addressing A unique identifier of a node in the `NNCFGraph` - i.e. an operation in the DNN control flow graph - is the `OperationExecutionContext`. However, in most cases the input-agnostic part of `OperationExecutionContext` is enough to identify an operation in the model control flow graph for purposes of inserting compression-related hooks into the model. This `InputAgnosticOperationExecutionContext` is built using a) and b) from the information list gathered to build a regular `OperationExecutionContext`. Its string representation is a concatenation of a `Scope` string representation, the name of the operator (Python function), underscore `_`, and the order of the operator call in the same `Scope`. In turn, the string representation of a `Scope` is a sequence of "__module_class_name__[__module_field_name__]/" substrings, where each such substring corresponds to a __module_class_name__ type of `torch.nn.Module` being called as a __module_field_name__ member field of its parent module, and slashes `/` separate the adjacent levels of the module call hierarchy. @@ -73,17 +76,17 @@ As an example, consider a simple PyTorch module: ```python class SimpleModule(torch.nn.Module): - def __init__(): - super().__init__() - self.submodule1 = torch.nn.Conv2d(...) # params omitted - self.submodule2 = torch.nn.Sequential([torch.nn.BatchNorm2d(...), torch.nn.ReLU(...)]) - def forward(x_in): - x = self.submodule1(x_in) - x = self.submodule2(x) - x += torch.ones_like(x) - x += torch.ones_like(x) - x = torch.nn.functional.relu(x) - return x + def __init__(): + super().__init__() + self.submodule1 = torch.nn.Conv2d(...) # params omitted + self.submodule2 = torch.nn.Sequential([torch.nn.BatchNorm2d(...), torch.nn.ReLU(...)]) + def forward(x_in): + x = self.submodule1(x_in) + x = self.submodule2(x) + x += torch.ones_like(x) + x += torch.ones_like(x) + x = torch.nn.functional.relu(x) + return x ``` Each `torch.nn.Conv2d` module call internally calls a `conv2d` operator, which will then be added to an `NNCFGraph` during tracing. Therefore, the two convolution operations in the model's control flow graph will have the following `InputAgnosticOperationExecutionContext` string representations: `SimpleModule/Conv2d[submodule1]/conv2d_0` and `SimpleModule/Conv2d[submodule2]/conv2d_0`. @@ -94,9 +97,8 @@ The two consecutive addition operations will be represented by `SimpleModule/__i These string definitions are referred to as "scopes" in the NNCF configuration files (as in `"ignored_scopes"` or `"target_scopes"`), and help specify exact operations for inclusion into or exclusion from compression or for separate compression parameter specification. +### Compression algorithm API and interaction with NNCFNetwork - -#### Compression algorithm API and interaction with NNCFNetwork A compression algorithm is a modification of a regular model control flow according to some trainable or non-trainable parameters. Modification of the control flow is done via a hook, and the trainable parameters are stored inside special NNCF modules. Each compression algorithm therefore consists of taking an unmodified model, analyzing it and then determining a set of modifications necessary for modifying the model's execution so that it now takes specific compression into account. `NNCFNetwork` defines a common interface for compression algorithms to specify the location for hook insertion (based on a `InputAgnosticOperationExecutionContext` of an operation) and the hook itself. It also allows algorithms to register external modules within itself so that the trainable parameters of the compression algorithm could be saved as a checkpoint along with the model while also being indistinguishable from any other trainable parameter of the original model from the training pipeline optimizer's standpoint. @@ -114,4 +116,4 @@ Once all algorithms are applied to the model, the compression changes are commit A `CompressionAlgorithmController` is then used to control or modify aspects of compression during training, to gather statistics related to compression, or to provide additional loss for proper training of the trainable compression parameters. To this purpose it contains a `CompressionScheduler` and `CompressionLoss` instances, which can then be used as desired during the training pipeline. For instance, a `CompressionScheduler` may be implemented so that it enables quantization for activations only upon a certain training epoch, and a `CompressionLoss` may be implemented so that it facilitates soft filter pruning. -> **NOTE**: In general, the compression method may not have its own scheduler and loss, and the default implementations are used instead. +> __NOTE__: In general, the compression method may not have its own scheduler and loss, and the default implementations are used instead. diff --git a/docs/Usage.md b/docs/Usage.md index 46cebbbea5b..12e4689fe73 100644 --- a/docs/Usage.md +++ b/docs/Usage.md @@ -6,35 +6,41 @@ The task is to prepare this model for accelerated inference by simulating the co The instructions below use certain "helper" functions of the NNCF which abstract away most of the framework specifics and make the integration easier in most cases. As an alternative, you can always use the NNCF internal objects and methods as described in the [architectural overview](./NNCFArchitecture.md). - ## Basic usage -#### Step 1: Create an NNCF configuration file +### Step 1: Create an NNCF configuration file A JSON configuration file is used for easier setup of the parameters of compression to be applied to your model. See [configuration file description](./ConfigFile.md) or the sample configuration files packaged with the [example scripts](../examples) for reference. -#### Step 2: Modify the training pipeline +### Step 2: Modify the training pipeline + NNCF enables compression-aware training by being integrated into the regular training pipelines. The framework is designed so that the modifications to your original training code are minor. 1. **Add** the imports required for NNCF: + ```python import torch import nncf.torch # Important - must be imported before any other external package that depends on torch from nncf import NNCFConfig, create_compressed_model, load_state ``` + **NOTE (PyTorch)**: Due to the way NNCF works within the PyTorch backend, `import nncf` must be done before any other import of `torch` in your package _or_ in third-party packages that your code utilizes, otherwise the compression may be applied incompletely. 2. Load the NNCF JSON configuration file that you prepared during Step 1: + ```python nncf_config = NNCFConfig.from_json("nncf_config.json") # Specify a path to your own NNCF configuration file in place of "nncf_config.json" ``` + 3. (Optional) For certain algorithms such as quantization it is highly recommended to **initialize the algorithm** by passing training data via `nncf_config` prior to starting the compression fine-tuning properly: + ```python from nncf import register_default_init_args nncf_config = register_default_init_args(nncf_config, train_loader, criterion=criterion) ``` + Training data loaders should be attached to the NNCFConfig object as part of a library-defined structure. `register_default_init_args` is a helper method that registers the necessary structures for all available initializations (currently quantizer range and precision initialization) by taking data loader, criterion and criterion function (for sophisticated calculation of loss different from direct call of the @@ -45,47 +51,57 @@ The framework is designed so that the modifications to your original training co `nncf.common.initialization.dataloader.NNCFDataLoader` interface to return a tuple of (_single model input_ , _the rest of the model inputs as a kwargs dict_). 4. Right after you create an instance of the original model and load its weights, **wrap the model** by making the following call + ```python compression_ctrl, compressed_model = create_compressed_model(model, nncf_config) ``` + The `create_compressed_model` function parses the loaded configuration file and returns two objects. `compression_ctrl` is a "controller" object that can be used during compressed model training to adjust certain parameters of the compression algorithm (according to a scheduler, for instance), or to gather statistics related to your compression algorithm (such as the current level of sparsity in your model). 5. (Optional) Wrap your model with `DataParallel` or `DistributedDataParallel` classes for multi-GPU training. If you use `DistributedDataParallel`, add the following call afterwards: - ```python - compression_ctrl.distributed() - ``` - in case the compression algorithms that you use need special adjustments to function in the distributed mode. + ```python + compression_ctrl.distributed() + ``` + in case the compression algorithms that you use need special adjustments to function in the distributed mode. + + 6. In the **training loop**, make the following changes: -6. In the **training loop**, make the following changes: - After inferring the model, take a compression loss and add it (using the `+` operator) to the common loss, for example cross-entropy loss: + ```python compression_loss = compression_ctrl.loss() loss = cross_entropy_loss + compression_loss ``` + - Call the scheduler `step()` before each training iteration: + ```python compression_ctrl.scheduler.step() ``` + - Call the scheduler `epoch_step()` before each training epoch: + ```python compression_ctrl.scheduler.epoch_step() ``` > **NOTE**: For a real-world example of how these changes should be introduced, take a look at the [examples](../examples) published in the NNCF repository. -#### Step 3: Run the training pipeline +### Step 3: Run the training pipeline + At this point, the NNCF is fully integrated into your training pipeline. You can run it as usual and monitor your original model's metrics and/or compression algorithm metrics and balance model metrics quality vs. level of compression. - Important points you should consider when training your networks with compression algorithms: - - Turn off the `Dropout` layers (and similar ones like `DropConnect`) when training a network with quantization or sparsity - - It is better to turn off additional regularization in the loss function (for example, L2 regularization via `weight_decay`) when training the network with RB sparsity, since it already imposes an L0 regularization term. -#### Step 4: Export the compressed model +- Turn off the `Dropout` layers (and similar ones like `DropConnect`) when training a network with quantization or sparsity +- It is better to turn off additional regularization in the loss function (for example, L2 regularization via `weight_decay`) when training the network with RB sparsity, since it already imposes an L0 regularization term. + +### Step 4: Export the compressed model + After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it. There are two ways to export a model: 1. Call the compression controller's `export_model` method to properly export the model with compression specifics into ONNX. @@ -93,6 +109,7 @@ After the compressed model has been fine-tuned to acceptable accuracy and compre ```python compression_ctrl.export_model("./compressed_model.onnx") ``` + The exported ONNX file may contain special, non-ONNX-standard operations and layers to leverage full compressed/low-precision potential of the OpenVINO toolkit. In some cases it is possible to export a compressed model with ONNX standard operations only (so that it can be run using `onnxruntime`, for example) - this is the case for the 8-bit symmetric quantization and sparsity/filter pruning algorithms. Refer to [compression algorithm documentation](./compression_algorithms) for details. @@ -114,6 +131,7 @@ After the compressed model has been fine-tuned to acceptable accuracy and compre ``` ## Saving and loading compressed models + The complete information about compression is defined by a compressed model and a compression state. The model characterizes the weights and topology of the network. The compression state - how to restore the setting of compression layers in the model and how to restore the compression schedule and the compression loss. @@ -136,6 +154,7 @@ sparsity algorithm has learnt masking of 30% weights out of 51% of target rate. algorithm, for example when rb-sparsity method sets final target sparsity rate for the loss. ### Saving and loading compressed models in TensorFlow + ```python # save part compression_ctrl, compress_model = create_compressed_model(model, nncf_config) @@ -172,6 +191,7 @@ string within `tf.train.Checkpoint`. There are 2 helper classes: `TFCompressionS ### Saving and loading compressed models in PyTorch Deprecated API + ```python # save part compression_ctrl, compressed_model = create_compressed_model(model, nncf_config) @@ -191,6 +211,7 @@ compression_ctrl.scheduler.load_state(resuming_checkpoint['scheduler_state']) ``` New API + ```python # save part compression_ctrl, compressed_model = create_compressed_model(model, nncf_config) @@ -237,8 +258,8 @@ have the same structure with regard to PyTorch module and parameters as it was w In practice this means that you should use the same compression algorithms (i.e. the same NNCF configuration file) when loading a compressed model checkpoint. - ## Exploring the compressed model + After a `create_compressed_model` call, the NNCF log directory will contain visualizations of internal representations for the original, uncompressed model (`original_graph.dot`) and for the model with the compression algorithms applied (`compressed_graph.dot`). These graphs form the basis for NNCF analyses of your model. Below is the example of a LeNet network's `original_graph.dot` visualization: @@ -259,10 +280,10 @@ For instance, below is the same LeNet INT8 model as above, but with `"ignored_sc Notice that all RELU operation outputs and the second convolution's weights are no longer quantized. - ## Advanced usage ### Compression of custom modules + With no target model code modifications, NNCF only supports native PyTorch modules with respect to trainable parameter (weight) compressed, such as `torch.nn.Conv2d` If your model contains a custom, non-PyTorch standard module with trainable weights that should be compressed, you can register it using the `@nncf.register_module` decorator: @@ -281,9 +302,11 @@ If registered module should be ignored by specific algorithms use `ignored_algor In the example above, the NNCF-compressed models that contain instances of `MyModule` will have the corresponding modules extended with functionality that will allow NNCF to quantize, sparsify or prune the `weight` parameter of `MyModule` before it takes part in `MyModule`'s `forward` calculation. ### Accuracy-Aware model training + NNCF has the capability to apply the model compression algorithms while satisfying the user-defined accuracy constraints. This is done by executing an internal custom accuracy-aware training loop, which also helps to automate away some of the manual hyperparameter search related to model training such as setting the total number of epochs, the target compression rate for the model, etc. There are two supported training loops. The first one is called [Early Exit Training](./accuracy_aware_model_training/EarlyExitTraining.md), which aims to finish fine-tuning when the accuracy drop criterion is reached. The second one is more sophisticated. It is targeted for the automated discovery of the compression rate for the model given that it satisfies the user-specified maximal tolerable accuracy drop due to compression. Its name is [Adaptive Compression Level Training](./accuracy_aware_model_training/AdaptiveCompressionTraining.md). Both training loops could be run with either PyTorch or TensorFlow backend with the same user interface(except for the TF case where the Keras API is used for training). The following function is required to create the accuracy-aware training loop. One has to pass the `NNCFConfig` object and the compression controller (that is returned upon compressed model creation, see above). + ```python from nncf.common.accuracy_aware_training import create_accuracy_aware_training_loop training_loop = create_accuracy_aware_training_loop(nncf_config, compression_ctrl, uncompressed_model_accuracy) @@ -344,14 +367,17 @@ def dump_checkpoint_fn(model, compression_controller, accuracy_aware_runner, sav ``` Once the above functions are defined, you could pass them to the `run` method of the earlier created training loop : + ```python -model = training_loop.run(model, - train_epoch_fn=train_epoch_fn, - validate_fn=validate_fn, - configure_optimizers_fn=configure_optimizers_fn, - dump_checkpoint_fn=dump_checkpoint_fn) +model = training_loop.run( + model, + train_epoch_fn=train_epoch_fn, + validate_fn=validate_fn, + configure_optimizers_fn=configure_optimizers_fn, + dump_checkpoint_fn=dump_checkpoint_fn) ``` + The above call executes the accuracy-aware training loop and return the compressed model. For more details on how to use the accuracy-aware training loop functionality of NNCF, please refer to its [documentation](./accuracy_aware_model_training/AdaptiveCompressionTraining.md). See a PyTorch [example](../../examples/torch/classification/main.py) for **Quantization** + **Filter Pruning** Adaptive Compression scenario on CIFAR10 and ResNet18 [config](../../examples/torch/classification/configs/pruning/resnet18_cifar10_accuracy_aware.json). diff --git a/docs/accuracy_aware_model_training/AdaptiveCompressionLevelTraining.md b/docs/accuracy_aware_model_training/AdaptiveCompressionLevelTraining.md index cd0ee66fb6b..4ec0a2d4b07 100644 --- a/docs/accuracy_aware_model_training/AdaptiveCompressionLevelTraining.md +++ b/docs/accuracy_aware_model_training/AdaptiveCompressionLevelTraining.md @@ -5,21 +5,22 @@ The compression pipeline can consist of several compression algorithms (Algorith See a PyTorch [example](../../examples/torch/classification/main.py) for **Quantization** + **Filter Pruning** scenario on CIFAR10 and ResNet18 [config](../../examples/torch/classification/configs/pruning/resnet18_cifar10_accuracy_aware.json). -The exact compression algorithm for which the compression level search will be applied is determined in "compression" config section. The parameters to be set by the user in this config section are: -1) `maximal_relative_accuracy_degradation` or `maximal_absolute_accuracy_degradation` (Optional; default `maximal_relative_accuracy_degradation=1.0`) - the maximal allowed accuracy metric drop relative to the original model metrics (in percent) or the maximal allowed absolute accuracy metric drop (in original metrics value), -2) `initial_training_phase_epochs` (Optional; default=5) - number of epochs to train the model with the compression schedule specified in the `"params"` section of `"compression"` algorithm. -3) `patience_epochs` (Optional; default=3) - number of epochs to train the model for a compression rate level set by the search algorithm before switching to another compression rate value. -4) `minimal_compression_rate_step` (Optional; default=0.025) - the minimal compression rate change step value after which the training loop is terminated. -5) `initial_compression_rate_step` (Optional; default=0.1) - initial value for the compression rate increase/decrease training phase of the compression training loop. -6) `compression_rate_step_reduction_factor` (Optional; default=0.5) - factor used to reduce the compression rate change step in the adaptive compression training loop. -7) `lr_reduction_factor` (Optional; default=0.5) - factor used to reduce the base value of the learning rate scheduler after compression rate step is reduced. -8) `maximal_total_epochs` (Optional; default=10000) - number of training epochs, if the fine-tuning epoch reaches this number, the loop finishes the fine-tuning and return the model with thi highest compression rate and the least accuracy drop. +The exact compression algorithm for which the compression level search will be applied is determined in "compression" config section. The parameters to be set by the user in this config section are: +1. `maximal_relative_accuracy_degradation` or `maximal_absolute_accuracy_degradation` (Optional; default `maximal_relative_accuracy_degradation=1.0`) - the maximal allowed accuracy metric drop relative to the original model metrics (in percent) or the maximal allowed absolute accuracy metric drop (in original metrics value), +2. `initial_training_phase_epochs` (Optional; default=5) - number of epochs to train the model with the compression schedule specified in the `"params"` section of `"compression"` algorithm. +3. `patience_epochs` (Optional; default=3) - number of epochs to train the model for a compression rate level set by the search algorithm before switching to another compression rate value. +4. `minimal_compression_rate_step` (Optional; default=0.025) - the minimal compression rate change step value after which the training loop is terminated. +5. `initial_compression_rate_step` (Optional; default=0.1) - initial value for the compression rate increase/decrease training phase of the compression training loop. +6. `compression_rate_step_reduction_factor` (Optional; default=0.5) - factor used to reduce the compression rate change step in the adaptive compression training loop. +7. `lr_reduction_factor` (Optional; default=0.5) - factor used to reduce the base value of the learning rate scheduler after compression rate step is reduced. +8. `maximal_total_epochs` (Optional; default=10000) - number of training epochs, if the fine-tuning epoch reaches this number, the loop finishes the fine-tuning and return the model with thi highest compression rate and the least accuracy drop. To launch the adaptive compression training loop, the user should define several functions related to model training, validation and optimizer creation (see [the usage documentation](../Usage.md#accuracy-aware-model-training) for more details) and pass them to the run method of an `AdaptiveCompressionTrainingLoop` instance. The training loop logic inside of the `AdaptiveCompressionTrainingLoop` is framework-agnostic, while all of the framework specifics are encapsulated inside of corresponding `Runner` objects, which are created and called inside the training loop. The adaptive compression training loop is generally aimed at automatically searching for the optimal compression rate in the model, with the parameters of the search algorithm specified in the configuration file. Below is an example of a filter pruning configuration with added `"accuracy_aware_training"` parameters. + ```json5 { "input_infos": {"sample_size": [1, 2, 224, 224]}, @@ -68,5 +69,6 @@ That is, if a too big of an increase in compression rate resulted in the accurac This sequential search is limited by the minimal granularity of the steps given by `"minimal_compression_rate_step"`. ## Example + An example of how model is compressed using Adaptive Compression Training Loop is given on the figure below. -![Example](actl_progress_plot.png) \ No newline at end of file +![Example](actl_progress_plot.png) diff --git a/docs/accuracy_aware_model_training/EarlyExitTraining.md b/docs/accuracy_aware_model_training/EarlyExitTraining.md index 58313136a32..217f630cc3f 100644 --- a/docs/accuracy_aware_model_training/EarlyExitTraining.md +++ b/docs/accuracy_aware_model_training/EarlyExitTraining.md @@ -1,7 +1,7 @@ # Early Exit training loop in NNCF Early Exit training loop aims to get the compressed model with the desired accuracy criteria as earlier as possible. This is done by checking a compressed model accuracy after each training epoch step and also after the initialization step then exits the fine-tuning process once the accuracy reaches the user-defined criteria -This pipeline is simple but effective. It reduces a fine-tuning time for many models till just an initialization step. +This pipeline is simple but effective. It reduces a fine-tuning time for many models till just an initialization step. Note: since the EarlyExit training does not control any compression parameter the specified accuracy criterium cannot be satisfied in some cases @@ -18,7 +18,7 @@ Example of config file needed to be provided to create_accuracy_aware_training_l "mode": "early_exit", "params": { "maximal_relative_accuracy_degradation": 1.0, - "maximal_total_expochs": 100 + "maximal_total_epochs": 100 } }, "compression": [ @@ -36,5 +36,3 @@ Example of config file needed to be provided to create_accuracy_aware_training_l } ``` - - \ No newline at end of file diff --git a/docs/compression_algorithms/BatchnormAdaptation.md b/docs/compression_algorithms/BatchnormAdaptation.md index 38219fbf0d8..47df400808c 100644 --- a/docs/compression_algorithms/BatchnormAdaptation.md +++ b/docs/compression_algorithms/BatchnormAdaptation.md @@ -1,17 +1,18 @@ -### Batch-norm statistics adaptation +# Batch-norm statistics adaptation -After the compression-related changes in the model have been committed, the statistics of the batchnorm layers (per-channel rolling means and variances of activation tensors) can be updated by passing several batches of data through the model before the fine-tuning starts. -This allows to correct the compression-induced bias in the model and reduce the corresponding accuracy drop even before model training. -This option is common for quantization, magnitude sparsity and filter pruning algorithms. +After the compression-related changes in the model have been committed, the statistics of the batchnorm layers (per-channel rolling means and variances of activation tensors) can be updated by passing several batches of data through the model before the fine-tuning starts. +This allows to correct the compression-induced bias in the model and reduce the corresponding accuracy drop even before model training. +This option is common for quantization, magnitude sparsity and filter pruning algorithms. It can be enabled by setting a non-zero value of `num_bn_adaptation_samples` in the `batchnorm_adaptation` section of the `initializer` configuration - see [NNCF config schema](https://openvinotoolkit.github.io/nncf/) for reference. Note that in order to use batchnorm adaptation for your model, you must supply to NNCF a data loader using a `register_default_init_args` helper function or by registering a `nncf.config.structures.BNAdaptationInitArgs` structure within the `NNCFConfig` object in your integration code. -### Example configuration files +## Example configuration files >_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_. - Apply batchnorm adaptation for 2048 samples (rounded to nearest batch size multiple) during model quantization: + ```json5 { "input_info": {"sample_size" : [1, 3, 224, 224]}, // the input shape of your model may vary @@ -27,6 +28,7 @@ Note that in order to use batchnorm adaptation for your model, you must supply t ``` - Apply batchnorm adaptation for 32 samples (rounded to nearest batch size multiple) during model magnitude-based sparsification: + ```json5 { "input_info": {"sample_size" : [1, 3, 224, 224]}, // the input shape of your model may vary @@ -43,4 +45,4 @@ Note that in order to use batchnorm adaptation for your model, you must supply t } } } -``` \ No newline at end of file +``` diff --git a/docs/compression_algorithms/Binarization.md b/docs/compression_algorithms/Binarization.md index 33fe08ac93e..d2836b87ff3 100644 --- a/docs/compression_algorithms/Binarization.md +++ b/docs/compression_algorithms/Binarization.md @@ -1,6 +1,8 @@ +# Binarization + >_Scroll down for the examples of the JSON configuration files that can be used to apply this algorithm_. -### Binarization -NNCF supports binarizing weights and activations for 2D convolutional PyTorch\* layers (Conv2D) *only*. + +NNCF supports binarizing weights and activations for 2D convolutional PyTorch\* layers (Conv2D) _only_. Weight binarization may be done in two ways, depending on the configuration file parameters - either via [XNOR binarization](https://arxiv.org/abs/1603.05279) or via [DoReFa binarization](https://arxiv.org/abs/1606.06160). For DoReFa binarization, the scale of binarized weights for each convolution operation is calculated as the mean of absolute values of non-binarized convolutional filter weights, while for XNOR binarization, each convolutional operation has scales that are calculated in the same manner, but _per input channel_ of the convolutional filter. Refer to the original papers for details. @@ -8,20 +10,21 @@ Binarization of activations is implemented via binarizing inputs to the convolut $\text{out} = s * H(\text{in} - s*t)$ -In the formula above, - - $\text{in}$ - non-binarized activation values - - $\text{out}$ - binarized activation values - - $H(x)$ is the Heaviside step function - - $s$ and $t$ are trainable parameters corresponding to binarization scale and threshold respectively +In the formula above: -Training binarized networks requires special scheduling of the training process. For instance, binarizing a pretrained ResNet18 model on ImageNet is a four-stage process, with each stage taking a certain number of epochs. During the stage 1, the network is trained without any binarization. During the stage 2, the training continues with binarization enabled for activations only. During the stage 3, binarization is enabled both for activations and weights. Finally, during the stage 4 the optimizer learning rate, which was kept constant at previous stages, is decreased according to a polynomial law, while weight decay parameter of the optimizer is set to 0. The configuration files for the NNCF binarization algorithm allow to control certain parameters of this training schedule. +- $\text{in}$ - non-binarized activation values +- $\text{out}$ - binarized activation values +- $H(x)$ is the Heaviside step function +- $s$ and $t$ are trainable parameters corresponding to binarization scale and threshold respectively +Training binarized networks requires special scheduling of the training process. For instance, binarizing a pretrained ResNet18 model on ImageNet is a four-stage process, with each stage taking a certain number of epochs. During the stage 1, the network is trained without any binarization. During the stage 2, the training continues with binarization enabled for activations only. During the stage 3, binarization is enabled both for activations and weights. Finally, during the stage 4 the optimizer learning rate, which was kept constant at previous stages, is decreased according to a polynomial law, while weight decay parameter of the optimizer is set to 0. The configuration files for the NNCF binarization algorithm allow to control certain parameters of this training schedule. -### Example configuration files: +## Example configuration files >_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_. - Binarize a ResNet using XNOR algorithm, ignoring several portions of the model, with finetuning on the scope of 60 epochs and staged binarization schedule (activations first, then weights) + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, @@ -36,12 +39,13 @@ Training binarized networks requires special scheduling of the training process. "lr_poly_drop_duration_epochs": 30, // Duration, in epochs, of the learning rate dropping process. "disable_wd_start_epoch": 60 // Epoch to disable weight decay in the optimizer }, - - "ignored_scopes": ["ResNet/NNCFLinear[fc]/linear_0", - "ResNet/NNCFConv2d[conv1]/conv2d_0", - "ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0", - "ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0", - "ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0"] + "ignored_scopes": [ + "ResNet/NNCFLinear[fc]/linear_0", + "ResNet/NNCFConv2d[conv1]/conv2d_0", + "ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0", + "ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0", + "ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0" + ] } } ``` diff --git a/docs/compression_algorithms/KnowledgeDistillation.md b/docs/compression_algorithms/KnowledgeDistillation.md index d1c2a75cdea..ceef7f41b18 100644 --- a/docs/compression_algorithms/KnowledgeDistillation.md +++ b/docs/compression_algorithms/KnowledgeDistillation.md @@ -1,43 +1,45 @@ -### Knowledge Distillation (experimental feature) +# Knowledge Distillation (experimental feature) + +## The algorithm description -#### The algorithm description The Knowledge Distillation [Hinton et al., 2015](https://arxiv.org/pdf/1503.02531.pdf) -implies that a small model (student) is trained to mimic a pre-trained large model (teacher) through knowledge -transfer. The goal is to improve the accuracy of the student network. +implies that a small model (student) is trained to mimic a pre-trained large model (teacher) through knowledge +transfer. The goal is to improve the accuracy of the student network. -The NNCF for PyTorch supports Knowledge Distillation out of the box along with all supported compression algorithm -(quantization, sparsity, filter pruning), when a student is a model being compressed and teacher - original -non-compressed one. +The NNCF for PyTorch supports Knowledge Distillation out of the box along with all supported compression algorithm +(quantization, sparsity, filter pruning), when a student is a model being compressed and teacher - original +non-compressed one. -Knowledge is transferred from the teacher model to the student one by minimizing loss function, which is calculated -based on predictions of the models. At the moment, two types of loss functions are available. +Knowledge is transferred from the teacher model to the student one by minimizing loss function, which is calculated +based on predictions of the models. At the moment, two types of loss functions are available. One of them should be explicitly specified in the config. - + MSE distillation loss: - + ${L}_{MSE}(z^{s}, z^{t}) = || z^s - z^t ||_2^2$ - + Cross-Entropy distillation loss: - + ${p}_{i} = \frac{\exp({z}\_{i})}{\sum\_{j}(\exp({z}\_{j}))}$ - + ${L}\_{CE}({p}^{s}, {p}^{t}) = -\sum_{i}{p}^{t}\_{i}*\log({p}^{s}\_{i})$ - + The Knowledge Distillation loss function is combined with a regular loss function, so overall loss function will be computed as: - + $L = {L}\_{reg}({z}^{s}, y) + {L}\_{distill}({z}^{s}, {z}^{t})$ - + ![kd_pic](../pics/knowledge_distillation.png) - - Note: the Cross-Entropy distillation loss was proposed in [Hinton et al., 2015](https://arxiv.org/pdf/1503.02531.pdf) + + Note: the Cross-Entropy distillation loss was proposed in [Hinton et al., 2015](https://arxiv.org/pdf/1503.02531.pdf) with temperature parameter, but we don't use it or assume that T=1. - -#### User guide + +## User guide To turn on the Knowledge Distillation with some compression algorithm (e.g. filter_pruning) it's necessary to specify `knowledge_distillation` algorithm and its type in the config: -``` + +```json { ... "compression": [ @@ -52,14 +54,13 @@ specify `knowledge_distillation` algorithm and its type in the config: ] } ``` + See this [config file](../../examples/torch/classification/configs/pruning/resnet34_imagenet_pruning_geometric_median_kd.json) for an example, and [NNCF config schema](https://openvinotoolkit.github.io/nncf/) for reference to the available configuration parameters for the algorithm. -##### Limitations +## Limitations - The algorithm is supported for PyTorch only. -- Training the same configuration with Knowledge Distillation requires more time and GPU memory than without it. +- Training the same configuration with Knowledge Distillation requires more time and GPU memory than without it. On average, memory (for all GPU execution modes) and time overhead is below 20% each. - Outputs of model that shouldn't be differentiated must have `requires_grad=False`. - Model should output predictions, not calculate the losses. - - diff --git a/docs/compression_algorithms/Pruning.md b/docs/compression_algorithms/Pruning.md index 3651256542f..abe84354caf 100644 --- a/docs/compression_algorithms/Pruning.md +++ b/docs/compression_algorithms/Pruning.md @@ -1,11 +1,12 @@ +# Filter pruning + >_Scroll down for the examples of the JSON configuration files that can be used to apply this algorithm_. -### Filter pruning Filter pruning algorithm zeros output filters in Convolutional layers based on some filter importance criterion (filters with smaller importance are pruned). The framework contains three filter importance criteria: `L1`, `L2` norm, and `Geometric Median`. Also, different schemes of pruning application are presented by different schedulers. Not all Convolution layers in the model can be pruned. Such layers are determined by the model architecture automatically as well as cross-layer dependencies that impose constraints on pruning filters. -#### Filter importance criteria **L1, L2** +## Filter importance criteria **L1, L2** `L1`, `L2` filter importance criteria are based on the following assumption: > Convolutional filters with small $l_p$ norms do not significantly contribute to output activation values, and thus have a small impact on the final predictions of CNN models. @@ -26,17 +27,18 @@ Where $L_j$ is j-th convolutional layer in model. $\{F_1, \dots F_m\} \in L_j$ - Then during pruning filters with smaller $G(F_i)$ importance function will be pruned first. -#### Schedulers +## Schedulers **Baseline Scheduler** - Firstly, during `num_init_steps` epochs the model is trained without pruning. Secondly, the pruning algorithm calculates filter importances and prunes a `pruning_target` part of the filters with the smallest importance in each prunable convolution. + +Firstly, during `num_init_steps` epochs the model is trained without pruning. Secondly, the pruning algorithm calculates filter importances and prunes a `pruning_target` part of the filters with the smallest importance in each prunable convolution. The zeroed filters are frozen afterwards and the remaining model parameters are fine-tuned. **Parameters of the scheduler:** + - `num_init_steps` - number of epochs for model pretraining **before** pruning. - `pruning_target` - pruning level target. For example, the value `0.5` means that right after pretraining, convolutions that can be pruned will have 50% of their filters set to zero. - **Exponential scheduler** Similar to the Baseline scheduler, during `num_init_steps` epochs model is pretrained without pruning. @@ -47,19 +49,21 @@ $P_i = a * e^{- k * i}$ Where $a, k$ - parameters. **Parameters of scheduler:** + - `num_init_steps` - number of epochs for model pretraining before pruning. - `pruning_steps` - the number of epochs during which the pruning level target is increased from `pruning_init` to `pruning_target` value. - `pruning_init` - initial pruning level target. For example, value `0.1` means that at the begging of training, convolutions that can be pruned will have 10% of their filters set to zero. - `pruning_target` - pruning level target at the end of the schedule. For example, the value `0.5` means that at the epoch with the number of `num_init_steps + pruning_steps`, convolutions that can be pruned will have 50% of their filters set to zero. **Exponential with bias scheduler** + Similar to the `Exponential scheduler`, but current pruning level $P_{i}$ (on i-th epoch) during training calculates by equation: $P_i = a * e^{- k * i} + b$ Where $a, k, b$ - parameters. > **NOTE**: Baseline scheduler prunes filters only ONCE and after it just fine-tunes remaining parameters while exponential (and exponential with bias) schedulers choose and prune different filters subsets at each pruning epoch. -#### Batch-norm statistics adaptation +## Batch-norm statistics adaptation After the compression-related changes in the model have been committed, the statistics of the batchnorm layers (per-channel rolling means and variances of activation tensors) can be updated by passing several batches of data @@ -68,24 +72,26 @@ and reduce the corresponding accuracy drop even before model training. This opti sparsity and filter pruning algorithms. It can be enabled by setting a non-zero value of `num_bn_adaptation_samples` in the `batchnorm_adaptation` section of the `initializer` configuration (see example below). -#### Interlayer ranking types +## Interlayer ranking types Interlayer ranking type can be one of `unweighted_ranking` or `learned_ranking`. + - In case of `unweighted_ranking` and with `all_weights=True` all filter norms will be collected together and sorted to choose the least important ones. But this approach may not be optimal because filter norms are a good measure of filter importance inside a layer, but not across layers. - In the case of `learned_ranking` that uses re-implementation of [Learned Global Ranking method](https://arxiv.org/abs/1904.12368) (LeGR), a set of ranking coefficients will be learned for comparing filters across different layers. The $(a_i, b_i)$ pair of scalars will be learned for each ( $i$ layer and used to transform norms of $i$-th layer filters before sorting all filter norms together as $a_i * N_i + b_i$ , where $N_i$ - is vector of filter norma of $i$-th layer, $(a_i, b_i)$ is ranking coefficients for $i$-th layer. This approach allows pruning the model taking into account layer-specific sensitivity to weight perturbations and get pruned models with higher accuracy. - > **NOTE:** In all our pruning experiments we used SGD optimizer. -#### Filter pruning statistics -A model compression can be measured by two main metrics: filter pruning level and FLOPs pruning level. While -filter pruning level shows the ratio of removed filters to the total number of filters in the model, FLOPs pruning level -indicates how the removed filters affect the number of floating point operations required to run a model. +## Filter pruning statistics + +A model compression can be measured by two main metrics: filter pruning level and FLOPs pruning level. While +filter pruning level shows the ratio of removed filters to the total number of filters in the model, FLOPs pruning level +indicates how the removed filters affect the number of floating point operations required to run a model. During the algorithm execution several compression statistics are available. See the example below. -``` + +```text Statistics by pruned layers: +----------------------+------------------+--------------+---------------------+ | Layer's name | Weight's shape | Mask's shape | Filter pruning | @@ -118,155 +124,163 @@ Statistics of the filter pruning algorithm: +---------------------------------------+-------+ ``` -##### Layer statistics -`Statistics by pruned layers` section lists names of all layers that will be pruned, shapes of their weight tensors, -shapes of pruning masks applied to respective weights and percentage of zeros in those masks. +### Layer statistics + +`Statistics by pruned layers` section lists names of all layers that will be pruned, shapes of their weight tensors, +shapes of pruning masks applied to respective weights and percentage of zeros in those masks. -##### Model statistics -The columns `Full` and `Current` represent the values of the corresponding statistics in the original model and compressed one in the current state, respectively. +### Model statistics + +The columns `Full` and `Current` represent the values of the corresponding statistics in the original model and compressed one in the current state, respectively. The `Pruning level` column indicates the ratio between the values of the full and current statistics in the corresponding rows, defined by the formula: $Statistic\\:pruning\\:level = 1 - statistic\\:current / statistic\\:full$ - -`Filter pruning level` - percentage of filters removed from the model. -`GFLOPs pruning level` - an estimated reduction in the number of floating point operations of the model. +`Filter pruning level` - percentage of filters removed from the model. + +`GFLOPs pruning level` - an estimated reduction in the number of floating point operations of the model. The number of FLOPs for a single convolutional layer can be calculated as: $FLOPs = 2 * input\\:channels * kernel\\:size ^2 * W * H * filters$ > **NOTE**: One GFLOP is one billion (1e9) FLOPs. -Each removed filter contributes to FLOPs reduction in two convolutional layers as it affects the number -of filters in one and the number of input channels of the next layer. Thus, it is expected that this number may differ +Each removed filter contributes to FLOPs reduction in two convolutional layers as it affects the number +of filters in one and the number of input channels of the next layer. Thus, it is expected that this number may differ significantly from the filter pruning level. -In addition, the decrease in GFLOPs is estimated by calculating the number of FLOPs of convolutional and fully connected layers. +In addition, the decrease in GFLOPs is estimated by calculating the number of FLOPs of convolutional and fully connected layers. As a result, these estimates may differ slightly from the actual number of FLOPs in the compressed model. `MParams pruning level` - calculated reduction in the number of parameters in the model in millions. Typically convolutional layer weights have the shape of $(kernel\\:size,\\:kernel\\:size,\\:input\\:channels,\\:filter\\:num)$. -Thus, each removed filter affects the number of parameters in two convolutional layers as it affects the number -of filters in one and the number of input channels of the next layer. It is expected that this number may differ +Thus, each removed filter affects the number of parameters in two convolutional layers as it affects the number +of filters in one and the number of input channels of the next layer. It is expected that this number may differ significantly from the filter pruning level. -##### Algorithm statistics +### Algorithm statistics -`Filter (or FLOPs) pruning level in current epoch` - a pruning level calculated by the algorithm scheduler to be applied in the current training epoch. -> **NOTE**: In case of `Filter pruning level in current epoch` this metric does not indicate the whole model filter pruning level, as +`Filter (or FLOPs) pruning level in current epoch` - a pruning level calculated by the algorithm scheduler to be applied in the current training epoch. +> **NOTE**: In case of `Filter pruning level in current epoch` this metric does not indicate the whole model filter pruning level, as it does not take into account the number of filters in layers that cannot be pruned. `Target filter (or FLOPs) pruning level` - a pruning level that is expected to be achieved at the end of the algorithm execution. > **NOTE**: In case of `Target filter pruning level` this number indicates what percentage of filters will be removed from only those layers that can be pruned. It is important to note that pruning levels mentioned in the `statistics of the filter pruning algorithm` are the goals the algorithm aims to achieve. -It is not always possible to achieve these levels of pruning due to cross-layer and inference constraints. +It is not always possible to achieve these levels of pruning due to cross-layer and inference constraints. Therefore, it is expected that these numbers may differ from the calculated statistics in the `statistics of the pruned model` section. -### Example configuration files +## Example configuration files >_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_. - Prune a model with default parameters (from 0 to 0.5 filter pruning level across 100 epochs with exponential schedule) -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, - "compression": - { - "algorithm": "filter_pruning" - } -} -``` + + ```json5 + { + "input_info": { "sample_size": [1, 3, 224, 224] }, + "compression": + { + "algorithm": "filter_pruning" + } + } + ``` - Same as above, but filter importance is considered globally across all eligible weighted operations: -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, - "compression": - { - "algorithm": "filter_pruning", - "all_weights": true - } -} -``` + + ```json5 + { + "input_info": { "sample_size": [1, 3, 224, 224] }, + "compression": + { + "algorithm": "filter_pruning", + "all_weights": true + } + } + ``` - Prune a model, immediately setting filter pruning level to 10%, applying [batchnorm adaptation](./BatchnormAdaptation.md) and reaching 60% within 20 epochs using exponential schedule, enabling pruning of first convolutional layers and downsampling convolutional layers: -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, - "compression": - { - "algorithm": "filter_pruning", - "pruning_init": 0.1, - "params": { - "pruning_target": 0.6, - "pruning_steps": 20, - "schedule": "exponential", - "prune_first_conv": true, - "prune_downsample_convs": true - } - } -} -``` + + ```json5 + { + "input_info": { "sample_size": [1, 3, 224, 224] }, + "compression": + { + "algorithm": "filter_pruning", + "pruning_init": 0.1, + "params": { + "pruning_target": 0.6, + "pruning_steps": 20, + "schedule": "exponential", + "prune_first_conv": true, + "prune_downsample_convs": true + } + } + } + ``` - Prune a model using geometric median filter importance and reaching 30% filter pruning level within 10 epochs using exponential schedule, postponing application of pruning for 10 epochs: -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, - "compression": - { - "algorithm": "filter_pruning", - "params": { - "filter_importance": "geometric_median", - "pruning_target": 0.3, - "pruning_steps": 10, - "schedule": "exponential", - "num_init_steps": 10 - } - } -} -``` + + ```json5 + { + "input_info": { "sample_size": [1, 3, 224, 224] }, + "compression": + { + "algorithm": "filter_pruning", + "params": { + "filter_importance": "geometric_median", + "pruning_target": 0.3, + "pruning_steps": 10, + "schedule": "exponential", + "num_init_steps": 10 + } + } + } + ``` - Prune and quantize a model at the same time using a FLOPS target for pruning and defaults for the rest of parameters: -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, - "compression": - [ - { - "algorithm": "filter_pruning", - "params": { - "pruning_flops_target": 0.6 - } - }, - { - "algorithm": "quantization" - } - ] -} -``` -- Prune a model with default parameters, estimate filter ranking by Learned Global Ranking method before finetuning. + ```json5 + { + "input_info": { "sample_size": [1, 3, 224, 224] }, + "compression": + [ + { + "algorithm": "filter_pruning", + "params": { + "pruning_flops_target": 0.6 + } + }, + { + "algorithm": "quantization" + } + ] + } + ``` + +- Prune a model with default parameters, estimate filter ranking by Learned Global Ranking method before finetuning. LEGR algorithm will be using 200 generations for the evolution algorithm, 20 train steps to estimate pruned model accuracy on each generation and target maximal filter pruning level equal to 50%: -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, - "compression": - [ - { - "algorithm": "filter_pruning", - "params": - { - "interlayer_ranking_type": "learned_ranking", - "legr_params": - { - "generations": 200, - "train_steps": 20, - "max_pruning": 0.5 - } - } - } - ] -} -``` \ No newline at end of file + + ```json5 + { + "input_info": { "sample_size": [1, 3, 224, 224] }, + "compression": + [ + { + "algorithm": "filter_pruning", + "params": + { + "interlayer_ranking_type": "learned_ranking", + "legr_params": + { + "generations": 200, + "train_steps": 20, + "max_pruning": 0.5 + } + } + } + ] + } + ``` diff --git a/docs/compression_algorithms/Quantization.md b/docs/compression_algorithms/Quantization.md index 37e3f186bb7..977d77483eb 100644 --- a/docs/compression_algorithms/Quantization.md +++ b/docs/compression_algorithms/Quantization.md @@ -1,10 +1,11 @@ +# Uniform Quantization with Fine-Tuning + >_Scroll down for the examples of the JSON configuration files that can be used to apply this algorithm_. -### Uniform Quantization with Fine-Tuning A uniform "fake" quantization method supports an arbitrary number of bits (>=2) which is used to represent weights and activations. The method performs differentiable sampling of the continuous signal (for example, activations or weights) during forward pass, simulating inference with integer arithmetic. -#### Common Quantization Formula +## Common Quantization Formula Quantization is parametrized by clamping range and number of quantization levels. The sampling formula is the following: @@ -16,11 +17,9 @@ $clamp(input; input\\_low, input\\_high)$ $s = \frac{levels - 1}{input\\_high - input\\_low}$ - $input\\_low$ and $input\\_high$ represent the quantization range and $\left\lfloor \cdot \right\rceil$ denotes rounding to the nearest integer. - -#### Symmetric Quantization +## Symmetric Quantization During the training, we optimize the **scale** parameter that represents the range `[input_low, input_range]` of the original signal using gradient descent: @@ -28,17 +27,17 @@ $input\\_low=scale*\frac{level\\_low}{level\\_high}$ $input\\_high=scale$ - In the formula above, $level\\_low$ and $level\\_high$ represent the range of the discrete signal. - - For weights: - + +- For weights: + $level\\_low=-2^{bits-1}+1$ - + $level\\_high=2^{bits-1}-1$ $levels=255$ - - For unsigned activations: +- For unsigned activations: $level\\_low=0$ @@ -46,7 +45,7 @@ In the formula above, $level\\_low$ and $level\\_high$ represent the range of th $levels=256$ - - For signed activations: +- For signed activations: $level\\_low=-2^{bits-1}$ @@ -60,7 +59,7 @@ $output = \left\lfloor clamp(input * \frac{level\\_high}{scale}, level\\_low, le Use the `num_init_samples` parameter from the `initializer` group to initialize the values of `scale` and determine which activation should be signed or unsigned from the collected statistics using given number of samples. -#### Asymmetric Quantization +## Asymmetric Quantization During the training we optimize the `input_low` and `input_range` parameters using gradient descent: @@ -93,19 +92,20 @@ $$ &\end{flalign} $$ - You can use the `num_init_samples` parameter from the `initializer` group to initialize the values of `input_low` and `input_range` from the collected statistics using given number of samples. -#### Quantizer setup and hardware config files +## Quantizer setup and hardware config files + NNCF allows to quantize models for best results on a given Intel hardware type when executed using OpenVINO runtime. To achieve this, the quantizer setup should be performed with following considerations in mind: -1) every operation that can accept quantized inputs on a given HW (i.e. can be executed using quantized input values) should have its inputs quantized in NNCF -2) the quantized inputs should be quantized with a configuration that is supported on a given HW for a given operation (e.g. per-tensor vs per-channel quantization, or 8 bits vs. 4 bits) -3) for operations that are agnostic to quantization, the execution should handle quantized tensors rather than full-precision tensors. -4) certain operation sequences will be runtime-optimized to execute in a single kernel call ("fused"), and additional quantizer insertion/quantization simulation within such operation sequences will be detrimental to overall performance + +1. every operation that can accept quantized inputs on a given HW (i.e. can be executed using quantized input values) should have its inputs quantized in NNCF +2. the quantized inputs should be quantized with a configuration that is supported on a given HW for a given operation (e.g. per-tensor vs per-channel quantization, or 8 bits vs. 4 bits) +3. for operations that are agnostic to quantization, the execution should handle quantized tensors rather than full-precision tensors. +4. certain operation sequences will be runtime-optimized to execute in a single kernel call ("fused"), and additional quantizer insertion/quantization simulation within such operation sequences will be detrimental to overall performance These requirements are fulfilled by the quantizer propagation algorithm. -The algorithm first searches the internal NNCF representation of the model's control flow graph for predefined patterns that are "fusable", and apply the fusing to the internal graph representation as well. +The algorithm first searches the internal NNCF representation of the model's control flow graph for predefined patterns that are "fusible", and apply the fusing to the internal graph representation as well. Next, the operations in the graph that can be associated to input-quantizable operations on a given target hardware are assigned a single quantizer for each its quantizable activation input, with a number of possible quantizer configurations attached to it (that are feasible on target HW). The quantizers are then "propagated" against the data flow in the model's control flow graph as far as possible, potentially merging with other quantizers. Once all quantizers have reached a standstill in their propagation process, each will have a final (possibly reduced) set of possible quantizer configurations, from which a single one is either chosen manually, or using a precision initialization algorithm (which accepts the potential quantizer locations and associated potential quantizer configuration sets). @@ -122,10 +122,9 @@ The quantization configuration in the `"target_device": "TRIAL"` case may be ove For all target HW types, parts of the model graph can be marked as non-quantizable by using the `"ignored_scopes"` field - inputs and weights of matching nodes in the NNCF internal graph representation will not be quantized, and the downstream quantizers will not propagate upwards through such nodes. +## Quantization Implementation -#### Quantization Implementation - -In our implementation, we use a slightly transformed formula. It is equivalent by order of floating-point operations to simplified symmetric formula and the assymetric one. The small difference is addition of small positive number `eps` to prevent division by zero and taking absolute value of range, since it might become negative on backward: +In our implementation, we use a slightly transformed formula. It is equivalent by order of floating-point operations to simplified symmetric formula and the asymmetric one. The small difference is addition of small positive number `eps` to prevent division by zero and taking absolute value of range, since it might become negative on backward: $output = \frac{clamp(\left\lfloor (input-input\\_low^{*}) *s - ZP \right \rceil, level\\_low, level\\_high)}{s}$ @@ -145,10 +144,11 @@ $input\\_low^{*} = 0$ $input\\_range^{*} = scale$ -The most common case of applying quantization is 8-bit uniform quantization. +The most common case of applying quantization is 8-bit uniform quantization. NNCF example scripts provide a plethora of configuration files that implement this case ([PyTorch](../../examples/torch/classification/configs/quantization/inception_v3_imagenet_int8.json), [TensorFlow](../../examples/tensorflow/classification/configs/quantization/inception_v3_imagenet_int8.json)) --- + **NOTE** There is a known issue with AVX2 and AVX512 CPU devices. The issue appears with 8-bit matrix calculations with tensors which elements are close to the maximum or saturated. @@ -160,18 +160,18 @@ This regime is used when `"target_device": "CPU"` or `"target_device": "ANY"` se To control the application of overflow fix, `"overflow_fix"` config option is introduced. The default value is `"overflow_fix": "enable"`. To apply the overflow issue fix only to the first layer, use `"overflow_fix": "first_layer_only"`. To disable the overflow issue fix for all layers, use `"overflow_fix": "disable"`. - - --- -#### Mixed-Precision Quantization + +## Mixed-Precision Quantization Quantization to lower precisions (e.g. 6, 4, 2 bits) is an efficient way to accelerate inference of neural networks. Although NNCF supports quantization with an arbitrary number of bits to represent weights and activations values, choosing ultra-low bitwidth could noticeably affect the model's accuracy. A good trade-off between accuracy and performance is achieved by assigning different precisions to different layers. NNCF provides two automatic precision assignment algorithms, namely **HAWQ** and **AutoQ**. -#### HAWQ +### HAWQ + NNCF utilizes the [HAWQ-v2](https://arxiv.org/pdf/1911.03852.pdf) method to automatically choose optimal mixed-precision configuration by taking into account the sensitivity of each layer, i.e. how much lower-bit quantization of each layer decreases the accuracy of model. The most sensitive layers are kept at higher precision. The sensitivity of the i-th layer is @@ -227,7 +227,8 @@ is chosen. By default, liberal mode is used as it does not reject a large number The `bitwidth_assignment_mode` parameter can override it to the strict one. For automatic mixed-precision selection it's recommended to use the following template of configuration file: -``` + +```json "optimizer": { "base_lr": 3.1e-4, "schedule_type": "plateau", @@ -271,7 +272,8 @@ file. --- -#### AutoQ +### AutoQ + NNCF provides an alternate mode, namely AutoQ, for mixed-precision automation. It is an AutoML-based technique that automatically learns the layer-wise bitwidth with explored experiences. Based on [HAQ](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_HAQ_Hardware-Aware_Automated_Quantization_With_Mixed_Precision_CVPR_2019_paper.pdf), AutoQ utilizes an actor-critic algorithm, Deep Deterministic Policy Gradient (DDPG) for efficient search over the bitwidth space. DDPG is trained in an episodic fashion, converging to a deterministic mixed-precision policy after a number of episodes. An episode is constituted by stepping, the DDPG transitions from quantizer to quantizer sequentially to predict a precision of a layer. Each quantizer essentially denotes a state in RL framework and it is represented by attributes of the associated layers. For example, a quantizer for 2D Convolution is represented by its quantizer Id (integer), input and output channel size, feature map dimension, stride size, if it is depthwise, number of parameters etc. It is recommended to check out ```_get_layer_attr``` in [```quantization_env.py```](https://github.com/openvinotoolkit/nncf/blob/develop/nncf/automl/environment/quantization_env.py#L333) for the featurization of different network layer types. When the agent enters a state/quantizer, it receives the state features and forward passes them through its network. The output of the forward pass is a scalar continuous action output which is subsequently mapped to the bitwidth options of the particular quantizer. The episode terminates after the prediction of the last quantizer and a complete layer-wise mixed-precision policy is obtained. To ensure a policy fits in the user-specified compression ratio, the policy is post processed by reducing the precision sequentially from the last quantizer until the compression ratio is met. @@ -315,7 +317,7 @@ As briefly mentioned earlier, user is required to register a callback function f Following is an example of wrapping ImageNet validation loop as a callback. Top5 accuracy is chosen as the scalar objective metric. ```autoq_eval_fn``` and ```val_loader``` are registered in the call of ```register_default_init_args```. -``` +```python def autoq_eval_fn(model, eval_loader): _, top5 = validate(eval_loader, model, criterion, config) return top5 @@ -327,11 +329,12 @@ Following is an example of wrapping ImageNet validation loop as a callback. Top5 The complete config [example](../../examples/torch/classification/configs/mixed_precision/mobilenet_v2_imagenet_mixed_int_autoq_staged.json) that applies AutoQ to MobileNetV2 is provided within the [classification sample](../../examples/torch/classification) for PyTorch. -### Example configuration files: +## Example configuration files >_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_. - Quantize a model using default algorithm settings (8-bit, quantizers configuration chosen to be compatible with all Intel target HW types): + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary @@ -342,6 +345,7 @@ The complete config [example](../../examples/torch/classification/configs/mixed_ ``` - Quantize a model to 8-bit precision targeted for Intel CPUs, with additional constraints of symmetric weight quantization and asymmetric activation quantization: + ```json5 { "input_info": { "sample_size": [1, 3, 32, 32] }, // the input shape of your model may vary @@ -355,6 +359,7 @@ The complete config [example](../../examples/torch/classification/configs/mixed_ ``` - Quantize a model with fully symmetric INT8 quantization and increased number of quantizer range initialization samples (make sure to supply a corresponding data loader in code via `nncf.config.structures.QuantizationRangeInitArgs` or the `register_default_init_args` helper function): + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary @@ -369,6 +374,7 @@ The complete config [example](../../examples/torch/classification/configs/mixed_ ``` - Quantize a model using 4-bit per-channel quantization for experimentation/trial purposes (end-to-end performance and/or compatibility with OpenVINO Inference Engine not guaranteed) + ```json5 { "input_info": { "sample_size": [1, 3, 32, 32] }, // the input shape of your model may vary @@ -382,6 +388,7 @@ The complete config [example](../../examples/torch/classification/configs/mixed_ ``` - Quantize a multi-input model to 8-bit precision targeted for Intel CPUs, with a range initialization performed using percentile statistics (empirically known to be better for NLP models, for example) and excluding some parts of the model from quantization: + ```json5 { "input_info": [ @@ -418,7 +425,9 @@ The complete config [example](../../examples/torch/classification/configs/mixed_ "target_device": "TRIAL" } ``` + - Quantize a model to variable bit width using 300 iterations of the AutoQ algorithm, with a target model size (w.r.t the effective parameter storage size) set to 15% of the FP32 model and possible quantizer bitwidths limited to INT2, INT4 or INT8. + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary @@ -436,4 +445,3 @@ The complete config [example](../../examples/torch/classification/configs/mixed_ "target_device": "TRIAL" } ``` - diff --git a/docs/compression_algorithms/Sparsity.md b/docs/compression_algorithms/Sparsity.md index 6922113dc4a..b0916550eec 100644 --- a/docs/compression_algorithms/Sparsity.md +++ b/docs/compression_algorithms/Sparsity.md @@ -1,11 +1,11 @@ +# Non-Structured Sparsity >_Scroll down for the examples of the JSON configuration files that can be used to apply this algorithm_. -### Non-Structured Sparsity Sparsity algorithm zeros weights in Convolutional and Fully-Connected layers in a non-structured way, so that zero values are randomly distributed inside the tensor. Most of the sparsity algorithms set the less important weights to zero but the criteria of how they do it is different. The framework contains several implementations of sparsity methods. -#### RB-Sparsity +## RB-Sparsity This section describes the Regularization-Based Sparsity (RB-Sparsity) algorithm implemented in this framework. The method is based on $L_0$-regularization, with which parameters of the model tend to zero: @@ -33,7 +33,7 @@ The method requires a long schedule of the training process in order to minimize > **NOTE**: The known limitation of the method is that the sparsified CNN must include Batch Normalization layers which make the training process more stable. -#### Batch-norm statistics adaptation +## Batch-norm statistics adaptation After the compression-related changes in the model have been committed, the statistics of the batchnorm layers (per-channel rolling means and variances of activation tensors) can be updated by passing several batches of data @@ -44,14 +44,15 @@ sparsity and filter pruning algorithms. It can be enabled by setting a non-zero > **NOTE**: In all our sparsity experiments, we used the Adam optimizer and initial learning rate `0.001` for model weights and sparsity mask. -#### Magnitude Sparsity +## Magnitude Sparsity The magnitude sparsity method implements a naive approach that is based on the assumption that the contribution of lower weights is lower so that they can be pruned. After each training epoch the method calculates a threshold based on the current sparsity ratio and uses it to zero weights which are lower than this threshold. And here there are two options: + - Weights are used as is during the threshold calculation procedure. - Weights are normalized before the threshold calculation. +## Constant Sparsity -#### Constant Sparsity This special algorithm takes no additional parameters and is used when you want to load a checkpoint already trained with another sparsity algorithm and do other compression without changing the sparsity mask. ### Example configuration files @@ -70,6 +71,7 @@ This special algorithm takes no additional parameters and is used when you want ``` - Apply magnitude sparsity, increasing sparsity level step-wise from 0 to 70% in 3 steps at given training epoch indices: + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary @@ -87,6 +89,7 @@ This special algorithm takes no additional parameters and is used when you want ``` - Apply magnitude sparsity, immediately setting sparsity level to 10%, performing [batch-norm adaptation](./BatchnormAdaptation.md) to potentially recover accuracy, and exponentially increasing sparsity to 50% over 30 epochs of training: + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary @@ -108,6 +111,7 @@ This special algorithm takes no additional parameters and is used when you want ``` - Apply RB-sparsity to UNet, increasing sparsity level exponentially from 1% to 60% over 100 epochs, keeping the sparsity mask trainable until epoch 110 (after which the mask is frozen and the model is allowed to fine-tune with a fixed sparsity level), and excluding parts of the model from sparsification: + ```json5 { "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary @@ -126,4 +130,4 @@ This special algorithm takes no additional parameters and is used when you want ] } } -``` \ No newline at end of file +``` diff --git a/docs/compression_algorithms/post_training/ONNX.md b/docs/compression_algorithms/post_training/ONNX.md index 54f610cc186..58c6f6def97 100644 --- a/docs/compression_algorithms/post_training/ONNX.md +++ b/docs/compression_algorithms/post_training/ONNX.md @@ -1,9 +1,9 @@ -## Post-Training Quantization for ONNX +# Post-Training Quantization for ONNX NNCF supports [ONNX](https://onnx.ai/) backend for the Post-Training Quantization algorithm. This guide contains some notes that you should consider before working with NNCF for ONNX. -### Model Preparation +## Model Preparation The majority of the ONNX models are exported from different frameworks, such as PyTorch or TensorFlow. @@ -22,9 +22,10 @@ from onnx.version_converter import convert_version model = onnx.load_model('/path_to_model') converted_model = convert_version(model, target_version=13) ``` -# ONNX Results -Below are some results obtained using [benchmarking section](../../../tests/onnx/benchmarking/README.md) for the models from [ONNX Model Zoo](https://github.com/onnx/models). +## ONNX Results + +Below are some results obtained using [benchmarking section](../../../tests/onnx/benchmarking/README.md) for the models from [ONNX Model Zoo](https://github.com/onnx/models). ### Classification diff --git a/docs/compression_algorithms/post_training/Quantization.md b/docs/compression_algorithms/post_training/Quantization.md index fe39416a592..2f8a1b20104 100644 --- a/docs/compression_algorithms/post_training/Quantization.md +++ b/docs/compression_algorithms/post_training/Quantization.md @@ -1,4 +1,4 @@ -## Post-Training Quantization +# Post-Training Quantization Post-Training Quantization is a quantization algorithm that doesn't demand retraining of a quantized model. It utilizes a small subset of the initial dataset to calibrate quantization constants. @@ -9,7 +9,7 @@ NNCF provides an advanced Post-Training Quantization algorithm, which consists o 2) FastBiasCorrection or BiasCorrection - Reduces the bias errors between quantized layers and the corresponding original layers. -### Usage +## Usage To start the algorithm, provide the following entities: @@ -19,29 +19,30 @@ To start the algorithm, provide the following entities: The basic workflow steps: -1) Create the [data transformation function](#data-transformation-function). +1. Create the [data transformation function](#data-transformation-function). -```python -def transform_fn(data_item): - images, _ = data_item - return images -``` + ```python + def transform_fn(data_item): + images, _ = data_item + return images + ``` -2) Create an instance of `nncf.Dataset` class by passing two parameters: -* `data_source` - Iterable python object that contains data items for model calibration. -* `transform_fn` - Data transformation function from the Step 1. +2. Create an instance of `nncf.Dataset` class by passing two parameters: -```python -calibration_dataset = nncf.Dataset(val_dataset, transform_fn) -``` + * `data_source` - Iterable python object that contains data items for model calibration. + * `transform_fn` - Data transformation function from the Step 1. -3) Run the quantization pipeline. + ```python + calibration_dataset = nncf.Dataset(val_dataset, transform_fn) + ``` -```python -quantized_model = nncf.quantize(model, calibration_dataset) -``` +3. Run the quantization pipeline. + + ```python + quantized_model = nncf.quantize(model, calibration_dataset) + ``` -### Data Transformation Function +## Data Transformation Function Model input structure differs from one pipeline to another. Thus NNCF introduces the interface to adapt the user dataset format to the NNCF format. This interface is called the data transformation function. @@ -87,4 +88,4 @@ for data_item in val_loader: NNCF provides the examples of Post-Training Quantization where you can find the implementation of data transformation -function: [PyTorch](../../../examples/post_training_quantization/torch/mobilenet_v2/README.md), [TensorFlow](../../../examples/post_training_quantization/tensorflow/mobilenet_v2/README.md), [ONNX](../../../examples/post_training_quantization/onnx/mobilenet_v2/README.md), and [OpenVINO](../../../examples/post_training_quantization/openvino/mobilenet_v2/README.md) \ No newline at end of file +function: [PyTorch](../../../examples/post_training_quantization/torch/mobilenet_v2/README.md), [TensorFlow](../../../examples/post_training_quantization/tensorflow/mobilenet_v2/README.md), [ONNX](../../../examples/post_training_quantization/onnx/mobilenet_v2/README.md), and [OpenVINO](../../../examples/post_training_quantization/openvino/mobilenet_v2/README.md) diff --git a/docs/styleguide/PyGuide.md b/docs/styleguide/PyGuide.md index 3e7cb969a2c..0c7f9169766 100644 --- a/docs/styleguide/PyGuide.md +++ b/docs/styleguide/PyGuide.md @@ -3,40 +3,42 @@
Table of Contents -- [1 Introduction](#s1-introduction) -- [2 Automating Code Formatting](#s2-auto-code-formatting) -- [3 Python Language Rules](#s3-python-language-rules) - * [3.1 PyLint](#s3.1-pylint) - * [3.2 3rd party packages](#s3.2-3rd-party-packages) - * [3.3 Global variables](#s3.3-global-variables) - * [3.4 Nested/Local/Inner Classes and Functions](#s3.4-nested) - * [3.5 Default Iterators and Operators](#s3.5-default-iterators-and-operators) - * [3.6 Type Annotated Code](#s3.6-type-annotated-code) - * [3.7 Files and Sockets](#s3.7-files-and-sockets) - * [3.8 Abstract Classes](#s3.8-abstract-classes) -- [4 Python Style Rules](#s4-python-style-rules) - * [4.1 Line length](#s4.1-line-length) - * [4.2 Comments and Docstrings](#s4.2-comments-and-docstrings) - + [4.2.1 Modules](#s4.2.1-modules) - + [4.2.2 Functions and Methods](#s4.2.2-functions-and-methods) - + [4.2.3 Classes](#s4.2.3-classes) - + [4.2.4 Block and Inline Comments](#s4.2.4-block-and-inline-comments) - * [4.3 Strings](#s4.3-strings) - * [4.4 Logging](#s4.4-logging) - * [4.5 Error Messages](#s4.5-error-messages) - * [4.6 TODO Comments](#s4.6-todo-comments) - * [4.7 Naming](#s4.7-naming) - + [4.7.1 Names to Avoid](#s4.7.1-names-to-avoid) - + [4.7.2 Naming Conventions](#s4.7.2-naming-conventions) - + [4.7.3 Framework specific class naming](#s4.7.3-framework-specific-class-naming) - + [4.7.4 File Naming](#s4.7.4-file-naming) - * [4.8 Main](#s4.8-main) -- [5 API documentation rules](#s5-api-doc-rules) +- [1 Introduction](#s1-introduction) +- [2 Automating Code Formatting](#s2-auto-code-formatting) +- [3 Python Language Rules](#s3-python-language-rules) + - [3.1 PyLint](#s3.1-pylint) + - [3.2 3rd party packages](#s3.2-3rd-party-packages) + - [3.3 Global variables](#s3.3-global-variables) + - [3.4 Nested/Local/Inner Classes and Functions](#s3.4-nested) + - [3.5 Default Iterators and Operators](#s3.5-default-iterators-and-operators) + - [3.6 Type Annotated Code](#s3.6-type-annotated-code) + - [3.7 Files and Sockets](#s3.7-files-and-sockets) + - [3.8 Abstract Classes](#s3.8-abstract-classes) +- [4 Python Style Rules](#s4-python-style-rules) + - [4.1 Line length](#s4.1-line-length) + - [4.2 Comments and Docstrings](#s4.2-comments-and-docstrings) + - [4.2.1 Modules](#s4.2.1-modules) + - [4.2.2 Functions and Methods](#s4.2.2-functions-and-methods) + - [4.2.3 Classes](#s4.2.3-classes) + - [4.2.4 Block and Inline Comments](#s4.2.4-block-and-inline-comments) + - [4.3 Strings](#s4.3-strings) + - [4.4 Logging](#s4.4-logging) + - [4.5 Error Messages](#s4.5-error-messages) + - [4.6 TODO Comments](#s4.6-todo-comments) + - [4.7 Naming](#s4.7-naming) + - [4.7.1 Names to Avoid](#s4.7.1-names-to-avoid) + - [4.7.2 Naming Conventions](#s4.7.2-naming-conventions) + - [4.7.3 Framework specific class naming](#s4.7.3-framework-specific-class-naming) + - [4.7.4 File Naming](#s4.7.4-file-naming) + - [4.8 Main](#s4.8-main) +- [5 API documentation rules](#s5-api-doc-rules) +
+ ## 1 Introduction This document gives coding conventions for the Python code comprising [Neural Network Compression Framework (NNCF)](../../README.md). @@ -48,6 +50,7 @@ the [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0 + ## 2 Automating Code Formatting To maintain consistency and readability throughout the codebase, we use the [black](https://github.com/psf/black) @@ -62,10 +65,11 @@ make pre-commit Also recommend configuring your IDE to run Black and isort tools automatically when saving files. Automatic code formatting is mandatory for all Python files, but you can disable it for specific cases if required: - - if you need a specialized order of importing modules; - - for large data structures for which autoformatting unnecessarily breaks into lines, - e.g. reference data in tests, class lists or arguments for subprocess; - - for structures for which formatting helps understanding, such as matrix. + +- if you need a specialized order of importing modules; +- for large data structures for which autoformatting unnecessarily breaks into lines, + e.g. reference data in tests, class lists or arguments for subprocess; +- for structures for which formatting helps understanding, such as matrix. Example for 'isort': @@ -95,11 +99,13 @@ arr2 = [ + ## 3 Python Language Rules + ### 3.1 PyLint Run [pylint](https://github.com/PyCQA/pylint) over your code using this [pylintrc](../../.pylintrc). @@ -108,16 +114,18 @@ Run [pylint](https://github.com/PyCQA/pylint) over your code using this [pylintr - *Preferred solution*: Change the code to fix the warning. - *Exception*: Suppress the warning if they are inappropriate so that other issues are not hidden. To suppress warnings you can set a line-level comment + ```python dict = "something awful" # Bad Idea... pylint: disable=redefined-builtin ``` + or update [pylintrc](../../.pylintrc) if applicable for the whole project. If the reason for the suppression is not clear from the symbolic name, add an explanation. - + ### 3.2 3rd party packages Do not add new third-party dependencies unless absolutely necessary. All things being equal, give preference to built-in packages. @@ -125,6 +133,7 @@ Do not add new third-party dependencies unless absolutely necessary. All things + ### 3.3 Global variables Avoid global variables. @@ -137,6 +146,7 @@ Avoid global variables. + ### 3.4 Nested/Local/Inner Classes and Functions No need to overuse nested local functions or classes and inner classes. @@ -144,6 +154,7 @@ No need to overuse nested local functions or classes and inner classes. - Nested local functions or classes are fine if it satisfy the following conditions: - The code becomes more readable and simpler. - Closing over a local variables. + ```python # Correct: def make_scaling_fn(scale): @@ -156,6 +167,7 @@ No need to overuse nested local functions or classes and inner classes. - Do not nest a function just to hide it from users of a module. Instead, prefix its name with an `_` at the module level so that it can still be accessed by tests. + ```Python # Wrong: def avg(a, b, c): @@ -167,6 +179,7 @@ No need to overuse nested local functions or classes and inner classes. m = sum(m,c) return m/3 ``` + ```Python # Correct: def _sum(x, y): @@ -181,6 +194,7 @@ No need to overuse nested local functions or classes and inner classes. + ### 3.5 Default Iterators and Operators Use default iterators and operators for types that support them, like lists, @@ -196,6 +210,7 @@ if obj in alist: ... for line in afile: ... for k, v in adict.items(): ... ``` + ```python # Wrong: for key in adict.keys(): ... @@ -207,6 +222,7 @@ for k, v in dict.iteritems(): ... + ### 3.6 Type Annotated Code Code should be annotated with type hints according to @@ -220,6 +236,7 @@ def func(a: int) -> List[int]: + ### 3.7 Files and Sockets Explicitly close files and sockets when done with them. @@ -230,10 +247,10 @@ with open("hello.txt") as hello_file: print(line) ``` - + ### 3.8 Abstract Classes When defining abstract classes, the following template should be used: @@ -275,31 +292,32 @@ class C(ABC): pass ``` - + ## 4 Python Style Rules + ### 4.1 Line length Maximum line length is *120 characters*. Explicit exceptions to the 120 character limit: -- Long import statements. -- URLs, pathnames, or long flags in comments. -- Long string module level constants not containing whitespace that would be - inconvenient to split across lines such as URLs or pathnames. - - Pylint disable comments. (e.g.: `# pylint: disable=invalid-name`) - +- Long import statements. +- URLs, pathnames, or long flags in comments. +- Long string module level constants not containing whitespace that would be + inconvenient to split across lines such as URLs or pathnames. + - Pylint disable comments. (e.g.: `# pylint: disable=invalid-name`) + ### 4.2 Comments and Docstrings Be sure to use the right style for module, function, method docstrings and @@ -308,6 +326,7 @@ inline comments. + #### 4.2.1 Modules Every file should contain a license boilerplate. @@ -328,14 +347,16 @@ Every file should contain a license boilerplate. + #### 4.2.2 Functions and Methods In this section, "function" means a method, function, or generator. A function must have a docstring, unless it meets all of the following criteria: -- not externally visible -- very short -- obvious + +- not externally visible +- very short +- obvious ```python def load_state(model: torch.nn.Module, state_dict_to_load: dict, is_resume: bool = False) -> int: @@ -364,6 +385,7 @@ def load_state(model: torch.nn.Module, state_dict_to_load: dict, is_resume: bool + #### 4.2.3 Classes Classes should have a docstring below the class definition describing the class. If your class @@ -403,6 +425,7 @@ if there is nothing special about this exact implementation of the magic method hashing all fields as a tuple in `__hash__` or concatenating string-like objects in `__add__` etc.) For instance, this simple `__init__` method may omit the method description in the docstring (the parameter description is, however, still required): + ```python class Klass: # ... @@ -414,11 +437,13 @@ class Klass: self.param1 = param1 self.param2 = param2 ``` + while this `__init__` requires a description of external dependencies and potential side effects of creating objects of the class: + ```python class ComplexKlass(BaseClass): # ... - def __init__(self, param1: ParamType, param2: AnotherParamType): + def __init__(self, param1: ParamType, param2: AnotherParamType): """ *Add a brief explanation of what happens during this particular __init__, such as :* The construction of this object is dependent on the value of GLOBAL_VARIABLE... @@ -448,6 +473,7 @@ class ComplexKlass(BaseClass): + #### 4.2.4 Block and Inline Comments The final place to have comments is in tricky parts of the code. If you're going to have to explain it @@ -478,8 +504,8 @@ knows Python (though not what you're trying to do) better than you do. -### 4.3 Strings +### 4.3 Strings ```python # Correct: @@ -506,7 +532,9 @@ long_string = textwrap.dedent( + ### 4.4 Logging + Use the logger object built into NNCF for all purposes of logging within the NNCF package code. Do not use `print(...)` or other ways of output. @@ -519,6 +547,7 @@ nncf_logger.info("This is an info-level log message") ``` Wrong: + ```python print("This is an info-level log message") ``` @@ -568,10 +597,12 @@ This ensures that the deprecation warning is seen to the user at all NNCF log le + ### 4.5 Error Messages Error messages (such as: message strings on exceptions like `ValueError`, or messages shown to the user) should follow guidelines: + - The message needs to precisely match the actual error condition. - Interpolated pieces need to always be clearly identifiable as such. - The message should start with a capital letter. @@ -579,6 +610,7 @@ messages shown to the user) should follow guidelines: + ### 4.6 TODO Comments Use `TODO` comments for code that is temporary, a short-term solution, or @@ -607,13 +639,13 @@ event ("Remove this code when all clients can handle XML responses."). + ### 4.7 Naming `module_name`, `package_name`, `ClassName`, `method_name`, `ExceptionName`, `function_name`, `GLOBAL_CONSTANT_NAME`, `global_var_name`, `instance_var_name`, `function_parameter_name`, `local_var_name`. - Function names, variable names, and filenames should be descriptive; eschew abbreviation. In particular, do not use abbreviations that are ambiguous or unfamiliar to readers outside your project, and do not abbreviate by deleting @@ -621,7 +653,6 @@ letters within a word. Always use a `.py` filename extension. Never use dashes. - @@ -702,44 +733,47 @@ Always use a `.py` filename extension. Never use dashes. + #### 4.7.1 Names to Avoid -- single character names, except for specifically allowed cases: - - counters or iterators (e.g. `i`, `j`, `k`, `v`, et al.) - - `e` as an exception identifier in `try/except` statements. - - `f` as a file handle in `with` statements - Please be mindful not to abuse single-character naming. Generally speaking, - descriptiveness should be proportional to the name's scope of visibility. - For example, `i` might be a fine name for 5-line code block but within - multiple nested scopes, it is likely too vague. -- dashes (`-`) in any package/module name -- `__double_leading_and_trailing_underscore__` names (reserved by Python) -- offensive terms -- names that needlessly include the type of the variable (for example: +- single character names, except for specifically allowed cases: + - counters or iterators (e.g. `i`, `j`, `k`, `v`, et al.) + - `e` as an exception identifier in `try/except` statements. + - `f` as a file handle in `with` statements + Please be mindful not to abuse single-character naming. Generally speaking, + descriptiveness should be proportional to the name's scope of visibility. + For example, `i` might be a fine name for 5-line code block but within + multiple nested scopes, it is likely too vague. +- dashes (`-`) in any package/module name +- `__double_leading_and_trailing_underscore__` names (reserved by Python) +- offensive terms +- names that needlessly include the type of the variable (for example: `id_to_name_dict`) + #### 4.7.2 Naming Conventions -- "Internal" means internal to a module, or protected or private within a - class. -- Prepending a single underscore (`_`) has some support for protecting module - variables and functions (linters will flag protected member access). While - prepending a double underscore (`__` aka "dunder") to an instance variable - or method effectively makes the variable or method private to its class - (using name mangling); we discourage its use as it impacts readability and - testability, and isn't *really* private. -- Place related classes and top-level functions together in a - module. -- Use CapWords for class names, but lower\_with\_under.py for module names. -- Use the word "layer" (instead of "module") in the `nncf.common` module to - refer to the building block of neural networks. +- "Internal" means internal to a module, or protected or private within a + class. +- Prepending a single underscore (`_`) has some support for protecting module + variables and functions (linters will flag protected member access). While + prepending a double underscore (`__` aka "dunder") to an instance variable + or method effectively makes the variable or method private to its class + (using name mangling); we discourage its use as it impacts readability and + testability, and isn't *really* private. +- Place related classes and top-level functions together in a + module. +- Use CapWords for class names, but lower\_with\_under.py for module names. +- Use the word "layer" (instead of "module") in the `nncf.common` module to + refer to the building block of neural networks. + #### 4.7.3 Framework specific class naming - `PTClassName` for Torch @@ -748,6 +782,7 @@ Always use a `.py` filename extension. Never use dashes. + #### 4.7.4 File Naming Python filenames must have a `.py` extension and must not contain dashes (`-`). @@ -756,6 +791,7 @@ This allows them to be imported and unit tested. + ### 4.8 Main ```python @@ -766,16 +802,17 @@ if __name__ == "__main__": main() ``` - + ## 5 API documentation rules -All functions and classes that belong to NNCF API should be documented. + +All functions and classes that belong to NNCF API should be documented. The documentation should utilize the reStructuredText (.rst) format for specifying parameters, return types and otherwise formatting the docstring, since the docstring is used as a source for generating the HTML API documentation with Sphinx. Argument descriptions for `__init__(...)` methods of API classes should be located in the docstring of the class itself, not the docstring of the `__init__(...)` method. This is required so that the autogenerated API documentation is rendered properly. -If the autogenerated API documentation does not show type hints for certain arguments despite the fact that the type hints are present in the object's implementation code, +If the autogenerated API documentation does not show type hints for certain arguments despite the fact that the type hints are present in the object's implementation code, or if the type hints do not refer to the API symbol's canonical alias, then the type hint should be explicitly declared in the docstring using the `:type *param_name*:` directive (or `:rtype:` for return types). diff --git a/examples/common/README.md b/examples/common/README.md index 8d472e3fb16..a41e21da71d 100644 --- a/examples/common/README.md +++ b/examples/common/README.md @@ -1,2 +1,2 @@ This directory contains common code for example scripts. -See [other directories at the same level](./..) for actual example scripts that you can launch to evaluate NNCF for various backend frameworks and use cases. \ No newline at end of file +See [other directories at the same level](./..) for actual example scripts that you can launch to evaluate NNCF for various backend frameworks and use cases. diff --git a/examples/experimental/torch/classification/Quickstart.md b/examples/experimental/torch/classification/Quickstart.md index 9dcb3a8d32d..823abaa0a4d 100644 --- a/examples/experimental/torch/classification/Quickstart.md +++ b/examples/experimental/torch/classification/Quickstart.md @@ -1,47 +1,54 @@ # Setup -### PyTorch +## PyTorch + Install PyTorch and Torchvision using the [PyTorch installation guide](https://pytorch.org/get-started/locally/#start-locally). NNCF currently supports PyTorch 1.12.1. For this quickstart, PyTorch 1.12.1 and Torchvision 0.13.1 with CUDA 11.3 was installed using: + ```bash pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 ``` +## NNCF -### NNCF There are two options for installing [***NNCF***](https://github.com/openvinotoolkit/nncf#installation): -- Package built from NNCF repository or -- PyPI package. + +- Package built from NNCF repository or +- PyPI package. To install NNCF and dependencies from the NNCF repository, install by running the following in the repository root directory and also set `PYTHONPATH` variable to include the root directory: + ```bash python setup.py develop export PYTHONPATH="${PYTHONPATH}:/nncf" ``` To install NNCF and dependencies as a PyPI package, use the following: + ```bash pip install nncf ``` The ```examples``` folder from the NNCF repository ***is not*** included when you install NNCF using a package manager. To run the BootstrapNAS examples, you will need to obtain this folder from the repository and add it to your path. +## Additional Dependencies -### Additional Dependencies -The examples in the NNCF repo have additional requirements, such as EfficientNet, MLFlow, Tensorboard, etc., which are not installed with NNCF. You will need to install them using: -``` +The examples in the NNCF repo have additional requirements, such as EfficientNet, MLFlow, Tensorboard, etc., which are not installed with NNCF. You will need to install them using: + +```bash pip install efficientnet_pytorch tensorboard mlflow returns ``` +## Example -# Example -To run an example of super-network generation and sub-network search, use the ```bootstrap_nas.py``` script located [here](https://github.com/openvinotoolkit/nncf/blob/develop/examples/experimental/torch/classification/bootstrap_nas.py) and the sample ```config.json``` from [here](https://github.com/jpablomch/bootstrapnas/blob/main/bootstrapnas_examples/config.json). +To run an example of super-network generation and sub-network search, use the ```bootstrap_nas.py``` script located [here](https://github.com/openvinotoolkit/nncf/blob/develop/examples/experimental/torch/classification/bootstrap_nas.py) and the sample ```config.json``` from [here](https://github.com/jpablomch/bootstrapnas/blob/main/bootstrapnas_examples/config.json). -The file ```config.json``` contains a sample configuration for generating a super-network from a trained model. The sample file is configured to generate a super-network from ResNet-50 trained with CIFAR-10. The file should be modified depending on the model to be used as input for BootstrapNAS. +The file ```config.json``` contains a sample configuration for generating a super-network from a trained model. The sample file is configured to generate a super-network from ResNet-50 trained with CIFAR-10. The file should be modified depending on the model to be used as input for BootstrapNAS. -Weights for CIFAR10-based models can be found at: https://github.com/huyvnphan/PyTorch_CIFAR10 +Weights for CIFAR10-based models can be found at: https://github.com/huyvnphan/PyTorch_CIFAR10 -Use the following to test training a super-network: -``` +Use the following to test training a super-network: + +```bash cd /examples/experimental/torch/classification python bootstrap_nas.py -m train \ -c /bootstrapnas_examples/config.json \ @@ -49,22 +56,22 @@ python bootstrap_nas.py -m train \ --weights ``` - ### Expected Output Files after executing BootstrapNAS -The output of running ```bootstrap_nas.py``` will be a sub-network configuration that has an accuracy similar to the input model (by default a $\pm$1% absolute difference in accuracy is allowed), but with improvements in MACs. Format: ([MACs_subnet, ACC_subnet]). -Several files are saved to your `log_dir` after the training has ended: +The output of running ```bootstrap_nas.py``` will be a sub-network configuration that has an accuracy similar to the input model (by default a $\pm$1% absolute difference in accuracy is allowed), but with improvements in MACs. Format: ([MACs_subnet, ACC_subnet]). + +Several files are saved to your `log_dir` after the training has ended: -- `compressed_graph.{dot, png}`- Dot and PNG files that describe the wrapped NNCF model. -- `original_graph.dot` - Dot file that describes the original model. -- `config.json`- A copy of your original config file. +- `compressed_graph.{dot, png}`- Dot and PNG files that describe the wrapped NNCF model. +- `original_graph.dot` - Dot file that describes the original model. +- `config.json`- A copy of your original config file. - `events.*`- Tensorboard logs. - `last_elasticity.pth`- Super-network's elasticity information. This file can be used when loading super-networks for searching or inspection. -- `last_model_weights.pth`- Super-network's weights after training. -- `snapshot.tar.gz` - Copy of the code used for this run. +- `last_model_weights.pth`- Super-network's weights after training. +- `snapshot.tar.gz` - Copy of the code used for this run. - `subnetwork_best.pth` - Dictionary with the configuration of the best sub-network. Best defined as a sub-network that performs in the Pareto front, and that deviates a maximum `acc_delta` from original model. -- `supernet_{best, last}.pth` - Super-network weights at its best and last state. +- `supernet_{best, last}.pth` - Super-network weights at its best and last state. If the user wants to have a CSV output file of the search progression, ```search_algo.search_progression_to_csv()``` can be called after running the search step. -For a visualization of the search progression please use ```search_algo.visualize_search_progression()``` after the search has concluded. A PNG file will be generated. +For a visualization of the search progression please use ```search_algo.visualize_search_progression()``` after the search has concluded. A PNG file will be generated. diff --git a/examples/post_training_quantization/onnx/mobilenet_v2/README.md b/examples/post_training_quantization/onnx/mobilenet_v2/README.md index e9d528ab35f..8aaaad8bf46 100644 --- a/examples/post_training_quantization/onnx/mobilenet_v2/README.md +++ b/examples/post_training_quantization/onnx/mobilenet_v2/README.md @@ -1,24 +1,29 @@ # Post-Training Quantization of MobileNet v2 ONNX Model -This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize ONNX models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. +This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize ONNX models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. The example includes the following steps: + - Loading the [Imagenette](https://github.com/fastai/imagenette) dataset (~340 Mb) and the [MobileNet v2 ONNX model](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) pretrained on this dataset. - Quantizing the model using NNCF Post-Training Quantization algorithm. - Output of the following characteristics of the quantized model: - - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) - - Performance speed up of the quantized model (INT8) + - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) + - Performance speed up of the quantized model (INT8) + +## Install requirements -# Install requirements At this point it is assumed that you have already installed NNCF. You can find information on installation NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + It's pretty simple. The example does not require additional preparation. It will do the preparation itself, such as loading the dataset and model, etc. -``` + +```bash python main.py -``` \ No newline at end of file +``` diff --git a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/README.md b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/README.md index e2ff0691c41..e37071a366e 100644 --- a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/README.md +++ b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/README.md @@ -7,27 +7,30 @@ This example demonstrates how to quantize [Student-Teacher Feature Pyramid Match The `nncf.quantize_with_accuracy_control()` method quantizes a model with a specified accuracy drop and the `max_drop` parameter is passed to specify the maximum absolute difference between the quantized and pre-trained model. The example includes the following steps: -- + - Loading the [MVTec (capsule category)](https://www.mvtec.com/company/research/datasets/mvtec-ad) dataset (~385 Mb) and the [STFPM OpenVINO model](https://huggingface.co/alexsu52/stfpm_mvtec_capsule) pretrained on this dataset. - Quantizing the model using NNCF Post-Training Quantization algorithm with accuracy control. - Output of the following characteristics of the quantized model: - - Accuracy drop between the quantized model (INT8) and the pre-trained model (FP32) - - Compression rate of the quantized model file size relative to the pre-trained model file size - - Performance speed up of the quantized model (INT8) + - Accuracy drop between the quantized model (INT8) and the pre-trained model (FP32) + - Compression rate of the quantized model file size relative to the pre-trained model file size + - Performance speed up of the quantized model (INT8) + +## Install requirements -# Install requirements At this point it is assumed that you have already installed NNCF. You can find information on installation NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + It's pretty simple. The example does not require additional preparation. It will do the preparation itself, such as loading the dataset and model, etc. The maximum accuracy drop you can pass as a command line argument. F1 score is calculted in range [0,1] for STFPM. Thus if you want to specify the maximum accuracy drop between the quantized and pre-trained model of 0.5% you must specify 0.005 as a command line argument: -``` +```bash python main.py 0.005 -``` \ No newline at end of file +``` diff --git a/examples/post_training_quantization/openvino/mobilenet_v2/README.md b/examples/post_training_quantization/openvino/mobilenet_v2/README.md index 180012993de..65b6797d156 100644 --- a/examples/post_training_quantization/openvino/mobilenet_v2/README.md +++ b/examples/post_training_quantization/openvino/mobilenet_v2/README.md @@ -1,25 +1,30 @@ # Post-Training Quantization of MobileNet v2 OpenVINO Model -This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize OpenVINO models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. +This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize OpenVINO models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. The example includes the following steps: + - Loading the [Imagenette](https://github.com/fastai/imagenette) dataset (~340 Mb) and the [MobileNet v2 OpenVINO model](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) pretrained on this dataset. - Quantizing the model using NNCF Post-Training Quantization algorithm. - Output of the following characteristics of the quantized model: - - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) - - Compression rate of the quantized model file size relative to the pre-trained model file size - - Performance speed up of the quantized model (INT8) + - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) + - Compression rate of the quantized model file size relative to the pre-trained model file size + - Performance speed up of the quantized model (INT8) + +## Install requirements -# Install requirements At this point it is assumed that you have already installed NNCF. You can find information on installation NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + It's pretty simple. The example does not require additional preparation. It will do the preparation itself, such as loading the dataset and model, etc. -``` + +```bash python main.py -``` \ No newline at end of file +``` diff --git a/examples/post_training_quantization/openvino/yolov8/README.md b/examples/post_training_quantization/openvino/yolov8/README.md index 4b21899c46f..f861b3c5b15 100644 --- a/examples/post_training_quantization/openvino/yolov8/README.md +++ b/examples/post_training_quantization/openvino/yolov8/README.md @@ -1,28 +1,37 @@ # Post-Training Quantization of YOLOv8 OpenVINO Model -This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize YOLOv8n model. +This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize YOLOv8n model. The example includes the following steps: + - Download and prepare COCO-128 dataset. - Quantize the model with NNCF Post-Training Quantization algorithm. - Measure accuracy and performance of the floating-point and quantized models. -# Install requirements +## Install requirements + To run the example you should install the corresponding Python dependencies: + - Install NNCF from source: -``` + +```bash pip install ../../../../ ``` + - Install 3rd party dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + The example is fully automated. Just run the following comman in the prepared Python environment: -``` + +```bash python main.py ``` ## See also + - [YOLOv8 Jupyter notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/230-yolov8-optimization) diff --git a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/README.md b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/README.md index 2458f52f52d..f5649efbdf8 100644 --- a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/README.md +++ b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/README.md @@ -8,25 +8,29 @@ The example includes the following steps: - Quantize the model with "AccuracyAwareQuantization" algorithm instead of "DefaultQuantization". - Measure accuracy and performance of the floating-point and quantized models. -# Install requirements +## Install requirements To run the example you should install the corresponding Python dependencies: + - Install NNCF from source: -``` -git clone https://github.com/openvinotoolkit/nncf.git -cd nncf -pip install . -``` + + ```bash + git clone https://github.com/openvinotoolkit/nncf.git + cd nncf + pip install . + ``` + - Install 3rd party dependencies of this example: -``` -pip install -r requirements.txt -``` -# Run Example + ```bash + pip install -r requirements.txt + ``` + +## Run Example The example is fully automated. Just run the following command in the prepared Python environment: -``` +```bash python main.py ``` diff --git a/examples/post_training_quantization/tensorflow/mobilenet_v2/README.md b/examples/post_training_quantization/tensorflow/mobilenet_v2/README.md index 6a599e5d8e1..567b1503c8a 100644 --- a/examples/post_training_quantization/tensorflow/mobilenet_v2/README.md +++ b/examples/post_training_quantization/tensorflow/mobilenet_v2/README.md @@ -1,25 +1,30 @@ # Post-Training Quantization of MobileNet v2 TensorFlow Model -This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize TensorFlow models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. +This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize TensorFlow models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. The example includes the following steps: + - Loading the [Imagenette](https://github.com/fastai/imagenette) dataset (~340 Mb) and the [MobileNet v2 TensorFlow model](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) pretrained on this dataset. - Quantizing the model using NNCF Post-Training Quantization algorithm. - Output of the following characteristics of the quantized model: - - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) - - Compression rate of the quantized model file size relative to the pre-trained model file size - - Performance speed up of the quantized model (INT8) + - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) + - Compression rate of the quantized model file size relative to the pre-trained model file size + - Performance speed up of the quantized model (INT8) + +## Install requirements -# Install requirements At this point it is assumed that you have already installed NNCF. You can find information on installation NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + It's pretty simple. The example does not require additional preparation. It will do the preparation itself, such as loading the dataset and model, etc. -``` + +```bash python main.py -``` \ No newline at end of file +``` diff --git a/examples/post_training_quantization/torch/mobilenet_v2/README.md b/examples/post_training_quantization/torch/mobilenet_v2/README.md index 7e9ecf43de1..fbf2ecbe8ea 100644 --- a/examples/post_training_quantization/torch/mobilenet_v2/README.md +++ b/examples/post_training_quantization/torch/mobilenet_v2/README.md @@ -1,25 +1,30 @@ # Post-Training Quantization of MobileNet v2 PyTorch Model -This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize PyTorch models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. +This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize PyTorch models on the example of [MobileNet v2](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) quantization, pretrained on [Imagenette](https://github.com/fastai/imagenette) dataset. The example includes the following steps: + - Loading the [Imagenette](https://github.com/fastai/imagenette) dataset (~340 Mb) and the [MobileNet v2 PyTorch model](https://huggingface.co/alexsu52/mobilenet_v2_imagenette) pretrained on this dataset. - Quantizing the model using NNCF Post-Training Quantization algorithm. - Output of the following characteristics of the quantized model: - - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) - - Compression rate of the quantized model file size relative to the pre-trained model file size - - Performance speed up of the quantized model (INT8) + - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32) + - Compression rate of the quantized model file size relative to the pre-trained model file size + - Performance speed up of the quantized model (INT8) + +## Install requirements -# Install requirements At this point it is assumed that you have already installed NNCF. You can find information on installation NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + It's pretty simple. The example does not require additional preparation. It will do the preparation itself, such as loading the dataset and model, etc. -``` + +```bash python main.py -``` \ No newline at end of file +``` diff --git a/examples/post_training_quantization/torch/ssd300_vgg16/README.md b/examples/post_training_quantization/torch/ssd300_vgg16/README.md index d50cd265d21..a4e57c3c469 100644 --- a/examples/post_training_quantization/torch/ssd300_vgg16/README.md +++ b/examples/post_training_quantization/torch/ssd300_vgg16/README.md @@ -1,27 +1,32 @@ # Post-Training Quantization of SSD PyTorch Model -This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize PyTorch models on the example of [SSD300_VGG16](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.ssd300_vgg16.html) from torchvision library. +This example demonstrates how to use Post-Training Quantization API from Neural Network Compression Framework (NNCF) to quantize PyTorch models on the example of [SSD300_VGG16](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.ssd300_vgg16.html) from torchvision library. The example includes the following steps: + - Loading the [COCO128](https://www.kaggle.com/datasets/ultralytics/coco128) dataset (~7 Mb). - Loading [SSD300_VGG16](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.ssd300_vgg16.html) from torchvision pretrained on the full COCO dataset. - Patching some internal methods with `no_nncf_trace` context so that the model graph is traced properly by NNCF. - Quantizing the model using NNCF Post-Training Quantization algorithm. - Output of the following characteristics of the quantized model: - - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32). - - Compression rate of the quantized model file size relative to the pre-trained model file size. - - Performance speed up of the quantized model (INT8). + - Accuracy drop of the quantized model (INT8) over the pre-trained model (FP32). + - Compression rate of the quantized model file size relative to the pre-trained model file size. + - Performance speed up of the quantized model (INT8). + +## Install requirements -# Install requirements At this point it is assumed that you have already installed NNCF. You can find information on installation NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` + +```bash pip install -r requirements.txt ``` -# Run Example +## Run Example + The example does not require any additional preparation, just run: -``` + +```bash python main.py -``` \ No newline at end of file +``` diff --git a/examples/tensorflow/classification/README.md b/examples/tensorflow/classification/README.md index 036698c156e..64f5c30cf88 100644 --- a/examples/tensorflow/classification/README.md +++ b/examples/tensorflow/classification/README.md @@ -15,7 +15,7 @@ At this point it is assumed that you have already installed nncf. You can find i To work with the sample you should install the corresponding Python package dependencies: -``` +```bash pip install -r examples/tensorflow/requirements.txt ``` @@ -34,6 +34,7 @@ Please read the following [guide](https://www.tensorflow.org/datasets/overview) For the [ImageNet](http://www.image-net.org/challenges/LSVRC/2012/) dataset, TFDS requires a manual download. Please refer to the [TFDS ImageNet Readme](https://www.tensorflow.org/datasets/catalog/imagenet2012) for download instructions. The TFDS ImageNet dataset should be specified in the configuration file as follows: + ```json "dataset": "imagenet2012", "dataset_type": "tfds" @@ -43,6 +44,7 @@ The TFDS ImageNet dataset should be specified in the configuration file as follo To download the [ImageNet](http://www.image-net.org/challenges/LSVRC/2012/) dataset and convert it to [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) format, refer to the following [tutorial](https://github.com/tensorflow/models/tree/master/research/slim#Data). The ImageNet dataset in TFRecords format should be specified in the configuration file as follows: + ```json "dataset": "imagenet2012", "dataset_type": "tfrecords" @@ -58,6 +60,7 @@ The ImageNet dataset in TFRecords format should be specified in the configuratio Before compressing a model, it is highly recommended checking the accuracy of the pretrained model. All models which are supported in the sample has pretrained weights for ImageNet. To load pretrained weights into a model and then evaluate the accuracy of that model, make sure that the pretrained=True option is set in the configuration file and use the following command: + ```bash python main.py \ --mode=test \ @@ -69,13 +72,15 @@ python main.py \ #### Compress Pretrained Model Run the following command to start compression with fine-tuning on all available GPUs on the machine: - ```bash - python main.py \ - --mode=train \ - --config=configs/quantization/mobilenet_v2_imagenet_int8.json \ - --data= \ - --log-dir=../../results/quantization/mobilenet_v2_int8 - ``` + +```bash +python main.py \ +--mode=train \ +--config=configs/quantization/mobilenet_v2_imagenet_int8.json \ +--data= \ +--log-dir=../../results/quantization/mobilenet_v2_int8 +``` + It may take a few epochs to get the baseline accuracy results. Use the `--resume` flag with the path to the checkpoint to resume training from the defined checkpoint or folder with checkpoints to resume training from the last checkpoint. @@ -83,6 +88,7 @@ Use the `--resume` flag with the path to the checkpoint to resume training from ### Validate Your Model Checkpoint To estimate the test scores of your trained model checkpoint, use the following command: + ```bash python main.py \ --mode=test \ @@ -94,6 +100,7 @@ python main.py \ ### Export Compressed Model To export trained model to the **Frozen Graph**, use the following command: + ```bash python main.py \ --mode=export \ @@ -103,6 +110,7 @@ python main.py \ ``` To export trained model to the **SavedModel**, use the following command: + ```bash python main.py \ --mode=export \ @@ -112,6 +120,7 @@ python main.py \ ``` To export trained model to the **Keras H5**, use the following command: + ```bash python main.py \ --mode=export \ @@ -124,6 +133,6 @@ python main.py \ To export a model to the OpenVINO IR and run it using the Intel® Deep Learning Deployment Toolkit, refer to this [tutorial](https://software.intel.com/en-us/openvino-toolkit). -### Results - -Please see compression results for Tensorflow classification at our [Model Zoo page](../../../docs/ModelZoo.md#tensorflow-classification). \ No newline at end of file +## Results + +Please see compression results for Tensorflow classification at our [Model Zoo page](../../../docs/ModelZoo.md#tensorflow-classification). diff --git a/examples/tensorflow/object_detection/README.md b/examples/tensorflow/object_detection/README.md index 714b777f885..9c27d9aa636 100644 --- a/examples/tensorflow/object_detection/README.md +++ b/examples/tensorflow/object_detection/README.md @@ -6,7 +6,7 @@ The sample receives a configuration file where the training schedule, hyper-para ## Features -- RetinaNet from the official [TF repository](https://github.com/tensorflow/models/tree/master/official/vision/detection) with minor modifications (custom implementation of upsamling is replaced with equivalent tf.keras.layers.UpSampling2D). YOLOv4 from the [keras-YOLOv3-model-set](https://github.com/david8862/keras-YOLOv3-model-set) repository. +- RetinaNet from the official [TF repository](https://github.com/tensorflow/models/tree/master/official/vision/detection) with minor modifications (custom implementation of upsampling is replaced with equivalent tf.keras.layers.UpSampling2D). YOLOv4 from the [keras-YOLOv3-model-set](https://github.com/david8862/keras-YOLOv3-model-set) repository. - Support [TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets) and TFRecords for COCO2017 dataset. - Configuration file examples for sparsity, quantization, filter pruning and quantization with sparsity. - Export to Frozen Graph or TensorFlow SavedModel that is supported by the OpenVINO™ toolkit. @@ -18,7 +18,7 @@ At this point it is assumed that you have already installed nncf. You can find i To work with the sample you should install the corresponding Python package dependencies: -``` +```bash pip install -r examples/tensorflow/requirements.txt ``` @@ -68,28 +68,33 @@ The [COCO2017](https://cocodataset.org/) dataset in TFRecords format should be s - Go to the `examples/tensorflow/object_detection` folder. - Download the pre-trained weights in H5 format for either [RetinaNet](https://storage.openvinotoolkit.org/repositories/nncf/models/develop/tensorflow/retinanet_coco.tar.gz) or [YOLOv4](https://storage.openvinotoolkit.org/repositories/nncf/models/develop/tensorflow/yolo_v4_coco.tar.gz) and provide the path to them using `--weights` flag. - (Optional) Before compressing a model, it is highly recommended checking the accuracy of the pretrained model, use the following command: - ```bash - python main.py \ - --mode=test \ - --config=configs/quantization/retinanet_coco_int8.json \ - --weights= - --data= \ - --disable-compression - ``` + + ```bash + python main.py \ + --mode=test \ + --config=configs/quantization/retinanet_coco_int8.json \ + --weights= \ + --data= \ + --disable-compression + ``` + - Run the following command to start compression with fine-tuning on all available GPUs on the machine: + ```bash python main.py \ --mode=train \ --config=configs/quantization/retinanet_coco_int8.json \ - --weights= + --weights= \ --data= \ --log-dir=../../results/quantization/retinanet_coco_int8 ``` + - Use the `--resume` flag with the path to the checkpoint to resume training from the defined checkpoint or folder with checkpoints to resume training from the last checkpoint. ### Validate Your Model Checkpoint To estimate the test scores of your trained model checkpoint, use the following command: + ```bash python main.py \ --mode=test \ @@ -101,6 +106,7 @@ python main.py \ ### Export Compressed Model To export trained model to the **Frozen Graph**, use the following command: + ```bash python main.py \ --mode=export \ @@ -110,6 +116,7 @@ python main.py \ ``` To export trained model to the **SavedModel**, use the following command: + ```bash python main.py \ --mode=export \ @@ -119,6 +126,7 @@ python main.py \ ``` To export trained model to the **Keras H5**, use the following command: + ```bash python main.py \ --mode=export \ @@ -128,7 +136,9 @@ python main.py \ ``` ### Save Checkpoint without Optimizer + To reduce memory footprint (if no further training is scheduled) it is useful to save the checkpoint without optimizer. Use the following command: + ```bash python ../common/prepare_checkpoint.py \ --config=configs/quantization/retinanet_coco_int8.json \ @@ -141,10 +151,12 @@ python ../common/prepare_checkpoint.py \ To export a model to the OpenVINO IR and run it using the Intel® Deep Learning Deployment Toolkit, refer to this [tutorial](https://software.intel.com/en-us/openvino-toolkit). ## Train RetinaNet from scratch + - Download pre-trained ResNet-50 checkpoint from [here](https://storage.cloud.google.com/cloud-tpu-checkpoints/model-garden-vision/detection/resnet50-2018-02-07.tar.gz). - If you did not install the package, add the repository root folder to the `PYTHONPATH` environment variable. - Go to the `examples/tensorflow/object_detection` folder. - Run the following command to start training RetinaNet from scratch on all available GPUs on the machine: + ```bash python main.py \ --mode=train \ @@ -152,9 +164,10 @@ To export a model to the OpenVINO IR and run it using the Intel® Deep Learning --data= \ --log-dir=../../results/quantization/retinanet_coco_baseline \ --backbone-checkpoint= + ``` + - Export trained model to the Keras H5 format. ## Results - -Please see compression results for Tensorflow object detection at our [Model Zoo page](../../../docs/ModelZoo.md#tensorflow-object-detection). \ No newline at end of file +Please see compression results for Tensorflow object detection at our [Model Zoo page](../../../docs/ModelZoo.md#tensorflow-object-detection). diff --git a/examples/tensorflow/segmentation/README.md b/examples/tensorflow/segmentation/README.md index 215cbdbb592..7dfb531ffcf 100644 --- a/examples/tensorflow/segmentation/README.md +++ b/examples/tensorflow/segmentation/README.md @@ -6,7 +6,7 @@ The sample receives a configuration file where the training schedule, hyper-para ## Features -- Mask R-CNN from the official [TF repository](https://github.com/tensorflow/models/tree/master/official/vision/detection) with minor modifications (custom implementation of upsamling is replaced with equivalent tf.keras.layers.UpSampling2D). +- Mask R-CNN from the official [TF repository](https://github.com/tensorflow/models/tree/master/official/vision/detection) with minor modifications (custom implementation of upsampling is replaced with equivalent tf.keras.layers.UpSampling2D). - Support TFRecords for COCO2017 dataset. - Configuration file examples for sparsity, quantization, and quantization with sparsity. - Export to Frozen Graph or TensorFlow SavedModel that is supported by the OpenVINO™ toolkit. @@ -18,7 +18,7 @@ At this point it is assumed that you have already installed nncf. You can find i To work with the sample you should install the corresponding Python package dependencies: -``` +```bash pip install -r examples/tensorflow/requirements.txt ``` @@ -49,11 +49,13 @@ The [COCO2017](https://cocodataset.org/) dataset should be specified in the conf ### Run Instance Segmentation Sample We can run the sample after data preparation. For this follow these steps: + - If you did not install the package, add the repository root folder to the `PYTHONPATH` environment variable. - Go to the `examples/tensorflow/segmentation` folder. -- Download the pre-trained Mask-R-CNN [weights](https://storage.openvinotoolkit.org/repositories/nncf/models/develop/tensorflow/mask_rcnn_coco.tar.gz) in checkpoint format and provide the path to them using `--weights` flag. +- Download the pre-trained Mask-R-CNN [weights](https://storage.openvinotoolkit.org/repositories/nncf/models/develop/tensorflow/mask_rcnn_coco.tar.gz) in checkpoint format and provide the path to them using `--weights` flag. - Specify the GPUs to be used for training by setting the environment variable [`CUDA_VISIBLE_DEVICES`](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/). This is necessary because training and validation during training must be performed on different GPU devices. Please note that usually only one GPU is required for validation during training. - (Optional) Before compressing a model, it is highly recommended checking the accuracy of the pretrained model, use the following command: + ```bash python evaluation.py \ --mode=test \ @@ -63,33 +65,39 @@ We can run the sample after data preparation. For this follow these steps: --batch-size=1 \ --disable-compression ``` + - Run the following command to start compression with fine-tuning on all available GPUs on the machine: - ```bash - python train.py \ - --config=configs/quantization/mask_rcnn_coco_int8.json \ - --weights= \ - --data= \ - --log-dir=../../results/quantization/maskrcnn_coco_int8 - ``` + + ```bash + python train.py \ + --config=configs/quantization/mask_rcnn_coco_int8.json \ + --weights= \ + --data= \ + --log-dir=../../results/quantization/maskrcnn_coco_int8 + ``` + - Use the `--resume` flag with the path to the checkpoint to resume training from the defined checkpoint or folder with checkpoints to resume training from the last checkpoint. To start checkpoints validation during training follow these steps: + - If you did not install the package, add the repository root folder to the `PYTHONPATH` environment variable. - Go to the `examples/tensorflow/segmentation` folder. - Specify the GPUs to be used for validation during training by setting the environment variable [`CUDA_VISIBLE_DEVICES`](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/). - Run the following command to start checkpoints validation during training: - ```bash - python evaluation.py \ - --mode=train \ - --config=configs/quantization/mask_rcnn_coco_int8.json \ - --data= \ - --batch-size=1 \ - --checkpoint-save-dir= - ``` + + ```bash + python evaluation.py \ + --mode=train \ + --config=configs/quantization/mask_rcnn_coco_int8.json \ + --data= \ + --batch-size=1 \ + --checkpoint-save-dir= + ``` ### Validate Your Model Checkpoint -To estimate the test scores of your trained model checkpoint, use the following command +To estimate the test scores of your trained model checkpoint, use the following command: + ```bash python evaluation.py \ --mode=test \ @@ -102,6 +110,7 @@ python evaluation.py \ ### Export Compressed Model To export trained model to the **Frozen Graph**, use the following command: + ```bash python evaluation.py \ --mode=export \ @@ -112,6 +121,7 @@ python evaluation.py \ ``` To export trained model to the **SavedModel**, use the following command: + ```bash python evaluation.py \ --mode=export \ @@ -126,18 +136,20 @@ python evaluation.py \ To export a model to the OpenVINO IR and run it using the Intel® Deep Learning Deployment Toolkit, refer to this [tutorial](https://software.intel.com/en-us/openvino-toolkit). ## Train MaskRCNN from scratch + - Download pre-trained ResNet-50 checkpoint from [here](https://storage.cloud.google.com/cloud-tpu-checkpoints/model-garden-vision/detection/resnet50-2018-02-07.tar.gz). - If you did not install the package, add the repository root folder to the `PYTHONPATH` environment variable. - Go to the `examples/tensorflow/segmentation` folder. - Run the following command to start training MaskRCNN from scratch on all available GPUs on the machine: - ```bash - python train.py \ - --config=configs/mask_rcnn_coco.json \ - --backbone-checkpoint= \ - --data= \ - --log-dir=../../results/quantization/maskrcnn_coco_baseline + ```bash + python train.py \ + --config=configs/mask_rcnn_coco.json \ + --backbone-checkpoint= \ + --data= \ + --log-dir=../../results/quantization/maskrcnn_coco_baseline + ``` ## Results - -Please see compression results for Tensorflow instance segmentation at our [Model Zoo page](../../../docs/ModelZoo.md#tensorflow-instance-segmentation). \ No newline at end of file + +Please see compression results for Tensorflow instance segmentation at our [Model Zoo page](../../../docs/ModelZoo.md#tensorflow-instance-segmentation). diff --git a/examples/torch/README.md b/examples/torch/README.md index 71e2cc5158f..be3c1321a12 100644 --- a/examples/torch/README.md +++ b/examples/torch/README.md @@ -1,26 +1,27 @@ -### Installation +# Installation Install the packages needed for samples by running the following in the current directory: -``` +```bash pip install -r requirements.txt ``` One of the needed package - torchvision. The version of torchvision should always match the version of installed torch package. Please refer to the [table](https://github.com/pytorch/pytorch/wiki/PyTorch-Versions#domain-version-compatibility-matrix-for-pytorch) to find compatible versions of torchvision and torch. -By default, if there is no torchvision in your Python environment it installs the package that is compatible with -the best known torch version (`BKC_TORCH_VERSION` in the code). In that case if your environment has the torch version, +By default, if there is no torchvision in your Python environment it installs the package that is compatible with +the best known torch version (`BKC_TORCH_VERSION` in the code). In that case if your environment has the torch version, which is different from best known one, you should install the corresponding torchvision package by yourself. -For example, if you need torch 1.9.1 (not best known version) with CUDA11 support, we recommend specifying the -corresponding torchvision version as follows in the root nncf directory: +For example, if you need torch 1.9.1 (not best known version) with CUDA11 support, we recommend specifying the +corresponding torchvision version as follows in the root nncf directory: -``` +```bash pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html pip install .[torch] pip install -r examples/torch/requirements.txt ``` -### Results -Please see compression results for PyTorch models at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch). \ No newline at end of file +## Results + +Please see compression results for PyTorch models at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch). diff --git a/examples/torch/classification/README.md b/examples/torch/classification/README.md index f2f4dfaab6f..36c85eecd49 100644 --- a/examples/torch/classification/README.md +++ b/examples/torch/classification/README.md @@ -17,7 +17,7 @@ At this point it is assumed that you have already installed nncf. You can find i To work with the sample you should install the corresponding Python package dependencies: -``` +```bash pip install -r examples/torch/requirements.txt ``` @@ -25,16 +25,16 @@ pip install -r examples/torch/requirements.txt This scenario demonstrates quantization with fine-tuning of MobileNet v2 on the ImageNet dataset. -#### Dataset Preparation +### Dataset Preparation To prepare the ImageNet dataset, refer to the following [tutorial](https://github.com/pytorch/examples/tree/master/imagenet). -#### Run Classification Sample +### Run Classification Sample - If you did not install the package, add the repository root folder to the `PYTHONPATH` environment variable. - Go to the `examples/torch/classification` folder. -#### Test Pretrained Model +### Test Pretrained Model Before compressing a model, it is highly recommended checking the accuracy of the pretrained model. All models which are supported in the sample has pretrained weights for ImageNet. @@ -48,11 +48,15 @@ python main.py \ --disable-compression ``` -#### Compress Pretrained Model +### Compress Pretrained Model - Run the following command to start compression with fine-tuning on GPUs: - ``` - python main.py -m train --config configs/quantization/mobilenet_v2_imagenet_int8.json --data /data/imagenet/ --log-dir=../../results/quantization/mobilenet_v2_int8/ + + ```bash + python main.py -m train \ + --config configs/quantization/mobilenet_v2_imagenet_int8.json \ + --data /data/imagenet/ \ + --log-dir=../../results/quantization/mobilenet_v2_int8/ ``` It may take a few epochs to get the baseline accuracy results. @@ -62,27 +66,33 @@ python main.py \ - Use the `--weights` flag with the path to a compatible PyTorch checkpoint in order to load all matching weights from the checkpoint into the model - useful if you need to start compression-aware training from a previously trained uncompressed (FP32) checkpoint instead of performing compression-aware training from scratch. - Use the `--no_strip_on_export` to export not stripped model. -#### Validate Your Model Checkpoint +### Validate Your Model Checkpoint To estimate the test scores of your trained model checkpoint, use the following command: -``` -python main.py -m test --config=configs/quantization/mobilenet_v2_imagenet_int8.json --resume +```bash +python main.py -m test \ +--config=configs/quantization/mobilenet_v2_imagenet_int8.json \ +--resume ``` **WARNING**: The samples use `torch.load` functionality for checkpoint loading which, in turn, uses pickle facilities by default which are known to be vulnerable to arbitrary code execution attacks. **Only load the data you trust** -#### Export Compressed Model +### Export Compressed Model To export trained model to the ONNX format, use the following command: -``` -python main.py -m export --config=configs/quantization/mobilenet_v2_imagenet_int8.json --resume=../../results/quantization/mobilenet_v2_int8/6/checkpoints/epoch_1.pth --to-onnx=../../results/mobilenet_v2_int8.onnx +```bash +python main.py -m export \ +--config=configs/quantization/mobilenet_v2_imagenet_int8.json \ +--resume=../../results/quantization/mobilenet_v2_int8/6/checkpoints/epoch_1.pth \ +--to-onnx=../../results/mobilenet_v2_int8.onnx ``` -#### Export to OpenVINO™ Intermediate Representation (IR) +### Export to OpenVINO™ Intermediate Representation (IR) To export a model to the OpenVINO IR and run it using the Intel® Deep Learning Deployment Toolkit, refer to this [tutorial](https://software.intel.com/en-us/openvino-toolkit). -### Results -Please see compression results for PyTorch classification at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch-classification). \ No newline at end of file +## Results + +Please see compression results for PyTorch classification at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch-classification). diff --git a/examples/torch/object_detection/README.md b/examples/torch/object_detection/README.md index 384e535aa2a..88eb11380fc 100644 --- a/examples/torch/object_detection/README.md +++ b/examples/torch/object_detection/README.md @@ -2,7 +2,7 @@ This sample demonstrates DL model compression capabilities for object detection task. -## Features: +## Features - Vanilla SSD300 / SSD512 (+ Batch Normalization), MobileNetSSD-300 - VOC2007 / VOC2012, COCO datasets @@ -17,7 +17,7 @@ At this point it is assumed that you have already installed nncf. You can find i To work with the sample you should install the corresponding Python package dependencies: -``` +```bash pip install -r examples/torch/requirements.txt ``` @@ -25,16 +25,17 @@ pip install -r examples/torch/requirements.txt This scenario demonstrates quantization with fine-tuning of SSD300 on VOC dataset. -#### Dataset preparation +### Dataset preparation - Download and extract in one folder train/val+test VOC2007 and train/val VOC2012 data from [here](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) - In the future, `` means the path to this folder. -#### Run object detection sample +### Run object detection sample - If you did not install the package then add the repository root folder to the `PYTHONPATH` environment variable - Navigate to the `examples/torch/object_detection` folder - (Optional) Before compressing a model, it is highly recommended checking the accuracy of the pretrained model, use the following command: + ```bash python main.py \ --mode=test \ @@ -42,6 +43,7 @@ This scenario demonstrates quantization with fine-tuning of SSD300 on VOC datase --data= \ --disable-compression ``` + - Run the following command to start compression with fine-tuning on GPUs: `python main.py -m train --config configs/ssd300_vgg_voc_int8.json --data --log-dir=../../results/quantization/ssd300_int8 --weights=`It may take a few epochs to get the baseline accuracy results. - Use `--weights` flag with the path to a compatible PyTorch checkpoint in order to load all matching weights from the checkpoint into the model - useful if you need to start compression-aware training from a previously trained uncompressed (FP32) checkpoint instead of performing compression-aware training from scratch. This flag is optional, but highly recommended to use. @@ -49,7 +51,7 @@ This scenario demonstrates quantization with fine-tuning of SSD300 on VOC datase - Use `--resume` flag with the path to a previously saved model to resume training. - Use the `--no_strip_on_export` to export not stripped model. -#### Validate your model checkpoint +### Validate your model checkpoint To estimate the test scores of your trained model checkpoint use the following command: `python main.py -m test --config=configs/ssd300_vgg_voc_int8.json --data --resume ` @@ -57,14 +59,15 @@ If you want to validate an FP32 model checkpoint, make sure the compression algo **WARNING**: The samples use `torch.load` functionality for checkpoint loading which, in turn, uses pickle facilities by default which are known to be vulnerable to arbitrary code execution attacks. **Only load the data you trust** -#### Export compressed model +### Export compressed model To export trained model to ONNX format use the following command: `python main.py -m export --config configs/ssd300_vgg_voc_int8.json --data --resume --to-onnx=../../results/ssd300_int8.onnx` -#### Export to OpenVINO Intermediate Representation (IR) +### Export to OpenVINO Intermediate Representation (IR) To export a model to OpenVINO IR and run it using Intel Deep Learning Deployment Toolkit please refer to this [tutorial](https://software.intel.com/en-us/openvino-toolkit). -### Results -Please see compression results for PyTorch object detection at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch-object-detection). \ No newline at end of file +## Results + +Please see compression results for PyTorch object detection at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch-object-detection). diff --git a/examples/torch/semantic_segmentation/README.md b/examples/torch/semantic_segmentation/README.md index 1af5433076f..17f54beb7e6 100644 --- a/examples/torch/semantic_segmentation/README.md +++ b/examples/torch/semantic_segmentation/README.md @@ -2,7 +2,7 @@ This sample demonstrates DL model compression capabilities for semantic segmentation problem -## Features: +## Features - UNet and ICNet with implementations as close as possible to the original papers - Loaders for CamVid, Cityscapes (20-class), Mapillary Vistas(20-class), Pascal VOC (reuses the loader integrated into torchvision) @@ -17,7 +17,7 @@ At this point it is assumed that you have already installed nncf. You can find i To work with the sample you should install the corresponding Python package dependencies: -``` +```bash pip install -r examples/torch/requirements.txt ``` @@ -25,15 +25,16 @@ pip install -r examples/torch/requirements.txt This scenario demonstrates quantization with fine-tuning of UNet on Mapillary Vistas dataset. -#### Dataset preparation +### Dataset preparation - Obtain a copy of Mapillary Vistas train/val data [here](https://www.mapillary.com/dataset/vistas/) -#### Run semantic segmentation sample +### Run semantic segmentation sample - If you did not install the package then add the repository root folder to the `PYTHONPATH` environment variable - Navigate to the `examples/torch/segmentation` folder - (Optional) Before compressing a model, it is highly recommended checking the accuracy of the pretrained model, use the following command: + ```bash python main.py \ --mode=test \ @@ -43,6 +44,7 @@ This scenario demonstrates quantization with fine-tuning of UNet on Mapillary Vi --batch-size=1 \ --disable-compression ``` + - Run the following command to start compression with fine-tuning on GPUs: `python main.py -m train --config configs/unet_mapillary_int8.json --data --weights ` @@ -56,7 +58,7 @@ It may take a few epochs to get the baseline accuracy results. om scratch. - Use the `--no_strip_on_export` to export not stripped model. -#### Validate your model checkpoint +### Validate your model checkpoint To estimate the test scores of your trained model checkpoint use the following command: `python main.py -m test --config=configs/unet_mapillary_int8.json --resume ` @@ -64,14 +66,15 @@ If you want to validate an FP32 model checkpoint, make sure the compression algo **WARNING**: The samples use `torch.load` functionality for checkpoint loading which, in turn, uses pickle facilities by default which are known to be vulnerable to arbitrary code execution attacks. **Only load the data you trust** -#### Export compressed model +### Export compressed model To export trained model to ONNX format use the following command: `python main.py --mode export --config configs/unet_mapillary_int8.json --data --resume --to-onnx unet_int8.onnx` -#### Export to OpenVINO Intermediate Representation (IR) +### Export to OpenVINO Intermediate Representation (IR) To export a model to OpenVINO IR and run it using Intel Deep Learning Deployment Toolkit please refer to this [tutorial](https://software.intel.com/en-us/openvino-toolkit). -### Results -Please see compression results for PyTorch semantic segmentation at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch-semantic-segmentation). \ No newline at end of file +## Results + +Please see compression results for PyTorch semantic segmentation at our [Model Zoo page](../../../docs/ModelZoo.md#pytorch-semantic-segmentation). diff --git a/nncf/experimental/torch/nas/bootstrapNAS/BootstrapNAS.md b/nncf/experimental/torch/nas/bootstrapNAS/BootstrapNAS.md index ecc6be129bf..f60322dbda8 100644 --- a/nncf/experimental/torch/nas/bootstrapNAS/BootstrapNAS.md +++ b/nncf/experimental/torch/nas/bootstrapNAS/BootstrapNAS.md @@ -1,96 +1,98 @@ -### BootstrapNAS +# BootstrapNAS -Automated generation of weight-sharing super-networks (Cai, Gan, et al., 2020) for Neural Architecture Search (NAS) (Elsken et al., 2019). A weight-sharing super-network is a data structure from which smaller and more efficient sub-networks can be extracted. +Automated generation of weight-sharing super-networks (Cai, Gan, et al., 2020) for Neural Architecture Search (NAS) (Elsken et al., 2019). A weight-sharing super-network is a data structure from which smaller and more efficient sub-networks can be extracted.

BootstrapNAS Architecture

-BootstrapNAS (1) takes as input a pre-trained model. (2) It uses this model to generate a weight-sharing super-network. (3) BootstrapNAS then applies a training strategy, and once the super-network has been trained, (4) it searches for efficient subnetworks that satisfy the user's requirements. (5) The configuration of the discovered sub-network(s) is returned to the user. +BootstrapNAS (1) takes as input a pre-trained model. (2) It uses this model to generate a weight-sharing super-network. (3) BootstrapNAS then applies a training strategy, and once the super-network has been trained, (4) it searches for efficient subnetworks that satisfy the user's requirements. (5) The configuration of the discovered sub-network(s) is returned to the user. -The parameters for generating, training and searching on the super-network are defined in a configuration file within two exclusive subsets of parameters for training and search: -```json - "bootstrapNAS": { - "training": { - ... - }, - "search": { - ... - } +The parameters for generating, training and searching on the super-network are defined in a configuration file within two exclusive subsets of parameters for training and search: + +```json5 +"bootstrapNAS": { + "training": { + ... + }, + "search": { + ... } +} ``` -In the `training` section, you specify the training algorithm, e.g., `progressive_shrinking`, schedule and elasticity parameters: +In the `training` section, you specify the training algorithm, e.g., `progressive_shrinking`, schedule and elasticity parameters: ```json "training": { - "algorithm": "progressive_shrinking", - "progressivity_of_elasticity": ["depth", "width"], + "algorithm": "progressive_shrinking", + "progressivity_of_elasticity": ["depth", "width"], "batchnorm_adaptation": { "num_bn_adaptation_samples": 1500 }, - "schedule": { + "schedule": { "list_stage_descriptions": [ - {"train_dims": ["depth"], "epochs": 25, "depth_indicator": 1, "init_lr": 2.5e-6, "epochs_lr": 25}, - {"train_dims": ["depth"], "epochs": 40, "depth_indicator": 2, "init_lr": 2.5e-6, "epochs_lr": 40}, - {"train_dims": ["depth", "width"], "epochs": 50, "depth_indicator": 2, "reorg_weights": true, "width_indicator": 2, "bn_adapt": true, "init_lr": 2.5e-6, "epochs_lr": 50}, - {"train_dims": ["depth", "width"], "epochs": 50, "depth_indicator": 2, "reorg_weights": true, "width_indicator": 3, "bn_adapt": true, "init_lr": 2.5e-6, "epochs_lr": 50} - ] - }, + {"train_dims": ["depth"], "epochs": 25, "depth_indicator": 1, "init_lr": 2.5e-6, "epochs_lr": 25}, + {"train_dims": ["depth"], "epochs": 40, "depth_indicator": 2, "init_lr": 2.5e-6, "epochs_lr": 40}, + {"train_dims": ["depth", "width"], "epochs": 50, "depth_indicator": 2, "reorg_weights": true, "width_indicator": 2, "bn_adapt": true, "init_lr": 2.5e-6, "epochs_lr": 50}, + {"train_dims": ["depth", "width"], "epochs": 50, "depth_indicator": 2, "reorg_weights": true, "width_indicator": 3, "bn_adapt": true, "init_lr": 2.5e-6, "epochs_lr": 50} + ] + }, "elasticity": { "available_elasticity_dims": ["width", "depth"], "width": { "max_num_widths": 3, "min_width": 32, - "width_step": 32, + "width_step": 32, "width_multipliers": [1, 0.80, 0.60] }, - ... + ... } - ``` -In the search section, you specify the search algorithm, e.g., `NSGA-II` and its parameters. For example: + +In the search section, you specify the search algorithm, e.g., `NSGA-II` and its parameters. For example: + ```json "search": { "algorithm": "NSGA2", - "num_evals": 3000, - "population": 50, - "ref_acc": 93.65 + "num_evals": 3000, + "population": 50, + "ref_acc": 93.65 } ``` -By default, BootstrapNAS uses `NSGA-II` (Dev et al., 2002), an genetic algorithm that constructs a pareto front of efficient sub-networks. +By default, BootstrapNAS uses `NSGA-II` (Dev et al., 2002), an genetic algorithm that constructs a pareto front of efficient sub-networks. -List of parameters that can be used in the configuration file: +List of parameters that can be used in the configuration file: **Training:** `algorithm`: Defines training strategy for tuning supernet. By default, `progressive_shrinking`. -`progressivity_of_elasticity`: Defines the order of adding a new elasticity dimension from stage to stage. +`progressivity_of_elasticity`: Defines the order of adding a new elasticity dimension from stage to stage. examples=["width", "depth", "kernel"]. -`batchnorm_adaptation`: Specifies the number of samples from the training dataset to use for model inference during the +`batchnorm_adaptation`: Specifies the number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. -`schedule`: The schedule section includes a list of stage descriptors (`list_stage_descriptions`) that specify the -elasticity dimensions enabled for a particular stage (`train_dims`), the number of `epochs` for the stage, the -`depth_indicator` which in the case of elastic depth, restricts the maximum number of blocks in each independent group -that can be skipped, the `width_indicator`, which restricts the maximum number of width values in each elastic layer. -The user can also specify whether weights should be reorganized (`reorg_weights`), whether batch norm adaptation should -be triggered at the beginning of the stage (`bn_adapt`), the initial learning rate for the stage (`init_lr`), and -the epochs to use for adjusting the learning rate (`epochs_lr`). +`schedule`: The schedule section includes a list of stage descriptors (`list_stage_descriptions`) that specify the +elasticity dimensions enabled for a particular stage (`train_dims`), the number of `epochs` for the stage, the +`depth_indicator` which in the case of elastic depth, restricts the maximum number of blocks in each independent group +that can be skipped, the `width_indicator`, which restricts the maximum number of width values in each elastic layer. +The user can also specify whether weights should be reorganized (`reorg_weights`), whether batch norm adaptation should +be triggered at the beginning of the stage (`bn_adapt`), the initial learning rate for the stage (`init_lr`), and +the epochs to use for adjusting the learning rate (`epochs_lr`). -`elasticity`: Currently, BootstrapNAS supports three elastic dimensions (`kernel`, `width` and `depth`). -Elastic depth automatically finds blocks to skip, by default. The user can specify the `min_block_size`, i.e., minimal +`elasticity`: Currently, BootstrapNAS supports three elastic dimensions (`kernel`, `width` and `depth`). +Elastic depth automatically finds blocks to skip, by default. The user can specify the `min_block_size`, i.e., minimal number of operations in the skipping block, and the `max_block_size`, i.e., maximal number of operations in the block. -Alternatively, one can specify list of blocks to skip manually via `skipped_blocks`. -In the case of elastic width, the user can specify the `min_width`, i.e., the minimal number of output channels that -can be activated for each layers with elastic width. Default value is 32, the `max_num_widths`, which restricts total -number of different elastic width values for each layer, a `width_step`, which defines a step size for a generation of -the elastic width search space, or a `width_multiplier` to define the elastic width search space via a list of multipliers. -Finally, the user can determine the type of filter importance metric: L1, L2 or geometric mean. L2 is selected by default. -For elastic kernel, the user can specify the `max_num_kernels`, which restricts the total number of different elastic +Alternatively, one can specify list of blocks to skip manually via `skipped_blocks`. +In the case of elastic width, the user can specify the `min_width`, i.e., the minimal number of output channels that +can be activated for each layers with elastic width. Default value is 32, the `max_num_widths`, which restricts total +number of different elastic width values for each layer, a `width_step`, which defines a step size for a generation of +the elastic width search space, or a `width_multiplier` to define the elastic width search space via a list of multipliers. +Finally, the user can determine the type of filter importance metric: L1, L2 or geometric mean. L2 is selected by default. +For elastic kernel, the user can specify the `max_num_kernels`, which restricts the total number of different elastic kernel values for each layer. `train_steps`: Defines the number of samples used for each training epoch. @@ -107,8 +109,7 @@ kernel values for each layer. `ref_acc`: Defines the reference accuracy from the pre-trained model used to generate the super-network. -For more information about BootstrapNAS and to cite this work, please refer to the following publications: - +For more information about BootstrapNAS and to cite this work, please refer to the following publications: [Automated Super-Network Generation for Scalable Neural Architecture Search](https://openreview.net/attachment?id=HK-zmbTB8gq&name=main_paper_and_supplementary_material). @@ -122,9 +123,10 @@ For more information about BootstrapNAS and to cite this work, please refer to t url={https://openreview.net/forum?id=HK-zmbTB8gq} } ``` + [Enabling NAS with Automated Super-Network Generation](https://arxiv.org/abs/2112.10878) -```BibTex +```bibtex @article{ bootstrapNAS, author = {Mu{\~{n}}oz, J. Pablo and Lyalyushkin, Nikolay and Akhauri, Yash and Senina, Anastasia and Kozlov, Alexander and Jain, Nilesh}, @@ -140,10 +142,10 @@ For more information about BootstrapNAS and to cite this work, please refer to t } ``` -#### References +## References - Cai, H., C. Gan, et al. (2020). “Once for All: Train One Network and Specialize it for Efficient Deployment”. In: International Conference on Learning Representations. - Deb, K., A. Pratap, et al. (2002). “A fast and elitist multiobjective genetic algorithm: NSGA-II”. In: 303 IEEE Transactions on Evolutionary Computation 6.2, pp. 182–197. -- Elsken, T., J. H. Metzen, and F. Hutter (2019). “Neural Architecture Search: A Survey”. In: Journal of Machine Learning Research 20.55, pp. 1–21. \ No newline at end of file +- Elsken, T., J. H. Metzen, and F. Hutter (2019). “Neural Architecture Search: A Survey”. In: Journal of Machine Learning Research 20.55, pp. 1–21. diff --git a/nncf/experimental/torch/replace_custom_modules/__init__.py b/nncf/experimental/torch/replace_custom_modules/__init__.py new file mode 100644 index 00000000000..9b29b47534a --- /dev/null +++ b/nncf/experimental/torch/replace_custom_modules/__init__.py @@ -0,0 +1,10 @@ +# Copyright (c) 2023 Intel Corporation +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nncf/experimental/torch/sparsity/movement/MovementSparsity.md b/nncf/experimental/torch/sparsity/movement/MovementSparsity.md index 986922c4219..39dae627c3f 100644 --- a/nncf/experimental/torch/sparsity/movement/MovementSparsity.md +++ b/nncf/experimental/torch/sparsity/movement/MovementSparsity.md @@ -1,4 +1,4 @@ -### Movement Sparsity +# Movement Sparsity [Movement Pruning (Sanh et al., 2020)](https://arxiv.org/pdf/2005.07683.pdf) is an effective learning-based unstructured sparsification algorithm, especially for Transformer-based models in transfer learning setup. [Lagunas et al., 2021](https://arxiv.org/pdf/2109.04838.pdf) extends the algorithm to sparsify by block grain size, enabling structured sparsity which can achieve device-agnostic inference acceleration. @@ -6,7 +6,7 @@ NNCF implements both unstructured and structured movement sparsification. The im For usage explanation of the algorithm, let's start with an example configuration below which is targeted for BERT models. -**Example configuration of Movement Sparsity for BERT models** +## Example configuration of Movement Sparsity for BERT models ```json { @@ -39,20 +39,20 @@ This diagram is the sparsity level of BERT-base model over the optimization life 2. **Structured masking and fine-tuning**: At the end of first stage, i.e. `warmup_end_epoch`, the sparsified model cannot be accelerated without tailored HW/SW but some sparse structures can be totally discarded from the model to save compute and memory footprint. NNCF provides mechanism to achieve structured masking by `"enable_structured_masking": true`, where it automatically resolves the structured masking between dependent layers and rewinds the sparsified parameters that does not participate in acceleration for task modeling. In the example above, the sparsity level has dropped after `warmup_end_epoch` due to structured masking and the model will continue to fine-tune thereafter. Currently, the automatic structured masking feature was tested on **_BERT, DistilBERT, RoBERTa, MobileBERT, Wav2Vec2, Swin, ViT, CLIPVisual_** architectures defined by [Hugging Face's transformers](https://huggingface.co/docs/transformers/index). Support for other architectures is not guaranteed. Users can disable this feature by setting `"enable_structured_masking": false`, where the sparse structures at the end of first stage will be frozen and training/fine-tuning will continue on unmasked parameters. Please refer next section to realize model inference acceleration with [OpenVINO](https://docs.openvino.ai/latest/index.html) toolchain. -#### Inference Acceleration via [OpenVINO](https://docs.openvino.ai/latest/index.html) +## Inference Acceleration via [OpenVINO](https://docs.openvino.ai/latest/index.html) Optimized models are compatible with OpenVINO toolchain. Use `compression_controller.export_model("movement_sparsified_model.onnx")` to export model in onnx format. Sparsified parameters in the onnx are in value of zero. Structured sparse structures can be discarded during ONNX translation to OpenVINO IR using [Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) with additional option `--transform=Pruning`. Corresponding IR is compressed and deployable with [OpenVINO Runtime](https://docs.openvino.ai/latest/openvino_docs_OV_UG_OV_Runtime_User_Guide.html). To quantify inference performance improvement, both ONNX and IR can be profiled using [Benchmark Tool](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html). -#### Getting Started +## Getting Started Please refer [optimum-intel](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino) for example pipelines on image classification, question answering, etc. The repository also provides examples of joint pruning, quantization and distillation, end-to-end from NNCF optimization to compressed OpenVINO IR. -#### Known Limitation +## Known Limitation 1. Movement sparsification only supports `torch.nn.Linear` layers. 2. Automatic structured masking feature supports **BERT, DistilBERT, RoBERTa, MobileBERT, Wav2Vec2, Swin, ViT, CLIPVisual** architectures defined by [Hugging Face's transformers](https://huggingface.co/docs/transformers/index). Other similar architectures may work, but support is not guaranteed. -#### Detailed description of Movement Sparsity configuration +## Detailed description of Movement Sparsity configuration - `algorithm`: The algorithm name is "movement_sparsity". - `warmup_start_epoch` & `warmup_end_epoch`: The algorithm will conduct model weight sparsification gradually from epoch >= `warmup_start_epoch` to epoch < `warmup_end_epoch`, with epoch is zero-indexed. This span is known as sparsification warm-up (stage 1). @@ -68,7 +68,7 @@ Please refer [optimum-intel](https://github.com/huggingface/optimum-intel/tree/m - `ignored_scopes`: A string or a list of strings representing the layers to be ignored by Movement Sparsity algorithm. -#### Extra configuration in `params` section +## Extra configuration in `params` section Following arguments have been defaulted to work well out of the box. However, you can specify them for a more controlled sparsification strategy. @@ -76,7 +76,7 @@ Following arguments have been defaulted to work well out of the box. However, yo - `power`: Optional. The importance threshold and regularization factor follow a concave polynomial warm-up schedule where its decay factor is parameterized by `power`. Default is 3. - `steps_per_epoch`: Optional. Number of steps per epoch is needed for threshold and regularization factor scheduling. It varies by dataset size and training hyperparameters. By default, this can be automatically derived during the first epoch without any side effect, as long as `warmup_start_epoch` >= 1. Specification of `steps_per_epoch` is only required when warm-up sparsification is intended to start at the first epoch. -#### References +## References 1. Victor Sanh, Thomas Wolf, and Alexander M. Rush. 2020. [Movement Pruning: Adaptive Sparsity by Fine-Tuning]((https://arxiv.org/pdf/2005.07683.pdf)). In Advances in Neural Information Processing Systems, 33, pp. 20378-20389. 2. François Lagunas, Ella Charlaix, Victor Sanh, and Alexander M. Rush. 2021. [Block Pruning For Faster Transformers]((https://arxiv.org/pdf/2109.04838.pdf)). In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 10619–10629. diff --git a/tests/onnx/README.md b/tests/onnx/README.md index 274fa2b179c..b094b810571 100644 --- a/tests/onnx/README.md +++ b/tests/onnx/README.md @@ -7,9 +7,9 @@ We provide two types of tests. This is a test that the CI server runs for every PR. It consists of unit tests of ONNX features of NNCF. To run the pre-commit test, please execute the following command. ```bash - $ pytest tests/onnx --junitxml nncf-tests.xml + pytest tests/onnx --junitxml nncf-tests.xml # (alias) - $ make test-onnx + make test-onnx ``` 2. E2E test (pytest markers: `e2e_ptq` and `e2e_eval_reference_model`) @@ -17,7 +17,7 @@ We provide two types of tests. This is a test to validate ONNX PTQ API functionality for the models in ONNX Model ZOO. It compares the quantized model accuracy with the references. To run the E2E test, please execute the following command. ```bash - $ pytest tests/onnx -m e2e_ptq --model-dir (model_dir) --data-dir (data_dir) --output-dir (output_dir) --ckpt-dir (ckpt_dir) --anno-dir (anno_dir) --eval-size (eval_size) --ptq-size (ptq_size) + pytest tests/onnx -m e2e_ptq --model-dir (model_dir) --data-dir (data_dir) --output-dir (output_dir) --ckpt-dir (ckpt_dir) --anno-dir (anno_dir) --eval-size (eval_size) --ptq-size (ptq_size) ``` You should give three arguments to run this test. @@ -30,13 +30,11 @@ We provide two types of tests. 6. (Optional) `--anno-dir`: Directory path for dataset annotations. Please refer to [OpenVINO accuracy checker](https://github.com/openvinotoolkit/open_model_zoo/tree/master/tools/accuracy_checker). 7. (Optional) `--eval-size`: The number of samples for evaluation. 8. (Optional) `--ptq-size`: The number of samples for calibrating quantization parameters. - 9. (Optional) `--enable-ov-ep`: If the parameter is set then the accuracy validation of the quantized models - will be enabled for OpenVINOExecutionProvider. - 10. (Optional) `--disable-cpu-ep`: If the parameter is set then the accuracy validation of the quantized models - will be disabled for CPUExecutionProvider. + 9. (Optional) `--enable-ov-ep`: If the parameter is set then the accuracy validation of the quantized models will be enabled for OpenVINOExecutionProvider. + 10. (Optional) `--disable-cpu-ep`: If the parameter is set then the accuracy validation of the quantized models will be disabled for CPUExecutionProvider. If you want to test the reference (not quantized) model accuracy - try the following command. ```bash - $ pytest tests/onnx -m e2e_eval_reference_model --model-dir (model_dir) --data-dir (data_dir) --output-dir (output_dir) --ckpt-dir (ckpt_dir) --anno-dir (anno_dir) --eval-size (eval_size) --ptq-size (ptq_size) + pytest tests/onnx -m e2e_eval_reference_model \--model-dir (model_dir) --data-dir (data_dir) --output-dir (output_dir) --ckpt-dir (ckpt_dir) --anno-dir (anno_dir) --eval-size (eval_size) --ptq-size (ptq_size) ``` diff --git a/tests/onnx/benchmarking/README.md b/tests/onnx/benchmarking/README.md index e83c61f5cf4..014caa8e5c0 100644 --- a/tests/onnx/benchmarking/README.md +++ b/tests/onnx/benchmarking/README.md @@ -1,4 +1,4 @@ -## Benchmark for ONNX Model Zoo +# Benchmark for ONNX Model Zoo ## Installation @@ -9,7 +9,7 @@ NNCF [here](https://github.com/openvinotoolkit/nncf#user-content-installation). To work with the example you should install the corresponding Python package dependencies: -``` +```bash pip install -r requirements.txt ``` @@ -20,7 +20,7 @@ uses [OpenVINO™ Accuracy Checker](https://github.com/openvinotoolkit/open_mode tool to preprocess data for quantization parameters calibration and for final accuracy validation. The benchmarking supports the following models: -* Classification +- Classification 1. [bvlcalexnet-12](https://github.com/onnx/models/blob/main/vision/classification/alexnet/model/bvlcalexnet-12.onnx) 2. [caffenet-12](https://github.com/onnx/models/blob/main/vision/classification/caffenet/model/caffenet-12.onnx) @@ -37,7 +37,7 @@ The benchmarking supports the following models: 13. [vgg16-12](https://github.com/onnx/models/blob/main/vision/classification/vgg/model/vgg16-12.onnx) 14. [zfnet512-12](https://github.com/onnx/models/blob/main/vision/classification/zfnet-512/model/zfnet512-12.onnx) -* Object detection and segmentation models +- Object detection and segmentation models 1. [FasterRCNN-12](https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/faster-rcnn/model/FasterRCNN-12.onnx) 2. [MaskRCNN-12](https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/mask-rcnn/model/MaskRCNN-12.onnx) @@ -60,7 +60,7 @@ and [object_detection_segmentation](./object_detection_segmentation/onnx_models_ ### 1. Prepare dataset -* Classification models +- Classification models Because we use [OpenVINO™ Accuracy Checker](https://github.com/openvinotoolkit/open_model_zoo/tree/master/tools/accuracy_checker) @@ -68,7 +68,7 @@ tool, you should prepare ILSVRC2012 validation dataset by following the [dataset preparation guide](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/data/datasets.md#imagenet) . After preparation, your dataset directory will be: -``` +```text DATASET_DIR/ +-- ILSVRC2012_img_val/ | +-- ILSVRC2012_val_00000001.JPEG @@ -78,7 +78,7 @@ DATASET_DIR/ +-- val.txt ``` -* Object detection and segmentation models +- Object detection and segmentation models We use [COCO](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/data/datasets.md#common-objects-in-context-coco) @@ -86,9 +86,9 @@ use [COCO](https://github.com/openvinotoolkit/open_model_zoo/blob/2022.1.0/data/ and [CityScapes](https://github.com/openvinotoolkit/open_model_zoo/blob/cf9003a95ddb742aabea341aa1573c3fa25ebbe1/data/dataset_definitions.yml#L1300-L1307) datasets. Please follow the link to prepare datasets. After preparation, your dataset directory will be: -``` +```text DATASET_DIR/ -+-- annotations/ (COCO annotatios) ++-- annotations/ (COCO annotations) | +-- instances_val2017.json | +-- ... +-- val2017/ (COCO images) @@ -114,26 +114,26 @@ You can run the benchmarking for particular model with the following command: ### Results -1. Classification models +#### 1. Classification models | Model Name | Dataset | FP32 Accuracy (%) | INT8 accuracy (%) | Accuracy Drop (%) | |-------------------------|----------|-------------------|-------------------|-------------------| -| bvlcalexnet-12 | ImageNet | 52.02 | 51.96 | 0.06 | -| caffenet-12 | ImageNet | 54.26 | 54.22 | 0.04 | -| densenet-12 | ImageNet | 60.96 | 60.16 | 0.8 | -| efficientnet-lite4-11 | ImageNet | 77.58 | 77.43 | 0.15 | -| googlenet-12 | ImageNet | 66.67 | 66.36 | 0.31 | -| inception-v1-12 | ImageNet | 65.21 | 64.87 | 0.34 | -| mobilenetv2-12 | ImageNet | 71.87 | 71.38 | 0.49 | -| resnet50-v1-12 | ImageNet | 74.11 | 73.92 | 0.19 | -| resnet50-v2-7 | ImageNet | 74.84 | 74.63 | 0.21 | -| shufflenet-9 | ImageNet | 47.43 | 47.25 | 0.18 | -| shufflenet-v2-12 | ImageNet | 69.36 | 68.93 | 0.43 | -| squeezenet1.0-12 | ImageNet | 54.84 | 54.3 | 0.54 | -| vgg16-12 | ImageNet | 72.02 | 72.02 | 0.0 | -| zfnet512-12 | ImageNet | 58.57 | 58.53 | 0.04 | - -2. Object detection and segmentation models +| bvlcalexnet-12 | ImageNet | 52.02 | 51.96 | 0.06 | +| caffenet-12 | ImageNet | 54.26 | 54.22 | 0.04 | +| densenet-12 | ImageNet | 60.96 | 60.16 | 0.8 | +| efficientnet-lite4-11 | ImageNet | 77.58 | 77.43 | 0.15 | +| googlenet-12 | ImageNet | 66.67 | 66.36 | 0.31 | +| inception-v1-12 | ImageNet | 65.21 | 64.87 | 0.34 | +| mobilenetv2-12 | ImageNet | 71.87 | 71.38 | 0.49 | +| resnet50-v1-12 | ImageNet | 74.11 | 73.92 | 0.19 | +| resnet50-v2-7 | ImageNet | 74.84 | 74.63 | 0.21 | +| shufflenet-9 | ImageNet | 47.43 | 47.25 | 0.18 | +| shufflenet-v2-12 | ImageNet | 69.36 | 68.93 | 0.43 | +| squeezenet1.0-12 | ImageNet | 54.84 | 54.3 | 0.54 | +| vgg16-12 | ImageNet | 72.02 | 72.02 | 0.0 | +| zfnet512-12 | ImageNet | 58.57 | 58.53 | 0.04 | + +#### 2. Object detection and segmentation models | Model Name | Dataset | FP32 mAP (%) | INT8 mAP (%) | mAP diff. (%) | |----------------------|-----------|--------------|---------------|---------------| diff --git a/tests/openvino/tools/README.md b/tests/openvino/tools/README.md index dccb6c4149e..c0178738e5b 100644 --- a/tests/openvino/tools/README.md +++ b/tests/openvino/tools/README.md @@ -1,9 +1,12 @@ -## Calibration tool for testing OpenVINO backend using POT config -### How to run +# Calibration tool for testing OpenVINO backend using POT config + +## How to run + The `calibrate.py` supports `pot` and `native` implementation of the OpenVINO backend. The implementation should be specified using `--impl` command line argument. -``` + +```bash python calibrate.py \ ---config \ ---output-dir \ ---impl pot -``` \ No newline at end of file + --config \ + --output-dir \ + --impl pot +``` diff --git a/tests/post_training/README.md b/tests/post_training/README.md index abaa6f0d16d..f42fc9271d8 100644 --- a/tests/post_training/README.md +++ b/tests/post_training/README.md @@ -1,5 +1,7 @@ # Post-training Quantization Conformance Suite + This is the test suite that takes PyTorch Timm models and runs post-training quantization on ImageNet dataset for the following three representations: + - PyTorch - ONNX - OpenVINO @@ -7,17 +9,19 @@ This is the test suite that takes PyTorch Timm models and runs post-training qua The outcome of each quantization step is accuracy and performance with OpenVINO. The source representation is converted to OpenVINO IR at this step. ## Installation -``` + +```bash pip install -r requirements.txt ``` ## Data preparation -### Imagenet +## Imagenet /imagenet/val - name of path Since Torchvision `ImageFolder` class is used to work with data the ImageNet validation dataset should be structured accordingly. Below is an example of the `val` folder: -``` + +```text n01440764 n01695060 n01843383 @@ -25,11 +29,11 @@ n01843383 ``` ## Usage + Once the environment is installed use the following command to run the test: -``` + +```bash NUM_VAL_THREADS=8 pytest --data= --output=./tmp tests/post_training/test_quantize_conformance.py ``` `NUM_VAL_THREADS` environment variable controls the number of parallel streams when validating the model. - - diff --git a/tests/tensorflow/README.md b/tests/tensorflow/README.md index b713e20e677..7f5c13bc942 100644 --- a/tests/tensorflow/README.md +++ b/tests/tensorflow/README.md @@ -1,6 +1,7 @@ -# Tesing NNCF in Tensorflow +# Testing NNCF in Tensorflow ## Introduction + In this folder, there are test files available to test if the nncf module is installed and works properly in your local or server environment. It will test NNCF module with mock datasets(`cifar10` for classification, or `coco2017` for detection & segmentation) and mock models. Before testing make sure that symlinks from `tests/tensorflow/data` are correct. They may be corrupted if the repo was downloaded to Windows machine via git without `core.symlinks` parameter enabled. @@ -10,26 +11,31 @@ Before testing make sure that symlinks from `tests/tensorflow/data` are correct. --- ## pre-commit test + A generic way to run TF pre-commit tests is via `make`: -``` + +```bash make install-tensorflow-test make test-tensorflow ``` Another way is to run `pytest` explicitly: -``` + +```bash pytest tests/common tests/tensorflow \ - --junitxml nncf-tests.xml + --junitxml nncf-tests.xml ``` + The tests results will be saved in `nncf-tests.xml`. ## nightly-test - Below is a description of the parameters to be used when building. -``` + +```text --ignore-unknown-dependency - ignore dependencies whose outcome is not known ---data=DATA-DIR Path to test datasets + ignore dependencies whose outcome is not known +--data=DATA-DIR Path to test datasets --sota-checkpoints-dir=SOTA_CHECKPOINTS_DIR Path to checkpoints directory for sota accuracy test --sota-data-dir=SOTA_DATA_DIR @@ -46,15 +52,17 @@ The tests results will be saved in `nncf-tests.xml`. ``` ### test_sanity_sample.py + In this file, you will **test the basic training and evalutation loop in NNCF**. The `generate_config_params` function will generate some test configs that will be tested, and it will be saved into `CONFIG_PARAMS`. One example in `CONFIG_PARAMS` is like: `('classification', '{nncf-dir}/tests/tensorflow/data/configs/resnet50_cifar10_magnitude_sparsity_int8.json', 'cifar10', 'tfrecord')`. The functions `test_model_eval`, `test_model_train`, `test_trained_model_eval`, or other similar functions are the key functions in this file. It receives parameters from config which is generated as sample, and the variable `main` in this function will get main function which is defined in each task(e.g. for classification: `examples/tensorflow/classification/main.py`). Each function will test the model from checkpoint, or train the model with 1~2 epochs, or test the onnx exporting of the tf model. - ### test_weekly.py + In this file, you will **optimize and train the pre-trained models in `GLOBAL_CONFIG` with each dataset, and test the trained model's metrics within the `tolerance` value and `expected_accuracy`**. The `tolerance` term is the term on how much error to allow for relative accuracy, with the default value of 0.5. For example, if the expected accuracy is 75 and the tolerance value is 0.5, then an accuracy between 74.5 and 75.5 is allowed for test. You should give `--run-weekly-tests` parameter to run the whole process. It will take a long time because it will train the certain models. Example of the tfds dataset structure is like below: -``` + +```text tfds ├── cifar10 │ └── cifar10 @@ -73,28 +81,30 @@ tfds And example of the command of the weekly test is like below: -``` +```bash pytest --junitxml nncf-tests.xml tests/tensorflow/test_weekly.py -s \ ---run-weekly-tests \ ---data {PATH_TO_TFDS_OR_TFRECORDS_DATA_PATH} \ ---models-dir {PATH_TO_PRETRAINED_MODELS_CKPT_PATH} \ ---metrics-dump-path ./weekly_test_dump + --run-weekly-tests \ + --data {PATH_TO_TFDS_OR_TFRECORDS_DATA_PATH} \ + --models-dir {PATH_TO_PRETRAINED_MODELS_CKPT_PATH} \ + --metrics-dump-path ./weekly_test_dump ``` - ### test_sota_checkpoints.py + In this file, you can **test whether the trained models from weekly test match the expected performance**. You can see the configurations are written in `sota_checkpoints_eval.json`, which contains the tasks / datasets / topologies. In topologies, it contains model name as a key and various datas such as config file path, ckpt path, target performance based on metric_type, compression method or etc. OV test will extract the `IR` or `frozen graph` from each model and test the extraced graph's accuracy. You can run the test from OV extracted model or eval from tensorflow model as follow: -``` + +```bash pytest test_sota_checkpoints.py -s \ --m oveval \ ---sota-checkpoints-dir={SOTA_CKPT_DIR} \ ---run-openvino-eval \ ---ov-data-dir={OV_DATA_DIR} ---metrics-dump-path ./ov_test_dump -``` + -m oveval \ + --sota-checkpoints-dir={SOTA_CKPT_DIR} \ + --run-openvino-eval \ + --ov-data-dir={OV_DATA_DIR} \ + --metrics-dump-path ./ov_test_dump ``` + +```bash pytest test_sota_checkpoints.py -s \ ---sota-checkpoints-dir={SOTA_CKPT_DIR}, ---sota-data-dir={SOTA_DATA_DIR} ---metrics-dump-path ./eval_test_dump -``` \ No newline at end of file + --sota-checkpoints-dir={SOTA_CKPT_DIR} \ + --sota-data-dir={SOTA_DATA_DIR} \ + --metrics-dump-path ./eval_test_dump \ +``` diff --git a/third_party_integration/huggingface_transformers/README.md b/third_party_integration/huggingface_transformers/README.md index f7dfcf691e4..80ce12b370b 100644 --- a/third_party_integration/huggingface_transformers/README.md +++ b/third_party_integration/huggingface_transformers/README.md @@ -1,9 +1,11 @@ # Integrating NNCF into Transformers + https://github.com/huggingface/transformers -This folder contains a git patch to enable NNCF-based quantization for XNLI, SQuAD and GLUE training pipelines of the huggingface transformers repository. +This folder contains a git patch to enable NNCF-based quantization for XNLI, SQuAD and GLUE training pipelines of the huggingface transformers repository. Instructions: + 1. Apply the `0001-Modifications-for-NNCF-usage.patch` file to the huggingface transformers repository checked out at commit id: `bd469c40659ce76c81f69c7726759d249b4aef49` 2. Install the `transformers` library and the example scripts from the patched repository as described in the documentation for the huggingface transformers repository. @@ -12,13 +14,11 @@ Instructions: The NNCF configs to be used in this way are also provided in the same patch on a per-model, per-compression algorithm basis. Distributed multiprocessing is also supported, simply use the corresponding version of the command line in the huggingface transformers repository with the same additional `--nncf_config` parameter. - - 4. While running with the `--nncf_config` option, the training scripts will output NNCF-wrapped model checkpoints instead of the regular ones. You may evaluate these checkpoints using the same command lines for training above, but with the`--do_train` key omitted. In order to export these checkpoints into ONNX format, further add `--to_onnx ` to your evaluation command line parameters. See exact command lines for each case in the model notes below. -Note that in all cases the training hyperparameters might have to be adjusted to accomodate the hardware you have available. +Note that in all cases the training hyperparameters might have to be adjusted to accommodate the hardware you have available. -## Current best results: +## Current best results All models use as their baselines the checkpoints obtained with the scripts and command line parameters from the corresponding sections in the original repository documentation. While fine-tuning the quantized model, the hyperparameters were left unchanged, i.e. the difference in the training script invocation was limited to adding `--nncf_config` option and specifying the pre-trained baseline model as the starting point for quantization fine-tuning. For RoBERTa-MNLI, no baseline model finetuning was necessary since the `roberta-large-mnli` model pretrained on MNLI was already available for download. @@ -38,7 +38,6 @@ _INT8 model (symmetric weights, asymmetric activations quantization)_ - 77.22% a `python examples/pytorch/text-classification/run_xnli.py --model_name_or_path bert_xnli_int8 --language zh --train_language zh --do_eval --per_gpu_eval_batch_size 1 --max_seq_length 128 --output_dir bert_xnli_int8 --nncf_config nncf_bert_config_xnli.json --to_onnx bert_xnli_int8.onnx` - ### BERT-SQuAD v1.1 _Full-precision FP32 baseline model_ - bert-large-uncased-whole-word-masking model, trained on SQuAD v1.1 - 93.21% F1, 87.2% EM on the dev set, @@ -59,7 +58,6 @@ _INT8 model (symmetric quantization) + Knowledge Distillation_ - 92.89% F1, 86.6 `python examples/pytorch/question-answering/run_qa.py --model_name_or_path bert_squad_int8 --do_eval --dataset_name squad --max_seq_length 384 --doc_stride 128 --output_dir bert_squad_int8 --per_gpu_eval_batch_size=1 --nncf_config nncf_bert_config_squad.json --to_onnx bert_squad_int8.onnx` - ### BERT-CoNLL2003 _Full-precision FP32 baseline model_ - bert-base-cased model, trained on CoNLL2003 - 99.17% acc, 95.03% F1 @@ -70,12 +68,10 @@ _INT8 model (symmetric quantization)_ - 99.18% acc, 95.31% F1 `python examples/pytorch/token-classification/run_ner.py --model_name_or_path *path_to_fp32_finetuned_model* --dataset_name conll2003 --output_dir bert_base_cased_conll_int8 --do_train --do_eval --save_strategy epoch --evaluation_strategy epoch --nncf_config nncf_bert_config_conll.json` - **Fine-tuned INT8 model evaluation and ONNX export command line:** `python examples/pytorch/token-classification/run_ner.py --model_name_or_path bert_base_cased_conll_int8 --dataset_name conll2003 --output_dir bert_base_cased_conll_int8 --do_eval --nncf_config nncf_bert_config_squad.json --to_onnx bert_base_cased_conll_int8.onnx` - ### BERT-MRPC _Full-precision FP32 baseline model_ - bert-base-cased-finetuned-mrpc, 84.56% acc @@ -100,12 +96,10 @@ _INT8 model (asymmetrically quantized)_ - 89.25% accuracy (matched), 88.9% accur `python examples/pytorch/text-classification/run_glue.py --model_name_or_path roberta-large-mnli --task_name mnli --do_train --do_eval --per_gpu_train_batch_size 24 --per_gpu_eval_batch_size 1 --learning_rate 2e-5 --num_train_epochs 3.0 --max_seq_length 128 --output_dir roberta_mnli_int8 --save_steps 400 --nncf_config nncf_roberta_config_mnli.json` - **Fine-tuned INT8 model evaluation and ONNX export command line:** `python examples/pytorch/text-classification/run_glue.py --model_name_or_path roberta_mnli_int8 --task_name mnli --do_eval --learning_rate 2e-5 --num_train_epochs 3.0 --max_seq_length 128 --per_gpu_eval_batch_size 1 --output_dir roberta_mnli_int8 --save_steps 400 --nncf_config nncf_roberta_config_mnli.json --to_onnx roberta_mnli_int8.onnx` - ### DistilBERT-SST-2 _Full-precision FP32 baseline model_ - distilbert-base-uncased-finetuned-sst-2-english, pre-trained on SST-2 - 91.1% accuracy @@ -116,12 +110,10 @@ _INT8 model (symmetrically quantized)_ - 90.94% accuracy `python examples/pytorch/text-classification/run_glue.py --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english --task_name sst2 --do_train --do_eval --per_gpu_train_batch_size 16 --per_gpu_eval_batch_size 1 --learning_rate 5e-5 --num_train_epochs 3.0 --max_seq_length 128 --output_dir distilbert_sst2_int8 --save_steps 100000 --nncf_config nncf_distilbert_config_sst2.json` - **Fine-tuned INT8 model evaluation and ONNX export command line:** `python examples/pytorch/text-classification/run_glue.py --model_name_or_path distilbert_sst2_int8 --task_name sst2 --do_eval --per_gpu_eval_batch_size 1 --max_seq_length 128 --output_dir distilbert_sst2_int8 --save_steps 100000 --nncf_config nncf_distilbert_config_sst2.json --to_onnx distilbert_sst2_int8.onnx` - ### MobileBERT-SQuAD v1.1 _Full-precision FP32 baseline model_ - google/mobilebert-uncased, trained on SQuAD v1.1 - 89.98% F1, 82.61% EM on the dev set, @@ -142,7 +134,6 @@ _Full-precision FP32 baseline model_ - 19.73 perplexity on the test set _INT8 model (symmetric quantization)_ - 20.9 perplexity on the test set - **INT8 model quantization-aware training command line (trained on 1x Tesla V100):** `python examples/pytorch/language-modeling/run_clm.py --model_name_or_path --do_train --do_eval --dataset_name wikitext --num_train_epochs 3 --output_dir gpt2_wikitext2_int8 --per_gpu_eval_batch_size=1 --per_gpu_train_batch_size=4 --save_steps=591 --nncf_config nncf_gpt2_config_wikitext_hw_config.json` @@ -150,4 +141,3 @@ _INT8 model (symmetric quantization)_ - 20.9 perplexity on the test set **Fine-tuned INT8 model evaluation and ONNX export command line:** `python examples/pytorch/language-modeling/run_clm.py --model_name_or_path gpt2_wikitext2_int8 --do_eval --dataset_name wikitext --output_dir gpt2_wikitext2_int8 --per_gpu_eval_batch_size=1 --nncf_config nncf_gpt2_config_wikitext_hw_config.json --to_onnx gpt2_wikitext2_int8.onnx` -