v2.11.0
Post-training Quantization:
Features:
- (OpenVINO) Added Scale Estimation algorithm for 4-bit data-aware weights compression. The optional scale_estimation parameter was introduced to nncf.compress_weights() and can be used to minimize accuracy degradation of compressed models (note that this algorithm increases the compression time).
- (OpenVINO) Added GPTQ algorithm for 8/4-bit data-aware weights compression, supporting INT8, INT4, and NF4 data types. The optional gptq parameter was introduced to nncf.compress_weights() to enable the GPTQ algorithm.
- (OpenVINO) Added support for models with BF16 weights in the weights compression method, nncf.compress_weights().
- (PyTorch) Added support for quantization and weight compression of the custom modules.
Fixes:
- (OpenVINO) Fixed incorrect node with bias determination in Fast-/BiasCorrection and ChannelAlighnment algorithms.
- (OpenVINO, PyTorch) Fixed incorrect behaviour of nncf.compress_weights() in case of compressed model as input.
- (OpenVINO, PyTorch) Fixed SmoothQuant algorithm to work with Split ports correctly.
Improvements:
- (OpenVINO) Aligned resulting compression subgraphs for the nncf.compress_weights() in different FP precisions.
- Aligned 8-bit scheme for NPU target device with the CPU.
Examples:
- (OpenVINO, ONNX) Updated ignored scope for YOLOv8 examples utilizing a subgraphs approach.
Tutorials:
- Post-Training Optimization of Stable Video Diffusion Model
- Post-Training Optimization of YOLOv10 Model
- Post-Training Optimization of LLaVA Next Model
- Post-Training Optimization of S3D MIL-NCE Model
- Post-Training Optimization of Stable Cascade Model
Compression-aware training:
Features:
- (PyTorch) nncf.quantize method is now the recommended path for the quantization initialization for Quantization-Aware Training.
- (PyTorch) Compression modules placement in the model now can be serialized and restored with new API functions: compressed_model.nncf.get_config() and nncf.torch.load_from_config. The documentation for the saving/loading of a quantized model is available, and Resnet18 example was updated to use the new API.
Fixes:
- (PyTorch) Fixed compatibility with torch.compile.
Improvements:
- (PyTorch) Base parameters were extended for the EvolutionOptimizer (LeGR algorithm part).
- (PyTorch) Improved wrapping for parameters which are not tensors.
Examples:
- (PyTorch) Added an example for STFPM model from Anomalib.
Tutorials:
Deprecations/Removals:
- Removed extra dependencies to install backends from setup.py (like [torch] are [tf], [onnx] and [openvino]).
- Removed openvino-dev dependency.
Requirements:
- Updated PyTorch (2.3.0) and Torchvision (0.18.0) versions.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@DaniAffCH
@UsingtcNower
@anzr299
@AdiKsOnDev
@Viditagarwal7479
@truhinnm