Skip to content

Latest commit

 

History

History
27 lines (17 loc) · 3.15 KB

welcome.md

File metadata and controls

27 lines (17 loc) · 3.15 KB

Introduction to Intel® Neural Compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool) is an open-source Python library running on Intel CPUs and GPUs, which delivers unified interfaces across multiple deep learning frameworks for popular network compression technologies, such as quantization, pruning, knowledge distillation. This tool supports automatic accuracy-driven tuning strategies to help user quickly find out the best quantized model. It also implements different weight pruning algorithms to generate pruned model with predefined sparsity goal and supports knowledge distillation to distill the knowledge from the teacher model to the student model.

Note: GPU support is under development.

Infrastructure Workflow
Infrastructure Workflow

Supported deep learning frameworks are:

Note: Intel Optimized TensorFlow 2.5.0 requires setting environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running quantization process or deploying the quantized model.

Note: From the official TensorFlow 2.6.0, oneDNN support has been upstreamed. Download the official TensorFlow 2.6.0 binary for the CPU device and set the environment variable TF_ENABLE_ONEDNN_OPTS=1 before running the quantization process or deploying the quantized model.

Get started with installation, tutorials, examples, and more!

View the Intel® Neural Compressor repo at: https://github.com/intel/neural-compressor.