Skip to content

Commit

Permalink
Update NNCF WC documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
l-bat committed Oct 17, 2024
1 parent 1693821 commit 3dd38d3
Showing 1 changed file with 34 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -161,15 +161,15 @@ trade-offs after optimization:
`Larger Group Size`: Results in faster inference and a smaller model, but might
compromise accuracy.

* ``ratio`` controls the ratio between INT4 and INT8_ASYM compressed layers in the model.
* ``ratio`` controls the ratio between INT4 and layers compressed to ``backup_mode`` in the model.
Ratio is a decimal between 0 and 1. For example, 0.8 means that 80% of layers will be
compressed to INT4, while the rest will be compressed to INT8_ASYM precision. The default
compressed to INT4, while the rest will be compressed to ``backup_mode`` precision. The default
value for ratio is 1.

`Higher Ratio (more INT4)`: Reduces the model size and increase inference speed but
might lead to higher accuracy degradation.

`Lower Ratio (more INT8_ASYM)`: Maintains better accuracy but results in a larger model size
`Lower Ratio (more layers in backup_mode)`: Maintains better accuracy but results in a larger model size
and potentially slower inference.

In this example, 90% of the model's layers are quantized to INT4 asymmetrically with
Expand All @@ -196,8 +196,11 @@ trade-offs after optimization:
4 bits. The method can sometimes result in reduced accuracy when used with
Dynamic Quantization of activations. Requires dataset.

* ``gptq`` - boolean parameter that enables the GPTQ method for more accurate weight
quantization. Requires dataset.

* ``dataset`` - calibration dataset for data-aware weight compression. It is required
for some compression options, for example, ``scale_estimation`` or ``awq``. Some types
for some compression options, for example, ``scale_estimation``, ``gptq`` or ``awq``. Some types
of ``sensitivity_metric`` can use data for precision selection.

* ``sensitivity_metric`` - controls the metric to estimate the sensitivity of compressing
Expand Down Expand Up @@ -226,6 +229,33 @@ trade-offs after optimization:
* ``all_layers`` - boolean parameter that enables INT4 weight quantization of all
Fully-Connected and Embedding layers, including the first and last layers in the model.

* ``lora_correction`` - boolean parameter that enables the LoRA Correction Algorithm
to further improve the accuracy of INT4 compressed models on top of other
algorithms - AWQ and Scale Estimation.

* ``backup_mode`` - defines a backup precision for mixed-precision weight compression.
There are three modes: INT8_ASYM, INT8_SYM, and NONE, which retains
the original floating-point precision of the model weights (``INT8_ASYM`` is default value).

It is possible to generate a synthetic dataset using the `nncf.data.generate_text_data` method for
data-aware weight compression. The method takes a language model (e.g. from `optimum.intel.openvino`)
and a tokenizer (e.g. from `transformers`) as input and returns the list of strings generated by the model.
Note that dataset generation takes time and depends on various conditions, like the model size,
requested dataset length or environment setup. Also, since the dataset is generated by the model output,
it does not guarantee significant accuracy improvement after compression. This method is recommended
only when a better dataset is not available. Refer to the
`example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_synthetic_data>`__
for details of the usage.

.. code-block:: python
from nncf import compress_weights, CompressWeightsMode, Dataset
from nncf.data import generate_text_data
# Example: Generating synthetic dataset
synthetic_data = generate_text_data(model, tokenizer)
nncf_dataset = nncf.Dataset(synthetic_data, transform_fn)
For data-aware weight compression refer to the following
`example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama>`__.

Expand Down

0 comments on commit 3dd38d3

Please sign in to comment.