Update NNCF WC documentation

l-bat · Oct 17, 2024 · 3dd38d3 · 3dd38d3
1 parent 1693821
commit 3dd38d3
Showing 1 changed file with 34 additions and 4 deletions.
diff --git a/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst b/docs/articles_en/openvino-workflow/model-optimization-guide/weight-compression.rst
@@ -161,15 +161,15 @@ trade-offs after optimization:
   `Larger Group Size`: Results in faster inference and a smaller model, but might
   compromise accuracy.
 
-* ``ratio`` controls the ratio between INT4 and INT8_ASYM compressed layers in the model.
+* ``ratio`` controls the ratio between INT4 and layers compressed to ``backup_mode`` in the model.
   Ratio is a decimal between 0 and 1. For example, 0.8 means that 80% of layers will be
-  compressed to INT4, while the rest will be compressed to INT8_ASYM precision. The default
+  compressed to INT4, while the rest will be compressed to ``backup_mode`` precision. The default
   value for ratio is 1.
 
   `Higher Ratio (more INT4)`: Reduces the model size and increase inference speed but
   might lead to higher accuracy degradation.
 
-  `Lower Ratio (more INT8_ASYM)`: Maintains better accuracy but results in a larger model size
+  `Lower Ratio (more layers in backup_mode)`: Maintains better accuracy but results in a larger model size
   and potentially slower inference.
 
   In this example, 90% of the model's layers are quantized to INT4 asymmetrically with
@@ -196,8 +196,11 @@ trade-offs after optimization:
   4 bits. The method can sometimes result in reduced accuracy when used with
   Dynamic Quantization of activations. Requires dataset.
 
+* ``gptq`` - boolean parameter that enables the GPTQ method for more accurate weight
+  quantization. Requires dataset.
+
 * ``dataset`` - calibration dataset for data-aware weight compression. It is required
-  for some compression options, for example, ``scale_estimation`` or ``awq``. Some types
+  for some compression options, for example, ``scale_estimation``, ``gptq`` or ``awq``. Some types
   of ``sensitivity_metric`` can use data for precision selection.
 
 * ``sensitivity_metric`` - controls the metric to estimate the sensitivity of compressing
@@ -226,6 +229,33 @@ trade-offs after optimization:
 * ``all_layers`` - boolean parameter that enables INT4 weight quantization of all
   Fully-Connected and Embedding layers, including the first and last layers in the model.
 
+* ``lora_correction`` - boolean parameter that enables the LoRA Correction Algorithm
+  to further improve the accuracy of INT4 compressed models on top of other
+  algorithms - AWQ and Scale Estimation.
+
+* ``backup_mode`` - defines a backup precision for mixed-precision weight compression.
+  There are three modes: INT8_ASYM, INT8_SYM, and NONE, which retains
+  the original floating-point precision of the model weights (``INT8_ASYM`` is default value).
+
+It is possible to generate a synthetic dataset using the `nncf.data.generate_text_data` method for
+data-aware weight compression. The method takes a language model (e.g. from `optimum.intel.openvino`)
+and a tokenizer (e.g. from `transformers`) as input and returns the list of strings generated by the model.
+Note that dataset generation takes time and depends on various conditions, like the model size,
+requested dataset length or environment setup. Also, since the dataset is generated by the model output,
+it does not guarantee significant accuracy improvement after compression. This method is recommended
+only when a better dataset is not available. Refer to the
+`example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_synthetic_data>`__
+for details of the usage.
+
+  .. code-block:: python
+
+    from nncf import compress_weights, CompressWeightsMode, Dataset
+    from nncf.data import generate_text_data
+
+    # Example: Generating synthetic dataset
+    synthetic_data = generate_text_data(model, tokenizer)
+    nncf_dataset = nncf.Dataset(synthetic_data, transform_fn)
+
 For data-aware weight compression refer to the following
 `example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama>`__.