Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

README.md

Find the appropriate hyperparameters to compress the TinyLLama model

This example demonstrates how to find the appropriate awq, ratio and group_size parameters to compress the weights of the TinyLLama model from the HuggingFace Transformers. OpenVINO backend supports inference of mixed-precision models with weights compressed to a 4-bit data type as a primary precision. The fastest mixed-precision mode is INT4_SYM, but it may lead to a significant accuracy degradation, especially for models of moderate size. In this example, the allowed maximum deviation from the original model is 0.2 points of the similarity metric. If the similarity of the compressed model is not satisfying, there are 3 hyper-parameters to tune: awq, group_size and ratio. Smaller group_size and ratio of 4-bit layers usually improve accuracy at the sacrifice of model size and inference latency. Generally, the accuracy of the 4-bit compressed models also can be improved by using AWQ algorithm over data-based mixed-precision algorithm. To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using WhoWhatBench library.

The example includes the following steps:

Prepare wikitext dataset.
Prepare TinyLlama/TinyLlama-1.1B-step-50K-105b text-generation model in OpenVINO representation using Optimum-Intel.
Compress weights of the model with NNCF Weight compression algorithm.
Find appropriate awq, ratio and group_size if acceptable similarity is not achieved.
Measure the similarity and footprint of the final model.

Install requirements

To run the example you should install the corresponding Python dependencies:

Create a separate Python* environment and activate it: python3 -m venv nncf_env && source nncf_env/bin/activate
Install dependencies:

pip install -U pip
pip install ../../../../
pip install -r requirements.txt

Run Example

The example is fully automated. Just run the following command in the prepared Python environment:

python main.py

To find the appropriate awq, ratio and group_size parameters for your HF model, change the model_id, dataset and transform_func. Please refer to the <YOUR ...> comments in main.py to find the exact places that should be modified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiny_llama_find_hyperparams

tiny_llama_find_hyperparams

README.md

Find the appropriate hyperparameters to compress the TinyLLama model

Install requirements

Run Example

See also

Files

tiny_llama_find_hyperparams

Directory actions

More options

Directory actions

More options

Latest commit

History

tiny_llama_find_hyperparams

Folders and files

parent directory

README.md

Find the appropriate hyperparameters to compress the TinyLLama model

Install requirements

Run Example

See also