Skip to content

Commit

Permalink
Enable phi3v tuning (#197)
Browse files Browse the repository at this point in the history
* enable phi3-vision quantization

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine example

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update Phi3V, enable autoround format inference

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* add multimodal model loading test

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typos

Signed-off-by: Zhang, Weiwei1 <[email protected]>

---------

Signed-off-by: Zhang, Weiwei1 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
WeiweiZhang1 and pre-commit-ci[bot] authored Aug 13, 2024
1 parent 948f228 commit f2fef13
Show file tree
Hide file tree
Showing 18 changed files with 4,021 additions and 3 deletions.
5 changes: 4 additions & 1 deletion auto_round/auto_quantizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,6 @@ def convert_model(self, model: nn.Module):
"""
from auto_round.utils import get_layer_names_in_block

layer_names = get_layer_names_in_block(model)
quantization_config = model.config.quantization_config
if hasattr(quantization_config, "backend"): # pragma: no cover
backend = quantization_config.backend
Expand All @@ -337,6 +336,9 @@ def convert_model(self, model: nn.Module):
data_type = quantization_config.data_type if hasattr(quantization_config, "data_type") \
else "int" # pragma: no cover
sym = quantization_config.sym
quant_block_list = quantization_config.quant_block_list \
if hasattr(quantization_config, "quant_block_list") else None
layer_names = get_layer_names_in_block(model, quant_block_list=quant_block_list)
extra_config = {}
if hasattr(quantization_config, "extra_config"):
extra_config = quantization_config.extra_config
Expand Down Expand Up @@ -482,3 +484,4 @@ def is_serializable(self):

transformers.quantizers.auto.AutoHfQuantizer = AutoHfQuantizer
transformers.modeling_utils.AutoHfQuantizer = AutoHfQuantizer

2 changes: 2 additions & 0 deletions auto_round/autoround.py
Original file line number Diff line number Diff line change
Expand Up @@ -1124,6 +1124,7 @@ def save_quantized(self, output_dir=None, format="auto_round", inplace=True, **k
"amp",
"nsamples",
"low_gpu_mem_usage",
"quant_block_list",
"enable_norm_bias_tuning"
]
if isinstance(self.dataset, str):
Expand Down Expand Up @@ -1544,3 +1545,4 @@ def __init__(
optimizer,
**kwargs,
)

74 changes: 74 additions & 0 deletions examples/multimodal-modeling/Phi-3-vision/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Step-by-Step
============
transformers>=4.41.0
This document presents step-by-step instructions for auto-round.
# Run Quantization on Phi-3-vision Models

In this example, we introduce an straight-forward way to execute quantization on some popular multimodal models such as Phi-3-vision.

## Download the calibration data

Our calibration process resembles the official visual instruction tuning process.

Please download the annotation of the final mixture our instruction tuning data [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json), and download the images from constituting datasets:

COCO: [train2017](http://images.cocodataset.org/zips/train2017.zip), and unzip the image folder to any directory you desire.


## 2. Run Examples
Enter into the examples folder and install lm-eval to run the evaluation
```bash
pip install -r requirements.txt
```

- **Default Settings:**
```bash
CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name microsoft/Phi-3-vision-128k-instruct --bits 4 --group_size 128
```

- **Speedup the tuning:**

disable_low_gpu_mem_usage(more gpu memory)

reduce the seqlen to 512(potential large accuracy drop)

or combine them

- **Enable quantized lm-head:**

Currently only support in Intel xpu and AutoRound format, however, we found the fake tuning could improve the accuracy is some scenarios. Disable --low_gpu_mem_usage is strongly recommended if the whole model could be loaded to the device, otherwise it will be quite slow to cache the inputs of lm-head. Another way is reducing nsamples,e.g. 128, to alleviate the issue.
```bash
CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name microsoft/Phi-3-vision-128k-instruct --bits 4 --group_size 128 --quant_lm_head
```

- **Utilizing the AdamW Optimizer:**

Include the flag `--adam`. Note that AdamW is less effective than sign gradient descent in many scenarios we tested.

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
```


## 3. Environment

PyTorch 1.8 or higher version is needed


## Reference
If you find SignRound useful for your research, please cite our paper:
```bash
@article{cheng2023optimize,
title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},
author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao},
journal={arXiv preprint arXiv:2309.05516},
year={2023}
}
```






Empty file.
Loading

0 comments on commit f2fef13

Please sign in to comment.