Enable phi3v tuning (#197)

* enable phi3-vision quantization Signed-off-by: Zhang, Weiwei1 <[email protected]> * refine example Signed-off-by: Zhang, Weiwei1 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Phi3V, enable autoround format inference Signed-off-by: Zhang, Weiwei1 <[email protected]> * add multimodal model loading test Signed-off-by: Zhang, Weiwei1 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typos Signed-off-by: Zhang, Weiwei1 <[email protected]> --------- Signed-off-by: Zhang, Weiwei1 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
intel · Aug 13, 2024 · f2fef13 · f2fef13
1 parent 948f228
commit f2fef13
Show file tree

Hide file tree

Showing 18 changed files with 4,021 additions and 3 deletions.
diff --git a/auto_round/auto_quantizer.py b/auto_round/auto_quantizer.py
@@ -325,7 +325,6 @@ def convert_model(self, model: nn.Module):
         """
         from auto_round.utils import get_layer_names_in_block
 
-        layer_names = get_layer_names_in_block(model)
         quantization_config = model.config.quantization_config
         if hasattr(quantization_config, "backend"):  # pragma: no cover
             backend = quantization_config.backend
@@ -337,6 +336,9 @@ def convert_model(self, model: nn.Module):
         data_type = quantization_config.data_type if hasattr(quantization_config, "data_type") \
             else "int"  # pragma: no cover
         sym = quantization_config.sym
+        quant_block_list = quantization_config.quant_block_list \
+                           if hasattr(quantization_config, "quant_block_list") else None
+        layer_names = get_layer_names_in_block(model, quant_block_list=quant_block_list)
         extra_config = {}
         if hasattr(quantization_config, "extra_config"):
             extra_config = quantization_config.extra_config
@@ -482,3 +484,4 @@ def is_serializable(self):
 
 transformers.quantizers.auto.AutoHfQuantizer = AutoHfQuantizer
 transformers.modeling_utils.AutoHfQuantizer = AutoHfQuantizer
+
diff --git a/auto_round/autoround.py b/auto_round/autoround.py
@@ -1124,6 +1124,7 @@ def save_quantized(self, output_dir=None, format="auto_round", inplace=True, **k
             "amp",
             "nsamples",
             "low_gpu_mem_usage",
+            "quant_block_list",
             "enable_norm_bias_tuning"
         ]
         if isinstance(self.dataset, str):
@@ -1544,3 +1545,4 @@ def __init__(
             optimizer,
             **kwargs,
         )
+
diff --git a/examples/multimodal-modeling/Phi-3-vision/README.md b/examples/multimodal-modeling/Phi-3-vision/README.md
@@ -0,0 +1,74 @@
+Step-by-Step
+============
+transformers>=4.41.0
+This document presents step-by-step instructions for auto-round.
+# Run Quantization on Phi-3-vision Models
+
+In this example, we introduce an straight-forward way to execute quantization on some popular multimodal models such as Phi-3-vision. 
+
+## Download the calibration data
+
+Our calibration process resembles the official visual instruction tuning process.
+
+Please download the annotation of the final mixture our instruction tuning data [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json), and download the images from constituting datasets:
+
+COCO: [train2017](http://images.cocodataset.org/zips/train2017.zip), and unzip the image folder to any directory you desire.
+
+
+## 2. Run Examples
+Enter into the examples folder and install lm-eval to run the evaluation
+```bash
+pip install -r requirements.txt
+```
+
+- **Default Settings:**
+```bash
+CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name microsoft/Phi-3-vision-128k-instruct  --bits 4 --group_size 128
+```
+
+- **Speedup the tuning:**
+
+disable_low_gpu_mem_usage(more gpu memory)
+
+reduce the seqlen to 512(potential large accuracy drop)
+
+or combine them
+
+- **Enable quantized lm-head:**
+
+Currently only support in Intel xpu and AutoRound format, however, we found the fake tuning could improve the accuracy is some scenarios. Disable --low_gpu_mem_usage is strongly recommended if the whole model could be loaded to the device, otherwise it will be quite slow to cache the inputs of lm-head. Another way is reducing nsamples,e.g. 128, to alleviate the issue.
+```bash
+CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name microsoft/Phi-3-vision-128k-instruct  --bits 4 --group_size 128 --quant_lm_head
+```
+
+- **Utilizing the AdamW Optimizer:**
+
+Include the flag `--adam`. Note that AdamW is less effective than sign gradient descent in many scenarios we tested.
+
+- **Running on Intel Gaudi2**
+```bash
+bash run_autoround_on_gaudi.sh
+```
+
+
+## 3. Environment
+
+PyTorch 1.8 or higher version is needed
+
+
+## Reference
+If you find SignRound useful for your research, please cite our paper:
+```bash
+@article{cheng2023optimize,
+  title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},
+  author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao},
+  journal={arXiv preprint arXiv:2309.05516},
+  year={2023}
+}
+```
+
+
+
+
+
+
diff --git a/examples/multimodal-modeling/Phi-3-vision/eval_042/__init__.py b/examples/multimodal-modeling/Phi-3-vision/eval_042/__init__.py