Skip to content

Latest commit

 

History

History

diffusion

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

[Website][Paper][Nunchaku Inference System]

Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they require significantly more memory and suffer from higher latency, posing substantial challenges for deployment. In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. At such an aggressive level, both weights and activations are highly sensitive to quantization, where conventional post-training quantization methods for large language models like smoothing become insufficient. To overcome this limitation, we propose SVDQuant, a new 4-bit quantization paradigm. Different from smoothing which redistributes outliers between weights and activations, our approach absorbs these outliers using a low-rank branch. We first shift the outliers from activations into the weights, then employ a high-precision low-rank branch to take in the outliers in the weights with SVD. This process eases the quantization on both sides. However, naively running the low-rank branch independently incurs significant overhead due to extra data movement of activations, negating the quantization speedup. To address this, we co-design an inference engine Nunchaku that fuses the kernels in the low-rank branch into thosein the low-bit branch to cut off redundant memory access. It can also seamlessly support off-the-shelf low-rank adapters (LoRAs) without the requantization. Extensive experiments on SDXL, PixArt-Sigma, and FLUX.1 validate the effectiveness of SVDQuant in preserving image quality. We reduce the memory usage for the 12B FLUX.1 models by 3.6×, achieving 3.5× speedup over the 4-bit weight-only quantized baseline on a 16GB RTX-4090 GPU, paving the way for more interactive applications on PCs.

Teaser SVDQuant

Usage

We use Flux.1-Schnell as an example.

Step 1: Evaluation Baselines Preparation

In order to evaluate the similarity metrics, we have to first prepare the reference images generated by unquantized models by running the following command:

python -m deepcompressor.app.diffusion.ptq configs/model/flux.1-schnell.yaml --output-dirname reference

In this command,

  • configs/model/flux.1-schnell.yaml specifies the model configurations including evaluation setups.
  • By setting flag --output-dirname to reference, the output directory will be automatically redirect to the ref_root in the evaluation configuration.

Step 2: Calibration Dataset Preparation

Before quantizing diffusion models, we randomly sample 128 prompts in COCO Captions 2024 to generate calibration dataset by running the following command:

python -m deepcompressor.app.diffusion.dataset.collect.calib \
    configs/model/flux.1-schnell.yaml configs/collect/qdiff.yaml

In this command,

  • configs/collect/qdiff.yaml specifies the calibration dataset configurations, including the path to the prompt yaml (i.e., --collect-prompt-path prompts/qdiff.yaml), the number of prompts to be sampled (i.e., --collect-num-samples 128), and the root directory of the calibration datasets (which should be in line with the quantization configuration).

Step 3: Model Quantization

The following command will perform INT4 SVDQuant and evaluate the quantized model on 1024 samples from MJHQ-30K:

python -m deepcompressor.app.diffusion.ptq \
    configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml \
    --eval-benchmarks MJHQ --eval-num-samples 1024

In this command,

  • The positional arguments are configuration files which are loaded in order. configs/svdquant/int4 contains the quantization configurations specialized in INT4 SVDQuant. Please make sure all configuration files are under a subfolder of the working directory where you run the command.
  • All configurations can be directly set in either YAML file or command line. Please refer to configs/__default__.yaml and python -m deepcompressor.app.diffusion.ptq -h.
  • The default evaluation datasets are 1024 samples from MJHQ and DCI.
  • If you would like to save quantized model checkpoint, please add --save-model true or --save-model /PATH/TO/CHECKPOINT/DIR in the command.

Deployment

We provide SVDQuant quantized model checkpoints in Nunchaku for your reference. Please refer to Nunchaku for further deployment on GPU system.

Evaluation Resutls

Quality Evaluation

Below is the quality and similarity evaluated with 5000 samples from MJHQ-30K dataset. IR means ImageReward. Our 4-bit results outperform other 4-bit baselines, effectively preserving the visual quality of 16-bit models.

Model Precision Method FID ($\downarrow$) IR ($\uparrow$) LPIPS ($\downarrow$) PSNR( $\uparrow$)
FLUX.1-dev (50 Steps) BF16 -- 20.3 0.953 -- --
INT W8A8 Ours 20.4 0.948 0.089 27.0
W4A16 NF4 20.6 0.910 0.272 19.5
INT W4A4 Ours 19.86 0.932 0.254 20.1
FP W4A4 Ours 21.0 0.933 0.247 20.2
FLUX.1-schnell (4 Steps) BF16 -- 19.2 0.938 -- --
INT W8A8 Ours 19.2 0.966 0.120 22.9
W4A16 NF4 18.9 0.943 0.257 18.2
INT W4A4 Ours 18.4 0.969 0.292 17.5
FP W4A4 Ours 19.9 0.956 0.279 17.5
FP16 -- 16.6 0.944 -- --
PixArt-Sigma (20 Steps) INT W8A8 ViDiT-Q 15.7 0.944 0.137 22.5
INT W8A8 Ours 16.3 0.955 0.109 23.7
INT W4A8 ViDiT-Q 37.3 0.573 0.611 12.0
INT W4A4 Ours 20.1 0.898 0.394 16.2
FP W4A4 Ours 18.3 0.946 0.326 17.4

Reference

If you find deepcompressor useful or relevant to your research, please kindly cite our paper:

@article{li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang, and Lin*, Yujun and Zhang, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  journal={arXiv preprint arXiv:2411.05007},
  year={2024}
}