Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
prompts	prompts
scripts	scripts
.gitignore	.gitignore
README.md	README.md

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

[Website][Paper][Nunchaku Inference System]

Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they require significantly more memory and suffer from higher latency, posing substantial challenges for deployment. In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. At such an aggressive level, both weights and activations are highly sensitive to quantization, where conventional post-training quantization methods for large language models like smoothing become insufficient. To overcome this limitation, we propose SVDQuant, a new 4-bit quantization paradigm. Different from smoothing which redistributes outliers between weights and activations, our approach absorbs these outliers using a low-rank branch. We first shift the outliers from activations into the weights, then employ a high-precision low-rank branch to take in the outliers in the weights with SVD. This process eases the quantization on both sides. However, naively running the low-rank branch independently incurs significant overhead due to extra data movement of activations, negating the quantization speedup. To address this, we co-design an inference engine Nunchaku that fuses the kernels in the low-rank branch into thosein the low-bit branch to cut off redundant memory access. It can also seamlessly support off-the-shelf low-rank adapters (LoRAs) without the requantization. Extensive experiments on SDXL, PixArt-Sigma, and FLUX.1 validate the effectiveness of SVDQuant in preserving image quality. We reduce the memory usage for the 12B FLUX.1 models by 3.6×, achieving 3.5× speedup over the 4-bit weight-only quantized baseline on a 16GB RTX-4090 GPU, paving the way for more interactive applications on PCs.

Usage

We use Flux.1-Schnell as an example.

Step 1: Evaluation Baselines Preparation

In order to evaluate the similarity metrics, we have to first prepare the reference images generated by unquantized models by running the following command:

python -m deepcompressor.app.diffusion.ptq configs/model/flux.1-schnell.yaml --output-dirname reference

In this command,

configs/model/flux.1-schnell.yaml specifies the model configurations including evaluation setups.
By setting flag --output-dirname to reference, the output directory will be automatically redirect to the ref_root in the evaluation configuration.

Step 2: Calibration Dataset Preparation

Before quantizing diffusion models, we randomly sample 128 prompts in COCO Captions 2024 to generate calibration dataset by running the following command:

python -m deepcompressor.app.diffusion.dataset.collect.calib \
    configs/model/flux.1-schnell.yaml configs/collect/qdiff.yaml

In this command,

configs/collect/qdiff.yaml specifies the calibration dataset configurations, including the path to the prompt yaml (i.e., --collect-prompt-path prompts/qdiff.yaml), the number of prompts to be sampled (i.e., --collect-num-samples 128), and the root directory of the calibration datasets (which should be in line with the quantization configuration).

Step 3: Model Quantization

The following command will perform INT4 SVDQuant and evaluate the quantized model on 1024 samples from MJHQ-30K:

python -m deepcompressor.app.diffusion.ptq \
    configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml \
    --eval-benchmarks MJHQ --eval-num-samples 1024

In this command,

The positional arguments are configuration files which are loaded in order. configs/svdquant/int4 contains the quantization configurations specialized in INT4 SVDQuant. Please make sure all configuration files are under a subfolder of the working directory where you run the command.
All configurations can be directly set in either YAML file or command line. Please refer to configs/__default__.yaml and python -m deepcompressor.app.diffusion.ptq -h.
The default evaluation datasets are 1024 samples from MJHQ and DCI.
If you would like to save quantized model checkpoint, please add --save-model true or --save-model /PATH/TO/CHECKPOINT/DIR in the command.

Deployment

We provide SVDQuant quantized model checkpoints in Nunchaku for your reference. Please refer to Nunchaku for further deployment on GPU system.

Evaluation Resutls

Quality Evaluation

Below is the quality and similarity evaluated with 5000 samples from MJHQ-30K dataset. IR means ImageReward. Our 4-bit results outperform other 4-bit baselines, effectively preserving the visual quality of 16-bit models.

Model	Precision	Method	FID ($\downarrow$)	IR ($\uparrow$)	LPIPS ($\downarrow$)	PSNR( $\uparrow$)
FLUX.1-dev (50 Steps)	BF16	--	20.3	0.953	--	--
	INT W8A8	Ours	20.4	0.948	0.089	27.0
	W4A16	NF4	20.6	0.910	0.272	19.5
	INT W4A4	Ours	19.86	0.932	0.254	20.1
	FP W4A4	Ours	21.0	0.933	0.247	20.2
FLUX.1-schnell (4 Steps)	BF16	--	19.2	0.938	--	--
	INT W8A8	Ours	19.2	0.966	0.120	22.9
	W4A16	NF4	18.9	0.943	0.257	18.2
	INT W4A4	Ours	18.4	0.969	0.292	17.5
	FP W4A4	Ours	19.9	0.956	0.279	17.5
	FP16	--	16.6	0.944	--	--
PixArt-Sigma (20 Steps)	INT W8A8	ViDiT-Q	15.7	0.944	0.137	22.5
	INT W8A8	Ours	16.3	0.955	0.109	23.7
	INT W4A8	ViDiT-Q	37.3	0.573	0.611	12.0
	INT W4A4	Ours	20.1	0.898	0.394	16.2
	FP W4A4	Ours	18.3	0.946	0.326	17.4

Reference

If you find deepcompressor useful or relevant to your research, please kindly cite our paper:

@article{li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang, and Lin*, Yujun and Zhang, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  journal={arXiv preprint arXiv:2411.05007},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diffusion

diffusion

README.md

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Usage

Step 1: Evaluation Baselines Preparation

Step 2: Calibration Dataset Preparation

Step 3: Model Quantization

Deployment

Evaluation Resutls

Quality Evaluation

Reference

Files

diffusion

Directory actions

More options

Directory actions

More options

Latest commit

History

diffusion

Folders and files

parent directory

README.md

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Usage

Step 1: Evaluation Baselines Preparation

Step 2: Calibration Dataset Preparation

Step 3: Model Quantization

Deployment

Evaluation Resutls

Quality Evaluation

Reference