Frequency Augmented VAE (FA-VAE)

This is the original implementation for the paper "Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder" published in CVPR 2023.

FA-VAE is a model that reconstructs images through improving alignment on the frequency spectrums between the original and reconstructed images.

To-Do

~~We will be releasing the checkpoints shortly.~~

Requirements

The packages needed are in environment.yaml for reference.

Checkpoints

Model	Link
FA-VAE on CelebA-HQ (Table 2 row 8, FCM (Res) + non pair-wise DSL)	expe_5.pt
FA-VAE on FFHQ (Table 1 row 3)	favae-ffhq.pt
FA-VAE on ImageNet (f=16) (Table 1 last row)	favae-imagenet-f16.pt
FA-VAE on ImageNet (f=4) (Table 1 row 6)	favae-imagenet-f4.pt
CAT on CelebA-HQ	cat_celeba.pt

Data Preparation

CelebA-HQ

Download the dataset:
- CelebA-HQ dataset can be downloaded from CelebA-Mask-HQ.
- The train test split is in the file list_eval_partition.txt, in CelebA where "0" is train, "1" is eval, and "2" is test.
- Download the captions from MM-CelebA-HQ dataset for training T2I generation.
Preprocess the data files in the pkl format.
```
cd datasets
python preprocess_celeba.py
```

FFHQ can be downloaded from FFHQ, ImageNet can downloaded from Kaggle.

Train FA-VAE

FA-VAE comes with different architectures for Frequency Complement Module (FCM) and different settings for the losses Spectrum Loss (SL) and Dynamic Spectrum Loss (DSL).

FA-VAE on CelebA-HQ with different settings of FCM and SL/DSL can be found in the script train_favae_celeba.sh. These settings are for the Table 2.
```
cd favae_scripts
bash train_favae_celeba.sh
```
FA-VAE on FFHQ, ImageNetcan be found in the script train_favae_other_datasets.sh
```
cd favae_scripts
bash train_favae_other_datasets.sh
```

To resume training, the arguments --resume and the path for the argument --resume_path should be provided. For instance, to resume FA-VAE codebook training on ImageNet

torchrun --nnodes=1 --nproc_per_node=2 train_vqgan_ddp.py --ds $OUTPUT --batch_size 2 --print_steps 5000 --img_steps 20000 --codebook_size 16384 --disc_start_epochs 1 --embed_dim 256 --use_lucid_quantizer --use_cosine_sim --with_fcm --ffl_weight 1.0 --use_same_conv_gauss --ffl_weight_features 0.01 --gaussian_kernel 9 --codebook_weight 1.0 --perceptual_weight 1.0 --disc_weight 0.75 --base_lr 2.0e-6 --train_file ../datasets/pkl_files/imagenet_train_wo_cap.pkl --val_file ../datasets/pkl_files/imagenet_test_wo_cap.pkl --resume --resume_path $RESUME_PATH

Train CAT Models

CAT for T2I generation on CelebA

cd cat_scripts
bash script_gpt_CA_celeba.sh

BibTeX

@inproceedings{favae2023cvpr,
  title={Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder},
  author={Lin, Xinmiao and Li, Yikang and Hsiao, Jenhao and Ho, Chiuman and Kong, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

License

See the LICENSE file for license rights and limitations (MIT).

Acknowledge

The implementation of FA-VAE relies on resources from Clip-Gen, taming-transformers, CLIP, vector-quantize-pytorch, PerceptualSimilarity, and pytorch-fid. We thank the original authors for their open-sourcing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Frequency Augmented VAE (FA-VAE)

To-Do

Requirements

Checkpoints

Data Preparation

CelebA-HQ

Train FA-VAE

Train CAT Models

BibTeX

License

Acknowledge

Files

README.md

Latest commit

History

README.md

File metadata and controls

Frequency Augmented VAE (FA-VAE)

To-Do

Requirements

Checkpoints

Data Preparation

CelebA-HQ

Train FA-VAE

Train CAT Models

BibTeX

License

Acknowledge