Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques (BMVC 2023)
The official TensorFlow implementation of the BMVC2023 paper: Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques.
The CAS obtained from the classifiers trained only on generated data. The GaFi pipeline is compared with the previous state of the art, with the Synthetic Baseline and with the accuracy of the classifiers trained on real data.
- Clone the GitHub repository:
git clone https://github.com/sup3rgiu/GaFi-Pipeline.git
- Move inside the
docker
directory:
cd GaFi-Pipeline/docker
- Build docker image:
docker build --rm -t gafi_pipeline .
- Train the classifier on real data:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python train_classifier.py --cfg_file ./configs/CIFAR10/ResNet20.yaml
- Train GAN:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python train_gain.py --cfg_file ./configs/CIFAR10/BigGAN_deep.yaml
- Run full pipeline:
N.B.: adjust GAN name if needed. You can do it inside the .yaml file or as cmd argument
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python run_pipeline.py --cfg_file ./configs/CIFAR10/Pipeline.yaml --gan_name GAN_NAME
- Iterate through steps 2. and 3. N times, changing the seed each time to obtain N different generators:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python train_gain.py --cfg_file ./configs/CIFAR10/BigGAN_deep.yaml --seed NEW_SEED
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python run_pipeline.py --cfg_file ./configs/CIFAR10/Pipeline.yaml --gan_name NEW_GAN_NAME
- Run the MultiGAN script to obtain a classifier trained on a synthetic dataset sampled from the N different generators:
N.B.: adjust all the GAN names inside the .yaml file or as cmd argument
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python run_multigan.py --cfg_file ./configs/CIFAR10/MultiGAN.yaml
All default parameters defined in the .yaml configuration files can be overridden by specifying the corresponding command-line arguments.
For example, if we want to use the default ./configs/CIFAR10/ResNet20.yaml but train in mixed precision, we can do the following:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python train_classifier.py --cfg_file ./configs/CIFAR10/ResNet20.yaml --mixed_precision
Or if we want to train the classifier without using random erasing augmentation:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --name GaFiPipeline -it --rm -v /path_to_GaFi-Pipeline:/exp -t gafi_pipeline python train_classifier.py --cfg_file ./configs/CIFAR10/ResNet20.yaml --random_erasing False
All possible arguments are defined in parser.py and can be seen by running the scripts with the -h
flag.
Should you find this repository useful, please consider citing:
@misc{lampis2023bridging,
title={Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques},
author={Andrea Lampis and Eugenio Lomurno and Matteo Matteucci},
year={2023},
eprint={2305.10118},
archivePrefix={arXiv},
primaryClass={cs.CV}
}