This is a repository for the final project of Machine Learning course at Skoltech, term 3 '22. Team: Anton Zubekhin, Polina Karpikova, Nikita Fedyashin The study and code are based on a paper Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
by Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov and Andrew Gordon Wilson (NIPS 2018, Spotlight) (github repo)
Neural networks are used for many tasks (image and video classification, natural language processing, etc.). They are typically trained by minimizing some loss function, which is a function of the parameters of the network. Thus, a loss function is represented by a surface in a high dimensional space. The optimally trained networks are associated with local minima on the loss surface. The original paper refers to connections of such local minima for convolutional neural networks which are trained to classify images. It's shown, that there are low-loss paths between minima on loss surface, which may be used to construct an optimal neural network satisfying low loss requirement.
We study another type of neural networks - autoencoders - that are designed to construct low-dimensional representations of high-dimensional data. We claim, that for the task of embedding high dimensional data into low dimensional and further reconstruction of original data from its embedding there are low-loss paths, connecting different optimal neural networks (i.e. local minima on some loss surface).
We use a publically available autoencoder architecture with slight adjustments - instead of transposed convolutions in the decoder part, we use a combination of upsampling and usual convolution.
To train an autoencoder we stick to two losses: mean absolute reconstruction error and reconstruction loss based on Laplacian pyramid. For the latter we use this implementation.
We use CelebA 64x64 dataset to train and evaluate the performance of the models. The data is splitted as 4 to 1 for train set and test set.
The list of required packages is presented in requirements.txt
.
Be sure to install gdown==4.4.0
to be able to download the CelebA dataset.
To estimate the quality of reconstructions we use LPIPS metric (implementation is here).
Simply run
pip install -r requirements.txt
to install all required packages. We highly suggest using GPU for both train and inference of the autoencoder.
You need two trained autoencoders to build a curve connecting them. Run
python train.py --dir=<DIR> \
--data_path=<PATH> \
--device=<DEVICE> \
--loss_function=<LOSS> [mae|laplacian] \
--conv_init=<WEIGHT_INIT> [normal|kaiming_normal|kaiming_uniform] \
--latent_dim=<DIM> \
--epochs=<EPOCHS>
Main parameters:
DIR
— path to the directory to store checkpoints of trainingPATH
— path to the dataset (if predownloaded, by default on the first run it will be downloaded automatically)DEVICE
— device to train and infere models on (we suggest using cuda)LOSS
— type of reconstruction loss (MAE or Laplace pyramid loss, default MAE)WEIGHT_INIT
— type of weights initialization in convolutional layers (default: normal)DIM
— dimensionality of the embeddings (default: 128)EPOCHS
— number of training epochs (default: 100)
You can also choose on optimizer (Adam or SGD) and their parameters. Run
python train.py -h
to see all available options.
Once you have to checkpoints, you can connect them with a curve of your choice (Bezier or PolyChain)
python train.py --dir=<DIR> \
--data_path=<PATH> \
--device=<DEVICE> \
--loss_function=<LOSS> [mae|laplacian] \
--conv_init=<WEIGHT_INIT> [normal|kaiming_normal|kaiming_uniform] \
--curve=<CURVE> [Bezier|PolyChain] \
--num_bends=<NBENDS> \
--init_start=<START> \
--init_end=<END> \
--latent_dim=<DIM> \
--epochs=<EPOCHS>
[--fix_start] \
[--fix_end] \
Main parameters:
CURVE
— type of curve parametrization (Bezier or PolyChain)NBENDS
— number of bends in the curve (default: 3)START, END
— paths to the checkpoints of the endpoints in the curve
You may also use --fix_start --fix_end
to fix the endpoints of the curve (otherwise they will also
be trained)
If you have a checkpoint of the curve, you can start the evaluation procedure. You are able to track the value of the loss along the trained curve (and optionally LPIPS). By default you will also get the dynamics of the reconstructed images by the networks, initialized with weights on the trained curve. We use 4 images from the training set to track their dynamics. We also leave an opportunity to connect endpoints with a straight line to compare it with the trained low-loss curve. Run
python eval_curve.py --dir=<DIR> \
--ckpt=<CKPT> \
--device=<DEVICE> \
--connect=<CONNECT> [CURVE|TRIVIAL]\
--curve=<CURVE> \
--num_bends=<NBENDS>
--num_points=<NPOINTS>
[--lpips]
Main parameters:
CKPT
— path to the checkpoint of the curveCONNECT
— type of connection - trained low-loss curve or straight line in the hyperspaceNPOINTS
— number of points to evaluate the curve at (default: 10)
Use flag --lpips
to track the LPIPS score of the reconstructions along the curve.
models
folder stores architectures of autoencoder and special
network class, that embodies the low-loss curve;
notebooks
folder has some additional .ipynb
notebooks:
colab_training.ipynb
- notebook for training models on `GoogleColab;check_outputs.ipynb
- notebook to check the outputs of autoencoders , to verify that endpoints are in different positions in the loss space, to check the size of the models;
check_ends.py
is there for a sanity check to compare the endpoints;
dataset.py
has a dataset class for CelebA dataset. By default, it is downloaded
upon initialization of the dataset in the main code;
train.py
trains the base autoencoder or the curve network (detailed description above);
eval_curve.py
evaluates the curve (description above);
trainer.py
stores basic functions for training and testing of the models;
utils.py
has additional utility functions inherited from original repo;
Typical loss behaviour:
Image dynamics along L1 trained low-loss curve:
Along Laplacian pyramid trained curve:
Along segment connection:
And one more:
Check other in /media/gifs
folder.
- DNN Loss Connectivity original implementation of low-loss curve finding algorithm
- Autoencoder arcitecture for CelebA dataset
- Laplacian pyramid loss for models training
- LPIPS score evaluation
- CelebA aligned 64x64 dataset