PyTorch & Wandb: hyperparameter search and model comparison with MNIST

This repository illustrates how to perform hyperparameter search and multi-trial training with PyTorch and wandb and the MNIST database.

It is structured around the following steps:

A hyperparameter grid search is performed with wandb to find the optimal configurations of two architectures (MLP and CNN).
Once the best configuration is identified for each model, each of them is trained 5 times in its optimal configuration.
Plotting scripts are finally used to extract the relevant plots and a performance summary of the multi-trial executions.

Notes:

this repository was designed to lay the foundation of a similar work involving LLMs where each architecture is handled with its own configuration file,
in order to visualize results with wandb, you will need to sign up on their website.

Requirements

The experiments were performed on a local laptop without GPU. In addition, Python version 3.10.12 and the dependencies in requirements.txt were used. These can be installed in a virtual environment with the following commands:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install upgrade pip
pip install -r requirements.txt

Note: our experiments were conducted with the exact package versions specified in requirements.txt. Future updates may alter reproducibility.

After successful installation, the user can be prompted with the program arguments:

python3 main.py --help
usage: main.py [-h] [--batch-size BATCH_SIZE] [--config CONFIG] [--epochs EPOCHS] [--json] [--lr LR] [--model MODEL]
               [--test] [--train] [--out-dir OUT_DIR] [--seed SEED] [--wandb]

options:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        batch size for train loop. (default: 512)
  --config CONFIG       name of the experiment config file. (default: config.json)
  --epochs EPOCHS       number of epochs to train on. (default: 10)
  --json                use a json config file. (default: False)
  --lr LR               learning rate. (default: 0.001)
  --model MODEL         model to train/execute: CNN or MLP. (default: CNN)
  --test                to test. (default: False)
  --train               to train. (default: False)
  --out-dir OUT_DIR     where output data are written. (default: experiment-YYYYMMDDTHHMMSS)
  --seed SEED           torch manual seed. (default: 123)
  --wandb               use a wandb config file. (default: False)

A simple training/test can finally be executed with the default options by running:

python3 --train --test

Experiments

Hyperparameter search

The hyperparameter search leading to the results below can be reproduced as follows:

python3 --wandb --config=wandb_config

Note: hints on how to customize the configuration file are available on the wandb documentation.

The results of this search are illustrated on the figures and table below (left CNN, right MLP):

Model	lr	batch-size	epochs	validation loss [Accuracy]
CNN	1e-3	512	5	0.06561 [0.9813]
CNN	1e-3	1024	5	0.09156 [0.9733]
CNN	1e-1	512	5	0.1579 [0.95]
CNN	1e-1	1024	5	0.2435 [0.9235]
MLP	1e-3	512	5	0.2116 [0.9367]
MLP	1e-3	1024	5	0.2645 [0.9202]
MLP	1e-1	512	5	0.9584 [0.6551]
MLP	1e-1	1024	5	0.8177 [0.7299]

Note: for both architectures, the hyperparameter search was performed with the same parameters. However, this is not a requirements and architectural parameter changes (e.g. in the layer depth or size) could easily be handled since model has its own set of parameters (see wandb_config.json). Be aware though that introducing new parameters would require to add the corresponding parameter management code to main.py.

Multi-trial training

The previous results suggest that for each model, the best configuration is:

CNN with lr=1e-3, batch_size=512,
MLP with lr=1e-3, batch_size=512.

Each of them was therefore trained 5 additional times by running:

python3 main.py --json --config=config

The results can be viewed with tensorboard with the command:

tensorboard --logdir multi-trial

Note: similarly to the hyperparameter sweep, the multi-trial configuration can easily be modified according to the user's needs (see config.json). Be aware though that introducing new parameters would require to add the corresponding parameter management code to main.py.

Results export

The point of running training multiple time is to extract performance statistics like the mean and standard deviations. The loss histories of the multi-trial runs are given in the figures below:

These can be obtained by executing the post-processing script:

python3 tb_to_csv.py --path=multi-trial

Note: the csv files can be saved with the --save option.

The final performances are summarized in the following table, where the means of the test loss and accuracy are given ± one standard deviation:

Model	test loss	test accuracy
CNN	0.0571 ± 0.0015	0.982 ± 0.0009
MLP	0.2126 ± 0.0024	0.938 ± 0.0017

About reproducibility

The code was built in order to foster reproducibility by manually setting torch's seed. However, as explained on Pytorch's documentation:

Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

In other words, users may not be able to reproduce exactly our results but provided that they run multiple experiments in the same environment (i.e. with the same dependencies and hardware), they should be able to reproduce their results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PyTorch & Wandb: hyperparameter search and model comparison with MNIST

Requirements

Experiments

Hyperparameter search

Multi-trial training

Results export

About reproducibility

Files

README.md

Latest commit

History

README.md

File metadata and controls

PyTorch & Wandb: hyperparameter search and model comparison with MNIST

Requirements

Experiments

Hyperparameter search

Multi-trial training

Results export

About reproducibility