The wilds_examples folder was taken directly from https://github.com/p-lambda/wilds/tree/main/examples. We implemented our methods on top of what was already present in order to accelerate development iteration.
You can create a conda environment by utilizing the requirements.txt
file.
To train the baseline model:
- Go to the wilds folder
- Execute:
python run_expt.py -d fmow --algorithm ERM --root_dir ./data --download
If you have not downloaded the dataset previously, this will download the dataset. Note: This dataset is 50GB in size.
To add data loading workers, add the following argument to the run_expt.py
: --loader_kwargs "num_workers=8"
(ie. 8 because I have an 8-core CPU)
To speed up the data transfer between host and GPU: --loader_kwargs pin_memory=True
The wilds/configs/datasets.py
file contains default training config for each dataset.
The wilds/models/initializer
is the place where models are created/initialized (def initialize_model(config, d_out, is_featurizer=False)
)
The logs files will be shared on the Google Drive (contact team for permission) in the IFT6759 - The WILD Guess Team\Logs
folder.
Naming convention: <first_name>_<model>_<method>_<partial/full data>_exp<#> ex: Nathan_ERM_Baseline_full_exp1
Link: https://drive.google.com/drive/folders/1-GfVHWnTdhvYA4-LM7mLIWidzdqZBQUE?usp=sharing
The one line result in kpi_extract.txt
extracted with the run_analyse.py
script described below needs to be copied in the Excel tracker file (contact team for permission) along with the model training command line.
Link: https://docs.google.com/spreadsheets/d/1Z0vVkII57D0G3OWWFtW8muDY7glydptM/edit?usp=sharing&ouid=109019793128097247425&rtpof=true&sd=true
The run_analyse.py
script can be used to extract results & make useful plots from the logs coming from the run_expt.py
script. The script will namely:
- Plot the data split distributions.
- Plot the Loss & Accuracy curves.
- Extract an Excel pre-formatted one line result summary for the model's "Best Epoch" (based on validation loss minima).
How to use the script:
- Run
python run_analyse.py --log_dir <logs> --show --eval_only
where<logs>
is the path to the directory where the logs can be found, where--show
is a boolean argument to make figure pop-ups appear sequentially (omitting it will stop the pop-ups) and where--eval_only
only performs the KPI extract based on the evaluation logs (w/o training logs). - All the figures & text file will be saved in the
<logs>
directory. - Copy the
kpi_extract.txt
content in the above shared Excel tracker file, adding also the command line used for the model training withrun_expt.py
for tracking/reproducibility purposes.
pip install wandb
wandb login
- Add the following arguments to
run_expt.py
:--use_wandb=True --wandb_kwargs project="wilds" entity="the-wild-guess"
You can then view experimental results here: https://wandb.ai/the-wild-guess
(Note: Access is required. Contact team for permission.)
The Bootstrap process is part of the WILDS package. The following parameters can be used to configure how Bootstrap is performed:
--train_load
-> Select the loader to be either per group or standard.
--groupby_fields region
-> Select the grouping parameters for Bootstrap sampling & results reporting.
--uniform_over_groups
-> Boolean to activate Bootstrap sampling uniformly over groups.
-n_groups_per_batch
-> the number of groups per batch. Need to be a multiple of the batch_size.
In order to preserve the WILDS package code coherence, the Bagging method has been developed into two parts: the training part and the evaluation part. In the training part, each Bagging predictor are trained sequentially using the specified parameters. During training, each predictor is evaluated individually using the default script from WILDS package. However, this evaluation does not take into account the joint predictions of predictors. To make this combined evaluation, a separate script has been developed specifically for Bagging evaluation.
The training part of the Bagging process is fairly simple. The main script run_expt.py
from WILDS package has been slightly tweaked to introduce a training loop which outputs a predictor for each of the defined Bagging seeds through the regular training process. This generic method allows flexibility as any kind of model can be trained with Bagging algo.
To enable the Bagging training, the --bagging
parameter must be set to TRUE, the --bagging_size
parameter must have the number of desired predictors and the --bagging_seeds
must a a list of unique seeds to train each individual predictor with a different subset of data. Note : the --frac
parameter must be below 1 in order to have different subsets of data for each predictor ; otherwise the Bagging process will not have any impact.
Here is a command line example to run the training part of Bagging:
python ./examples/run_expt.py -d fmow --algorithm ERM --root_dir ./data
--frac 0.5
--batch_size 30
--seed 0
--n_epoch 4
--train_load standard
--uniform_over_groups
--groupby_fields y
--bagging
--bagging_seeds 0 1 2 3 4 5 6 7
--bagging_size 8
--save_step 1
The Bagging evaluation is performed when the --bagging
and --eval_only
parameters are TRUE. The --eval_epoch
parameter can select a specific epoch at which predictors were trained to make the evaluation. If omitted, the last epoch is used by default. Furthermore, the training log folder must be inputed in --log_dir
parameter. During the Bagging evaluation process, the predictions from each predictor are aggregated to select the most occuring category as the final prediction.
Here is an example of a command line for Bagging evaluation:\
python ./examples/run_expt.py -d fmow --algorithm ERM --root_dir ./data
--log_dir "./logs"
--frac 1
--bagging
--eval_only
--eval_epoch 4
Since this method requires the predictions to be probability distributions, an additional softmax is applied if any prediction doesn't sum to 1.
To activate both while training & evaluating, add the argument --correct_label_shift
.
You can also specify the split to use for label distribution estimation out of train
, id_val
& val
.
Example command to evaluate best convnet
model from logs
folder:
python wilds_examples/run_expt.py -d fmow --algorithm ERM --root_dir ./data --download --model convnet
--frac 0.01 --loader_kwargs "num_workers=8" --loader_kwargs pin_memory=True
--correct_label_shift id_val --log_dir ./logs --eval_only
In order to estimate label shift per grouping in the test sets, add the argument:
--label_shift_estimation_grouping region year
You can group by either region, year or both depending on which are present in the argument.
Note: The following method requires training two different models. The first model (baseline) can be trained using the standard ERM approach (a helper script is available at wilds_examples/run_bbse_exp.sh
). After the baseline model is trained, we need to estimate the target label distribution on the test set. This can be done by running the following:
wilds_examples/bbse/run_estimate_target_distribution.sh
ensuring to update the following arguments to point to:
yval
: The true labels for the in-domain validation set
ytest
: The true labels for the OOD test set
ypred_source
: The predictions for the in-domain validation set
ypred_target
: The predictions for the OOD test set
This will output a class weights file, which should be used to train a second model (can use the run_bbse_exp.sh
helper script) with the additional argument --erm_weights
.
The DORO experiment can be run by utilizing the helper script found here:
wilds_examples/run_doro_exp.sh
To run the Visual Transformer run:
python run_expt.py -d fmow --model vit --algorithm ERM --root_dir ./data
--loader_kwargs pin_memory=True
--loader_kwargs "num_workers=26"
--model_kwargs="model_size=B_16"
--model_kwargs="pretrained=True"
--device=0```