Skip to content

Latest commit



167 lines (116 loc) · 8.61 KB

File metadata and controls

167 lines (116 loc) · 8.61 KB


The wilds_examples folder was taken directly from We implemented our methods on top of what was already present in order to accelerate development iteration.


You can create a conda environment by utilizing the requirements.txt file.

Relevant Preliminary Information

To train the baseline model:

  • Go to the wilds folder
  • Execute: python -d fmow --algorithm ERM --root_dir ./data --download

If you have not downloaded the dataset previously, this will download the dataset. Note: This dataset is 50GB in size.

To add data loading workers, add the following argument to the --loader_kwargs "num_workers=8" (ie. 8 because I have an 8-core CPU)

To speed up the data transfer between host and GPU: --loader_kwargs pin_memory=True

The wilds/configs/ file contains default training config for each dataset.

The wilds/models/initializer is the place where models are created/initialized (def initialize_model(config, d_out, is_featurizer=False))

Result sharing convention & analytics

Logs & results sharing

The logs files will be shared on the Google Drive (contact team for permission) in the IFT6759 - The WILD Guess Team\Logs folder.
Naming convention: <first_name>_<model>_<method>_<partial/full data>_exp<#> ex: Nathan_ERM_Baseline_full_exp1

The one line result in kpi_extract.txt extracted with the script described below needs to be copied in the Excel tracker file (contact team for permission) along with the model training command line.

Log visualisation & results extract with homemade script

The script can be used to extract results & make useful plots from the logs coming from the script. The script will namely:

  1. Plot the data split distributions.
  2. Plot the Loss & Accuracy curves.
  3. Extract an Excel pre-formatted one line result summary for the model's "Best Epoch" (based on validation loss minima).

How to use the script:

  1. Run python --log_dir <logs> --show --eval_only where <logs> is the path to the directory where the logs can be found, where --show is a boolean argument to make figure pop-ups appear sequentially (omitting it will stop the pop-ups) and where --eval_only only performs the KPI extract based on the evaluation logs (w/o training logs).
  2. All the figures & text file will be saved in the <logs> directory.
  3. Copy the kpi_extract.txt content in the above shared Excel tracker file, adding also the command line used for the model training with for tracking/reproducibility purposes.

Log visualisation with Weights & Biases (wand) package

  1. pip install wandb
  2. wandb login
  3. Add the following arguments to --use_wandb=True --wandb_kwargs project="wilds" entity="the-wild-guess"

You can then view experimental results here:

(Note: Access is required. Contact team for permission.)

Dataset split exploration and Methodology

Bootstrapping for evenly distributed splits

The Bootstrap process is part of the WILDS package. The following parameters can be used to configure how Bootstrap is performed:

--train_load -> Select the loader to be either per group or standard.
--groupby_fields region -> Select the grouping parameters for Bootstrap sampling & results reporting.
--uniform_over_groups -> Boolean to activate Bootstrap sampling uniformly over groups.
-n_groups_per_batch -> the number of groups per batch. Need to be a multiple of the batch_size.


In order to preserve the WILDS package code coherence, the Bagging method has been developed into two parts: the training part and the evaluation part. In the training part, each Bagging predictor are trained sequentially using the specified parameters. During training, each predictor is evaluated individually using the default script from WILDS package. However, this evaluation does not take into account the joint predictions of predictors. To make this combined evaluation, a separate script has been developed specifically for Bagging evaluation.


The training part of the Bagging process is fairly simple. The main script from WILDS package has been slightly tweaked to introduce a training loop which outputs a predictor for each of the defined Bagging seeds through the regular training process. This generic method allows flexibility as any kind of model can be trained with Bagging algo.
To enable the Bagging training, the --bagging parameter must be set to TRUE, the --bagging_size parameter must have the number of desired predictors and the --bagging_seeds must a a list of unique seeds to train each individual predictor with a different subset of data. Note : the --frac parameter must be below 1 in order to have different subsets of data for each predictor ; otherwise the Bagging process will not have any impact.

Here is a command line example to run the training part of Bagging:

python ./examples/ -d fmow --algorithm ERM  --root_dir ./data
--frac 0.5
--batch_size 30
--seed 0
--n_epoch 4
--train_load standard
--groupby_fields y
--bagging_seeds 0 1 2 3 4 5 6 7
--bagging_size 8
--save_step 1


The Bagging evaluation is performed when the --bagging and --eval_only parameters are TRUE. The --eval_epoch parameter can select a specific epoch at which predictors were trained to make the evaluation. If omitted, the last epoch is used by default. Furthermore, the training log folder must be inputed in --log_dir parameter. During the Bagging evaluation process, the predictions from each predictor are aggregated to select the most occuring category as the final prediction.
Here is an example of a command line for Bagging evaluation:\

python ./examples/ -d fmow --algorithm ERM  --root_dir ./data
--log_dir "./logs"
--frac 1
--eval_epoch 4

Model evaluation with label shift correction

Expectation Minimization + Bias Corrected Temperature Scaling

Since this method requires the predictions to be probability distributions, an additional softmax is applied if any prediction doesn't sum to 1. To activate both while training & evaluating, add the argument --correct_label_shift. You can also specify the split to use for label distribution estimation out of train, id_val & val.

Example command to evaluate best convnet model from logs folder:

python wilds_examples/ -d fmow --algorithm ERM --root_dir ./data --download --model convnet
--frac 0.01 --loader_kwargs "num_workers=8" --loader_kwargs pin_memory=True 
--correct_label_shift id_val --log_dir ./logs --eval_only


In order to estimate label shift per grouping in the test sets, add the argument:

--label_shift_estimation_grouping region year

You can group by either region, year or both depending on which are present in the argument.

Label Shift Correction w/ Black Box Predictors

Note: The following method requires training two different models. The first model (baseline) can be trained using the standard ERM approach (a helper script is available at wilds_examples/ After the baseline model is trained, we need to estimate the target label distribution on the test set. This can be done by running the following:


ensuring to update the following arguments to point to:

yval: The true labels for the in-domain validation set

ytest: The true labels for the OOD test set

ypred_source: The predictions for the in-domain validation set

ypred_target: The predictions for the OOD test set

This will output a class weights file, which should be used to train a second model (can use the helper script) with the additional argument --erm_weights.

Distributionally and Outlier Robost Optimization (DORO)

The DORO experiment can be run by utilizing the helper script found here:


Visual Transformer

To run the Visual Transformer run:

python -d fmow --model vit --algorithm ERM --root_dir ./data
--loader_kwargs pin_memory=True
--loader_kwargs "num_workers=26" 