Author: Dimitri Korsch
This repository contains the official source code to produce results reported in the paper
End-to-end Learning of Fisher Vector Encodings for Part Features in Fine-grained Recognition.
Dimitri Korsch, Paul Bodesheim and Joachim Denzler.
DAGM German Conference on Pattern Recognition (DAGM-GCPR), 2021.
- Install miniconda.
- Create an environment:
conda create -n DeepFVE python~=3.9.0 cython~=0.28 mpi4py
conda activate DeepFVE
Note: If you create environment with another name, then you need to prepend CONDA_ENV=<another_name>
in the execution of the experiments. Example: CONDA_ENV=my_env ./train.sh
.
3. Install CUDA / cuDNN and required libraries:
conda install -c conda-forge -c nvidia cudnn~=8.0.0 nccl cudatensor \
cudatoolkit~=11.0.3 cudatoolkit-dev~=11.0.3 numpy~=1.23.0
pip install -r requirements.txt
- Install FVE-Layer implementation:
git submodule init
git submodule update
cd fve_layer
make
Start sacred to log your experiment results
docker
and docker-compose
are required
Note: Sacred is optional, but if it is not running, you need to append --no_sacred
in the execution of the experiments. Example : ./train.sh --no_sacred
.
- Go to
fgvc/sacred
- Copy the config template
config.sh.template
toconfig.sh
and edit missing values. - Start the containers:
./run.sh
Recreating sacred_mongodb ... done
Recreating sacred_omniboard ... done
- Check the containers:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
xxxxxxxxxxxx vivekratnavel/omniboard:latest "/sbin/tini -- yarn …" 38 seconds ago Up 35 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp sacred_omniboard
xxxxxxxxxxxx mongo "docker-entrypoint.s…" 42 seconds ago Up 39 seconds 27018/tcp, 0.0.0.0:27018->27017/tcp, :::27018->27017/tcp sacred_mongodb
- In your web browser go to
localhost:9000
to open the omniboard.
- Download the weights for the InceptionV3 model and put them in you data directory (e.g.:
<data_dir>/models/inception/model.imagenet.npz
) - Download the dataset annotations and download the images from the links mentioned in the corresponding README file.
- Copy
example.yml
tofgvc/info.yml
and updateBASE_DIR
to point to your data directory.
Note: An example data directory may look like the following:
data
├── datasets
│ ├── birds
│ │ ├── BIRDSNAP
│ │ │ ├── CS_parts
│ │ │ └── ORIGINAL
│ │ ├── CUB200
│ │ │ ├── CS_parts
│ │ │ │ ├── images -> ../ORIGINAL/images
│ │ │ │ ├── images.txt -> ../ORIGINAL/images.txt
│ │ │ │ ├── labels.txt -> ../ORIGINAL/labels.txt
│ │ │ │ ├── parts
│ │ │ │ │ ├── part_locs.txt
│ │ │ │ │ └── parts.txt
│ │ │ │ └── tr_ID.txt -> ../ORIGINAL/tr_ID.txt
│ │ │ └── ORIGINAL
│ │ │ ├── images
│ │ │ ├── images.txt
│ │ │ ├── labels.txt
│ │ │ ├── parts
│ │ │ │ ├── part_locs.txt
│ │ │ │ └── parts.txt
│ │ │ └── tr_ID.txt
│ │ └── NAB
│ │ ├── CS_parts
│ │ └── ORIGINAL
│ ├── dogs
│ │ ├── CS_parts
│ │ └── ORIGINAL
│ └── moths
│ ├── CS_parts
│ └── ORIGINAL
└── models
└── inception
├── model.imagenet.ckpt.npz
└── model.inat.ckpt.npz
Change to the scripts directory and start the training script with default parameters:
cd fgvc/scripts
./train.sh
Prints the command instead of executing it. Useful to check the settings.
DRY_RUN=1 ./train.sh
python ../main.py train ../info.yml CUB200 GLOBAL --no_snapshot --n_jobs 3 --label_shift 1 --gpu 0 --model_type cvmodelz.InceptionV3 --prepare_type model --pre_training inat --input_size 299 --parts_input_size 299 --feature_aggregation concat --load_strict --fve_type no --n_components 1 --comp_size -1 --post_fve_size 0 --aux_lambda 0.5 --aux_lambda_rate 0.5 --aux_lambda_step 20 --ema_alpha 0.99 --init_mu 1 --init_sig 1 --mask_features --augmentations random_crop random_flip color_jitter --center_crop_on_val --batch_size 24 --update_size 64 --label_smoothing 0.1 --optimizer adam --epochs 60 --output .results/results/CUB200/adam/2021-08-27-10.45.53.421472597 --logfile .results/results/CUB200/adam/2021-08-27-10.45.53.421472597/output.log -lr 1e-3 -lrd 1e-1 -lrs 1000 -lrt 1e-8 --no_sacred
The most important parameters can be set by prepending variables to the training script (check config files under fgvc/scripts/configs
for more information). This one starts NA-Birds training with CS-Parts on the second GPU in your system for the ResNet50 with a batch size of 16 without writing the results to sacred.
DATASET=NAB PARTS=CS_PARTS BATCH_SIZE=16 MODEL_TYPE=chainercv2.resnet50 GPU=1 ./train.sh --no_sacred
The selection of the FVE implementation can be controlled by the FVE_TYPE
variable:
# training of CS-parts with GAP
PARTS=CS_parts FVE_TYPE=no ./train.sh
# ... with em-based FVE (our proposed method)
PARTS=CS_parts FVE_TYPE=em ./train.sh
# ... with gradient-based FVE (our implementation of Wieschollek et al.)
PARTS=CS_parts FVE_TYPE=grad ./train.sh
See fgvc/scripts/config/21_fve.sh
config script for more configuration possibilities.
Further command line options of the training script can be seen here:
./train.sh -h
usage: main.py train [-h] --model_type
{chainercv2.resnet50,chainercv2.inceptionv3,cvmodelz.VGG16,cvmodelz.VGG19,cvmodelz.ResNet35,cvmodelz.ResNet50,cvmodelz.ResNet101,cvmodelz.ResNet152,cvmodelz.InceptionV3}
[--pre_training {imagenet,inat}]
[--input_size INPUT_SIZE [INPUT_SIZE ...]]
[--parts_input_size PARTS_INPUT_SIZE [PARTS_INPUT_SIZE ...]]
[--prepare_type {model,custom,tf,chainercv2}]
[--pooling {max,avg,tf_avg,g_avg,cbil,alpha}]
[--load LOAD] [--weights WEIGHTS] [--headless]
[--load_strict] [--load_path LOAD_PATH]
[--feature_aggregation {mean,concat}]
[--pred_comb {no,sum,linear}]
[--copy_mode {copy,share,init}]
[--label_shift LABEL_SHIFT] [--swap_channels]
[--n_jobs N_JOBS] [--shuffle_parts] [--logfile LOGFILE]
[--loglevel LOGLEVEL] [--gpu GPU [GPU ...]] [--profile]
[--only_klass ONLY_KLASS] [--fve_type {no,grad,em}]
[--n_components N_COMPONENTS] [--comp_size COMP_SIZE]
[--init_mu INIT_MU] [--init_sig INIT_SIG]
[--post_fve_size POST_FVE_SIZE] [--ema_alpha EMA_ALPHA]
[--aux_lambda AUX_LAMBDA]
[--aux_lambda_rate AUX_LAMBDA_RATE]
[--aux_lambda_step AUX_LAMBDA_STEP] [--mask_features]
[--no_gmm_update] [--only_mu_part] [--normalize]
[--no_sacred] [--augment_features] [--warm_up WARM_UP]
[--optimizer {sgd,rmsprop,adam}]
[--cosine_schedule COSINE_SCHEDULE] [--l1_loss]
[--from_scratch] [--label_smoothing LABEL_SMOOTHING]
[--only_head] [--seed SEED] [--batch_size BATCH_SIZE]
[--epochs EPOCHS] [--debug]
[--learning_rate LEARNING_RATE] [--lr_shift LR_SHIFT]
[--lr_decrease_rate LR_DECREASE_RATE]
[--lr_target LR_TARGET] [--decay DECAY]
[--augmentations [{random_crop,random_flip,random_rotation,center_crop,color_jitter} [{random_crop,random_flip,random_rotation,center_crop,color_jitter} ...]]]
[--center_crop_on_val]
[--brightness_jitter BRIGHTNESS_JITTER]
[--contrast_jitter CONTRAST_JITTER]
[--saturation_jitter SATURATION_JITTER] [--only_eval]
[--init_eval] [--no_progress] [--no_snapshot]
[--output OUTPUT] [--update_size UPDATE_SIZE]
[--test_fold_id TEST_FOLD_ID] [--analyze_features]
[--mpi] [--only_analyze] [--loss_scaling LOSS_SCALING]
[--opt_epsilon OPT_EPSILON]
data {CUB200,NAB,BIRDSNAP,DOGS,EU_MOTHS}
{GLOBAL,GT,GT2,CS_PARTS}
[...]
This work is licensed under a GNU Affero General Public License.