Skip to content

Latest commit

 

History

History
80 lines (67 loc) · 6.06 KB

MODEL_ZOO.md

File metadata and controls

80 lines (67 loc) · 6.06 KB

Model Zoo

If you only want to use our trained checkpoints for inference or fine-tuning, here is the collection of models.

Models with 3 experts are standard models that provide a good trade-off between computation/model size and accuracy. Note that models with 2 experts sometimes have even lower computational cost than baseline models. However, we will also release some models that achieve higher accuracy such as models with 6 experts, which can be used as teacher models to distill other models. Some models are trained in an old config format so that config may mismatch. If you cannot load the checkpoint, please let us know.

Imbalanced CIFAR 100/CIFAR-LT 100 (100 epochs)

  1. CE and Decouple: baseline results for cross-entropy and decouple (cRT/tau-norm/LWS)
  2. RIDE: ResNet32 backbone, without distillation, with EA
  3. RIDE + Distill: ResNet32 backbone, with distillation, with EA
  4. Teacher Model: ResNet32 backbone, 6 experts, without EA. Working as the teacher model when optimizing RIDE with knowledge distillation.
Model #Experts Overall Accuracy Many Accuracy Medium Accuracy Few Accuracy Download
CE - 39.1 66.1 37.3 10.6 -
Decouple - 43.3 64.0 44.8 18.1 -
RIDE 3 48.6 67.0 49.9 25.7 Link
RIDE + Distill 3 49.0 67.6 50.9 25.2 Link
RIDE + Distill 4 49.4 67.7 51.3 25.7 Link
Teacher Model 6 50.2 69.3 52.1 25.8 Link

ImageNet-LT (100 epochs)

  1. CE and Decouple: baseline results for cross-entropy and decouple (cRT/tau-norm/LWS)
  2. RIDE: ResNeXt50 backbone, 3 experts, without distillation, with EA
  3. RIDE + Distill: ResNeXt50 backbone, with distillation, with EA
  4. Teacher Model: ResNeXt50 backbone, 6 experts, without EA. Working as the teacher model when optimizing RIDE with knowledge distillation.
Model #Experts Overall Accuracy Many Accuracy Medium Accuracy Few Accuracy Download
CE - 44.4 65.9 37.5 7.7 -
Decouple - 49.9 60.2 47.2 30.3 -
RIDE 3 55.7 67.0 52.2 36.0 Link
RIDE + Distill 4 56.8 68.3 53.5 35.9 Link
Teacher Model 6 57.5 68.9 54.3 36.5 Link

iNaturalist (100 epochs)

  1. CE and Decouple: baseline results for cross-entropy and Decouple (cRT/tau-norm/LWS)
  2. RIDE: ResNet50 backbone, without distillation, with EA
  3. RIDE + Distill: ResNet50 backbone, with distillation, with EA (in FP16)
  4. Teacher Model: ResNet50 backbone, 6 experts, without EA. Working as the teacher model when optimizing RIDE with knowledge distillation.
Model #Experts Overall Accuracy Many Accuracy Medium Accuracy Few Accuracy Download
CE - 61.7 72.2 63.0 57.2 -
Decouple - 65.9 65.0 66.3 65.5 -
RIDE 3 71.2 70.2 71.2 71.6 Link
RIDE + Distill 4 72.6 70.9 72.5 73.1 Link
Teacher Model 6 72.9 71.1 72.9 73.3 Link

iNaturalist (Longer Training)

  1. RIDE + Distill: ResNet 50 backbone, 4 experts, with EA, 200 epochs (distilled from 6 experts, 200 epochs).
  2. RIDE: ResNet 50 backbone with 6 experts, without EA, 300 epochs.
Model #Experts Overall Accuracy Many Accuracy Medium Accuracy Few Accuracy Download
RIDE + Distill 4 73.2 70.5 73.7 73.3 Link
RIDE 6 74.6 71.0 75.7 74.3 Link

After downloading the checkpoints, you could run evaluation by following the instructions in the test section.