Model Zoo

If you only want to use our trained checkpoints for inference or fine-tuning, here is the collection of models.

Models with 3 experts are standard models that provide a good trade-off between computation/model size and accuracy. Note that models with 2 experts sometimes have even lower computational cost than baseline models. However, we will also release some models that achieve higher accuracy such as models with 6 experts, which can be used as teacher models to distill other models. Some models are trained in an old config format so that config may mismatch. If you cannot load the checkpoint, please let us know.

Imbalanced CIFAR 100/CIFAR-LT 100 (100 epochs)

CE and Decouple: baseline results for cross-entropy and decouple (cRT/tau-norm/LWS)
RIDE: ResNet32 backbone, without distillation, with EA
RIDE + Distill: ResNet32 backbone, with distillation, with EA
Teacher Model: ResNet32 backbone, 6 experts, without EA. Working as the teacher model when optimizing RIDE with knowledge distillation.

Model	#Experts	Overall Accuracy	Many Accuracy	Medium Accuracy	Few Accuracy	Download
CE	-	39.1	66.1	37.3	10.6	-
Decouple	-	43.3	64.0	44.8	18.1	-
RIDE	3	48.6	67.0	49.9	25.7	Link
RIDE + Distill	3	49.0	67.6	50.9	25.2	Link
RIDE + Distill	4	49.4	67.7	51.3	25.7	Link
Teacher Model	6	50.2	69.3	52.1	25.8	Link

ImageNet-LT (100 epochs)

CE and Decouple: baseline results for cross-entropy and decouple (cRT/tau-norm/LWS)
RIDE: ResNeXt50 backbone, 3 experts, without distillation, with EA
RIDE + Distill: ResNeXt50 backbone, with distillation, with EA
Teacher Model: ResNeXt50 backbone, 6 experts, without EA. Working as the teacher model when optimizing RIDE with knowledge distillation.

Model	#Experts	Overall Accuracy	Many Accuracy	Medium Accuracy	Few Accuracy	Download
CE	-	44.4	65.9	37.5	7.7	-
Decouple	-	49.9	60.2	47.2	30.3	-
RIDE	3	55.7	67.0	52.2	36.0	Link
RIDE + Distill	4	56.8	68.3	53.5	35.9	Link
Teacher Model	6	57.5	68.9	54.3	36.5	Link

iNaturalist (100 epochs)

CE and Decouple: baseline results for cross-entropy and Decouple (cRT/tau-norm/LWS)
RIDE: ResNet50 backbone, without distillation, with EA
RIDE + Distill: ResNet50 backbone, with distillation, with EA (in FP16)
Teacher Model: ResNet50 backbone, 6 experts, without EA. Working as the teacher model when optimizing RIDE with knowledge distillation.

Model	#Experts	Overall Accuracy	Many Accuracy	Medium Accuracy	Few Accuracy	Download
CE	-	61.7	72.2	63.0	57.2	-
Decouple	-	65.9	65.0	66.3	65.5	-
RIDE	3	71.2	70.2	71.2	71.6	Link
RIDE + Distill	4	72.6	70.9	72.5	73.1	Link
Teacher Model	6	72.9	71.1	72.9	73.3	Link

iNaturalist (Longer Training)

RIDE + Distill: ResNet 50 backbone, 4 experts, with EA, 200 epochs (distilled from 6 experts, 200 epochs).
RIDE: ResNet 50 backbone with 6 experts, without EA, 300 epochs.

Model	#Experts	Overall Accuracy	Many Accuracy	Medium Accuracy	Few Accuracy	Download
RIDE + Distill	4	73.2	70.5	73.7	73.3	Link
RIDE	6	74.6	71.0	75.7	74.3	Link

After downloading the checkpoints, you could run evaluation by following the instructions in the test section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

Model Zoo

Imbalanced CIFAR 100/CIFAR-LT 100 (100 epochs)

ImageNet-LT (100 epochs)

iNaturalist (100 epochs)

iNaturalist (Longer Training)

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

Model Zoo

Imbalanced CIFAR 100/CIFAR-LT 100 (100 epochs)

ImageNet-LT (100 epochs)

iNaturalist (100 epochs)

iNaturalist (Longer Training)