The codebase implements the image classification with EfficientViT.
Model | Data | Input | Acc@1 | Acc@5 | #FLOPs | #Params | Throughput | Link |
---|---|---|---|---|---|---|---|---|
EfficientViT-M0 | ImageNet-1k | 224x224 | 63.2 | 85.2 | 79 | 2.3M | 27644 | model/log/onnx |
EfficientViT-M1 | ImageNet-1k | 224x224 | 68.4 | 88.7 | 167 | 3.0M | 20093 | model/log/onnx |
EfficientViT-M2 | ImageNet-1k | 224x224 | 70.8 | 90.2 | 201 | 4.2M | 18218 | model/log/onnx |
EfficientViT-M3 | ImageNet-1k | 224x224 | 73.4 | 91.4 | 263 | 6.9M | 16644 | model/log/onnx |
EfficientViT-M4 | ImageNet-1k | 224x224 | 74.3 | 91.8 | 299 | 8.8M | 15914 | model/log/onnx |
EfficientViT-M5 | ImageNet-1k | 224x224 | 77.1 | 93.4 | 522 | 12.4M | 10621 | model/log/onnx |
Run the following command to install the dependences:
pip install -r requirements.txt
We need to prepare ImageNet-1k dataset from http://www.image-net.org/
.
- ImageNet-1k
ImageNet-1k contains 1.28 M images for training and 50 K images for validation. The images shall be stored as individual files:
ImageNet/
├── train
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
...
├── val
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
...
Our code also supports storing the train set and validation set as the *.tar
archives:
ImageNet/
├── train.tar
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
...
└── val.tar
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
...
Before evaluation, we need to prepare the pre-trained models from model-zoo.
Run the following command to evaluate a pre-trained EfficientViT-M4 on ImageNet val with a single GPU:
python main.py --eval --model EfficientViT_M4 --resume ./efficientvit_m4.pth --data-path $PATH_TO_IMAGENET
This should give
* Acc@1 74.266 Acc@5 91.788 loss 1.242
Here are the command lines for evaluating other pre-trained models:
EfficientViT-M0
python main.py --eval --model EfficientViT_M0 --resume ./efficientvit_m0.pth --data-path $PATH_TO_IMAGENET
giving
* Acc@1 63.296 Acc@5 85.150 loss 1.741
EfficientViT-M1
python main.py --eval --model EfficientViT_M1 --resume ./efficientvit_m1.pth --data-path $PATH_TO_IMAGENET
giving
* Acc@1 68.356 Acc@5 88.672 loss 1.513
EfficientViT-M2
python main.py --eval --model EfficientViT_M2 --resume ./efficientvit_m2.pth --data-path $PATH_TO_IMAGENET
giving
* Acc@1 70.786 Acc@5 90.150 loss 1.442
EfficientViT-M3
python main.py --eval --model EfficientViT_M3 --resume ./efficientvit_m3.pth --data-path $PATH_TO_IMAGENET
giving
* Acc@1 73.390 Acc@5 91.350 loss 1.285
EfficientViT-M5
python main.py --eval --model EfficientViT_M5 --resume ./efficientvit_m5.pth --data-path $PATH_TO_IMAGENET
giving
* Acc@1 77.124 Acc@5 93.360 loss 1.127
To train an EfficientViT-M4 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M4 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M0
To train an EfficientViT-M0 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M0 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M1
To train an EfficientViT-M1 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M1 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M2
To train an EfficientViT-M2 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M2 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M3
To train an EfficientViT-M3 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M3 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M5
To train an EfficientViT-M5 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:
python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M5 --data-path $PATH_TO_IMAGENET --dist-eval
Run the following command to compare the throughputs on GPU/CPU:
python speed_test.py
which should give
EfficientViT_M0 cuda:0 27643.941865437002 images/s @ batch size 2048
EfficientViT_M1 cuda:0 20093.286204638334 images/s @ batch size 2048
EfficientViT_M2 cuda:0 18218.347390415714 images/s @ batch size 2048
EfficientViT_M3 cuda:0 16643.905520424512 images/s @ batch size 2048
EfficientViT_M4 cuda:0 15914.449955135608 images/s @ batch size 2048
EfficientViT_M5 cuda:0 10620.868156518267 images/s @ batch size 2048
We sincerely appreciate Swin Transformer, LeViT, pytorch-image-models, and PyTorch for their awesome codebases.