EfficientViT for Object Detection and Instance Segmentation

The codebase implements the object detection and instance segmentation framework with MMDetection, using EfficientViT as the backbone.

Model Zoo

RetinaNet Object Detection

Model	Pretrain	Lr Schd	Box AP	AP@50	AP@75	Config	Link
EfficientViT-M4	ImageNet-1k	1x	32.7	52.2	34.1	config	model/log

Mask R-CNN Instance Segmentation

Model	Pretrain	Lr Schd	Mask AP	AP@50	AP@75	Config	Link
EfficientViT-M4	ImageNet-1k	1x	31.0	51.2	32.2	config	model/log

Get Started

Please follow the following steps to setup EfficientViT for downstream tasks.

Install requirements

Install mmcv-full and MMDetection via MIM:

pip install -U openmim
mim install mmcv-full
mim install mmdet

Data preparation

Prepare COCO 2017 dataset according to the instructions in MMDetection. The dataset should be organized as

downstream
├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017

Evaluation

Firstly, prepare the MSCOCO pretrained models by downloading from the model-zoo.

Below are the instructions for evaluating the models on MSCOCO 2017 val set:

Object Detection

To evaluate the RetinaNet model with EfficientViT_M4 as backbone, run:

bash ./dist_test.sh configs/retinanet_efficientvit_m4_fpn_1x_coco.py ./retinanet_efficientvit_m4_fpn_1x_coco.pth 8 --eval bbox

where 8 refers to the number of GPUs. For the usage of more arguments, please refer to MMDetection.

Instance Segmentation

To evaluate the Mask R-CNN model with EfficientViT_M4 as backbone, run:

bash ./dist_test.sh configs/mask_rcnn_efficientvit_m4_fpn_1x_coco.py ./mask_rcnn_efficientvit_m4_fpn_1x_coco.pth 8 --eval bbox segm

where 8 refers to the number of GPUs. For the usage of more arguments, please refer to MMDetection.

Training

Firstly, prepare the ImageNet-1k pretrained EfficientViT-M4 model by downloading from the model-zoo.

Below are the instructions for training the models on MSCOCO 2017 train set:

Object Detection

To train the RetinaNet model with EfficientViT_M4 as backbone on a single machine using multi-GPUs, run:

bash ./dist_train.sh configs/retinanet_efficientvit_m4_fpn_1x_coco.py 8 --cfg-options model.backbone.pretrained=$PATH_TO_IMGNET_PRETRAIN_MODEL

where 8 refers to the number of GPUs. For the usage of more arguments, please refer to MMDetection.

Instance Segmentation

To train the Mask R-CNN model with EfficientViT_M4 as backbone on a single machine using multi-GPUs, run:

bash ./dist_train.sh configs/mask_rcnn_efficientvit_m4_fpn_1x_coco.py 8 --cfg-options model.backbone.pretrained=$PATH_TO_IMGNET_PRETRAIN_MODEL

where 8 refers to the number of GPUs. For the usage of more arguments, please refer to MMDetection.

Acknowledge

The downstream task implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

MMDetection, Swin-Transformer-Object-Detection, PoolFormer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EfficientViT for Object Detection and Instance Segmentation

Model Zoo

RetinaNet Object Detection

Mask R-CNN Instance Segmentation

Get Started

Install requirements

Data preparation

Evaluation

Training

Acknowledge

Files

README.md

Latest commit

History

README.md

File metadata and controls

EfficientViT for Object Detection and Instance Segmentation

Model Zoo

RetinaNet Object Detection

Mask R-CNN Instance Segmentation

Get Started

Install requirements

Data preparation

Evaluation

Training

Acknowledge