Setup
pip install -r requirements.txt
The following is the corresponding torch
and torchvision
versions.
rtdetr |
torch |
torchvision |
---|---|---|
- |
2.4 |
0.19 |
- |
2.2 |
0.17 |
- |
2.1 |
0.16 |
- |
2.0 |
0.15 |
Model | Dataset | Input Size | APval | AP50val | #Params(M) | FPS | config | checkpoint |
---|---|---|---|---|---|---|---|---|
RT-DETRv2-S | COCO | 640 | 48.1 (+1.6) | 65.1 | 20 | 217 | config | url |
RT-DETRv2-M* | COCO | 640 | 49.9 (+1.0) | 67.5 | 31 | 161 | config | url |
RT-DETRv2-M | COCO | 640 | 51.9 (+0.6) | 69.9 | 36 | 145 | config | url |
RT-DETRv2-L | COCO | 640 | 53.4 (+0.3) | 71.6 | 42 | 108 | config | url |
RT-DETRv2-X | COCO | 640 | 54.3 | 72.8 (+0.1) | 76 | 74 | config | url |
Notes:
-
AP
is evaluated on MSCOCO val2017 dataset. -
FPS
is evaluated on a single T4 GPU with$batch\_size = 1$ ,$fp16$ , and$TensorRT>=8.5.1$ . -
COCO + Objects365
in the table means finetuned model onCOCO
using pretrained weights trained onObjects365
.
Model | Sampling Method | APval | AP50val | config | checkpoint |
---|---|---|---|---|---|
RT-DETRv2-S_dsp | discrete_sampling | 47.4 | 64.8 (-0.1) | config | url |
RT-DETRv2-M*_dsp | discrete_sampling | 49.2 | 67.1 (-0.4) | config | url |
RT-DETRv2-M_dsp | discrete_sampling | 51.4 | 69.7 (-0.2) | config | url |
RT-DETRv2-L_dsp | discrete_sampling | 52.9 | 71.3 (-0.3) | config | url |
Notes:
- The impact on inference speed is related to specific device and software.
*_dsp*
is the model inherit*_sp*
model's knowledge and adapt todiscrete_sampling
strategy. You can use TensorRT 8.4 (or even older versions) to inference for these models
Model | Sampling Method | #Points | APval | AP50val | checkpoint |
---|---|---|---|---|---|
rtdetrv2_r18vd_sp1 | grid_sampling | 21,600 | 47.3 | 64.3 (-0.6) | url |
rtdetrv2_r18vd_sp2 | grid_sampling | 43,200 | 47.7 | 64.7 (-0.2) | url |
rtdetrv2_r18vd_sp3 | grid_sampling | 64,800 | 47.8 | 64.8 (-0.1) | url |
rtdetrv2_r18vd(_sp4) | grid_sampling | 86,400 | 47.9 | 64.9 | url |
Notes:
- The impact on inference speed is related to specific device and software.
#points
the total number of sampling points in decoder for per image inference.
details
- Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config --use-amp --seed=0 &> log.txt 2>&1 &
- Testing
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -r path/to/checkpoint --test-only
- Tuning
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -t path/to/checkpoint --use-amp --seed=0 &> log.txt 2>&1 &
- Export onnx
python tools/export_onnx.py -c path/to/config -r path/to/checkpoint --check
- Inference
Support torch, onnxruntime, tensorrt and openvino, see details in references/deploy
python references/deploy/rtdetrv2_onnx.py --onnx-file=model.onnx --im-file=xxxx
python references/deploy/rtdetrv2_tensorrt.py --trt-file=model.trt --im-file=xxxx
python references/deploy/rtdetrv2_torch.py -c path/to/config -r path/to/checkpoint --im-file=xxx --device=cuda:0
If you use RTDETR
or RTDETRv2
in your work, please use the following BibTeX entries:
bibtex
@misc{lv2023detrs,
title={DETRs Beat YOLOs on Real-time Object Detection},
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
year={2023},
eprint={2304.08069},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{lv2024rtdetrv2improvedbaselinebagoffreebies,
title={RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer},
author={Wenyu Lv and Yian Zhao and Qinyao Chang and Kui Huang and Guanzhong Wang and Yi Liu},
year={2024},
eprint={2407.17140},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.17140},
}