The Pytorch implementation is WongKinYiu/yolov9.
- YOLOv9-c:
- FP32
- FP16
- INT8
- YOLOv9-e:
- FP32
- FP16
- INT8
- GELAN-c:
- FP32
- FP16
- INT8
- GELAN-e:
- FP32
- FP16
- INT8
- TensorRT 8.0+
- OpenCV 3.4.0+
The speed test is done on a desktop with R7-5700G CPU and RTX 4060Ti GPU. The input size is 640x640. The FP32, FP16 and INT8 models are tested. The time only includes the inference time, not includes the pre-processing and post-processing. The time is the average of 1000 times inference.
frame | Model | FP32 | FP16 | INT8 |
pytorch | YOLOv9-c | - | 15.5ms | - |
pytorch | YOLOv9-e | - | 19.7ms | - |
tensorrt | YOLOv9-c | 13.5ms | 4.6ms | 3.0ms |
tensorrt | YOLOv9-e | 8.3ms | 3.2ms | 2.15ms |
GELAN will be updated later.
YOLOv9-e is faster than YOLOv9-c in tensorrt, because the YOLOv9-e requires fewer layers of inference.
[[31, 34, 37, 16, 19, 22], 1, DualDDetect, [nc]] # [A3, A4, A5, P3, P4, P5]
[[35, 32, 29, 42, 45, 48], 1, DualDDetect, [nc]]
In DualDDetect, the A3, A4, A5, P3, P4, P5 are the output of the backbone. The first 3 layers are used for the inference of the final result.
The YOLOv9-c requires 37 layers of inference, but YOLOv9-e requires 35 layers of inference.
- generate .wts from pytorch with .pt, or download .wts from model zoo
// download
cp {tensorrtx}/yolov9/ {yolov9}/yolov9
cd {yolov9}/yolov9
// a file 'yolov9.wts' will be generated.
- build tensorrtx/yolov9 and run
cd {tensorrtx}/yolov9/
// update kNumClass in config.h if your model is trained on custom dataset
mkdir build
cd build
cp {ultralytics}/ultralytics/yolov9.wts {tensorrtx}/yolov9/build
cmake ..
sudo ./yolov9 -s [.wts] [.engine] [c/e] // serialize model to plan file
sudo ./yolov9 -d [.engine] [image folder] // deserialize and run inference, the images in [image folder] will be processed.
// For example yolov9
sudo ./yolov9 -s yolov9-c.wts yolov9-c.engine c
sudo ./yolov9 -d yolov9-c.engine ../images
check the images generated, as follows. _zidane.jpg and _bus.jpg
optional, load and run the tensorrt model in python
// install python-tensorrt, pycuda, etc.
// ensure the yolov9.engine and have been built
Prepare calibration images, you can randomly select 1000s images from your train set. For coco, you can also download my calibration images
from GoogleDrive or BaiduPan pwd: a9wh -
unzip it in yolov9/build
set the macro
in config.h and change the path of calibration images in config.h, such as 'gCalibTablePath="./coco_calib/";' -
serialize the model and test
See the readme in home page.