Real-Time Object Detection using YoloV3 has been one of the favorites techniques used by researchers and developers working in Computer Vision, reportedly, because it's very fast and accurate compared to other Object Detection Systems such as SSD513, R-FCN, RetinaNet, etc (Fig. 1).
Figure 1: Comparison of Inference time between YOLOv3 with other systems on COCO dataset SourceA dataset of ≈ 5800 images was prepared to train the model, mostly scraped from Google and Bing, in addition to some frames extracted from video footages.
- Google: Download All Images Chrome extension
- Bing: bing image downloader
- Video Frame Extractor using openCV.
I ended up labeling about 60% of the dataset (it took me a serious time to finish that far).
- OpenLabeling tool was used.
Due to lack of computing resources the model training was executed on Google Colab. Training has been run using a modified version of Darknet by AlexeyAB. The hardware specification provided in Google Colab environment are (source)
Figure 2: YoloV3 configuration files for Darkent on Colab notebook.for the sake of comparison below are the result of my own trained weight model vs. the pre-trained model: If you want to insert images, this is how you do it: