We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.
Backbone | Batch Size | Step/Total Epochs | Mem (GB) | Inf time (fps) | box AP | Config | Download |
---|---|---|---|---|---|---|---|
HourglassNet-104 | 10 x 5 | 180/210 | 13.9 | 4.2 | 41.2 | config | model | log |
HourglassNet-104 | 8 x 6 | 180/210 | 15.9 | 4.2 | 41.2 | config | model | log |
HourglassNet-104 | 32 x 3 | 180/210 | 9.5 | 3.9 | 40.4 | config | model | log |
Note:
- TTA setting is single-scale and
flip=True
. - Experiments with
images_per_gpu=6
are conducted on Tesla V100-SXM2-32GB,images_per_gpu=3
are conducted on GeForce GTX 1080 Ti. - Here are the descriptions of each experiment setting:
- 10 x 5: 10 GPUs with 5 images per gpu. This is the same setting as that reported in the original paper.
- 8 x 6: 8 GPUs with 6 images per gpu. The total batchsize is similar to paper and only need 1 node to train.
- 32 x 3: 32 GPUs with 3 images per gpu. The default setting for 1080TI and need 4 nodes to train.
@inproceedings{law2018cornernet,
title={Cornernet: Detecting objects as paired keypoints},
author={Law, Hei and Deng, Jia},
booktitle={15th European Conference on Computer Vision, ECCV 2018},
pages={765--781},
year={2018},
organization={Springer Verlag}
}