Name		Name	Last commit message	Last commit date
parent directory ..
dependencies		dependencies
model		model
README.md		README.md

README.md

RetinaNet

Description

RetinaNet is a single-stage object detection model.

Model

Model	Download	Download (with sample test data)	ONNX version	Opset version	Accuracy
RetinaNet (ResNet101 backbone)	228.4 MB	153.3 MB	1.6.0	9	mAP 0.376

Inference

A sample script for ONNX model conversion and ONNXRuntime inference can be found here.

Input

The model expects mini-batches of 3-channel input images of shape (N x 3 x H x W), where N is batch size.

Preprocessing

The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferably happen at preprocessing.

The following code shows how to preprocess a NCHW tensor:

from torchvision import transforms

preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
# Create a mini-batch as expected by the model.
input_batch = input_tensor.unsqueeze(0)

Output

Model has 2 outputs:

Classification heads: 5 tensors of rank 4, each tensor corresponding to the classifying anchor box heads of one feature level in the feature pyramid network.

Bounding box regression heads: 5 tensors of rank 4, each tensor corresponding to the regressions (from anchor boxes to object boxes) of one feature level in the feature pyramid network.

Output sizes depend on input images sizes and model (subnetworks convolutional layers) parameters.

Output dimensions for an image mini-batch of size [1, 3, 480, 640]:

Class heads sizes: [1, 720, 60, 80], [1, 720, 30, 40], [1, 720, 15, 20], [1, 720, 8, 10], [1, 720, 4, 5]
Regression box heads sizes: [1, 36, 60, 80], [1, 36, 30, 40], [1, 36, 15, 20], [1, 36, 8, 10], [1, 36, 4, 5]

Postprocessing

The following script from NVIDIA/retinanet-examples shows how to:

Generate anchor boxes.
Decode and then filter box predictions from at most 1k top-scoring predictions per level, with confidence threshold of 0.05.
Apply non-maximum suppression on anchor boxes to get ground-truth boxes, scores, and labels.

import torch
from retinanet.box import generate_anchors, decode, nms

def detection_postprocess(image, cls_heads, box_heads):
	# Inference post-processing
	anchors = {}
	decoded = []

	for cls_head, box_head in zip(cls_heads, box_heads):
	    # Generate level's anchors
	    stride = image.shape[-1] // cls_head.shape[-1]
	    if stride not in anchors:
	        anchors[stride] = generate_anchors(stride, ratio_vals=[1.0, 2.0, 0.5],
	                                           scales_vals=[4 * 2 ** (i / 3) for i in range(3)])
	    # Decode and filter boxes
	    decoded.append(decode(cls_head, box_head, stride,
	                          threshold=0.05, top_n=1000, anchors=anchors[stride]))

	# Perform non-maximum suppression
	decoded = [torch.cat(tensors, 1) for tensors in zip(*decoded)]
	# NMS threshold = 0.5
	scores, boxes, labels = nms(*decoded, nms=0.5, ndetections=100)
	return scores, boxes, labels


scores, boxes, labels = detection_postprocess(input_image, cls_heads, box_heads)

Dataset

Model backbone is initialized using pre-trained pytorch/vision ResNet101 model. This model is pre-trained on the ImageNet dataset.

Validation accuracy

The accuracies obtained by the model on the validation set is provided by NVIDIA/retinanet-examples. Metric is COCO mAP (averaged over IoU of 0.5:0.95), computed over COCO 2017 val data.

Publication/Attribution

Focal Loss for Dense Object Detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár. arXiv, 2017.

References

This model is converted directly from NVIDIA/retinanet-examples.

License

BSD 3-Clause "New" or "Revised" License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retinanet

retinanet

README.md

RetinaNet

Description

Model

Inference

Input

Preprocessing

Output

Postprocessing

Dataset

Validation accuracy

Publication/Attribution

References

License

Files

retinanet

Directory actions

More options

Directory actions

More options

Latest commit

History

retinanet

Folders and files

parent directory

README.md

RetinaNet

Description

Model

Inference

Input

Preprocessing

Output

Postprocessing

Dataset

Validation accuracy

Publication/Attribution

References

License