Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Point2RBox #971

Open
wants to merge 10 commits into
base: dev-1.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions configs/point2rbox/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Point2RBox

> [Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision](https://arxiv.org/pdf/2311.14758)

<!-- [ALGORITHM] -->

## Abstract

<div align=center>
<img src="https://raw.githubusercontent.com/zytx121/image-host/main/imgs/point2rbox.png" width="800"/>
</div>

With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labelled point on the image, we transfer the object feature to synthetic visual patterns with the known bounding box to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.

## Basic patterns

Extract [basic_patterns.zip](https://github.com/open-mmlab/mmrotate/files/13816461/basic_patterns.zip) to data folder. The path can also be modified in config files.

## Results and models

DOTA1.0

| Backbone | AP50 | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size | Configs | Download |
| :----------------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| ResNet50 (1024,1024,200) | 41.87 | 1x | 16.12 | 111.7 | - | 2 | [point2rbox-yolof-dota](./point2rbox-yolof-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dota/point2rbox-yolof-dota-c94da82d.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dota/point2rbox-yolof-dota.json) |

DIOR

| Backbone | AP50 | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size | Configs | Download |
| :----------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| ResNet50 (800,800) | 27.34 | 1x | 10.38 | 127.3 | - | 2 | [point2rbox-yolof-dior](./point2rbox-yolof-dior.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dior/point2rbox-yolof-dior-f4f724df.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dior/point2rbox-yolof-dior.json) |

HRSC

| Backbone | AP50 | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size | Configs | Download |
| :----------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| ResNet50 (800,800) | 79.40 | 6x | 9.60 | 136.9 | - | 2 | [point2rbox-yolof-hrsc](./point2rbox-yolof-hrsc.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-hrsc/point2rbox-yolof-hrsc-9d096323.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-hrsc/point2rbox-yolof-hrsc.json) |

## Citation

```
@misc{yu2023point2rbox,
title={Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision},
author={Yi Yu and Xue Yang and Qingyun Li and Feipeng Da and Junchi Yan and Jifeng Dai and Yu Qiao},
year={2023},
eprint={2311.14758},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
50 changes: 50 additions & 0 deletions configs/point2rbox/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
Collections:
- Name: point2rbox
Metadata:
Training Data: DOTAv1.0
Training Techniques:
- AdamW
Training Resources: 1x GeForce RTX 4090
Architecture:
- ResNet
Paper:
URL: https://arxiv.org/pdf/2311.14758.pdf
Title: 'Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision'
README: configs/point2rbox/README.md

Models:
- Name: point2rbox-yolof-dota
In Collection: point2rbox
Config: configs/point2rbox/point2rbox-yolof-dota.py
Metadata:
Training Data: DOTAv1.0
Results:
- Task: Oriented Object Detection
Dataset: DOTAv1.0
Metrics:
mAP: 41.87
Weights: https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dota/point2rbox-yolof-dota-c94da82d.pth

- Name: point2rbox-yolof-dior
In Collection: point2rbox
Config: configs/point2rbox/point2rbox-yolof-dior.py
Metadata:
Training Data: DIOR
Results:
- Task: Oriented Object Detection
Dataset: DIOR
Metrics:
mAP: 27.34
Weights: https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dior/point2rbox-yolof-dior-f4f724df.pth

- Name: point2rbox-yolof-hrsc
In Collection: point2rbox
Config: configs/point2rbox/point2rbox-yolof-hrsc.py
Metadata:
Training Data: HRSC
Results:
- Task: Oriented Object Detection
Dataset: HRSC
Metrics:
mAP: 79.40
Weights: https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-hrsc/point2rbox-yolof-hrsc-9d096323.pth
156 changes: 156 additions & 0 deletions configs/point2rbox/point2rbox-yolof-dior.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
_base_ = [
'../_base_/datasets/dior.py', '../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
model = dict(
type='Point2RBoxYOLOF',
crop_size=(800, 800),
prob_rot=0.95 * 0.7,
prob_flp=0.05 * 0.7,
sca_fact=1.0,
sca_range=(0.5, 1.5),
basic_pattern='data/basic_patterns/dior',
dense_cls=[],
use_setrc=False,
use_setsk=True,
data_preprocessor=dict(
type='mmdet.DetDataPreprocessor',
mean=[103.530, 116.280, 123.675],
std=[1.0, 1.0, 1.0],
bgr_to_rgb=False,
pad_size_divisor=32),
backbone=dict(
type='mmdet.ResNet',
depth=50,
num_stages=4,
strides=(1, 2, 2, 1), # DC5
dilations=(1, 1, 1, 2),
out_indices=(3, ),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron/resnet50_caffe')),
neck=dict(
type='mmdet.DilatedEncoder',
in_channels=2048,
out_channels=512,
block_mid_channels=128,
num_residual_blocks=4,
block_dilations=[2, 4, 6, 8]),
bbox_head=dict(
type='Point2RBoxYOLOFHead',
num_classes=20,
in_channels=512,
reg_decoded_bbox=True,
num_cls_convs=4,
num_reg_convs=8,
use_objectness=False,
agnostic_cls=[2, 5, 9, 14, 15],
square_cls=[],
anchor_generator=dict(
type='mmdet.AnchorGenerator',
ratios=[1.0],
scales=[8, 8, 8, 8, 8, 8, 8],
strides=[16]),
bbox_coder=dict(
type='mmdet.DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1., 1., 1., 1.],
add_ctr_clamp=True,
ctr_clamp=16),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='mmdet.GIoULoss', loss_weight=1.0),
loss_angle=dict(type='mmdet.L1Loss', loss_weight=0.3),
loss_scale_ss=dict(type='mmdet.GIoULoss', loss_weight=0.02)),
# training and testing settings
train_cfg=dict(
assigner=dict(
type='Point2RBoxAssigner',
pos_ignore_thr=0.15,
neg_ignore_thr=0.7,
match_times=4),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms_rotated', iou_threshold=0.1),
max_per_img=2000))

# optimizer
optim_wrapper = dict(
optimizer=dict(
_delete_=True,
type='AdamW',
lr=0.00005,
betas=(0.9, 0.999),
weight_decay=0.05),
paramwise_cfg=dict(
norm_decay_mult=0., custom_keys={'backbone': dict(lr_mult=1. / 3)}))

train_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args={{_base_.backend_args}}),
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='mmdet.FixShapeResize', width=800, height=800, keep_ratio=True),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(type='RBox2Point'),
dict(
type='mmdet.RandomFlip',
prob=0.75,
direction=['horizontal', 'vertical', 'diagonal']),
dict(type='RandomRotate', prob=1, angle_range=180),
dict(type='mmdet.RandomShift', prob=0.5, max_shift_px=16),
dict(type='mmdet.PackDetInputs')
]

dataset_type = 'DIORDataset'
data_root = 'data/dior/'
train_dataloader = dict(
batch_size=4,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=None,
dataset=dict(
type='ConcatDataset',
ignore_keys=['DATASET_TYPE'],
datasets=[
dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/train.txt',
data_prefix=dict(img_path='JPEGImages-trainval'),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline),
dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/val.txt',
data_prefix=dict(img_path='JPEGImages-trainval'),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline,
backend_args=_base_.backend_args)
]))

train_cfg = dict(type='EpochBasedTrainLoop', val_interval=12)

val_dataloader = dict(batch_size=4, num_workers=4)

val_evaluator = dict(type='DOTAMetric', metric='mAP', iou_thrs=[0.25, 0.5])

# default_hooks = dict(logger=dict(type='LoggerHook', interval=30))

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (8 GPUs) x (8 samples per GPU)
auto_scale_lr = dict(base_batch_size=64)
129 changes: 129 additions & 0 deletions configs/point2rbox/point2rbox-yolof-dota.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
_base_ = [
'../_base_/datasets/dota.py', '../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
model = dict(
type='Point2RBoxYOLOF',
crop_size=(1024, 1024),
prob_rot=0.95 * 0.7,
prob_flp=0.05 * 0.7,
sca_fact=0.4,
sca_range=(0.5, 1.5),
basic_pattern='data/basic_patterns/dota',
dense_cls=[4, 5, 6, 9],
use_setrc=False,
use_setsk=True,
data_preprocessor=dict(
type='mmdet.DetDataPreprocessor',
mean=[103.530, 116.280, 123.675],
std=[1.0, 1.0, 1.0],
bgr_to_rgb=False,
pad_size_divisor=32),
backbone=dict(
type='mmdet.ResNet',
depth=50,
num_stages=4,
strides=(1, 2, 2, 1), # DC5
dilations=(1, 1, 1, 2),
out_indices=(3, ),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron/resnet50_caffe')),
neck=dict(
type='mmdet.DilatedEncoder',
in_channels=2048,
out_channels=512,
block_mid_channels=128,
num_residual_blocks=4,
block_dilations=[2, 4, 6, 8]),
bbox_head=dict(
type='Point2RBoxYOLOFHead',
num_classes=15,
in_channels=512,
reg_decoded_bbox=True,
num_cls_convs=4,
num_reg_convs=8,
use_objectness=False,
agnostic_cls=[1, 9, 11],
square_cls=[0],
anchor_generator=dict(
type='mmdet.AnchorGenerator',
ratios=[1.0],
scales=[4, 4, 4, 4, 4],
strides=[16]),
bbox_coder=dict(
type='mmdet.DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1., 1., 1., 1.],
add_ctr_clamp=True,
ctr_clamp=16),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='mmdet.GIoULoss', loss_weight=2.0),
loss_angle=dict(type='mmdet.L1Loss', loss_weight=0.6),
loss_scale_ss=dict(type='mmdet.GIoULoss', loss_weight=0.04)),
# training and testing settings
train_cfg=dict(
assigner=dict(
type='Point2RBoxAssigner',
pos_ignore_thr=0.15,
neg_ignore_thr=0.7,
match_times=2),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms_rotated', iou_threshold=0.1),
max_per_img=2000))

# optimizer
optim_wrapper = dict(
optimizer=dict(
_delete_=True,
type='AdamW',
lr=0.00005,
betas=(0.9, 0.999),
weight_decay=0.05),
paramwise_cfg=dict(
norm_decay_mult=0., custom_keys={'backbone': dict(lr_mult=1. / 3)}))

train_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args={{_base_.backend_args}}),
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
dict(type='RBox2Point'),
dict(
type='mmdet.RandomFlip',
prob=0.75,
direction=['horizontal', 'vertical', 'diagonal']),
dict(type='mmdet.RandomShift', prob=0.5, max_shift_px=16),
dict(type='mmdet.PackDetInputs')
]

train_cfg = dict(type='EpochBasedTrainLoop', val_interval=12)

train_dataloader = dict(
batch_size=4, num_workers=4, dataset=dict(pipeline=train_pipeline))

val_dataloader = dict(batch_size=4, num_workers=4)

val_evaluator = dict(type='DOTAMetric', metric='mAP', iou_thrs=[0.25, 0.5])

# default_hooks = dict(logger=dict(type='LoggerHook', interval=30))

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (8 GPUs) x (8 samples per GPU)
auto_scale_lr = dict(base_batch_size=64)
Loading
Loading