Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of BasicTAD #2638

Open
wants to merge 10 commits into
base: dev-1.x
Choose a base branch
from
8 changes: 5 additions & 3 deletions configs/localization/bsn/metafile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,18 @@ Collections:
Models:
- Name: bsn_400x100_1xb16_20e_activitynet_feature (cuhk_mean_100)
Config:
- configs/localization/bsn/bsn_tem_1xb16-400x100-20e_activitynet-feature.py
- configs/localization/bsn/bsn_pgm_400x100_activitynet-feature.py
- configs/localization/bsn/bsn_pem_1xb16-400x100-20e_activitynet-feature.py
configs/localization/bsn/bsn_pem_1xb16-400x100-20e_activitynet-feature.py
In Collection: BSN
Metadata:
Batch Size: 16
Epochs: 20
Training Data: ActivityNet v1.3
Training Resources: 1 GPU
feature: cuhk_mean_100
configs:
- configs/localization/bsn/bsn_tem_1xb16-400x100-20e_activitynet-feature.py
- configs/localization/bsn/bsn_pgm_400x100_activitynet-feature.py
- configs/localization/bsn/bsn_pem_1xb16-400x100-20e_activitynet-feature.py
Modality: RGB
Results:
- Dataset: ActivityNet v1.3
Expand Down
2 changes: 1 addition & 1 deletion docs/en/user_guides/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ model = dict(
MMAction2 supports UCF101, Kinetics-400, Moments in Time, Multi-Moments in Time, THUMOS14,
Something-Something V1&V2, ActivityNet Dataset.
The users may need to adapt one of the above datasets to fit their special datasets.
You could refer to [Prepare Dataset](prepare_dataset.md) and [Customize Datast](../advanced_guides/customize_dataset.md) for more details.
You could refer to [Prepare Dataset](prepare_dataset.md) and [Customize Dataset](../advanced_guides/customize_dataset.md) for more details.
In our case, UCF101 is already supported by various dataset types, like `VideoDataset`,
so we change the config as follows.

Expand Down
Empty file added projects/__init__.py
Empty file.
133 changes: 133 additions & 0 deletions projects/basic_tad/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# BasicTAD

This project implement the BasicTAD model in MMAction2. Please refer to the [official repo](https://github.com/MCG-NJU/BasicTAD) and [paper](https://arxiv.org/abs/2205.02717) for details.


## Usage

### Setup Environment

Please refer to [Get Started](https://mmaction2.readthedocs.io/en/latest/get_started/installation.html) to install MMAction2 and MMDetection.

At first, add the current folder to `PYTHONPATH`, so that Python can find your code. Run command in the current directory to add it.

> Please run it every time after you opened a new shell.

```shell
export PYTHONPATH=`pwd`:$PYTHONPATH
```

### Data Preparation

Prepare the THUMOS14 dataset according to the [instruction](https://github.com/open-mmlab/mmaction2/blob/main/tools/data/thumos14/README.md).

### Training commands

**To train with single GPU:**

```bash
mim train mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py
```

**To train with multiple GPUs:**

```bash
mim train mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --launcher pytorch --gpus 8
```

**To train with multiple GPUs by slurm:**

```bash
mim train mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
```

### Testing commands

**To test with single GPU:**

```bash
mim test mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --checkpoint $CHECKPOINT
```

**To test with multiple GPUs:**

```bash
mim test configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8
```

**To test with multiple GPUs by slurm:**

```bash
mim test mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --checkpoint $CHECKPOINT --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
```

> Replace the $CHECKPOINT with the trained model path, e.g., work_dirs/basicTAD_slowonly_96x10_1200e_thumos14_rgb/latest.pth.

## Results
### THMOS14
| frame sampling strategy | resolution | gpus | backbone | pretrain | [email protected] | avg. mAP | testing protocol | config | ckpt | log |
| :---------------------: | :--------: | :--: | :------: | :------: |:-------:|:--------:| :----------------: | :-------------------------------------------: | -------------------------------------: | -----------------------------: |
| 1x96x10 | 112x112 | 2 | SlowOnly | Kinetics | 50.4 | 47.9 | 1 clips x 1 crop | [config](./configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py) | todo | todo |

> Due to the limit of the computing resources, we only train the model in a simple setting (in terms of spatial-temporal resolution, testing augmentation, etc.). To reproduce the results in the paper, please refer to [setting](https://github.com/MCG-NJU/BasicTAD/blob/main/configs/trainval/basictad/thumos14/basictad_slowonly_e700_thumos14_rgb_192win_anchor_based.py) used the official repo.

> In fact, the main idea of [BasicTAD](https://arxiv.org/abs/2205.02717) lies on its modular design rather than innovating some sophisticated architecture/modules.

> Currently we only support anchor-based basicTAD model on THUMOS14. The anchor-free version is in the plan.

> `avg. mAP` refer to the averaged mAP over IoU=(0.3, 0.4, 0.5, 0.6, 0.7).
## Citation

<!-- Replace to the citation of the paper your project refers to. -->

```bibtex
@article{yang2023basictad,
title={Basictad: an astounding rgb-only baseline for temporal action detection},
author={Yang, Min and Chen, Guo and Zheng, Yin-Dong and Lu, Tong and Wang, Limin},
journal={Computer Vision and Image Understanding},
volume={232},
pages={103692},
year={2023},
publisher={Elsevier}
}
```

## Checklist

Here is a checklist of this project's progress, and you can ignore this part if you don't plan to contribute to MMAction2 projects.

- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.

- [x] Finish the code

<!-- The code's design shall follow existing interfaces and convention. For example, each model component should be registered into `mmaction.registry.MODELS` and configurable via a config file. -->

- [x] Basic docstrings & proper citation

<!-- Each major class should contains a docstring, describing its functionality and arguments. If your code is copied or modified from other open-source projects, don't forget to cite the source project in docstring and make sure your behavior is not against its license. Typically, we do not accept any code snippet under GPL license. [A Short Guide to Open Source Licenses](https://medium.com/nationwide-technology/a-short-guide-to-open-source-licenses-cf5b1c329edd) -->

- [ ] Converted checkpoint and results (Only for reproduction)

<!-- If you are reproducing the result from a paper, make sure the model in the project can match that results. Also please provide checkpoint links or a checkpoint conversion script for others to get the pre-trained model. -->

- [x] Milestone 2: Indicates a successful model implementation.

- [x] Training results

<!-- If you are reproducing the result from a paper, train your model from scratch and verified that the final result can match the original result. Usually, ±0.1% is acceptable for the action recognition task on Kinetics400. -->

- [ ] Milestone 3: Good to be a part of our core package!

- [ ] Unit tests

<!-- Unit tests for the major module are required. [Example](https://github.com/open-mmlab/mmaction2/blob/main/tests/models/backbones/test_resnet.py) -->

- [ ] Code style

<!-- Refactor your code according to reviewer's comment. -->

- [ ] `metafile.yml` and `README.md`

<!-- It will used for MMAction2 to acquire your models. [Example](https://github.com/open-mmlab/mmaction2/blob/main/configs/recognition/swin/metafile.yml). In particular, you may have to refactor this README into a standard one. [Example](https://github.com/open-mmlab/mmaction2/blob/main/configs/recognition/swin/README.md) -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
_base_ = ['./basicTAD_slowonly_96x10_1200e_thumos14_rgb.py']
# model settings
model = dict(
neck=[
dict(type='MaxPool3d', kernel_size=(2, 1, 1), stride=(2, 1, 1)),
dict(type='VDM',
in_channels=2048,
out_channels=512,
conv_cfg=dict(type='Conv3d'),
norm_cfg=dict(type='SyncBN'),
kernel_sizes=(3, 1, 1),
strides=(2, 1, 1),
paddings=(1, 0, 0),
stage_layers=(1, 1, 1, 1),
out_indices=(0, 1, 2, 3, 4),
out_pooling=True),
dict(type='mmdet.FPN',
in_channels=[2048, 512, 512, 512, 512],
out_channels=256,
num_outs=5,
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='SyncBN'))],
bbox_head=dict(anchor_generator=dict(strides=[2, 4, 8, 16, 32])))

clip_len = 192
frame_interval = 5
img_shape = (112, 112)
img_shape_test = (128, 128)

train_pipeline = [
dict(type='Time2Frame'),
dict(type='TemporalRandomCrop',
clip_len=clip_len,
frame_interval=frame_interval,
iof_th=0.75),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(128, -1), keep_ratio=True), # scale images' short-side to 128, keep aspect ratio
dict(type='SpatialRandomCrop', crop_size=img_shape),
dict(type='Flip', flip_ratio=0.5),
dict(type='PhotoMetricDistortion',
brightness_delta=32,
contrast_range=(0.5, 1.5),
saturation_range=(0.5, 1.5),
hue_delta=18,
p=0.5),
dict(type='Rotate',
limit=(-45, 45),
border_mode='reflect_101',
p=0.5),
dict(type='Pad', size=(clip_len, *img_shape)),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='PackTadInputs',
meta_keys=('img_id', 'img_shape', 'pad_shape', 'scale_factor',))
]

val_pipeline = [
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(128, -1), keep_ratio=True),
dict(type='SpatialCenterCrop', crop_size=img_shape_test),
dict(type='Pad', size=(clip_len, *img_shape_test)),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='PackTadInputs',
meta_keys=('img_id', 'img_shape', 'scale_factor', 'offset_sec'))
]

train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
val_dataloader = dict(dataset=dict(clip_len=clip_len, frame_interval=frame_interval, pipeline=val_pipeline))
test_dataloader = val_dataloader
Loading