Data Preparation

Pretrained Weights

The pretrained weights are placed in the folder pretrained_models.

Visual Backbones
- R-50: please download from Detectron2 or OneDrive.
- Swin-L: please download from OneDrive, which is converted from Swin-Transformer.
Text Encoders
- BERT-base: please download from Hugging Face.
SAM
- SAM-H: please download form SAM.

After preparation, the folder structure should be like:

|- datasets/
|- detectron2/
|- projects/
|    |- Uniref/
|- pretrained_models/
|    |- R-50.pkl
|    |- swin_large_patch4_window12_384_22k.pkl
|    |- sam_vit_h_4b8939.pth
|    |- bert-base-uncased/
...

DATA

We list the data for training and inference as following. The datasets in brackets () are only used for inference.

Pretraining:
- Objects365
Image-level Training
- DET: COCO2017
- RIS: RefCOCO/+/g
- FSS: FSS-1000
Video-level Training
- RVOS: RefCOCO/+/g, Ref-Youtube-VOS, (Ref-DAVIS17)
- VOS: COCO2017, Youtube-VOS-19, LVOS, OVIS, (Youtube-VOS-18, DAVIS17, MOSE)

We mainly follow UNINEXT to prepare our data. We provide the preprocessed annotation files in OneDrive. If you are interested in the preprocessing, please see our conversion files.

The datasets are placed in the folder datasets.

Pretraining

We provide the conversion file for downloading Objects365v2 images.

python3 conversion/download_objects365_v2.py

We use the same preprocessed json file as UNINEXT in OneDrive. The data structure should be like:

|- datasets/
|    |- Objects365V2/
|    |    |- annotations/
|    |    |    |- zhiyuan_objv2_train_new.json
|    |    |    |- zhiyuan_objv2_val_new.json
|    |    |- images/

Image-level Training

COCO

Please download COCO2017 from official website. The annotation file for video-level training is provided in OneDrive. The data structure should be like:

|- datasets/
|    |- coco/
|    |    |- annotations/
|    |    |    |- instances_train2017_video.json
|    |    |    |- instances_train2017.json
|    |    |    |- instances_val2017.json
|    |    |- train2017/
|    |    |- val2017/

RefCOCO/+/g

Please download COCO2014 images from official website. The original annotation files are from SeqTR. We further convert the files and provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- coco2014/
|    |    |- annotations/
|    |    |    |- refcoco-mixed/
|    |    |    |- refcoco-unc/
|    |    |    |- refcocoplus-unc/
|    |    |    |- refcocog-umd/
|    |    |- train2014/

FSS-1000

Please download FSS-1000 from official repo. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- fss-1000/
|    |    |- annotations/
|    |    |    |- train.json
|    |    |    |- val.json
|    |    |    |- test.json
|    |    |- images/

Video-level Training

Ref-Youtube-VOS

Please download Ref-Youtube-VOS from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- ref-youtube-vos/
|    |    |- annotations/
|    |    |    |- train.json
|    |    |    |- val.json
|    |    |- train/
|    |    |    |- JPEGImages/
|    |    |- valid/
|    |    |    |- JPEGImages/

Ref-DAVIS17

Please download Ref-DAVIS17 from official website. You only need to download DAVIS-2017-Unsupervised-trainval-480p.zip and unzip it. You can also download the original text annotations from the website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- ref-davis/
|    |    |- annotations/
|    |    |    |- valid_0.json
|    |    |    |- valid_1.json
|    |    |    |- valid_2.json
|    |    |    |- valid_3.json
|    |    |- DAVIS/
|    |    |    |- JPEGImages/

Youtube-VOS-18

Please download Youtube-VOS-18 from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- ytbvos18/
|    |    |- annotations/
|    |    |    |- train.json
|    |    |    |- val.json
|    |    |- train/
|    |    |    |- JPEGImages/
|    |    |- valid/
|    |    |    |- JPEGImages/

Youtube-VOS-19

Please download Youtube-VOS-19 from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- ytbvos19/
|    |    |- annotations/
|    |    |    |- train.json
|    |    |    |- val.json
|    |    |- train/
|    |    |    |- JPEGImages/
|    |    |- valid/
|    |    |    |- JPEGImages/

DAVIS17

Please download DAVIS17 from official website. You only need to download DAVIS-2017-trainval-480p.zip and unzip it. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- davis17/
|    |    |- annotations/
|    |    |    |- davis2017_train.json
|    |    |    |- davis2017_val.json
|    |    |- DAVIS/
|    |    |    |- JPEGImages/

OVIS

Please download OVIS from official website. This is an video instance segmentation dataset, we convert the annotation file to class-agnostic format for our training. The preprocessed annotation file is provided in OneDrive. The data structure should be like:

|- datasets/
|    |- ovis/
|    |    |- annotations/
|    |    |    |- train.json
|    |    |- train/

LVOS

Please download LVOS from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- lvos/
|    |    |- annotations_vos/
|    |    |    |- train.json
|    |    |    |- val.json
|    |    |- train/
|    |    |    |- JPEGImages/
|    |    |- valid/
|    |    |    |- JPEGImages/

MOSE

Please download MOSE from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:

|- datasets/
|    |- mose/
|    |    |- annotations/
|    |    |    |- train.json
|    |    |    |- val.json
|    |    |- train/
|    |    |    |- JPEGImages/
|    |    |- valid/
|    |    |    |- JPEGImages/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA.md

DATA.md

Data Preparation

Pretrained Weights

DATA

Pretraining

Image-level Training

Video-level Training

Files

DATA.md

Latest commit

History

DATA.md

File metadata and controls

Data Preparation

Pretrained Weights

DATA

Pretraining

Image-level Training

Video-level Training