The pretrained weights are placed in the folder pretrained_models
.
-
Visual Backbones
- R-50: please download from Detectron2 or OneDrive.
- Swin-L: please download from OneDrive, which is converted from Swin-Transformer.
-
Text Encoders
- BERT-base: please download from Hugging Face.
-
SAM
- SAM-H: please download form SAM.
After preparation, the folder structure should be like:
|- datasets/
|- detectron2/
|- projects/
| |- Uniref/
|- pretrained_models/
| |- R-50.pkl
| |- swin_large_patch4_window12_384_22k.pkl
| |- sam_vit_h_4b8939.pth
| |- bert-base-uncased/
...
We list the data for training and inference as following. The datasets in brackets ()
are only used for inference.
- Pretraining:
- Objects365
- Image-level Training
- DET: COCO2017
- RIS: RefCOCO/+/g
- FSS: FSS-1000
- Video-level Training
- RVOS: RefCOCO/+/g, Ref-Youtube-VOS, (Ref-DAVIS17)
- VOS: COCO2017, Youtube-VOS-19, LVOS, OVIS, (Youtube-VOS-18, DAVIS17, MOSE)
We mainly follow UNINEXT to prepare our data. We provide the preprocessed annotation files in OneDrive. If you are interested in the preprocessing, please see our conversion files.
The datasets are placed in the folder datasets
.
We provide the conversion file for downloading Objects365v2 images.
python3 conversion/download_objects365_v2.py
We use the same preprocessed json file as UNINEXT in OneDrive. The data structure should be like:
|- datasets/
| |- Objects365V2/
| | |- annotations/
| | | |- zhiyuan_objv2_train_new.json
| | | |- zhiyuan_objv2_val_new.json
| | |- images/
- COCO
Please download COCO2017 from official website. The annotation file for video-level training is provided in OneDrive. The data structure should be like:
|- datasets/
| |- coco/
| | |- annotations/
| | | |- instances_train2017_video.json
| | | |- instances_train2017.json
| | | |- instances_val2017.json
| | |- train2017/
| | |- val2017/
- RefCOCO/+/g
Please download COCO2014 images from official website. The original annotation files are from SeqTR. We further convert the files and provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- coco2014/
| | |- annotations/
| | | |- refcoco-mixed/
| | | |- refcoco-unc/
| | | |- refcocoplus-unc/
| | | |- refcocog-umd/
| | |- train2014/
- FSS-1000
Please download FSS-1000 from official repo. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- fss-1000/
| | |- annotations/
| | | |- train.json
| | | |- val.json
| | | |- test.json
| | |- images/
- Ref-Youtube-VOS
Please download Ref-Youtube-VOS from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- ref-youtube-vos/
| | |- annotations/
| | | |- train.json
| | | |- val.json
| | |- train/
| | | |- JPEGImages/
| | |- valid/
| | | |- JPEGImages/
- Ref-DAVIS17
Please download Ref-DAVIS17 from official website. You only need to download DAVIS-2017-Unsupervised-trainval-480p.zip
and unzip it. You can also download the original text annotations from the website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- ref-davis/
| | |- annotations/
| | | |- valid_0.json
| | | |- valid_1.json
| | | |- valid_2.json
| | | |- valid_3.json
| | |- DAVIS/
| | | |- JPEGImages/
- Youtube-VOS-18
Please download Youtube-VOS-18 from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- ytbvos18/
| | |- annotations/
| | | |- train.json
| | | |- val.json
| | |- train/
| | | |- JPEGImages/
| | |- valid/
| | | |- JPEGImages/
- Youtube-VOS-19
Please download Youtube-VOS-19 from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- ytbvos19/
| | |- annotations/
| | | |- train.json
| | | |- val.json
| | |- train/
| | | |- JPEGImages/
| | |- valid/
| | | |- JPEGImages/
- DAVIS17
Please download DAVIS17 from official website. You only need to download DAVIS-2017-trainval-480p.zip
and unzip it. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- davis17/
| | |- annotations/
| | | |- davis2017_train.json
| | | |- davis2017_val.json
| | |- DAVIS/
| | | |- JPEGImages/
- OVIS
Please download OVIS from official website. This is an video instance segmentation dataset, we convert the annotation file to class-agnostic format for our training. The preprocessed annotation file is provided in OneDrive. The data structure should be like:
|- datasets/
| |- ovis/
| | |- annotations/
| | | |- train.json
| | |- train/
- LVOS
Please download LVOS from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- lvos/
| | |- annotations_vos/
| | | |- train.json
| | | |- val.json
| | |- train/
| | | |- JPEGImages/
| | |- valid/
| | | |- JPEGImages/
- MOSE
Please download MOSE from official website. We provide the preprocessed annotation files in OneDrive. The data structure should be like:
|- datasets/
| |- mose/
| | |- annotations/
| | | |- train.json
| | | |- val.json
| | |- train/
| | | |- JPEGImages/
| | |- valid/
| | | |- JPEGImages/