Authors: Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou*
This repository provides the RGBD pretraining code of '[ICLR 2024] DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation'. Our implementation is modified from the timm repository. If there are any questions, please let me know via raising issues or e-mail ([email protected]).
1.1. Install
Enviroment requirement: Pyotrch & timm
If you have installed the dformer enviroment in our main repository, you can only additionally install timm.
conda create -n RGBD_Pretrain python=3.10 -y
conda activate RGBD_Pretrain
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install timm fvcore
If the above pipeline not work, You can also install following the timm.
1.2. Prepare Datasets
First, you need to prepare the ImageNet-1k dataset. We share the depth maps for the ImageNet-1K (20.4G) in the following links:
Baidu Netdisk | OneDrive |
---|
If the share links have any questions, please let me know ([email protected]). Then, create the soft links:
ln -s path_to_imagenet datasets/ImageNet
ln -s path_to_imagenet_depth_maps datasets/Depth_ImageNet
bash train.sh
After training, the checkpoints will be saved in the path `outputs/XXX', where the XXX is depends on the training config.
Then, the pretrained checkpoint is endowed with the capacity to encode the RGBD represetions and can be applied to various RGBD tasks.
We invite all to contribute in making this project and RGBD representation learning more acessible and useful. If you have any questions or suggestions about our work, feel free to contact me via e-mail ([email protected]) or raise an issue.
@article{yin2023dformer,
title={DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation},
author={Yin, Bowen and Zhang, Xuying and Li, Zhongyu and Liu, Li and Cheng, Ming-Ming and Hou, Qibin},
journal={arXiv preprint arXiv:2309.09668},
year={2023}
}
Our implementation is mainly based on timm. The depth maps are generated by Omnidata. Thanks for their authors.
Code in this repo is for non-commercial use only.