Skip to content

Latest commit

 

History

History
55 lines (38 loc) · 2.38 KB

README.md

File metadata and controls

55 lines (38 loc) · 2.38 KB

Depth Anything for Semantic Segmentation

We use our Depth Anything pre-trained ViT-L encoder to fine-tune downstream semantic segmentation models.

Performance

Cityscapes

Note that our results are obtained without Mapillary pre-training.

Method Encoder mIoU (s.s.) m.s.
SegFormer MiT-B5 82.4 84.0
Mask2Former Swin-L 83.3 84.3
OneFormer Swin-L 83.0 84.4
OneFormer ConNeXt-XL 83.6 84.6
DDP ConNeXt-L 83.2 83.9
Ours ViT-L 84.8 86.2

ADE20K

Method Encoder mIoU
SegFormer MiT-B5 51.0
Mask2Former Swin-L 56.4
UperNet BEiT-L 56.3
ViT-Adapter BEiT-L 58.3
OneFormer Swin-L 57.4
OneFormer ConNeXt-XL 57.4
Ours ViT-L 59.4

Pre-trained models

Installation

Please refer to MMSegmentation for instructions. Do not forget to install mmdet to support Mask2Former:

pip install "mmdet>=3.0.0rc4"

After installation:

For training or inference with our pre-trained models, please refer to MMSegmentation instructions.