Skip to content

hustvl/WeakCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WeakCLIP

Adapting CLIP for Weakly-supervised Semantic Segmentation

Lianghui Zhu1 *, Xinggang Wang1 📧, Jiapei Feng1 *, Tianheng Cheng1 *, Yingyue Li1 *, Dingwen Zhang2, Junwei Han2

1 Huazhong University of Science and Technology, Wuhan, China, 2 Northwestern Polytechnical University, Xi’an, China

(*) equal contribution, (📧) corresponding author.

Accepted by IJCV (Paper)

Introduction

Within the realm of weakly-supervised semantic segmentation (WSSS), it is always challenging to obtain sufficient and reliable pixel-level supervision from image-level annotations. Previous WSSS methods typically first generate a coarse class activation map (CAM) from classification networks, followed by the refinement of this CAM to produce high-quality pseudo masks, a step aided by hand-crafted priors.

The recent advancements in large-scale Contrastive Language and Image Pre-training (CLIP) present an opportune avenue for the enhancement of weakly-supervised image understanding, impacting the creation of high-quality pseudo masks. However, directly applying CLIP to WSSS cannot refine the CAM to be aware of three significant challenges: 1) the task gap between contrastive pre-training and WSSS CAM refinement, 2) lacking text-to-pixel modeling to fully utilize the pre-trained knowledge, and 3) the insufficient details brought by $\frac{1}{16}$ down-sampling resolution of ViT. Thus, we propose WeakCLIP to address the challenges and leverage the pre-trained knowledge from CLIP to WSSS.

Comprehensive experiments demonstrate that WeakCLIP provides an effective way to transfer CLIP knowledge to refine CAM and achieves the state-of-the-art WSSS performance on standard benchmarks, 74.0% mIoU on the $val$ set of PASCAL VOC 2012 and 46.1% mIoU on the $val$ set of COCO 2014.

Getting Started

Model Zoo

Dataset Checkpoint Pseudo Mask Train mIoU Retrain Checkpoint Val mIoU
Pascal VOC 2012 Google Drive Google Drive 77.2% Google Drive 74.0%
COCO 2014 Google Drive Google Drive 48.4% Google Drive 46.1%

License

MIT License

Citation

If you find our work useful in your research, please consider citing:

@article{zhu2024weakclip,
  title={WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation},
  author={Zhu, Lianghui and Wang, Xinggang and Feng, Jiapei and Cheng, Tianheng and Li, Yingyue and Jiang, Bo and Zhang, Dingwen and Han, Junwei},
  journal={International Journal of Computer Vision},
  pages={1--21},
  year={2024},
  publisher={Springer}
}