WeakCLIP

Adapting CLIP for Weakly-supervised Semantic Segmentation

Lianghui Zhu¹ *, Xinggang Wang^{1 📧}, Jiapei Feng¹ *, Tianheng Cheng¹ *, Yingyue Li¹ *, Dingwen Zhang², Junwei Han²

¹ Huazhong University of Science and Technology, Wuhan, China, ² Northwestern Polytechnical University, Xi’an, China

(*) equal contribution, (^📧) corresponding author.

Accepted by IJCV (Paper)

Introduction

Within the realm of weakly-supervised semantic segmentation (WSSS), it is always challenging to obtain sufficient and reliable pixel-level supervision from image-level annotations. Previous WSSS methods typically first generate a coarse class activation map (CAM) from classification networks, followed by the refinement of this CAM to produce high-quality pseudo masks, a step aided by hand-crafted priors.

The recent advancements in large-scale Contrastive Language and Image Pre-training (CLIP) present an opportune avenue for the enhancement of weakly-supervised image understanding, impacting the creation of high-quality pseudo masks. However, directly applying CLIP to WSSS cannot refine the CAM to be aware of three significant challenges: 1) the task gap between contrastive pre-training and WSSS CAM refinement, 2) lacking text-to-pixel modeling to fully utilize the pre-trained knowledge, and 3) the insufficient details brought by $\frac{1}{16}$ down-sampling resolution of ViT. Thus, we propose WeakCLIP to address the challenges and leverage the pre-trained knowledge from CLIP to WSSS.

Comprehensive experiments demonstrate that WeakCLIP provides an effective way to transfer CLIP knowledge to refine CAM and achieves the state-of-the-art WSSS performance on standard benchmarks, 74.0% mIoU on the $val$ set of PASCAL VOC 2012 and 46.1% mIoU on the $val$ set of COCO 2014.

Getting Started

Model Zoo

Dataset	Checkpoint	Pseudo Mask	Train mIoU	Retrain Checkpoint	Val mIoU
Pascal VOC 2012	Google Drive	Google Drive	77.2%	Google Drive	74.0%
COCO 2014	Google Drive	Google Drive	48.4%	Google Drive	46.1%

License

MIT License

Citation

If you find our work useful in your research, please consider citing:

@article{zhu2024weakclip,
  title={WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation},
  author={Zhu, Lianghui and Wang, Xinggang and Feng, Jiapei and Cheng, Tianheng and Li, Yingyue and Jiang, Bo and Zhang, Dingwen and Han, Junwei},
  journal={International Journal of Computer Vision},
  pages={1--21},
  year={2024},
  publisher={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data		data
docs		docs
img		img
tools		tools
weakclip		weakclip
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeakCLIP

Adapting CLIP for Weakly-supervised Semantic Segmentation

Introduction

Getting Started

Model Zoo

License

Citation

About

Releases

Packages

Contributors 2

Languages

License

hustvl/WeakCLIP

Folders and files

Latest commit

History

Repository files navigation

WeakCLIP

Adapting CLIP for Weakly-supervised Semantic Segmentation

Introduction

Getting Started

Model Zoo

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages