Lianghui Zhu1 *, Xinggang Wang1 📧, Jiapei Feng1 *, Tianheng Cheng1 *, Yingyue Li1 *, Dingwen Zhang2, Junwei Han2
1 Huazhong University of Science and Technology, Wuhan, China, 2 Northwestern Polytechnical University, Xi’an, China
(*) equal contribution, (📧) corresponding author.
Accepted by IJCV (Paper)
Within the realm of weakly-supervised semantic segmentation (WSSS), it is always challenging to obtain sufficient and reliable pixel-level supervision from image-level annotations. Previous WSSS methods typically first generate a coarse class activation map (CAM) from classification networks, followed by the refinement of this CAM to produce high-quality pseudo masks, a step aided by hand-crafted priors.
The recent advancements in large-scale Contrastive Language and Image Pre-training (CLIP) present an opportune avenue for the enhancement of weakly-supervised image understanding, impacting the creation of high-quality pseudo masks. However, directly applying CLIP to WSSS cannot refine the CAM to be aware of three significant challenges: 1) the task gap between contrastive pre-training and WSSS CAM refinement, 2) lacking text-to-pixel modeling to fully utilize the pre-trained knowledge, and 3) the insufficient details brought by
Comprehensive experiments demonstrate that WeakCLIP provides an effective way to transfer CLIP knowledge to refine CAM and achieves the state-of-the-art WSSS performance on standard benchmarks, 74.0% mIoU on the
Dataset | Checkpoint | Pseudo Mask | Train mIoU | Retrain Checkpoint | Val mIoU |
---|---|---|---|---|---|
Pascal VOC 2012 | Google Drive | Google Drive | 77.2% | Google Drive | 74.0% |
COCO 2014 | Google Drive | Google Drive | 48.4% | Google Drive | 46.1% |
MIT License
If you find our work useful in your research, please consider citing:
@article{zhu2024weakclip,
title={WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation},
author={Zhu, Lianghui and Wang, Xinggang and Feng, Jiapei and Cheng, Tianheng and Li, Yingyue and Jiang, Bo and Zhang, Dingwen and Han, Junwei},
journal={International Journal of Computer Vision},
pages={1--21},
year={2024},
publisher={Springer}
}