[MM'23] QA-CLIMS

This is the official PyTorch implementation of our paper:

QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]

Environment

Python 3.7
PyTorch 1.7.1
torchvision 0.8.2

pip install -r requirements.txt

PASCAL VOC2012

You can find the following files at here.

File	filename
FG & BG VQA results	`voc_vqa_fg_blip.npy` `voc_vqa_bg_blip.npy`
FG & BG VQA text features	`voc_vqa_fg_blip_ViT-L-14_cache.npy` `voc_vqa_bg_blip_ViT-L-14_cache.npy`
pre-trained baseline model	`res50_cam.pth`
QA-CLIMS model	`res50_qa_clims.pth`

1. Prepare VQA result features

You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy and voc_vqa_bg_blip_ViT-L-14_cache.npy above and put its in vqa/.

Or, you can generate it by yourself:

To generate VQA results, please follow third_party/README.

After that, run following command to generate VQA text features:

python gen_text_feats_cache.py voc \
    --vqa_fg_file vqa/voc_vqa_fg_blip.npy \
    --vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \
    --vqa_bg_file vqa/voc_vqa_bg_blip.npy \
    --vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \
    --clip ViT-L/14

2. Train QA-CLIMS and generate initial CAMs

Please download the pre-trained baseline model res50_cam.pth above and put it at cam-baseline-voc12/res50_cam.pth.

bash run_voc12_qa_clims.sh

3. Train IRNet and generate pseudo semantic masks

bash run_voc12_sem_seg.sh

4.Train DeepLab using pseudo semantic masks.

Please follow deeplab-pytorch or CLIMS.

MS COCO2014

You can find the following files at here.

File	filename
FG & BG VQA results	`coco_vqa_fg_blip.npy` `coco_vqa_bg_blip.npy`
FG & BG VQA text features	`coco_vqa_fg_blip_ViT-L-14_cache.npy` `coco_vqa_bg_blip_ViT-L-14_cache.npy`
pre-trained baseline model	`res50_cam.pth`
QA-CLIMS model	`res50_qa_clims.pth`

Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy and coco_vqa_bg_blip_ViT-L-14_cache.npy in vqa/, and res50_cam.pth in cam-baseline-coco14/.

Then, running the following command:

bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.sh

Citation

If you find this code useful for your research, please consider cite our paper:

@inproceedings{deng2023qa-clims,
  title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
  author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={5572--5583},
  year={2023}
}

This repository was highly based on CLIMS and IRNet, thanks for their great works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

[MM'23] QA-CLIMS

Environment

PASCAL VOC2012

1. Prepare VQA result features

2. Train QA-CLIMS and generate initial CAMs

3. Train IRNet and generate pseudo semantic masks

4.Train DeepLab using pseudo semantic masks.

MS COCO2014

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

[MM'23] QA-CLIMS

Environment

PASCAL VOC2012

1. Prepare VQA result features

2. Train QA-CLIMS and generate initial CAMs

3. Train IRNet and generate pseudo semantic masks

4.Train DeepLab using pseudo semantic masks.

MS COCO2014

Citation