AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
First, download and set up the repo:
git clone https://github.com/Artanic30/MacCap
cd MacCap
conda env create -f environment.yml
conda activate MacCap
Download coco_train to data
.
Download cc3m_train to data
.
./train_coco.sh
or
./train_cc3m.sh
Follow the instruction here to evaluate generated captions.
@article{qiu2024mining,
title={Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training},
author={Qiu, Longtian and Ning, Shan and He, Xuming},
journal={arXiv preprint arXiv:2401.02347},
year={2024}
}
This repository is heavily based on ClipCap, DeCap. For training we used the data of COCO dataset and Conceptual Captions.
- Initial Code release
- Detail Document
- Data Preparation
- Training and Evaluation Scripts
- Checkpoints