Skip to content

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

License

Notifications You must be signed in to change notification settings

x-cls/superclass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SuperClass: Classification Done Right for Vision-Language Pre-Training

Zilong Huang · Qinghao Ye · Bingyi Kang · Jiashi Feng · Haoqi Fan

Bytedance Research

Paper PDF

This work presents SuperClass, a super simple classification method that performs vision-language pre-training. Our method does not require a text encoder to be pre-trained on image-text data. Instead, it utilizes tokenized raw text as supervised classification labels, without the need for additional text filtering or selection.

teaser

News

  • 2024-11-06: Paper & code are all released.
  • 2024-10-02: SuperClass is accepted by NeurIPS 2024.

Usage

Prepraration

git clone https://github.com/x-cls/superclass
cd superclass
pip install -r requirements.txt

Download the datasets Datacomp-1B and ImageNet-1K. You can also use other image-text pair datasets for training.

Modify the DATA_PATH and VAL_DATA_PATH in training script train.sh and train_combo.sh to your local paths to Datacomp-1B and ImageNet-1K.

CLIP Training & Superclass Training

To start CLIP training and superclass training, use the following command:

bash train.sh <config_path> opencls

This script will navigate to the opencls directory and execute the training.

If you want to include the LiT training phase, use the following command:

bash train_combo.sh <cls_config_path> <lit_config_path> opencls

CLS training config are here opencls/configs/cls_schedule

For example:

bash train.sh configs/cls_schedule/cls_vit_b16_s1.28B_bs16k.yaml opencls

Please note that the default precision during training is set to amp_bfloat16. If your GPU (e.g., V100) does not support bf16, please change it to fp16 or amp.

Acknowledgement

Our codebase is built up on OpenCLIP and the ViTamin.

We thank the OpenCLIP and the ViTamin for contributing such impressive codes and models to our community.

LICENSE

The models & code of SuperClass are released under the Apache-2.0 license.

Citation

If you find this project useful, please consider citing:

@inproceedings{superclass_huang,
  title={Classification Done Right for Vision-Language Pre-Training}, 
  author={Huang, Zilong and Ye, Qinghao and Kang, Bingyi and Feng, Jiashi and Fan, Haoqi},
  booktitle={NeurIPS},
  year={2024}
}

About

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published