Data preparation

The LLaVA-PT is from LLaVA.
The Hybird-FT is from SViT, LVIS, LRV, MIMIC-IT.
The LLaVA-FT is from LLaVA.
Download the training annotations. You can download from Baidu Disk, Google Disk, Peking University Disk or Hugging Face

We also provide the processed data as follows. The link is to BaiDu Disk.

Data group	Usage	Link
LLaVA-PT	Stage 1	LLaVA 1.5-558k
Hybird-FT	Stage 2	SViT-157k, LVIS-220k, LRV-331k, MIMIC-IT-256k
LLaVA-FT	Stage 3	LLaVA 1.5-mix-665k

For those who can not easily access to BaiDu Disk, you can download data from Hugging Face.

After downloading all of them, organize the data as follows in IMAGE_FOLDER.

IMAGE_FOLDER
├── llava_image
├── llava_image_tune
├── lvis_tune
├── lrv_tune
├── svit_tune
└── mimicit_tune
    └── LA

Training

Specify your IMAGE_FOLDER and JSON_FOLDER according to the data preparation.

For training on 384 resolution, we use google/siglip-so400m-patch14-384 as image_tower. Notably, if you pass the --image_tower google/siglip-so400m-patch14-384, you should upgrade the version of transformers to 4.37.0.

Qwen

Stage 1 pretraining script: pretrain.sh.
Stage 2 tuning script: finetune.sh.
Stage 3 moe-tuning script: finetune_moe.sh.

Phi2

Stage 1 pretraining script: pretrain.sh.
Stage 2 tuning script: finetune.sh.
Stage 3 moe-tuning script: finetune_moe.sh.

StableLM

Stage 1 pretraining script: pretrain.sh.
Stage 2 tuning script: finetune.sh.
Stage 3 moe-tuning script: finetune_moe.sh.

OpenChat

Stage 1 pretraining script: pretrain.sh.
Stage 2 tuning script: finetune.sh.
Stage 3 moe-tuning script: finetune_moe.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRAIN.md

TRAIN.md

Data preparation

Training

Qwen

Phi2

StableLM

OpenChat

Files

TRAIN.md

Latest commit

History

TRAIN.md

File metadata and controls

Data preparation

Training

Qwen

Phi2

StableLM

OpenChat