Skip to content

Latest commit

 

History

History
119 lines (84 loc) · 12.2 KB

MODEL_ZOO.md

File metadata and controls

119 lines (84 loc) · 12.2 KB

Model Zoo

Note

  • For all the pretraining and finetuning, we adopt spaese/uniform sampling.
  • #Frame $=$ #input_frame $\times$ #crop $\times$ #clip
  • #input_frame means how many frames are input for model per inference
  • #crop means spatial crops (e.g., 3 for left/right/center)
  • #clip means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)

Pretraining

Model Setting Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash-1.1M 300e 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash-2M 300e TBD run.sh

Distillation

Model Setting Teacher Model Shell
$\text{InternVideo2}_{dist}$-S/14 K-Mash-1.1M 100e $\text{InternVideo2}_{s2}$-1B 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-B/14 K-Mash-1.1M 100e $\text{InternVideo2}_{s2}$-1B 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-L/14 K-Mash-1.1M 100e $\text{InternVideo2}_{s2}$-1B 🤗 HF link run.sh

Finetuning

K710

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT 8x3x4 87.6 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT 8x3x4 88.1 TBD run.sh
$\text{InternVideo2}_{dist}$-S/14 K-Mash PT 8x3x4 79.6 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-B/14 K-Mash PT 8x3x4 83.5 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-L/14 K-Mash PT 8x3x4 86.2 🤗 HF link run.sh

K400

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 8x3x4 91.3 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 16x3x4 91.6 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 8x3x4 91.9 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 16x3x4 92.1 TBD run.sh
$\text{InternVideo2}_{dist}$-S/14 K-Mash PT + K710 FT 8x3x4 85.4 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-B/14 K-Mash PT + K710 FT 8x3x4 88.4 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-L/14 K-Mash PT + K710 FT 8x3x4 90.4 🤗 HF link run.sh

K600

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 8x3x4 91.4 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 16x3x4 91.6 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 8x3x4 91.7 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 16x3x4 91.9 TBD run.sh
$\text{InternVideo2}_{dist}$-S/14 K-Mash PT + K710 FT 8x3x4 86.0 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-B/14 K-Mash PT + K710 FT 8x3x4 88.9 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-L/14 K-Mash PT + K710 FT 8x3x4 90.6 🤗 HF link run.sh

K700

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 8x3x4 85.0 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT 16x3x4 85.4 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 8x3x4 85.7 TBD run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT 16x3x4 85.9 TBD run.sh
$\text{InternVideo2}_{dist}$-S/14 K-Mash PT + K710 FT 8x3x4 75.7 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-B/14 K-Mash PT + K710 FT 8x3x4 80.5 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-L/14 K-Mash PT + K710 FT 8x3x4 83.5 🤗 HF link run.sh

MiT V1

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT + K710 FT + K400 FT 8x3x4 50.8 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT + K400 FT 8x3x4 51.0 TBD run.sh
$\text{InternVideo2}_{s1}$-6B 336↑ K-Mash PT + K710 FT + K400 FT 8x3x4 51.2 TBD run.sh

SthSth V1

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT 8x3x4 68.5 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT 8x3x4 69.7 TBD run.sh

SthSth V2

Model Setting #Frame Top-1 Model Shell
$\text{InternVideo2}_{s1}$-1B K-Mash PT 8x3x4 77.1 🤗 HF link run.sh
$\text{InternVideo2}_{s1}$-6B K-Mash PT 8x3x4 77.5 TBD run.sh
$\text{InternVideo2}_{dist}$-S/14 K-Mash PT 8x3x4 71.6 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-B/14 K-Mash PT 8x3x4 73.5 🤗 HF link run.sh
$\text{InternVideo2}_{dist}$-L/14 K-Mash PT 8x3x4 76.4 🤗 HF link run.sh

ANet

Model Setting #Frame Top-1 mAP Model Shell
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT + K400 FT 8x3x4 95.9 98.2 TBD run.sh

HACS

Model Setting #Frame Top-1 mAP Model Shell
$\text{InternVideo2}_{s1}$-6B K-Mash PT + K710 FT + K400 FT 8x3x4 97.0 98.8 TBD run.sh