As a simple baseline, we train a video encoder using the video-text contrastive loss on ego-only/exo-only/egoexo data, and evaluate their cross-view association abilities.
Install pytorch and dependencies. We use torch==1.13.1+cu117 other pytorch versions may also work.
pip install -r requirement.txt
Open ./configs/our_default.yml, replace the following paths with your own paths
ego_root: /path/to/your/egodata/
exo_root: /path/to/your/exodata/
metapath: /path/to/your/annotations/
The annotation files include:
├── annotations/
│ ├── ego_train.csv
│ ├── exo_train.csv
│ ├── association_val_hard.json
- (Optional) Download the Ego4d-pretrained checkpoint from LaViLA, modify the config file (e.g. ./configs/train_egoonly.yml)
resume: /path/to/your/pretrained_checkpoint/
- Train the model with ego-only data
python main.py --config ./configs/train_egoonly.yml
- Train the model with exo-only data
python main.py --config ./configs/train_exoonly.yml
- Cotrain the model with egoexo data
python main.py --config ./configs/train_egoexo.yml
By default, the checkpoints will be saved in './exps/' folder, you can modify them in the config file (e.g. ./configs/train_egoonly.yml) by setting:
output: /path/to/your/output_folder/
- Modify the resumed checkpoint path in ./configs/test.yml:
resume: /path/to/your/trained_checkpoint/
- Test the model
python main.py --config ./configs/test.yml
The codebase is based on LaVILA. We thank the authors for their efforts.
If you have any questions, feel free to contact Jilan Xu (18210240039 fudan.edu.cn)