Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Pytorch code for Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning, CVPR2022.

Requirements

# create conda env and install packages
conda create -y --name carl python=3.7.9

conda activate carl
# The code is tested on cuda10.1-cudnn7 and pytorch 1.6.0
conda install -y pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
conda install -y conda-build ipython pandas scipy pip av -c conda-forge

# install pip packages
pip install --upgrade pip
pip install -r requirements.txt

Preparing Data

Create a directory to store datasets:

mkdir /home/username/datasets

Download pre-processed datasets

Download the Pouring dataset at pouring
Download the PennAction dataset at penn_action
Download the FineGym dataset at finegym

BaiduCloud: https://pan.baidu.com/s/1Vu9Qkiei-O10tcdCJAwaHA password: 7rbo

(Due to my limited storage, the link for finegym on google drive is expired. Only BaiduCloud link is avaliable now.)

(Optionally) Pre-process datasets by yourself

Download Pouring using the script

sh dataset_preparation/download_pouring_data.sh
python dataset_preparation/tfrecords_to_videos.py

Download the original Penn Action dataset and label files. Run the preparation script:

python dataset_preparation/penn_action_to_tfrecords.py
python dataset_preparation/tfrecords_to_videos.py

Download the FineGym dataset from the official web FineGym. Contact that author to get raw videos or using the youtube-dl script in download_finegym_videos.py.

Run the preparation script:

python dataset_preparation/finegym_process.py

We trim the raw video based on the event time-stamps in finegym_annotation_info_v1.0.json. Each event video is standardized to 640x360 resolution and 25 fps. We train the model on the event videos containing at least one sub-action. For further research, we also provide the event videos without sub-action labeled in additional_processed_videos.

Download ResNet50 pretrained with BYOL

Our ResNet50 beckbone is initialized with the weights trained by BYOL.

Download the pretrained weight at pretrained_models, and place it at /home/username/datasets/pretrained_models.

Training

Check ./configs directory to see all config settings.

Training on Pouring

Start training, assuming your machine only have one GPUs (if you have 4 GPUs, set --nproc_per_node 4):

python -m torch.distributed.launch --nproc_per_node 1 train.py --workdir ~/datasets --cfg_file ./configs/scl_transformer_config.yml --logdir ~/tmp/scl_transformer_logs

The config can be changed by adding --opt TRAIN.BATCH_SIZE 1 TRAIN.MAX_EPOCHS 500

Check the file utils/config.py to see all config options.

We use “automatic mixed precision training” by default, but it sometimes causes the 'nan' gradient error. If you encounter this error, set --opt USE_AMP false.

Training on PennAction

python -m torch.distributed.launch --nproc_per_node 1 train.py --workdir ~/datasets --cfg_file ./configs/scl_transformer_action_config.yml --logdir ~/tmp/scl_transformer_action_logs

Training on FineGym

python -m torch.distributed.launch --nproc_per_node 1 train.py --workdir ~/datasets --cfg_file ./configs/scl_transformer_finegym_config.yml --logdir ~/tmp/scl_transformer_finegym_logs

Tips: The default number of data worker is 4, which might causes CPU overloaded for some Machines. In this case, you can set --opt DATA.NUM_WORKERS 1.

Pretraining on Kinetics400

Download K400 dataset from https://github.com/cvdfoundation/kinetics-dataset

python -m torch.distributed.launch --nproc_per_node 1 train.py --workdir ~/datasets --cfg_file ./configs/scl_transformer_k400_pretrain_config.yml --logdir ~/tmp/scl_transformer_k400_pretrain_logs

Checkpoints

We provide the checkpoints trained by our CARL method at

scl_transformer_logs (for Pouring)
scl_transformer_action_logs (for PennAction)
scl_transformer_finegym_logs (for FineGym). In this checkpoint, we also provide the extracted frame-wise representations of videos in FineGym.
scl_transformer_k400_pretrain_logs (the model pretrained on K400 by our CARL)

Place these checkpoints at /home/username/tmp to evaluate them.

Evaluation and Visualization

Start evaluation.

python -m torch.distributed.launch --nproc_per_node 1 evaluate.py --workdir ~/datasets --cfg_file ./configs/scl_transformer_config.yml --logdir ~/tmp/scl_transformer_logs

Tensorboard.

tensorboard --logdir=~/tmp/scl_transformer_logs

The video file of video alignment have already generated at /home/username/tmp/scl_transformer_logs

Citation

@inproceedings{chen2022framewise,
      title={Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning}, 
      author={Minghao Chen and Fangyun Wei and Chong Li and Deng Cai},
      booktitle={CVPR},
      year={2022}
}

Acknowledgment

The training setup code was modified from https://github.com/google-research/google-research/tree/master/tcc

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
algos		algos
configs		configs
dataset_preparation		dataset_preparation
datasets		datasets
evaluation		evaluation
models		models
utils		utils
.gitignore		.gitignore
Framework.png		Framework.png
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
evaluate_finegym.py		evaluate_finegym.py
pennaction_alignment.gif		pennaction_alignment.gif
requirements.txt		requirements.txt
train.py		train.py
visualize_alignment.py		visualize_alignment.py
visualize_retrieval.py		visualize_retrieval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Requirements

Preparing Data

Download pre-processed datasets

(Optionally) Pre-process datasets by yourself

Download ResNet50 pretrained with BYOL

Training

Training on Pouring

Training on PennAction

Training on FineGym

Pretraining on Kinetics400

Checkpoints

Evaluation and Visualization

Citation

Acknowledgment

About

Releases

Packages

Languages

License

minghchen/CARL_code

Folders and files

Latest commit

History

Repository files navigation

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Requirements

Preparing Data

Download pre-processed datasets

(Optionally) Pre-process datasets by yourself

Download ResNet50 pretrained with BYOL

Training

Training on Pouring

Training on PennAction

Training on FineGym

Pretraining on Kinetics400

Checkpoints

Evaluation and Visualization

Citation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages