Progressive Video Summarization via Multimodal Self-supervised Learning (SSPVS)

Haopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

Introduction

We propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task.

Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos.

The multimodal framework is trained on a newly-collected dataset that consists of video-text pairs.

Additionally, we introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.

Requirements and Dependencies

python=3.8.13
pytorch=1.12, ortools=9.3.10497
pytorch-lightning=1.6.5
pytorch-transformers=1.2.0

Self-supervised Pretraining

Download the pretrained model to the root dictionary.

OR

Follow the following steps to train the self-supervised model.

Data Preparation

Download the visual features and text information embeddings of the YTVT dataset and uncompress them to ssl/features/ and ssl/info_embed/, respectively.

Self-supervised Pretraining

Run the following command in ssl/ to train the self-supervised model:

$ CUDA_VISIBLE_DEVICES=0,1 python main_ssl.py --config ssl.yaml

The trained model is saved in ssl/results/SSL/checkpoints/.

Progressive Video Summarization

Data Preparation

Download the data and uncompress it to data/.

Training and Evaluation of Video Summarization

Run the following command in the root dictionary to train the video summarization model:

$ sh main.sh CFG_FILE

where CFG_FILE is a configuration file (*.yaml) for different settings. We provide several configuration files in cfgs/. Here is an example for training the model on SumMe in the augmented setting:

$ sh main.sh cfgs/sm_a.yaml

If you pretrain the model yourself, change resume in CFG_FILE to the model saved in ssl/results/SSL/checkpoints/. The results of video summarization are recoded in records.csv.

Source Data

We provide the original videos and text information of YTVT here. Besides, we also provide the re-collected text information of SumMe and TVSum here.

License and Citation

The use of this code is RESTRICTED to non-commercial research and educational purposes.

If you use this code or reference our paper in your work please cite this publication as:

@inproceedings{li2023progressive,
  title={Progressive Video Summarization via Multimodal Self-supervised Learning},
  author={Li, Haopeng and Ke, Qiuhong and Gong, Mingming and Drummond, Tom},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5584--5593},
  year={2023}
}

Acknowledgement

The code is developed based on VASNet.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
cfgs		cfgs
datasets		datasets
models		models
splits		splits
ssl		ssl
utils		utils
README.md		README.md
main.py		main.py
main.sh		main.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Progressive Video Summarization via Multimodal Self-supervised Learning (SSPVS)

Introduction

Requirements and Dependencies

Self-supervised Pretraining

Data Preparation

Self-supervised Pretraining

Progressive Video Summarization

Data Preparation

Training and Evaluation of Video Summarization

Source Data

License and Citation

Acknowledgement

About

Releases

Packages

Languages

HopLee6/SSPVS-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Progressive Video Summarization via Multimodal Self-supervised Learning (SSPVS)

Introduction

Requirements and Dependencies

Self-supervised Pretraining

Data Preparation

Self-supervised Pretraining

Progressive Video Summarization

Data Preparation

Training and Evaluation of Video Summarization

Source Data

License and Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages