Haopeng Li, Qiuhong Ke, Mingming Gong, Rui Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence
We propose Video Joint Modelling based on Hierarchical Transformer (VJMHT) for co-summarization, which takes into consideration the semantic dependencies across videos.
VJMHT consists of two layers of Transformer: the first layer extracts semantic representation from individual shots of similar videos, while the second layer performs shot-level video joint modelling to aggregate cross-video semantic information. By this means, complete cross-video high-level patterns are explicitly modelled and learned for the summarization of individual videos.
Moreover, Transformer-based video representation reconstruction is introduced to maximize the high-level similarity between the summary and the original video.
- Python=3.8.5
- PyTorch=1.9, ortools=8.1.8487
Download the datasets to datasets/
.
Download our models to results/
.
Run the following command to test our models.
$ python main.py -c configs/dataset_setting.py --eval
where dataset_setting.py
is the configuration file that can be found in configs/
. The results are saved in results/DATASET_SETTING/
.
Example for testing the model trained on TVSum in the canonical setting:
$ python main.py -c configs/tvsum_can.py --eval
The results are saved in results/TVSUM_CAN
.
Run the following command to train the model:
$ python main.py -c configs/dataset_setting.py
Example for training the model on TVSum in the canonical setting:
$ python main.py -c configs/tvsum_can.py
The trained models and results are saved in results/TVSUM_CAN
.
The use of this code is RESTRICTED to non-commercial research and educational purposes.
If you use this code or reference our paper in your work please cite this publication as:
@article{li2022video,
title={Video Joint Modelling Based on Hierarchical Transformer for Co-summarization},
author={Li, Haopeng and Ke, Qiuhong and Gong, Mingming and Zhang, Rui},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2022},
publisher={IEEE}
}
The code is developed based on VASNet.