Intruduction

 .d888b, d8888b  88bd88b  d8888b d8888b .d888b, d8888b d888b8b  
 ?8b,   d8b_,dP  88P' ?8bd8P' `Pd8b_,dP ?8b,   d8b_,dPd8P' ?88  
   `?8b 88b     d88   88P88b    88b       `?8b 88b    88b  ,88b 
`?888P' `?888P'd88'   88b`?888P'`?888P'`?888P' `?888P'`?88P'`88b
                                                             )88
                                                            ,88P
                                                        `?8888P

Intruduction

Movie Scene Segmentation based on CVPR2020 (A Local-to-Global Approach to Multi-modal Movie Scene Segmentation)

Evaluation

We take two commonly used metrics:

mAP -- the mean Average Precision of scene transition predictions for each movie.
Miou -- for each ground-truth scene, we take the maximum intersection-over-union with the detected scenes, averaging them on the whole video. Then the same is done for detected scenes against ground-truth scenes, and the two quantities are again averaged. The intersection/union is evaluated by counting the frames.

Using pre-trained model without Audio

AP:     0.444
mAP:    0.450
Miou:   0.47563575360370447
Recall: 0.7438605361117098

Trained on 56 videos and Place feature only

AP:     0.401
mAP:    0.405
Miou:   0.46523629842773173
Recall: 0.5669334164159875

Project Structure

Please refer to Install guide from original SceneSeg repo

├── config  # config files location
├── data    # data root for experiments
├── pre     # extract features from video
├── run     # place to store experiments
├── src     # models and feature loading
├── utilis  # useful functions 
├── QuickExec.ipynb
├── test.py
├── train.py
├── EDA.py
├── Extract_Features.py

I am using 64 videos in total, train:val:test = 48:8:8, data split located in ./data/meta/split.json
Features can be loaded through ./src/data/all.py
If you want to use free GPU resources, use QuickExec.ipynb in google colab to execute python scripts
train.py for training (noting special)
test.py allows you to choose features in pretrain model, (if you only curious of the specific features in the pretrained model)
EDA.py provides *.pkl and *.npy preview, Extract_Features.py can extract pre-packed features into the original LGSS repo's format

About Model Modification

Unfortunately this part is not included in config file. Model is located in ./src/models/lgss.py. Modify LGSSone to specify model strucutre for different features, for example audio and cast contributing very little to the overall performance especially when data size is small, in this case, change BNet to BNet_lite or BNet_aud_lite to speed up training.

Actually I found BNet performs really terrible in small dataset in cast and audio, you can further reduce the feature contribution in config file cfg.model.ratio.

Next step

I believe there is better way to aggregrate differnet features together instead of simply sum them up, something similar to attention layer?

I will update model once I have idea lol.

More About Original Paper

Feature Extraction Feature extracting already been aggregrated in MovieNet-tools (yehhhh), the source code is worth reading, it's not the focuse of this repo through Here are how features are extracted:

Place
- ResNet50
Cast
- Faster-RCNN on CIM dataset to detect
- ResNet50 on PIPA to extract
Action
- TSN on AVA dataset
Audio
- NaverNet on AVA-ActiveSpeaker dataset to separate speech
- stft to get features repectively in a shot

BNet (Boundary Net) Data: 2 * w_b shots => before and after boundary Two parts

B_d: d stands for 'difference'
- 2 * convolution layers: before and after shot + inner product operation to calculate differences
B_r: r stands for 'relationship'
- 1 * convolution layer + max pooling

After BNet: Coarse Prediction at Segment Level Next step is to predicting a sequence binary Use w_t shots each time to avoid memory leakage

seq to seq model: Bi-LSTM
- stride w_t / 2 shots
- return a coarse score: probability of a shot boundary to be a scene boundary
coarse prediction:
- binarizing coarse score (which is a list) with a threshold t

LGSS (Local-to-Global Scene Segmentation) Get Coarse Predictions separately from different features and sum them up (so disappointing)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intruduction

Evaluation

Project Structure

About Model Modification

Next step

More About Original Paper

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data/meta		data/meta
pre		pre
src		src
utilis		utilis
.gitignore		.gitignore
EDA.py		EDA.py
Extract_Features.py		Extract_Features.py
QuickExec.ipynb		QuickExec.ipynb
README.md		README.md
gen_csv.py		gen_csv.py
test.py		test.py
train.py		train.py

quhaohao/SceneSeg

Folders and files

Latest commit

History

Repository files navigation

Intruduction

Evaluation

Project Structure

About Model Modification

Next step

More About Original Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages