Temporal Pooling for Video Classification

This is experimental code for temporal pooling networks for video classification on the Youtube 8M dataset. The code follows the format of the starter code given by Youtube for this dataset. I have added my temporal pooling network model code in the frame level models file.

Temporal Pooling Networks

The concept of temporal pooling networks is rather simple, instead of stacking RNN layers on top of one another, pool their outputs between each layer. This has several affects; it reduces the time length with which the RNN needs to process, thus reducing compuration time, and it aggregates temporal information which is critical for video summarization and classification tasks. Below are two diagrams describing two network structures, one where a pooling operation is inserted between layers, and one where outputs are skipped between layers.

Overview of Models

TODO: Add paper reference when completed

TemporalPoolingNetworkModel: Processes the features of each frame using a GRU neural net. The outputs of the GRU are then locally pooled and fed into the second GRU layer. The internal state of the GRU's are then fed into a video-level model for classification.
TemporalSkippingNetworkModel: Processes the features of each frame using a GRU neural net. The outputs of the GRU are then skipped with input stride and fed into the second GRU layer. The internal state of the GRU's are then fed into a video-level model for classification.

Example Usage

Below is an example of how to run a temporal pooling network model with average pooling, kernel width of 3, and stride of 3:

python train.py --train_data_pattern='/home/mwoodson/data/train/train*.tfrecord' \
--frame_features=True --model=TemporalPoolingNetworkModel \
--feature_names="rgb, audio" --feature_sizes="1024, 128" \
--lstm_cells=1024 --pool_size=3 --pool_stride=3 --pool_type=AVG \
--learned_pooling=False --train_dir=$MODEL_DIR/test_pool \
--base_learning_rate=0.001 --batch_size=128 --num_epochs=5 \
--start_new_model=True

And the same example but using temporal skipping instead of pooling:

python train.py --train_data_pattern='/home/mwoodson/data/train/train*.tfrecord' \
--frame_features=True --model=TemporalSkippingNetworkModel \
--feature_names="rgb, audio" --feature_sizes="1024, 128" \
--lstm_cells=1024 --pool_size=3 --pool_stride=3 --pool_type=AVG \
--learned_pooling=False --train_dir=$MODEL_DIR/test_pool \
--base_learning_rate=0.001 --batch_size=128 --num_epochs=5 \
--start_new_model=True

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
images		images
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
average_precision_calculator.py		average_precision_calculator.py
cloudml-4gpu.yaml		cloudml-4gpu.yaml
cloudml-gpu-distributed.yaml		cloudml-gpu-distributed.yaml
cloudml-gpu.yaml		cloudml-gpu.yaml
convert_prediction_from_json_to_csv.py		convert_prediction_from_json_to_csv.py
eval.py		eval.py
eval_util.py		eval_util.py
export_model.py		export_model.py
fix_records.py		fix_records.py
frame_level_models.py		frame_level_models.py
inference.py		inference.py
losses.py		losses.py
mean_average_precision_calculator.py		mean_average_precision_calculator.py
model_utils.py		model_utils.py
models.py		models.py
readers.py		readers.py
train.py		train.py
utils.py		utils.py
video_level_models.py		video_level_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Pooling for Video Classification

Temporal Pooling Networks

Overview of Models

Example Usage

Results

About

Releases

Packages

Contributors 11

Languages

License

mwoodson1/temporal-pooling-networks

Folders and files

Latest commit

History

Repository files navigation

Temporal Pooling for Video Classification

Temporal Pooling Networks

Overview of Models

Example Usage

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages