This repository contains the source code for the paper Adascan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos, Amlan Kar* (IIT Kanpur), Nishant Rai* (IIT Kanpur), Karan Sikka (UCSD and SRI), Gaurav Sharma (IIT Kanpur), with support for multi-GPU training and testing.
- Tensorflow (this repository uses version 0.11)
- Numpy
- skimage
- skvideo
Note: skimage and skvideo are required for the preprocessing step
- Download UCF-101 dataset from here and UCF-101 flow files from here
- Download UCF-101 action recognition splits from here (to be passed using -split_dir)
- Run preprocessing script to create npz files required for training/testing (directory created to be passed using -data_dir)
- [RGB training] Download VGG numpy files from here (to be passed using -vgg_npy_path)
- [Optical Flow training] Download the pre-trained caffe models for flow from here and convert them using this tool to numpy files
- Edit sample_train.sh and run
- Download the pre-trained models from the given links below
- Download VGG numpy file for RGB and any one of the flow files to pass with -npy_path for testing (This is an extra step and doesn't change anything, we will remove this unneccessary step soon)
- Edit sample_test.sh and run
python demo.py -ckpt_file path/to/ckpt/file -vid_file vis/vid_file
This should save an image in vis/ that looks like:
These models have been trained on UCF-101. We will be releasing the updated models soon.
Sample self explanatory train and test scripts have been provided with the code
After fixing a bug post-submission, we have achieved higher results with the same configuration as in the original paper. We request authors to cite these numbers.
Model | UCF-101 | HMDB-51 |
---|---|---|
AdaScan | 91.6 | 62.4 |
AdaScan + iDT | 93.1 | 67.6 |
AdaScan + iDT + C3D | 94.0 | 69.4 |
If you use this code as part of any published research, please acknowledge the following paper:
AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos
Amlan Kar*, Nishant Rai*, Karan Sikka, Gaurav Sharma (*denotes equal contribution)
@article{kar2016adascan,
title={AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos},
author={Kar, Amlan and Rai, Nishant and Sikka, Karan and Sharma, Gaurav},
booktitle={CVPR},
year={2017}
}