-
set a conda environment
conda create -n tgifqa -y source activate tgifqa conda install -c conda-forge python=2.7.15 backports.weakref=1.0.post1 -y conda install -c conda-forge mkl mkl-include mkl-dnn enum34 -y conda install -c free cudatoolkit=8.0 cudnn=6.0.21 -y conda install -c anaconda -c conda-forge -c free tensorflow-gpu=1.4.1 tensorflow-tensorboard=0.4.0 backports.weakref=1.0.post1 cudatoolkit=8.0 cudnn=6.0.21 -y
-
Install python modules
pip install -r requirements.txt python -m spacy download en_core_web_sm
-
Set TGIF-QA dataset and related files in this (HOME/code) folder.
mkdir dataset mkdir dataset/tgif cp -r ../dataset dataset/tgif/DataFrame mkdir dataset/tgif/features dataset/tgif/Vocabulary ../dataset/word_vectors
-
Download GIF files in dataset page and extract the zip file it into
dataset/tgif/gifs
. -
Download crawl-300d-2M.vec from FastText[https://fasttext.cc/docs/en/english-vectors.html] and move it to the
HOME/dataset/word_vectors
folder.
Note: Since the codebase was implemented and used in 2018, some packages may include major updates, which could bring a slight difference in performance.
-
Download GIF files into your directory.
-
Install ffmpeg.
-
Extract all GIF frames into a separate folder.
./save-frames.sh dataset/tgif/{gifs,frames}
-
If using optical flow, perform this step, otherwise, skip it. Use Farneback's dense optical flow to extract flows for each gifs, and store it as input to the ResNet in the next step.
If not, extract ResNet-152 and C3D features by using each pretrained models. - Extract 'res5c', 'pool5' for ResNet-152, and 'conv5b', 'fc6' for C3D. - If a GIF file contains less than 16 frames, append the last frame to have 16 frames at least. - When extracting the C3D features, use stride 1 pad the first frame eight times for the first frame, and pad the last frame 7 time for the very last frame (SAME padding).
- Wrap each extracted features into hdf5 files per layer, name them as 'TGIF_[MODEL]_[layer_name].hdf5' (ex, TGIF_C3D_fc6.hdf5, TGIF_RESNET_pool5.hdf5, TGIF_ResOF_pool5.hdf5), and save them into 'code/dataset/tgif/features'. For example, pool5 feature and res5c feature need to be stored in a different hdf5 file. Each feature file should have to be a dictionary that uses 'key' field of each dataset file as the key of a dictionary and a numpy array of extracted features in (#frames, feature dimension) shape.
Note. We uploaded three hdf5 files ( Resnet_pool5, C3D_fc6, ResOF_pool5 ), but we failed to upload the other two files because of its size.
- Choose task [Count, Action, FrameQA, Trans] and model name [C3D, Resnet, Concat, Tp, Sp, SpTp]
- Run python script
cd gifqa python main.py --task=Count --name=Tp
- Choose task [Count, Action, FrameQA, Trans], model name [C3D, Resnet, Concat, Tp, Sp, SpTp] and set checkpoint path
- Run python script
cd gifqa python main.py --task=Count --name=Tp --checkpoint_path=YOUR_CHECKPOINT_PATH --test_phase=True --evaluate_only=True
-
Download checkpoints for concat and temporal models from this link and place checkpoint folders in
gifqa/pretrained_models
. Additionally, copy the unzipped fasttext folder to theHOME/dataset
directory. -
Run test script
cd gifqa ./test_scripts/{task}_{model}.sh
Last Edit: Apr 25, 2021