Skip to content

texastony/adapting-c3d-keras

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project is still a work in progress. Please follow this repo for updates

What is this project?

C3D was released in 2015, and it is not difficult to recreate the network in Tensorflow or Keras. I was able to use C3D for a activity recognition project with moderate success. During that process, I encountered many challenges, several of which I am still addressing. I wish to share my solutions to these challenges.

C3D

C3D is a deep neural network developed by CS deparment at Dartmouth and Facebook’s AI lab. Its main website is here, the repo for is here, and the paper is available here. It was developed and released in Caffe, but others have recreated it in Tensorflow and Keras. Caffe looks brilliant, but I am not about to try and learn every ML/NN/Analytic library/framework out there. (Frankly that would be an awesome Job).

C3D is a very popular model for video classification and activity recognition. It has appeared in many papers responding to the ActivityNet Challenge for Action proposal regions, held by CVPR. There are certantily other models for video classifcation. If you are looking for a list of papers with open source code, may I direct you to: This Repo of Papers, which I have found very useful.

What I used C3D for:

I had 9 hours of 1080 by 1920 video in which I needed to detect the activity of 21 different tools. At first, I used C3D as a feature extracture. I took the output of layer 13, the last pooling layer, and fit my own relu dense net to it with categorical cross entropy and dropout. My results were terrible, but I think that I will see better results with this method if I better pre-process the videos or re-scale C3D to match the ratio of my videos. More on this later.

I then attempted to fine tune the entire network. This network, if my understanding is correct, has 80 million trainable parameters. Fine tuning the whole network is somewhat of memory and multi-processing challenge as compared to a straight forward modeling challenge. However, I had decent results, with an accuracy of %71 on my top 4 classes. And that was only with a few epochs of training. Granted, an epoch took 4 hours. But I think that this is the better route for the problem I was working on.

What of this porject is currently useful?

None, but I will release a stable product in the comming weeks. I am in the process of generalizing my code so that it can work on novel datasets. I am also working on optimizations that take advantage of gpus every step of the process instead of just in Tensorflow. For that, I may have to switch to mxnet instead of numpy…