The following set of scripts should be a good way to get started with Convolution Neural Networks. It uses a GraphLab-Create's deep learning which is based on CXXNet.
- Setup time: ~2 mins
- Train time: ~20 mins on a GPU (it could take much longer on a CPU)
- Validation score: 0.98
- Leaderboard score: 0.97
Update: I haven't had much time to improve this score but francoisluus has improved the score to 0.77 using some interesting ideas!
Here is a quick summary of the submission:
- Load images into an SFrame (scalable dataframe).
- Use Pillow to augment the data with rotations with angle 90, 180, and 270.
- Setup a simple deep learning architecture (based on antinucleon)
- Create a "fair" train, validaiton split to make sure the classes are balanced.
- Train a deep learning model.
- Evaluate the multi-class log loss score.
- Save the predictions in Kaggle's format into a submission file called "submission.csv".
CPU instructions
pip install -r requirements.pip
GPU instructions
pip install -r requirements-gpu.pip
Let us assume that you have the data downloaded into two folders called train and test. You can do that as follows:
wget https://www.kaggle.com/c/datasciencebowl/download/train.zip
wget https://www.kaggle.com/c/datasciencebowl/download/test.zip
unzip train.zip
unzip test.zip
Now run the following script. The script will create a submission file. It could take around 1 hour depending on how many interations you perform. The network can train at around 5k images a second.
python make_submission.py