-
Notifications
You must be signed in to change notification settings - Fork 207
Home
Overall algorithm is written down to a file algorithm.m (in semi-pseudocode)
- We use Python 2.7 and 32 bit version
- Requirements:
- Pillow (image processing library), NumPy and SciPy
-
sudo apt-get install python-pil python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
-
- Theano
-
sudo pip install Theano
- More information at deeplearning.net
-
- Pillow (image processing library), NumPy and SciPy
- Libraries needed:
-
sudo apt-get install libsdl-gfx1.2-dev libsdl-image1.2-dev libsdl1.2-dev
-
sudo apt-get install imagemagick
(might be needed for .show() function)
-
- Installing Arcade Learning Environment (ALE)
- Run
./install.sh
from root directory of the repository - ALE will be compiled in
./libraries/ale
- The ALE executable is:
./libraries/ale/ale
- ROMs are stored under
./libraries/ale/roms
- Run
The convnet branch makes use of the cuda-convnet2 code written by Aleksey Akrizhevsky. To run this code you MUST have a NVIDIA cuda-capable GPU of compute capability 3.5. The convnet page says it has to be at least 3.5 and we have found that 5.0 (Maxwell architecture) does not work (the bug is being fixed by NVIDA as we speak).
To begin with, you need to set up your computer to use the graphics card. Download an install Cuda Toolkit and driver for the GPU. We have had sucess with Toolkit 6.0 (https://developer.nvidia.com/cuda-toolkit-60). We strongly advise to follow the installation instructions (as well as pre-intallation and post-installation) in Getting Started Guide (http://developer.download.nvidia.com/compute/cuda/6_0/rel/docs/CUDA_Getting_Started_Linux.pdf).
Once you have succeeded in doing a "hello_world" in cuda, you can move on to installing convnet2. Download and install the dependencies and git-checkout the code as described in https://code.google.com/p/cuda-convnet2/wiki/Compiling
Change the environment variables in the build.sh file in the main directory - normally you just have to change the location of your Cuda installation. Then run "./build sh" and the convnet2 should compile in a few minutes. Nevertheless, this is not enough- to run our code, you will need to tweak the convnet2 code a bit. This will be described in the following section.
It sems that certain aspects of cuda-convnet2 code are designed to work with images that have either 1 input channel (grayscale) or 3 channels (RGB). We, however, have 4 input channels (the 4 frames). To accomodate our case without getting assertion errors, one needs to change the file at:
cuda-convnet2/cudaconv3/src/weight_acts.cu
In line 2023 replace:
numImgColors < = 3 with numImgColors < = 4
and in line 2059 replace:
if (numFilterColors > 4) with if (numFilterColors > 4)
This should help the system deal with 4 input channels. After making the modifications, recompile the program by doing "./build.sh" in the "cuda-convnet2/" folder
- We convert Atari NTSC 128 palette colors to RGB using this table. (might be different in the original paper)
- We DO NOT use formula
0.21*R + 0.71*G + 0.07*B
to convert RGB to grayscale but0.5*R + 0.5*G + 0.5*B
. (link) (might be different in the original paper)
- Issue #7 (make sure that learning changes weight values at all):
make sure thatlearning_rate > 0
and uncomment the last line in "NeuralNet.train()", that prints some parameters after every training event - Issue #10 (make sure that for different input we get different output):
easiest way to make sure of this (without writing a specific function) is to setlearning_rate = 0.0
, so the network weights do not change (you can verify that as in Issue #7) and uncomment lineprint "estimated q", estimated_Q
in "NeuralNet.train()". Starting from second frame, when we already have two different images to compare, the Q-values estimated during minibatch-training should have more than 1 different value (at second frame they have 2 possible values, at 3rd step 3 possible values)
- Preprocessing: color to grayscale
- Gradient descent learning rate
- Gradient descent regularisation: none, L1, L2 or both
- Momentum in RMSProp
- Initialization of the network: weights (mean, std) + biases
- What to do with initial frames which do not have previous memory?
- Death has no penalty?
- Implementation of the error: we have zeros in all non-taken actions