-
Notifications
You must be signed in to change notification settings - Fork 1
Experiment 3
This experiment is to train multiple networks in parallel, each on a stuck point. Then assess the fitness of those networks after they are merged together.
In Experiment 1, there was a linear progress from the start of the game level to the right as networks were developed. In this experiment each network will start specifically before an area that Experiment 1 showed was difficult to pass. The networks end their individual training after the mean of the entire population passes that stuck point.
Mario usually starts at position 40, in this training each network starts at either 530, 1320, 1600, 2204 or 2800:
usage: agent.py [-h] [--config-path CONFIG_PATH] [--num-cores NUM_CORES]
[--state-path STATE_PATH] [--session-path SESSION_PATH]
[--input-distance INPUT_DISTANCE]
[--target-distance TARGET_DISTANCE]
Mario NEAT Agent Trainer
optional arguments:
-h, --help show this help message and exit
--config-path CONFIG_PATH
The path to the NEAT parameter config file to use
--num-cores NUM_CORES
The number of cores on your computer for parallel
execution
--state-path STATE_PATH
The directory to pull and store states from
--session-path SESSION_PATH
The directory to store output files within
--input-distance INPUT_DISTANCE
The target distance Mario should start training from
--target-distance TARGET_DISTANCE
The target distance Mario should achieve before
closing
By default we use the following arguments:
-
--config-path
:/opt/train/NEAT/config-feedforward
-
--num-cores
:1
-
--state-path
:/opt/train/Experiment_3/states/
-
--session-path
:/opt/train/Experiment_3/session/
-
--input-distance
:40
-
--target-distance
:1000
Note: To speed up training it is recommended to increase the core count to the maximum your computer can handle.
Assuming you have followed the setup guide you should have a running Docker container ready to execute the training.
$ docker exec -it openai_smb /bin/bash
$ cd /opt/train/Experiment_3/
$ export DISPLAY=:1
$ python3 agent.py
By default all output is stored to the session-path
variable which defaults to ./train/Experiment_3/session/
. Within this directory you will find the following:
-
{point}-stats.csv
- Information about each generation that is saved in a single file; can be used to model and drill into the development of your network over time -
avg_fitness.svg
- Diagram showing the average fitness over-time (updated after each generation); view in your browser -
speciation.svg
- The size of the species per generation; view in your browser -
Best-X.pkl
- The best genome of each generation saved as an individual file; these can be used with thePlayBest
utility program -
checkpoints/neat-checkpoint-{point}-x
- The checkpoint file; training is always resumed from the latest checkpoint available
In our first run of this experiment we trained a network on each stuck point until a single genome was able to pass it, this initially looked quite promising as the training process completed in hours rather than days compared to our baseline experiment. But when we tried to merge the networks together the resulting genome was sub-par, after 2 runs of the experiment both NEAT players would not move at all, or just perform a single action, such as jumping.
This led us to the thinking we needed more complex NEAT networks before trying to merge their best genome. So, we modified the training script to now consider a stuck point as finished once the mean of the population had passed the stuck point. We underestimated the disk space needed to run this new version of the experiment, which lead to our training environment crashing on the second stuck point; never the less, it still gave us the data we needed to know that this experiment was a bust. It took significantly without any improvement over the baseline experiment. Data Gathered
This led us to experiment 4, our thinking was the problem was that we were training new NEAT networks for each stuck point; which forced each network to learn the basics that its sibling networks had already learnt. So why not train one NEAT network on each of the stuck points one at a time.