Skip to content

Experiment 3

Loic Nyssen edited this page Oct 10, 2018 · 6 revisions

Experiment 3

This experiment is to train multiple networks in parallel, each on a stuck point. Then assess the fitness of those networks after they are merged together.

In Experiment 1, there was a linear progress from the start of the game level to the right as networks were developed. In this experiment each network will start specifically before an area that Experiment 1 showed was difficult to pass. The networks end their individual training after the mean of the entire population passes that stuck point.

Mario usually starts at position 40, in this training each network starts at either 530, 1320, 1600, 2204 or 2800:

Arguments

usage: agent.py [-h] [--config-path CONFIG_PATH] [--num-cores NUM_CORES]
                [--state-path STATE_PATH] [--session-path SESSION_PATH]
                [--input-distance INPUT_DISTANCE]
                [--target-distance TARGET_DISTANCE]

Mario NEAT Agent Trainer

optional arguments:
  -h, --help            show this help message and exit
  --config-path CONFIG_PATH
                        The path to the NEAT parameter config file to use
  --num-cores NUM_CORES
                        The number of cores on your computer for parallel
                        execution
  --state-path STATE_PATH
                        The directory to pull and store states from
  --session-path SESSION_PATH
                        The directory to store output files within
  --input-distance INPUT_DISTANCE
                        The target distance Mario should start training from
  --target-distance TARGET_DISTANCE
                        The target distance Mario should achieve before
                        closing

By default we use the following arguments:

  • --config-path: /opt/train/NEAT/config-feedforward
  • --num-cores: 1
  • --state-path: /opt/train/Experiment_3/states/
  • --session-path: /opt/train/Experiment_3/session/
  • --input-distance: 40
  • --target-distance: 1000

Note: To speed up training it is recommended to increase the core count to the maximum your computer can handle.

How to launch?

Assuming you have followed the setup guide you should have a running Docker container ready to execute the training.

$ docker exec -it openai_smb /bin/bash
$ cd /opt/train/Experiment_3/
$ export DISPLAY=:1
$ python3 agent.py

Outputs?

By default all output is stored to the session-path variable which defaults to ./train/Experiment_3/session/. Within this directory you will find the following:

  • {point}-stats.csv - Information about each generation that is saved in a single file; can be used to model and drill into the development of your network over time
  • avg_fitness.svg - Diagram showing the average fitness over-time (updated after each generation); view in your browser
  • speciation.svg - The size of the species per generation; view in your browser
  • Best-X.pkl - The best genome of each generation saved as an individual file; these can be used with the PlayBest utility program
  • checkpoints/neat-checkpoint-{point}-x - The checkpoint file; training is always resumed from the latest checkpoint available

Our results?

In our first run of this experiment we trained a network on each stuck point until a single genome was able to pass it, this initially looked quite promising as the training process completed in hours rather than days compared to our baseline experiment. But when we tried to merge the networks together the resulting genome was sub-par, after 2 runs of the experiment both NEAT players would not move at all, or just perform a single action, such as jumping.

This led us to the thinking we needed more complex NEAT networks before trying to merge their best genome. So, we modified the training script to now consider a stuck point as finished once the mean of the population had passed the stuck point. We underestimated the disk space needed to run this new version of the experiment, which lead to our training environment crashing on the second stuck point; never the less, it still gave us the data we needed to know that this experiment was a bust. It took significantly without any improvement over the baseline experiment. Data Gathered

This led us to experiment 4, our thinking was the problem was that we were training new NEAT networks for each stuck point; which forced each network to learn the basics that its sibling networks had already learnt. So why not train one NEAT network on each of the stuck points one at a time.

Clone this wiki locally