Reinforcement learning: Value Iteration, SARSA, and Q-learning on the racetrack problem
To train a Q-learning policy:
python exp_Q.py
To train a SARSA policy:
python exp_SARSA.py
To train a value iteration policy:
python exp_VI.py
This results in an arrays
directory which contains the policy at various points during training as well as a
Once a policy is trained, it can be used in a race.
python race.py
This produces a race like so:
Plot of learning curves for Q-Learning on different tracks and with different crash behavior. A "normal crash" means that if the car crashes into a wall, it returns to the last valid track square, whereas a "bad crash" means that it returns to the starting line upon crashing.
Example race: