Using reinforcement learning to find the shortest paths.
- numpy
- networkx
- matplotlib
- imageio (optional, useful for generating the video/gif file to visualize)
- imageio-ffmpeg (optional, useful for generating the video/gif file to visualize)
To install them, try:
pip3 install numpy networkx matplotlib imageio imageio-ffmpeg
- Please define the adjacent matrix for your problem. Note that the first row and the first column must correspond to the target state/node.
- You may need to modify the parameters of the reinforcement learning algorithms in order to solve your problem more effectively.
Here, we define the adjacent matrix as follows:
D = [[0, 4, 0, 0, 0, 0, 0, 8, 0],
[4, 0, 8, 0, 0, 0, 0, 11, 0],
[0, 8, 0, 7, 0, 4, 0, 0, 3],
[0, 0, 7, 0, 9, 14, 0, 0, 0],
[0, 0, 0, 9, 0, 10, 0, 0, 0],
[0, 0, 4, 14, 10, 0, 3, 0, 0],
[0, 0, 0, 0, 0, 3, 0, 3, 4],
[8, 11, 0, 0, 0, 0, 3, 0, 5],
[0, 0, 3, 0, 0, 0, 4, 5, 0]]
So the graph is as follows:
To run value iteration algorithm, run:
python -s vi
python -s value_iteration
The result is as follows:
To run policy iteration algorithm, run:
python -s pi
python -s policy_iteration
The result is as follows:
The start node has been set to node 3 in the code. To run Sarsa algorithm, run:
python -s sarsa
The result is as follows:
The start node has been set to node 3 in the code. To run Sarsa(λ) algorithm, run:
python -s sarsa(lambda)
python -s sarsa_lambda
The result is as follows:
The start node has been set to node 3 in the code. To run q-learning algorithm, run:
python -s q-learning
The result is as follows:
More details can be seen in the code. You can also change the start node for Sarsa, Sarsa(λ) and q-learning algorithm.