Python implementation for Sutton & Barto's Reinforcement Learning: An Introduction (2nd Edition)
Declare: Most of codes are modified from ShangtongZhang, but rewrite the codes to make it easy to understand. I not only write the codes for figures, but also complete some exercises in the book.
- Figure 2.1: An example bandit problem from the 10-armed testbed.
- Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed.
- Figure 2.3: The effect of optimistic initial action-value estimates on the 10-armed testbed.
- Figure 2.4: Average performance of UCB action selection on the 10-armed testbed.
- Figure 2.5: Average performance of the gradient bandit algorithm.
- Figure 2.6: A parameter study of the various bandit algorithms.
- Exercise 2.5
- Exercise 2.11
Feel free to discuss with me if you have any questions !【Homepage: http://guohai.tech Email: [email protected]】