reinforcement_learning_to_play_cartpole_game

A DQN model and an Actor-Critic model to paly cartpole games

Introduction

the objective of reinforcement learning is to train an intelligent agent that is capable of interacting with an environment intelligently. For instance, in the game of Go, the GO playing AI learns to excel in playing the game (the environment) through playing tens of thousands of game over and over again (usually with itself!) and discovers subtle strategies through trial and error. At present, the two most popular classes of reinforcement learning algorithms are Q Learning and Policy Gradients. Q learning is a type of value iteration method aims at approximating the Q function, while Policy Gradients is a method to directly optimize in the action space. Please read the refered articles for more details about the concept of Deep-Q-Learning and Policy gradient.

Methodology

DQN

Build a neural network model to represent the agent and initialize with random values
Get the initial state from gym
The agent pick an action (random or the best action available) and get the reward and next state from gym
Experience replay: Save (state, action, reward, next_state, done) to a memory, sample from the memory, update reward and train the agent.

Actor-Critic

Build an actor model to predict the action
Build a critic model to return a score of the action
Play the game and get reward, next_state and done signal from gym and remember (state, action, reward, next_state, done)
Experience replay: sample data from memory, update actor with the difference between new_value and original_value, update critic with decayed reward, and train both actor and critic.

Result

DQN

episode: 473/800, score: 499, e: 0.094
episode: 474/800, score: 499, e: 0.094
episode: 475/800, score: 499, e: 0.093
episode: 476/800, score: 499, e: 0.093
episode: 477/800, score: 337, e: 0.092
episode: 478/800, score: 499, e: 0.092
episode: 479/800, score: 499, e: 0.092
episode: 480/800, score: 499, e: 0.091
episode: 481/800, score: 499, e: 0.091
episode: 482/800, score: 499, e: 0.09
episode: 483/800, score: 499, e: 0.09
episode: 484/800, score: 499, e: 0.089
episode: 485/800, score: 499, e: 0.089
episode: 486/800, score: 499, e: 0.088
episode: 487/800, score: 499, e: 0.088
episode: 488/800, score: 249, e: 0.088
episode: 489/800, score: 499, e: 0.087
episode: 490/800, score: 499, e: 0.087
episode: 491/800, score: 499, e: 0.086
episode: 492/800, score: 499, e: 0.086

Actor-Critic

episode: 576/800, score: 499, e: 0.056
episode: 577/800, score: 499, e: 0.056
episode: 578/800, score: 499, e: 0.056
episode: 579/800, score: 499, e: 0.055
episode: 580/800, score: 499, e: 0.055
episode: 581/800, score: 499, e: 0.055
episode: 582/800, score: 499, e: 0.055
episode: 583/800, score: 499, e: 0.054
episode: 584/800, score: 499, e: 0.054
episode: 585/800, score: 499, e: 0.054
episode: 586/800, score: 499, e: 0.054
episode: 587/800, score: 499, e: 0.053
episode: 588/800, score: 499, e: 0.053
episode: 589/800, score: 499, e: 0.053
episode: 590/800, score: 499, e: 0.052
episode: 591/800, score: 499, e: 0.052
episode: 592/800, score: 297, e: 0.052
episode: 593/800, score: 287, e: 0.052
episode: 594/800, score: 499, e: 0.051
episode: 595/800, score: 499, e: 0.051
episode: 596/800, score: 499, e: 0.051

References

https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html
https://github.com/gregretkowski/notebooks/blob/master/ActorCritic-with-OpenAI-Gym.ipynb
https://keon.io/deep-q-learning/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
README.md.bak		README.md.bak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reinforcement_learning_to_play_cartpole_game

Introduction

Methodology

DQN

Actor-Critic

Result

DQN

Actor-Critic

References

About

Releases

Packages

Languages

License

yc930401/reinforcement_learning_to_play_cartpole_game

Folders and files

Latest commit

History

Repository files navigation

reinforcement_learning_to_play_cartpole_game

Introduction

Methodology

DQN

Actor-Critic

Result

DQN

Actor-Critic

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages