Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two estimators do not improve the tabular algorithms for our environment. #20

Open
2start opened this issue Jun 26, 2020 · 0 comments
Open
Labels
documentation Improvements or additions to documentation

Comments

@2start
Copy link
Collaborator

2start commented Jun 26, 2020

Two estimators are proposed to counteract certain environment setups.
Lets assume you have one state that has transitions to n states where each of the n states has a low probability for a large reward. It is really likely to hit the large reward at least once for large n. This leads to picking the transition to the state over and over again because of the epsilon-greedy strategy of q-learning and sarsa and it takes a long time to converge back to the expected value of the q_value.

@2start 2start added the documentation Improvements or additions to documentation label Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant