Two estimators do not improve the tabular algorithms for our environment. #20

2start · 2020-06-26T15:07:28Z

Two estimators are proposed to counteract certain environment setups.
Lets assume you have one state that has transitions to n states where each of the n states has a low probability for a large reward. It is really likely to hit the large reward at least once for large n. This leads to picking the transition to the state over and over again because of the epsilon-greedy strategy of q-learning and sarsa and it takes a long time to converge back to the expected value of the q_value.

2start added the documentation Improvements or additions to documentation label Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two estimators do not improve the tabular algorithms for our environment. #20

Two estimators do not improve the tabular algorithms for our environment. #20

2start commented Jun 26, 2020

Two estimators do not improve the tabular algorithms for our environment. #20

Two estimators do not improve the tabular algorithms for our environment. #20

Comments

2start commented Jun 26, 2020