Request help on Double DQN #5

Remember2018 · 2019-11-04T23:33:36Z

Hi, thanks a lot for your great work!

I have a question, in the Double DQN, maybe the following code needs a stop_gradient?

target_q = rewards + (gamma*double_q * (1-terminal_flags))

The double_q is from the target DQN. And when updating the main DQN, the error will back propagated to the target DQN if we don't stop the flow, right? So do we need to stop the gradient as follows?

target_q = tf.stop_gradient(target_q)

Could you please give some advice? Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request help on Double DQN #5

Request help on Double DQN #5

Remember2018 commented Nov 4, 2019

Request help on Double DQN #5

Request help on Double DQN #5

Comments

Remember2018 commented Nov 4, 2019