torch.gather in relevant to policy gradient #31

migom6 · 2020-05-25T09:25:31Z

As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone help me to understand the process?
Thanks for the help. 😃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.gather in relevant to policy gradient #31

torch.gather in relevant to policy gradient #31

migom6 commented May 25, 2020

torch.gather in relevant to policy gradient #31

torch.gather in relevant to policy gradient #31

Comments

migom6 commented May 25, 2020