You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@SeungyounShin
PPO is an on-policy algorithm. When you update agent using highly correlated trajectories can makes agent worse. I fixed that #45 for you.
if you train ppo far enough likes 3000 episodes or more, rewards got dropped. (like 500 to 30)
The text was updated successfully, but these errors were encountered: