-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lstm+ppo cannot converge in Pendulum-v0 environment #2
Comments
Hi @1900360! Did parameters mentioned in this issue help for that task? |
Hi @nslyubaykin
The process of training is too slow for the simple gym environment, is there some improved space? |
Waiting for your answer, I'm very interested in this:D |
Hi @1900360! I am not sure if I understand correctly what do you mean by slow training. Is the convergence is slower itself, or it is computationally slower? And the second question, what do you mean by improved space? |
Hi @nslyubaykin!
|
Do you have any idea? I didn't get anything since I'm a freshman in DRL :) |
Hi @1900360! The reason for the slower computation is the fact that your policy is now dealing with larger observations (obs_dim*n_lags) and other things equal has more parameters (new architecture may also affect it). Plus there is some minor computational overhead with creating and processing lags. Regarding training divergence, one option is that you need just to find a new right set of hyper-parameters for this new architecture (which can be found only by trial and error). The other option is that this environment performance is just harmed by introducing lags. According to my experience, when the observations are already fully observable, using lags may harm the performance by adding redundant information to an observation. Also using out_activation=torch.nn.Tanh()
acs_scale=2 with |
Hi @nslyubaykin
lstm+ppo cannot converge in Pendulum-v0 environment, I don't know there is some setting error in my code, could you check it for a moment?
reward curve shown as below:
lstm_parallel_ppo.txt
The text was updated successfully, but these errors were encountered: