Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer for on-policy algorithms like PPO, TRPO #271

Open
sriyash421 opened this issue May 2, 2022 · 1 comment · Fixed by #272
Open

Buffer for on-policy algorithms like PPO, TRPO #271

sriyash421 opened this issue May 2, 2022 · 1 comment · Fixed by #272
Assignees

Comments

@sriyash421
Copy link

Implementation of on policy algorithms would need an online buffer that holds the transitions, calculates the generalized advantage returns and then samples batches during training.

Or, should we leave this part on the agent side and have the agent calculate the returns.

This is needed because the current simple buffer samples random batches so we need additional features for on-policy learning.

@dapatil211
Copy link
Collaborator

@kshitijkg is currently working on a basic version of a buffer for this.

@kshitijkg kshitijkg linked a pull request May 3, 2022 that will close this issue
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants