Buffer for on-policy algorithms like PPO, TRPO #271

sriyash421 · 2022-05-02T21:16:45Z

Implementation of on policy algorithms would need an online buffer that holds the transitions, calculates the generalized advantage returns and then samples batches during training.

Or, should we leave this part on the agent side and have the agent calculate the returns.

This is needed because the current simple buffer samples random batches so we need additional features for on-policy learning.

dapatil211 · 2022-05-02T21:32:20Z

@kshitijkg is currently working on a basic version of a buffer for this.

dapatil211 assigned kshitijkg May 2, 2022

kshitijkg linked a pull request May 3, 2022 that will close this issue

PPO #272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer for on-policy algorithms like PPO, TRPO #271

Buffer for on-policy algorithms like PPO, TRPO #271

sriyash421 commented May 2, 2022

dapatil211 commented May 2, 2022

Buffer for on-policy algorithms like PPO, TRPO #271

Buffer for on-policy algorithms like PPO, TRPO #271

Comments

sriyash421 commented May 2, 2022

dapatil211 commented May 2, 2022