Has anyone compared this training framework to TRL? #54

StarrySeas1 · 2024-03-25T12:58:47Z

TRL PPO implementation is simpler than this, and takes up less memory. This framework has an additional value contribution network. I don't know which framework is more stable and effective.

refrain-wbh · 2024-04-28T06:01:20Z

While TRL indeed reduces one value function network, it may be relatively more challenging to train. That is because the policy and value function share parameters. On the other hand, the TRL library's code encapsulates a lot of optimizations, whereas our code has no additional optimization methods, making it easier to understand and modify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has anyone compared this training framework to TRL? #54

Has anyone compared this training framework to TRL? #54

StarrySeas1 commented Mar 25, 2024

refrain-wbh commented Apr 28, 2024

Has anyone compared this training framework to TRL? #54

Has anyone compared this training framework to TRL? #54

Comments

StarrySeas1 commented Mar 25, 2024

refrain-wbh commented Apr 28, 2024