Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has anyone compared this training framework to TRL? #54

Open
StarrySeas1 opened this issue Mar 25, 2024 · 1 comment
Open

Has anyone compared this training framework to TRL? #54

StarrySeas1 opened this issue Mar 25, 2024 · 1 comment

Comments

@StarrySeas1
Copy link

TRL PPO implementation is simpler than this, and takes up less memory. This framework has an additional value contribution network. I don't know which framework is more stable and effective.

@refrain-wbh
Copy link
Contributor

While TRL indeed reduces one value function network, it may be relatively more challenging to train. That is because the policy and value function share parameters. On the other hand, the TRL library's code encapsulates a lot of optimizations, whereas our code has no additional optimization methods, making it easier to understand and modify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants