Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about rlootrainer #2472

Open
1 of 3 tasks
macheng6 opened this issue Dec 13, 2024 · 1 comment
Open
1 of 3 tasks

A question about rlootrainer #2472

macheng6 opened this issue Dec 13, 2024 · 1 comment
Labels
🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO

Comments

@macheng6
Copy link

Method description

rlootrainer does not seem to use the self.policy model in train() function. I don't know the meaning of self.policy in the init function.

Open source status

  • The method implementation is available
  • The model weights are available
  • The training datasets are available

Provide useful links for the implementation

No response

@qgallouedec qgallouedec added 🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO labels Dec 13, 2024
@asparius
Copy link
Contributor

It utilizes self.model, which is defined in [this line](

self.model = policy
). This approach is also adopted in PPOTrainer. I believe this is a deliberate nomenclature choice, designed to remain consistent across various preference learning frameworks without introducing the complexity of aligning with the diverse terminologies used in academic papers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
Projects
None yet
Development

No branches or pull requests

3 participants