A question about rlootrainer #2472

macheng6 · 2024-12-13T12:16:10Z

Method description

rlootrainer does not seem to use the self.policy model in train() function. I don't know the meaning of self.policy in the init function.

Open source status

The method implementation is available
The model weights are available
The training datasets are available

Provide useful links for the implementation

No response

asparius · 2024-12-14T00:28:52Z

It utilizes self.model, which is defined in [this line](

trl/trl/trainer/rloo_trainer.py

Line 162 in 6d4ed07

self.model = policy

). This approach is also adopted in PPOTrainer. I believe this is a deliberate nomenclature choice, designed to remain consistent across various preference learning frameworks without introducing the complexity of aligning with the diverse terminologies used in academic papers.

qgallouedec added 🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about rlootrainer #2472

A question about rlootrainer #2472

macheng6 commented Dec 13, 2024

asparius commented Dec 14, 2024

A question about rlootrainer #2472

A question about rlootrainer #2472

Comments

macheng6 commented Dec 13, 2024

Method description

Open source status

Provide useful links for the implementation

asparius commented Dec 14, 2024