Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RewardTrainer] Change print_rich_table parameters during Reward Model training #2121

Open
rmovva opened this issue Sep 25, 2024 · 2 comments
Labels
🏋 Reward Related to Reward modelling

Comments

@rmovva
Copy link

rmovva commented Sep 25, 2024

Feature request

The RewardTrainer has a default behavior of printing four chosen & rejected responses along with their logits at every validation iteration. This is implemented in the following line:

L359 in reward_trainer.py

AFAIK there is no parameter to turn off this printing, change num_print_samples, etc. I tried passing it in to the RewardConfig but it's not a recognized parameter. I was wondering if the following functionality could be added:

  • Include num_print_samples as a RewardConfig parameter
  • If num_print_samples > 0, call self.visualize_samples(num_print_samples) as currently implemented
  • If num_print_samples == 0, then skip self.visualize_samples() entirely.

Motivation

When training a large number of reward models and not actively debugging their training, I do not want or need verbose printing of a rich table.

Your contribution

Not currently, but I will update with a PR if I get a chance.

@lewtun
Copy link
Member

lewtun commented Sep 27, 2024

Thanks for raising this issue @rmovva - I agree it would be good to disable this and keep the training logs lean. I think a better approach would be the following:

  • Remove visualise_samples() altogether
  • Implement a RewardSamplesCallback (or some similar name) that provides this functionality + table creation on WandB and users can decide if they wish to include it in their training script or not.

I'm not sure if the callback could be entirely general (i.e. easy to switch between reward modelling or PPO/RLOO), but if it can, then all the better!

@saum7800
Copy link

saum7800 commented Oct 4, 2024

just ran into the same issue myself. I think that makes sense @lewtun . I should be able to make a PR for this over the weekend. Just putting this here incase you know if someone is already working on this, thanks!

@qgallouedec qgallouedec added the 🏋 Reward Related to Reward modelling label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 Reward Related to Reward modelling
Projects
None yet
Development

No branches or pull requests

4 participants