You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RewardTrainer has a default behavior of printing four chosen & rejected responses along with their logits at every validation iteration. This is implemented in the following line:
AFAIK there is no parameter to turn off this printing, change num_print_samples, etc. I tried passing it in to the RewardConfig but it's not a recognized parameter. I was wondering if the following functionality could be added:
Include num_print_samples as a RewardConfig parameter
If num_print_samples > 0, call self.visualize_samples(num_print_samples) as currently implemented
If num_print_samples == 0, then skip self.visualize_samples() entirely.
Motivation
When training a large number of reward models and not actively debugging their training, I do not want or need verbose printing of a rich table.
Your contribution
Not currently, but I will update with a PR if I get a chance.
The text was updated successfully, but these errors were encountered:
Thanks for raising this issue @rmovva - I agree it would be good to disable this and keep the training logs lean. I think a better approach would be the following:
Remove visualise_samples() altogether
Implement a RewardSamplesCallback (or some similar name) that provides this functionality + table creation on WandB and users can decide if they wish to include it in their training script or not.
I'm not sure if the callback could be entirely general (i.e. easy to switch between reward modelling or PPO/RLOO), but if it can, then all the better!
just ran into the same issue myself. I think that makes sense @lewtun . I should be able to make a PR for this over the weekend. Just putting this here incase you know if someone is already working on this, thanks!
Feature request
The RewardTrainer has a default behavior of printing four chosen & rejected responses along with their logits at every validation iteration. This is implemented in the following line:
L359 in reward_trainer.py
AFAIK there is no parameter to turn off this printing, change num_print_samples, etc. I tried passing it in to the RewardConfig but it's not a recognized parameter. I was wondering if the following functionality could be added:
num_print_samples > 0
, callself.visualize_samples(num_print_samples)
as currently implementednum_print_samples == 0
, then skip self.visualize_samples() entirely.Motivation
When training a large number of reward models and not actively debugging their training, I do not want or need verbose printing of a rich table.
Your contribution
Not currently, but I will update with a PR if I get a chance.
The text was updated successfully, but these errors were encountered: