Utilization of negative samples #2

HillZhang1999 · 2024-06-25T07:30:15Z

Dear authors:
First of all, I appreciate your engaging and informative work! I have a question regarding your research: I noticed that you only utilize positive samples for SFT when enhancing the policy models. Have you considered incorporating negative samples through methods such as DPO?

zhangdan0602 · 2024-07-26T01:54:44Z

Thank you for your question! Indeed, we reproduce the baseline, Self-Rewarding, which runs the DPO using negative samples of LLMs judgments.

zhangdan0602 added the about dataset datasets of PRM and policy model label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilization of negative samples #2

Utilization of negative samples #2

HillZhang1999 commented Jun 25, 2024

zhangdan0602 commented Jul 26, 2024

Utilization of negative samples #2

Utilization of negative samples #2

Comments

HillZhang1999 commented Jun 25, 2024

zhangdan0602 commented Jul 26, 2024