DPO #1008
-
Hi @winglian, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
I don't have the answer this this (sorry), but I just wanted to say that I'd also like to know this. I've seen people using DPO to train their models and they say they get significant quality gains, but I've never seen anyone explain what the software or setup used to do this is. |
Beta Was this translation helpful? Give feedback.
-
Yes, it is possible but you currently need to checkout the rl-trainer branch. You can find an example configuration here. |
Beta Was this translation helpful? Give feedback.
-
Fyi, this has also been merged to main in beta! https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/docs/rlhf.md |
Beta Was this translation helpful? Give feedback.
Yes, it is possible but you currently need to checkout the rl-trainer branch. You can find an example configuration here.