DPO #1008

fakerybakery · 2023-12-27T23:04:42Z

fakerybakery
Dec 27, 2023

Hi @winglian,
Is it possible to train models with DPO (Direct Performance Optimization) using this framework?
Thank you!

Answered by filippo82

Jan 4, 2024

Yes, it is possible but you currently need to checkout the rl-trainer branch. You can find an example configuration here.

View full answer

araleza · 2023-12-28T09:48:31Z

araleza
Dec 28, 2023

I don't have the answer this this (sorry), but I just wanted to say that I'd also like to know this. I've seen people using DPO to train their models and they say they get significant quality gains, but I've never seen anyone explain what the software or setup used to do this is.

1 reply

fakerybakery Dec 28, 2023
Author

FYI here's the DPO paper

filippo82 · 2024-01-04T20:11:49Z

filippo82
Jan 4, 2024

Yes, it is possible but you currently need to checkout the rl-trainer branch. You can find an example configuration here.

0 replies

NanoCode012 · 2024-02-23T17:45:32Z

NanoCode012
Feb 23, 2024
Collaborator

Fyi, this has also been merged to main in beta! https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/docs/rlhf.md

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO #1008

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

DPO #1008

fakerybakery Dec 27, 2023

Replies: 3 comments · 1 reply

araleza Dec 28, 2023

fakerybakery Dec 28, 2023 Author

filippo82 Jan 4, 2024

NanoCode012 Feb 23, 2024 Collaborator

fakerybakery
Dec 27, 2023

Replies: 3 comments 1 reply

araleza
Dec 28, 2023

fakerybakery Dec 28, 2023
Author

filippo82
Jan 4, 2024

NanoCode012
Feb 23, 2024
Collaborator