You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Accelerating Direct Preference Optimization with Prefix Sharing, the authors proposed a efficient way to reduce total training tokens in paired preference optimization by combining the shared prompt with both chosen and rejected responses into a single sequence. As a result, the computation of the shared prompt is performed only once per training sample, eliminating redundant processing.
To do so, it leverages a custom attention mask. This mask masks out the region where the rejected response attends to the chosen response, ensuring that both responses are computed independently of each other.
To be more specific, please check the diagram from the paper below:
This method extends beyond DPO (demonstrated in the paper) and is compatible with all offline paired preference optimization algorithms, including ORPO and SimPO.
🚀 The feature, motivation and pitch
In Accelerating Direct Preference Optimization with Prefix Sharing, the authors proposed a efficient way to reduce total training tokens in paired preference optimization by combining the shared prompt with both chosen and rejected responses into a single sequence. As a result, the computation of the shared prompt is performed only once per training sample, eliminating redundant processing.
To do so, it leverages a custom attention mask. This mask masks out the region where the rejected response attends to the chosen response, ensuring that both responses are computed independently of each other.
To be more specific, please check the diagram from the paper below:
This method extends beyond DPO (demonstrated in the paper) and is compatible with all offline paired preference optimization algorithms, including ORPO and SimPO.
Alternatives
No response
Additional context
https://github.com/frankxwang/dpo-prefix-sharing
The text was updated successfully, but these errors were encountered: