Dataset type conversion utilities #8

August-murr · 2025-01-06T17:45:04Z

System Info

System Info
Some things that are not really correct in the dataset type conversions in https://huggingface.co/docs/trl/main/en/dataset_formats#utilities-for-converting-dataset-types:

when converting a preference dataset to an unpaired preference dataset with unpair_preference_dataset(), we are converting from a relative ranking to an absolute ranking. In a preference dataset, despite having a "chosen" and a "rejected" example, both can be good or both bad, just one slightly better/worse. See the example below.
So one should not convert a Preference dataset to an Unpaired Preference Dataset without keeping an eye on absolute ratings from e.g. a reward model.
Suggestion: At least add a warning to the documentation and conversion code or even remove it
when converting from Unpaired preference or Stepwise supervision to anything un-labeled like Language modeling or Prompt-completion, only the good (label=True) examples should be used. Like when converting from a Preference dataset it only uses the chosen completions.
Suggestion: Can easily fix that in the example conversion code

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

from datasets import Dataset
dataset_dict = {
"prompt": ["The sky is", "The sun is"]
"chosen": [" blue.", " in our solar system"],
"rejected": [" above.", " in the sky."]
}
dataset = Dataset.from_dict(dataset_dict)
dataset = unpair_preference_dataset(dataset)
dataset[1]

Expected behavior

{'prompt': 'The sky is', 'completion': ' blue.', 'label': True}
{'prompt': 'The sky is', 'completion': ' above.', 'label': True}

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

August-murr added the bug label Jan 6, 2025

github-actions bot added the 📚 documentation Improvements or additions to documentation label Jan 6, 2025

August-murr removed bug 📚 documentation Improvements or additions to documentation labels Jan 6, 2025

github-actions bot added Based on the provided issue title and description here are the possible label names: 💡 improvement 🏋️‍♂️ suggestion 🚀 deepspeed Related to deepspeed 🧒 good approach ✨ annotation labels Jan 12, 2025

August-murr removed 🚀 deepspeed Related to deepspeed Based on the provided issue title and description here are the possible label names: 💡 improvement 🏋️‍♂️ suggestion 🧒 good approach ✨ annotation labels Jan 13, 2025

github-actions bot added ❓ question Seeking clarification or more information 🎯 optimal import sentence 🏋 DDPO Related to DDPO 🏋 DPO Related to DPO 🏋 GKD Related to GKD 🏋 Iterative SFT Related to Iterative SFT 🏋 KTO Related to KTO labels Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset type conversion utilities #8

Dataset type conversion utilities #8

August-murr commented Jan 6, 2025 •

edited

Loading

Dataset type conversion utilities #8

Dataset type conversion utilities #8

Comments

August-murr commented Jan 6, 2025 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

August-murr commented Jan 6, 2025 •

edited

Loading