All datasets are in JSONL format, where:
id
:str
- A distinct identifier for each data entry, represented as a string.
- Chat Dataset
- Pair Preferences Dataset
- KTO Dataset
- Sampling Dataset
- Multimodal Dataset
- Classification Dataset
- DPPO Dataset (⌛️ Work in progress...)
messages
:list[ChatMessage]
— This is a sequence of messages that make up the chat history. EachChatMessage
includes:role
- The participant's role in the conversation (e.g.,user
orbot
).content
- The textual content of the message.
Example:
{
"id": 0,
"source": "example",
"messages": [
{"role": "user", "content": "Can you play chess?"},
{"role": "bot", "content": "Yes, of course"}
]
}
context
:list[ChatMessage]
— This is a sequence of messages that make up the chat history.answer_w
:ChatMessage
— The more preferable response.answer_l
:ChatMessage
— The less preferable response.
Example:
{
"id": 0,
"source": "example",
"context": [
{"role": "user", "content": "Can you play chess?"}
],
"answer_w": {"role": "bot", "content": "Yes, of course"},
"answer_l": {"role": "bot", "content": "Get out, I don't want to talk to you!"}
}
context
:list[ChatMessage]
— This is a sequence of messages that make up the chat history.answer
:ChatMessage
— The given response.is_desirable
:bool
— Indicator if the provided response is considered as desirable or no.
Example:
{
"id": 0,
"source": "example",
"context": [
{"role": "user", "content": "Can you play chess?"}
],
"answer": {"role": "bot", "content": "Yes, of course"},
"is_desirable": true
}
{
"id": 1,
"source": "example",
"context": [
{"role": "user", "content": "Can you play chess?"}
],
"answer": {"role": "bot", "content": "Get out, I don't want to talk to you!"},
"is_desirable": false
}
messages
:list[ChatMessage]
— This is a sequence of messages that make up the chat history.answers
:list[ChatInferenceOutput]
- A list of generated responses. EachChatInferenceOutput
is structured as:id
:str
- A unique identifier for the generated response.content
:str
- The content of generated completion
Example:
{
"id": "0",
"dataset_name": "example",
"messages": [
{"role": "user", "content": "hi"},
{"role": "bot", "content": "hi"},
{"role": "user", "content": "how are you"}
],
"answers": [
{"content": "good", "id": "0"},
{"content": "not bad", "id": "1"}
]
}
messages
:list[MultimodalChatMessage]
— This is a sequence of messages that make up the chat history. EachChatMessage
includes:role
- The participant's role in the conversation (e.g.,user
orbot
).type
– The type of modality (e.g.,text
orimage
)content
- If thetype
istext
, it's the textual content of the message. If it'simage
, it's the file path.
Example:
{
"id": "0",
"messages": [
{
"role": "system",
"type": "text",
"content": "You are a Multimodal AI assistant."
},
{
"role": "user",
"type": "image",
"content": "/path/to/cat.jpg"
},
{
"role": "user",
"type": "image",
"content": "/path/to/dog.jpg"
},
{
"role": "user",
"type": "text",
"content": "What's the difference between these two images?"
},
{
"role": "bot",
"type": "text",
"content": "The two images in question both feature animals, albeit of different species. The first image depicts a dog, which is generally perceived as an animal that elicits positive emotional responses. The second image features a cat, which is also regarded as an animal that evokes a positive emotional response."
}
]
}
messages
:list[ChatMessage]
— This is a sequence of messages that make up the chat history.label
:int
— Label of provided chat history.
Example:
{
"id": 0,
"source": "example",
"messages": [
{"role": "user", "content": "Can you play chess?"},
{"role": "bot", "content": "Yes, of course"}
],
"label": 1
}
{
"id": 1,
"source": "example",
"messages": [
{"role": "user", "content": "Can you play chess?"},
{"role": "bot", "content": "Get out, I don't want to talk to you!"}
],
"label": 0
}
context
:list[ChatMessage]
— This is a sequence of messages that make up the chat history.answer_w
:ChatMessage
— The more preferable response.answer_l
:ChatMessage
— The less preferable response.
Example:
{
"id": 0,
"source": "example",
"context": [
{"role": "user", "content": "Can you play chess?"}
],
"answer_w": {"role": "bot", "content": "Yes, of course"},
"answer_l": {"role": "bot", "content": "Get out, I don't want to talk to you!"}
}