Skip to content

kixlab/chatgpt-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Dataset Description

This dataset is released from the paper Understanding Users’ Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level.

Access to the dataset is based on an End-User License Agreement. The use of the dataset is strictly restricted to non-commercial research.

💁 For those interested in full access, please make a request through this Dataset Request Form.

A sample of the dataset can be found in the sample.json file.

Structure

The dataset is hierarchically organized, comprising the following components:

  • User (N=94)
  • ChatGPT conversation links and logs (N=249)
  • User's recollected experience data on dissatisfactory ChatGPT responses (N=377)
  • User's strategies to respond to the dissatisfactory response (N=459)

The dataset is provided in JSON format and the structure is as follows:

NOTE: Each field of this dataset is self-reported data from the user.

user_id
├── user_info
│   ├── age
│   ├── gender
│   ├── job
│   ├── first_language
│   └── LLM_knowledge
│       ├── knowledge_level
│       └── knowledge_level_reason
├── chatgpt_general_experiences
│   ├── chatgpt_usage_language
│   │   ├── language
│   │   ├── writing_level
│   │   └── reading_level
│   ├── chatgpt_usage_purpose
│   ├── chatgpt_usage_period
│   ├── chatgpt_usage_frequency
│   └── chatgpt_overall_satisfaction
│       ├── satisfaction_level
│       └── satisfaction_level_reason
├── chat_data_1
│   ├── chat_date
│   ├── memory_level
│   ├── model_version
│   ├── chat_usage_language
│   ├── chat_usage_reason
│   ├── chat_purpose
│   ├── why_chatgpt
│   ├── dissatisfaction_responses (list)
│   │   ├── dissatisfaction_chatgpt_response 
│   │   ├── dissatisfaction_chat_num
│   │   ├── dissatisfaction_category_score
│   │   │   ├── D_intent
│   │   │   ├── D_depth
│   │   │   ├── D_accuracy
│   │   │   ├── D_transparency
│   │   │   ├── D_refuse
│   │   │   ├── D_ethics
│   │   │   └── D_format
│   │   ├── dissatisfaction_overall_score
│   │   ├── dissatisfaction_reason
│   │   └── tactics (list)
│   │       ├── tactic_type
│   │       ├── tactic_prompt
│   │       ├── tactic_prompt_chat_num
│   │       ├── tactic_effectiveness_score
│   │       ├── tactic_effectiveness_reason
│   │       ├── tactic_theme
│   │       └── tactic_code
│   ├── chat_shared_link
│   └── chat_log (list)
│       ├── role
│       └── content
└── created_at

Description of Each field

Field Type Description Example
user_id string Unique identifier for a user cCbpmERyodVxuVXXHFBl6Q2h
user_info Object Nested structure containing the information about the user
age number User's age 23
gender string User's gender female
job string User's job Engineer
first_language string User's first language English
LLM_knowledge Object
knowledge_level ordinal number User's knowledge level regarding LLM on a 7-point scale
(1: very low, 7: very high)
6
knowledge_level_reason string The reason for user's knowledge level I have been studying LLM for 3 years.
chatgpt_general_experiences Object Nested structure containing various fields related to the
user's overall experiences with using ChatGPT.
chatgpt_usage_language Object Sub-structure containing information about
the user's commonly used language with ChatGPT.
usage_language string User's commonly used language for interacting with ChatGPT. English
writing_level ordinal number User's writing proficiency level about the language
on a 7-point scale (1: very low, 7: very high)
7
reading_level ordinal number User's reading comprehension level about the language
on a 7-point scale (1: very low, 7: very high)
6
chatgpt_usage_purpose string The primary purpose or use case of ChatGPT for the user. I used it to get some useful information,
summarisation, translation, etc.
chatgpt_usage_period string The duration of the user's usage of ChatGPT. 1 month
chatgpt_usage_frequency string The frequency of the user's usage of ChatGPT. 1 time a week
chatgpt_overall_satisfaction Object Nested structure containing the user's overall satisfaction with ChatGPT
satisfaction_level ordinal number User's overall satisfaction level with ChatGPT
on a 7-point scale (1: very low, 7: very high)
5
satisfaction_level_reason string The reason for the user's satisfaction level. I am satisfied with ChatGPT
because it is very useful.
chat_data_n Object Nested structure that contains detailed information about a specific chat session with ChatGPT.
n could be 1 ~ 5.
chat_date string Date of that chat session. One week ago
memory_level ordinal number User's memory level of that chat session.
on a 7-point scale (1: very poor, 7: Excellent)
7
model_version string The version of ChatGPT model used in that chat session (gpt3.5 or gpt4). gpt3.5
chat_usage_language string The language used in that chat session. English
chat_usage_language_reason string The reason for using that language in that chat session. because this is my first language
chat_purpose string The specific purose of that chat session. I want to get some summarise of the article
why_chatgpt string The motivation or reason for using ChatGPT for that purpose. I thought it is good at summarising
dissatisfaction_responses Array List of user's experience data related to their dissatisfaction with responses generated by ChatGPT.
Each element in the list contains data on each dissatisfactory ChatGPT response.
dissatisfaction_chatgpt_response string ChatGPT response that the user selected as dissatisfactory. Of course, the best books of the year are subjective,
but I think the best ...
dissatisfaction_chat_num number The chat turn number of dissatisfaction_chatgpt_response among ChatGPT responses.
Starts with 0.
2
dissatisfaction_category_score Object Nested structure containing dissatisfaction scores for seven dissatisfaction categories.
The dissatisfaction score is an integer value ranging from 0 to 10.
- 0: No dissatisfaction
- 1: A little dissatisfied
- 10: Extremely dissatisfied
D_intent number Dissatisfaction score in the aspect of "Intent understanding (D_intent)". 2
D_depth number Dissatisfaction score in the aspect of "Content depth and originality (D_depth)". 0
D_accuracy number Dissatisfaction score in the aspect of "Information accuracy (D_acc)". 7
D_transparency number Dissatisfaction score in the aspect of "Transparency (D_trans)". 0
D_refuse number Dissatisfaction score in the aspect of "Refusal to answer (D_refuse)". 0
D_ethic number Dissatisfaction score in the aspect of "Content ethichs and integrity (D_ethic)". 0
D_format number Dissatisfaction score in the aspect of "Response format and attitude (D_format)". 3
dissatisfaction_overall_score number Overall dissatisfaction score of dissatisfaction_chatgpt_response ranging from 1 to 10.
- 1: A little dissatisfied
- 10: Extremely dissatisfied
6
dissatisfaction_reason string The reason for the user's dissatisfaction I think the summarisation is not accurate
tactics Array List of user's tactics employed to address the dissatisfaction.
tactic_type string The type of tactic used among user_prompt, no_tactic, end_conversation. user_prompt
tactic_prompt string (Optional)
When tactic_type is user_prompt, user's prompt associated with the tactic.
When tactic_type is no_tactic, empty.
When tactic_type is end_conversation, user's last prompt before terminating the conversation.
Please give me more accurate summarisation
tactic_prompt_chat_num number (Optional)
When tactic_type is user_prompt, the chat turn number of tactic_prompt among user's prompt.
Starts with 0.
3
tactic_effectiveness_score number The effectiveness score of the tactic. 8
tactic_effectiveness_reason string The reason for the tactic's effectiveness. I think the answer is more accurate than before.
tactic_theme string Thematic category of the tactic among T_repeat, T_specify, T_error, T_adapt, or No_tactic. T_specify
tactic_code string Corresponding tactic codes among T_1 ~ T_13. T4
chat_shared_link string The shared link (URL) of the chat session.
It could not be accessed if the user deletes the conversation.
https://chat.openai.com/share/aaaaa-aaaaa-aaaaa-aaaaa
chat_log Arrary List of all chat log entries for the chat session,
role string user or assistant.
When role is user, the contents is user's prompt.
When role is assistant, the content is response generated by ChatGPT.
user
content string User's prompt or ChatGPT response. Give me a summarisation of this article. Article title:...
created_at string The timestamp indicates the exact time when the data was created after receiving the user's response. response. 09/04/2023, 15:07:05

Personal Information Protection

Our dataset includes actual conversation logs between users and ChatGPT. It may contain a risk of users' personal information leakage. To prevent this issue, we thoroughly reviewed all data and masked the data that have a risk of personal information leakage into the following MASKED TYPE.

MASKED TYPE

  • [MASKED: LINK] : Masked the ChatGPT shared link
  • [MASKED: WHOLE CHAT] : Masked the whole chat log
  • [MASKED: WHOLE RESPONSE] : Masked the one ChatGPT response entirely
  • [MASKED: WHOLE PROMPT] : Masked the one user prompt entirely
  • [MASKED: URL] : Masked the privacy related url
  • [MASKED: PROPER NOUN] : Masked the privacy related proper noun
  • [MASKED: USER CONTENT] : Masked user content such as user's phone number or email address.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published