This repository supports our paper submitted at MSR2024!
The full CSV files are available ad TO ADHERE TO A DOUBLE-BLIND POLICY THE LINK IS ANONYMIZED.
The data will be disclosed after complete acceptance, in accordance with the double-blind guidelines.
In the repository RQ1, you can find the file used to produce the Active users dataset and the script to analyze the privacy profiles. The latter is run on a portion of the Users dataset.
In the repository RQ2, you can find the script for preprocessing the corpus. In the script active_user_pr_comments, we extract the pull_request comments of active users only. In the folder PREPRO_DICTIONARY, we parsed the entire corpus with the Privacy Dictionary.
In the repository RQ3, readers can find the corpus with the comments considered sensitive by the raters.