In digital communities and forums on the internet, users often choose to remain anonymous as real names are not required when conversing with strangers online. With this anonymity comes the freedom to express one's thoughts without fear of being judged or recognized, yet this might also mean that users are free to say abusive sentiments with little to no repercussions. While most online forums and social media sites have various ways to moderate (e.g. moderators and staff that manually review posts and comments, a report button under messages, voting for comments and posts), these methods are not enough to combat the significant number of toxic comments made.
With this, ways to automate checking for toxicity in online text should be improved to foster a safe and respectful online environment.
The Toxic Comment Classification Challenge
is a Kaggle challenge by the Conversation AI team
, which is composed of researchers from both Jigsaw
and Google
. This challenge invites participants to build a multi-headed model that can accurately detect the types of toxicity (i.e.,toxic, severe toxic, obscene, threat, insult, and identity hate) better than Perspective’s current models
. Thus, the dataset given contains a large number of Wikipedia comments which have been labeled by human raters for toxic behavior.
This project's best model received a private ROC AUC score of 0.97559, and public ROC AUC score of 0.97622.
This Github Repository contains three folders, and two main files.
Folders | Description |
---|---|
cleaned_data |
Folder that holds the cleaned version of the train and test data |
data |
Folder that holds the original data from the Kaggle challenge |
results |
Folder that holds the prediction of the different algorithms tried |
Files | Description |
---|---|
ToxicComment_S13_Group8.ipynb |
Main notebook that also holds the Data Cleaning and Pre-processing, and EDA |
ToxicComment_S13_Group8_Supplementary.ipynb |
Other solutions tried to solve the Challenge |
- Extract the folder from the zipped file that you can download through this DownGit link.
- Launch
Jupyter notebook
orJupyterLab
. - Navigate to the project folder containing
ToxicComment_S13_Group8.ipynb
. - Open
ToxicComment_S13_Group8.ipynb
.
- Francheska Josefa Vicente
[email protected] - Sophia Danielle S. Vista
[email protected]