Toxic Comment Classification

Background

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments. Improvements to the current model will hopefully help online discussion become more productive and respectful.

Dataset

This dataset contains a large number of Wikipedia comments which have been labeled by human rates for toxic behavior.

In this competition, you’re challenged to build a multi-headed model that’s capable of detecting different types of of toxicity such as:

toxic
severe_toxic
obscene
threat
insult
identity_hate

Disclaimer: the dataset for this competition contains text that may be considered profane, vulgar, or offensive.

Guidance

This is not an exhaustive list of tasks, the points are provided in order to guide you:

Preprocessing

Try to various methods to preprocess the comments into tokens.

Model

Test the performance of different model architectures. Tune your model to improve its performance.

Evaluation

Report your results using appropriate metrics. See if your model performs equally among classes. Suggest possible imporvements.

Reference

Toxic Comment Classification Challenge by Jigsaw/Conversation AI https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
reader.py		reader.py
test_cleaned.csv		test_cleaned.csv
toxic_notebook.ipynb		toxic_notebook.ipynb
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification

Background

Dataset

Guidance

Preprocessing

Model

Evaluation

Reference

About

Releases

Packages

Languages

dcstang/toxic-comment-classification

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification

Background

Dataset

Guidance

Preprocessing

Model

Evaluation

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages