GitHub - NanoNets/bert-text-moderation: BERT + CNN for toxic comments multi label classification.

Toxic Comments Classification using BERT and CNNs

The companion blog post on text moderation can be found here.

The dataset used is from the kaggle Toxic Comments Challenge and can be downloaded from here.

Overview

The labels in the dataset are labels = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']

The CNN architecture used is an implementation of this as found here. We use the Hugging Face Transformers library to get word embeddings for each of our comments. We transfer these weights and train our CNN model based on our classification targets.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BERT-KimCNN-toxic-comments.ipynb		BERT-KimCNN-toxic-comments.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comments Classification using BERT and CNNs

Overview

About

Releases

Packages

Languages

NanoNets/bert-text-moderation

Folders and files

Latest commit

History

Repository files navigation

Toxic Comments Classification using BERT and CNNs

Overview

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages