Tensorflow constrained optimization with different Deep Learning Models
The code repository contains intgeration with tensorflow-constrained-optimization for introducing fairness in ML models. Models used here are
- Simple LSTM
- CNN
- Bi-Directional LSTM
All these models are available at notebook Wiki_toxicity_fairness-lstm-cnn-bi-lstm.ipynb
It also contains
- Stacked LSTM and CNN
- Stacked BI-LSTM and CNN
These models are available at Wiki_toxicity_fairness-stacked-lstm-cnn.ipynb
It further contains integration with
- BERT in toxicity_classification_fainess_bert.ipynb
Note : Since creating bert embeddings and tokenizing them takes time, we have limited training to 5000 data points. Of course this can ve extended to full dataset.
How to Download the data?
- Ensure you have a folder called fair_data in inside the main repo
- Create a sub-folder jigsaw-unintended-bias-in-toxicity-classification
- Download data from https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data . In the scope of this POC, we limit data training only from train.csv ie. 50% for training and remaining 50% for training and validation, yielding to 902437 training points and 451219 points for each testing and validation.
For Bert models:
Ensure you are downloading the model and the following lines are uncommneted.
!wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip !unzip uncased_L-12_H-768_A-12.zip os.makedirs("model", exist_ok=True) !mv uncased_L-12_H-768_A-12/ model
Ensure you are training the bert model using:
model = multi_cls_create_model(max_seq_len=128, bert_ckpt_file).
If model is set to something else, reset to bert model