Improving the performance of a personality trait classifier trained on ambiguous labels

Abstract

In this project I use Natural Language Processing techniques and several machine learning models, to compare their performance in classifing interviewees from the EmotiW 2017 dataset into 6 personality types from interview transcript data. In the ground truth (i.e. labels), most subjects have ambiguous personality type labels because of labeler disagreement, as measured in Fleiss’s kappa coefficient. In order to increase the contrast among personality traits, several solutions are compared that dichotmize the labels to maximize the posterior classification performance, first by finding ambiguity thresholds and truncating the data, and later by comparing two different weighting functions. Finally, I explore NLP choices, dimensionality redction with PCA, Random Forest and Multinomial Naive Bayes machine learning models and model tuning to continue improving the classifier's performance.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
missclassified		missclassified
testing_histogram		testing_histogram
training_histogram		training_histogram
writeup		writeup
.DS_Store		.DS_Store
180531_1_interview_clssfr_Plipeline.py		180531_1_interview_clssfr_Plipeline.py
180531_2_interview_clssfr_Plipeline.py		180531_2_interview_clssfr_Plipeline.py
180609_2_interview_clssfr_simple.py		180609_2_interview_clssfr_simple.py
180613_1_interview_clssfr_Plipeline.py		180613_1_interview_clssfr_Plipeline.py
180621_1_all_labels_clssfr.py		180621_1_all_labels_clssfr.py
180624_1_all_labels_clssfr.py		180624_1_all_labels_clssfr.py
180624_1_weights_estimator.py		180624_1_weights_estimator.py
180628_1NormalWeighting_example.pdf		180628_1NormalWeighting_example.pdf
180628_ParabolicWeighting_example.pdf		180628_ParabolicWeighting_example.pdf
README.md		README.md
candidate_classifier_NLP.ipynb		candidate_classifier_NLP.ipynb
candidate_clssfr_NLP.py		candidate_clssfr_NLP.py
emotiw_clssfr.py		emotiw_clssfr.py
traits_clssfr.py		traits_clssfr.py
traits_clssfr_PlotPlipeline.py		traits_clssfr_PlotPlipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving the performance of a personality trait classifier trained on ambiguous labels

Abstract

About

Releases

Packages

Languages

guillembp/nlp_personality_classifier

Folders and files

Latest commit

History

Repository files navigation

Improving the performance of a personality trait classifier trained on ambiguous labels

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages