A hybrid model consisting of ensemble classifier, k-prototype clustering and association rule mining models for customer churn analysis using majority voting technique for both feature selection and churn prediction on telecommunication dataset (IBM Watson Dataset).
Contributers: Arghadip Chakraborty, Sohel Raja Molla, Disha Sinha, Shankhadeep Giri.
Finding the best features of the dataset by comparing the rules obtained from Decision Tree Classifier, Clustering Models(Kmodes and Kprototypes) and Assocication Rule Mining(Apriori algorithm) and then implementing Voting Classifiers(Ensemble Learning) with other classifier models, taking the best features with other classifier models to see if there is a boost in accuracy of prediction or not.
We used this telecom service customer churn dataset for this particular project- WA_Fn-UseC_-Telco-Customer-Churn.csv
We cleaned the dataset and took dummy datas in the form of categorical datas for our classification purpose. Here is the new dataset- new_telco.csv
First of all, we did an initial classification by Implementing Decision Tree classifier using all the features of our dataset. We got an accuracy of 79.83% at depth=5 for Decision Tree Entropy technique. See the notebook
We implemented Kmodes and Kprototype clustering to get the clusters and centroids for each feature of our dataset. It will be used to find the best features of our dataset. See the notebook
We implemented apriori algorithm(association rule miining) to get the rules of features, depending on which we will find the best features of our dataset. See the notebooks
Comparing the results of Decisiton Tree classifier, Clustering and Association Rule mining, we get the following best features- 'tenure','InternetService','PhoneService'.
We implemented other classifiers like K-nearest Neighbors, Logistic Regression, Support Vector Machine, Random Forest and Naive Bayes Classifiers taking all the feauters and then taking the best features of the dataset. Then we compared the accuracy of the different models. See the notebooks
With all features:
With best features:
We implemeted Voting Classifier that combines several classifier models in order to produce one optimal predictive model and improves the model performance. See the notebook