Model for churn prevention die Zeit Verlag

Team: Jonas Bechthold, Silas Mederer & Carlotta von Ulm Erbach

Business Case: Churn prevention is your proactive strategy for keeping customers around. It involves looking at the underlying reasons for churn and then formulating a plan to combat issues that may lead to churn before they happen.

Repository content:

EDA Notebook
ML Notebook
ANN Notebook
Model Tuning: KNN, Logistic Regression, Random Forest I, Random Forest II, SVC, XGBoost
Error analysis
Customer segmentation
Deployment
Presentation

Goal: Improve the model of ‘Die ZEITVerlag’ to do better predictions.

Metric:

Recall: to identify as many real subscription churns
Precision: to avoid disturbing loyal customers with unnecessary mails
AUC and probability distribution

Overview: Since a large number of features (177) exists, feature selection and engineering as part of an extensive EDA was essential.

For the supervised (with label) ML the data imbalance must be handled. The aim is to identify as many as possible “real” subscription churns as possible. Incorrectly identifying “non real” churns is not of the highest priority (Recall vs. Accuracy), since we would recommend churn mitigation methods for these subscribers. The main target is therefore to understand the behavioral patterns of customers and to optimize the churn prevention while reducing the overall cost.

We used different machine learning (supervised and unsupervised) approaches to predict churns of subscriptions of the german weekly newspaper “Die Zeit”. As it is a binary classification problem (“churn” or “no churn”), we used Logistic Regression as a baseline model, and KNN, Random Forest and XGBoost for further optimization. We also implemented a stacking and voting classifier, where we combined all classifiers to one. Unsupervised machine learning methods have been used to improve the prediction.

We built ANN and DNN and implemented the “deep and wide” approach. We were able to improve our baseline ANN but not able to score higher in recall on “churn”. So at the end of this project we decided to implement the stacking and voting classifiers, wrote a script and readme, to hand it over for die ZEITVerlag, so that their operators could use our trained models.

Methods and Technologies: Preprocessing (Cleaning, EDA), supervised ML classification (Logistic Regression, KNN, RF, XGBoost, Stacking, Voting), advanced methods consisting of ANN (DNN, deep and wide Network)

Recommendations and findings: We recommend to our stakeholders to contact the customers 10 month after the subscription for the first time and then every 12 month. This recommendation is based on the feature “lesedauer” and its distribution. Also we would recommend to use more and especially more user specific and time related features, the digital subscription becomes more and more important, it must be easy to scrape this kind of information for the publisher. If this will be done, it would be possible to use a CNN model on clickrates or customer behavior which are more precise and could be used to not only predict “churn” or “no churn” but also the point of churn in time. These are important information if you want to bound your customer.

Overall we were able to relatively improve the referenced model by 14% in recall and 25% in precision and would recommend to use a soft voting classifier (option 2 in the script) for predictions. This one provides a recall of 79%, precision of 59% and an accuracy of 77%. But as you can see we still miss about 23% of the dataset respectively predict them wrong.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
customer_segmentation		customer_segmentation
data		data
deployment		deployment
images		images
ml-modelle		ml-modelle
plots		plots
.gitignore		.gitignore
ANN-Capstone_Zeit.ipynb		ANN-Capstone_Zeit.ipynb
Baseline_Model.ipynb		Baseline_Model.ipynb
Capstone_Zeit_EDA.ipynb		Capstone_Zeit_EDA.ipynb
Capstone_Zeit_EDA.pdf		Capstone_Zeit_EDA.pdf
Capstone_Zeit_Presentation.pdf		Capstone_Zeit_Presentation.pdf
LICENSE		LICENSE
ML-Capstone_Zeit.ipynb		ML-Capstone_Zeit.ipynb
README.md		README.md
eda_methods.py		eda_methods.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model for churn prevention die Zeit Verlag

About

Releases

Packages

Contributors 3

Languages

License

jb-ds2020/nf-ds3-capstone-churn-prevention

Folders and files

Latest commit

History

Repository files navigation

Model for churn prevention die Zeit Verlag

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages