Skip to content

In our team capstone project (Carlotta U., Silas M. and Jonas B.) of our bootcamp at neue fische churn prevention was applied to a dataset of the german newspaper “Die Zeit”. By predicting who is going to quit a subscription and doing churn prevention activities, a loss of subscribers can be avoided and revenues can be increased.

License

Notifications You must be signed in to change notification settings

jb-ds2020/nf-ds3-capstone-churn-prevention

Repository files navigation

Model for churn prevention die Zeit Verlag

Team: Jonas Bechthold, Silas Mederer & Carlotta von Ulm Erbach

Business Case: Churn prevention is your proactive strategy for keeping customers around. It involves looking at the underlying reasons for churn and then formulating a plan to combat issues that may lead to churn before they happen.

Repository content:

Goal: Improve the model of ‘Die ZEITVerlag’ to do better predictions.

Metric:

  • Recall: to identify as many real subscription churns
  • Precision: to avoid disturbing loyal customers with unnecessary mails
  • AUC and probability distribution

Overview: Since a large number of features (177) exists, feature selection and engineering as part of an extensive EDA was essential.

For the supervised (with label) ML the data imbalance must be handled. The aim is to identify as many as possible “real” subscription churns as possible. Incorrectly identifying “non real” churns is not of the highest priority (Recall vs. Accuracy), since we would recommend churn mitigation methods for these subscribers. The main target is therefore to understand the behavioral patterns of customers and to optimize the churn prevention while reducing the overall cost.

We used different machine learning (supervised and unsupervised) approaches to predict churns of subscriptions of the german weekly newspaper “Die Zeit”. As it is a binary classification problem (“churn” or “no churn”), we used Logistic Regression as a baseline model, and KNN, Random Forest and XGBoost for further optimization. We also implemented a stacking and voting classifier, where we combined all classifiers to one. Unsupervised machine learning methods have been used to improve the prediction.

We built ANN and DNN and implemented the “deep and wide” approach. We were able to improve our baseline ANN but not able to score higher in recall on “churn”. So at the end of this project we decided to implement the stacking and voting classifiers, wrote a script and readme, to hand it over for die ZEITVerlag, so that their operators could use our trained models.

Picture Deployment

Methods and Technologies: Preprocessing (Cleaning, EDA), supervised ML classification (Logistic Regression, KNN, RF, XGBoost, Stacking, Voting), advanced methods consisting of ANN (DNN, deep and wide Network)

Recommendations and findings: We recommend to our stakeholders to contact the customers 10 month after the subscription for the first time and then every 12 month. This recommendation is based on the feature “lesedauer” and its distribution. Also we would recommend to use more and especially more user specific and time related features, the digital subscription becomes more and more important, it must be easy to scrape this kind of information for the publisher. If this will be done, it would be possible to use a CNN model on clickrates or customer behavior which are more precise and could be used to not only predict “churn” or “no churn” but also the point of churn in time. These are important information if you want to bound your customer.

Overall we were able to relatively improve the referenced model by 14% in recall and 25% in precision and would recommend to use a soft voting classifier (option 2 in the script) for predictions. This one provides a recall of 79%, precision of 59% and an accuracy of 77%. But as you can see we still miss about 23% of the dataset respectively predict them wrong.

Picture Scores

About

In our team capstone project (Carlotta U., Silas M. and Jonas B.) of our bootcamp at neue fische churn prevention was applied to a dataset of the german newspaper “Die Zeit”. By predicting who is going to quit a subscription and doing churn prevention activities, a loss of subscribers can be avoided and revenues can be increased.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •