Skip to content
Silas Mederer edited this page Oct 29, 2020 · 17 revisions

Editing Model for churn prevention die Zeit Verlag

Team: Jonas Bechthold, Silas Mederer & Carlotta Ulm

Business Case

Churn prevention is your proactive strategy for keeping customers around. It involves looking at the underlying reasons for churn and then formulating a plan to combat issues that may lead to churn before they happen. The number of churns should be minimized by churn prevention to maximize profit.

Goal

  • Top the model that is in use (around 0.70 AUC).
  • Give recommendations when to contact customers and to the cycle when the model should be grounded.

Target Metric

  • Recall: avoiding large number of False Negative (actual churn and not detected)
  • F1: if we don’t want to disturb loyal customers with churn prevention actions
  • ROC/AUC to compare different models

Basic idea

We want to use different ML (supervised and unsupervised) approaches to predict churns of subscriptions of the german newspaper “Die Zeit”. Supervised ML methods (classification methods) can be used to predict subscription churns based on the given dataset. Since a large number of features exists, feature selection as part of an extensive EDA is essential. Unsupervised ML methods (Clustering methods) could be used first to do clustering on the dataset to identify if there are certain “groups of subscribers”, who share a range of certain features. This clustering could be used to investigate subscriber group specific churn mitigation methods (e.g. not only writing emails, as given in the dataset description).

For the supervised ML methods, a possible data imbalance must be handled. The aim is to identify as many as possible “real” subscription churns. Incorrectly identifying “non real” churns is not of the highest priority. This focus determines which target metric we are using (Recall vs. Accuracy) for our model. The main target is therefore to understand the behavioral patterns of customers, who are willing to churn and to optimize the churn prevention while reducing the overall cost .

Bonus

Customer lifecycle target groups (cluster), GUI, SQL Database

Methods and Technologies

Preprocessing (Cleaning, EDA), unsupervised learning for clustering, supervised ML classification, advanced methods consisting of ANN (Artificial Neural Networks), CNN

Project Goal and Business Plan

The main goal, is the prediction of churn of subscribers. We want to predict who will churn and when. For the time estimation we want to find out, what a typical timespan is, from subscription start to the date of receipt of the churn. This timespan is an important threshold for when to contact the subscriber and to dissuade the subscriber from churning.

We want to estimate the following: With a probability of xxx percent the subscriber will churn. Then we can combine this probability with the typical timespan we found out to undertake churn prevention actions. We want to give a recommendation when to contact the subscribers.

When considering time we also want to investigate user interaction of churns. For this purpose we can consider different time spans:

  • one week (1w),
  • one months (1m),
  • three months (3m),
  • six months (6m).

For these time spans we have data from different user activities related to the date on which the subscription was cancelled.

One question we would like to answer is how often the churn prediction model should be run.

For the developed machine learning the target metric will be recall to identify as few False Negatives as possible. If we don't want to disturb loyal subscribers, the emphasis should be set on the F1 score or accuracy.

Since there is an existing model by Die Zeit which uses logistic regression, we want to achieve a higher score than this baseline model.

When prioritizing which subscriber should be contacted, we can focus on customer values in dependence of how long the customer is a subscriber and how much revenue was already generated.