In order to be able to execute your own python statements it should be noted that scripts are only tested on anaconda distribution 4.5.11 in combination with python 3.6.6. The scripts require additional python libraries.
Run the following commands in anaconda prompt to be able to run the scripts that are provided in this git repository.
conda install scikit-learn
conda install pandas
conda install numpy
conda install seaborn
conda install matplotlib
Two quick start options are available:
- Download the latest release.
- Clone the repo:
git clone https://github.com/FrankTub/Arvato.git
For the first term of the nanodegree become a data scientist of Udacity I got involved in this project. In this project I used clustering algorithms to analyse the German population and analyse over and underrepresented population groups at webshop Arvato.
Within the download you'll find the following directories and files. Note that the data cannot be published in this repository.
Arvato/
├── Data_Dictionary.md
├── Identify_Customer_Segments.html
├── Identify_Customer_Segments.ipynb
└── README.md
The clustering shows that the mail-order company is popular among people who have above average interest in finance, have a lot of money saved and belong to the top-earners in Germany.
Frank Tubbing
Thanks to Udacity for setting up the projects where we can learn cool stuff!
Thanks to Arvato for providing cool data with which we can create a cutting edge project!