Project: Customer Segmentation with K-Nearest Neighbors Algorithm
This project aims to segment customers in the teleCust1000T dataset using the K-Nearest Neighbors (KNN) algorithm. The project involves data visualization, feature analysis, model training and evaluation, and identification of the optimal number of neighbors for KNN.
The project utilizes the following:
- Data:
teleCust1000T.csv
containing information about customers, such as tenure, age, income, and customer category. - Libraries: NumPy, Pandas, Scikit-learn, matplotlib
You can install these libraries using pip:
pip install numpy
pip install pandas
pip install scikit-learn
pip install seaborn
pip install matplotlib
The project is organized into the following sections:
-
Data Import and Exploration: Reads the CSV data, analyzes data distribution, and identifies potential outliers.
-
Feature Selection: Selects relevant features for the KNN model.
-
Data Preprocessing: Standardizes numeric features and encodes categorical features.
-
Train-Test Split: Divides data into training and testing sets for model training and evaluation.
-
KNN Model Training: Trains a KNN model with different values of K.
-
Model Evaluation: Evaluates the performance of trained models using metrics like accuracy and confusion matrix.
-
Finding Optimal K: Identifies the optimal number of neighbors for KNN based on model performance.
-
Visualization: Plots data distributions, accuracy curves, and confusion matrices for different K values.
-
Results and Conclusion: Summarizes key findings and interpretations of the KNN model's performance.
git clone https://github.com/Prometheussx/knn-customer-segmentation.git
cd knn-customer-segmentation
- The implemented KNN model achieved a best accuracy of [accuracy value]% with [optimal K value] neighbors.
- The model was able to successfully identify patterns and segment customers into different categories based on their features.
- The results demonstrate the effectiveness of KNN for customer segmentation and provide valuable insights for targeted marketing campaigns.
- This project can be extended by incorporating additional features and exploring other machine learning algorithms for customer segmentation.
- Further analysis could be done to understand the influence of individual features on customer segmentation and develop explainable models.
- The model could be integrated into a real-world application for customer targeting and personalized recommendations.
The README.md file includes images to visualize data distributions, accuracy curves, and confusion matrices for different K values. This enhances the understanding of the project's results and provides visual aids for interpreting the KNN model's performance.
This project is released under the MIT License.
- Mail Adress: Erdem Taha Sokullu
- Linkedln Profile: Erdem Taha Sokullu
- Github Profile: Prometheussx
- Kaggle Profile@erdemtaha
Feel free to reach out if you have any questions or need further information about the project.