First project during my Data Science Bootcamp. I worked with the King County House Sales dataset. Here, the focus is on EDA, Linear Regression and the Data Science Lifecycle.
On the one hand, the goal was to extract recommendations for action for home sellers, such as real estate agents, from the EDA. On the other hand, the sales price should be predicted with the help of a linear regression model.
Recommendations:
- Sell in summer, buy in winter
- Concentrate on houses from 1980-2020 and 1900-1940
- Renovate houses and sell more expensive
- Do not concentrate on as many floors and bedrooms as possible
- Pay more attention to the location, living space and grades
Model results:
- MAPE: 14 - 18%
- R-squared: 0.75 - 0.77
- Jupyter Notebook
- Presentation
- Overview of data (Source: https://de.slideshare.net/PawanShivhare1/predicting-king-county-house-prices, p. 2)
- Dataset
- Matplotlib
- Sklearn
- Scipy
- Pandas
- Numpy
- Seaborn
- Imports
- Data Overview
- Data Cleaning
- Exploratory Data Analysis
- Data Preparation
- Linear Regression (with & without outliers)