Skip to content

Data Engineering HDB Resale Price Prediction Project

Notifications You must be signed in to change notification settings

avintech/resale-price

Repository files navigation

HDB Resale Price Predictor

Project Overview

The HDB Resale Price System is a python-based application designed to predict real-time resale price for a specified location. By utilising advanced machine learning models such as the Linear Regression and Random Forest, this system provides accurate forecasts for future public home prices. With comprehensive data integration that incorporates essential macroeconomic factors like the Consumer Price Index(CPI), the application offers users a holistic understanding of property price fluctuations. Hyperparameter tuning further enhances the performance and generalisation of the machine learning models, ensuring reliable predictions validated through extensive testing. This empowers individuals, real estate professionals, and stakeholders to make well-informed decisions in the dynamic realm of property prices, supported by transparent and dependable performance evaluation metrics. The application is hosted on Streamlit, offering an interactive web interface.

Technologies Used

Data Manipulation and Analysis

  • pandas: For data manipulation and analysis.
  • numpy: For numerical computations.

Data Visualization

  • matplotlib.pyplot: For creating static, animated, and interactive visualizations.
  • seaborn: For data visualization based on matplotlib.
  • pywaffle: For creating waffle charts.
  • joypy: For visualizing distributions of variables using Joy plots.

Statistical Analysis

  • statsmodels: For estimating and interpreting models for statistical analysis.
  • scipy.stats: For statistical functions including spearmanr and pearsonr.

Machine Learning

  • scikit-learn: For implementing machine learning algorithms such as Linear Regression and Random Forest Regressor.
  • GridSearchCV: For hyperparameter tuning of machine learning models.

Model Evaluation and Validation

  • sklearn.metrics: For model evaluation metrics such as R² score and mean absolute error.
  • yellowbrick.regressor: For visualization of model diagnostics.
  • CooksDistance, ResidualsPlot: For identifying influential observations and plotting residuals of models.

Preprocessing

  • StandardScaler: For feature scaling.
  • train_test_split: For splitting the data into training and test sets.

Model Persistence

  • joblib: For saving and loading machine learning models.

Performance Measurement

In the development of our HDB Resale Price Predictor, various evaluation metrics were employed to assess the performance of the house pricing prediction models:

  • R² Score: Used to measure the proportion of variance in the target variable explained by the predictors. This allowed comparison of the predictive power of different models:
    • Linear Regression (with outliers): R² Score = 0.90
    • Linear Regression (without outliers): R² Score = 0.87
    • Random Forest (Out-of-bag): R² Score = 0.966
    • Random Forest (K-fold Cross Validation): R² Score = 0.967
  • Mean Absolute Error (MAE): Calculated for the Random Forest models to quantify the average magnitude of errors, providing a straightforward interpretation of the average prediction error.
  • Correlation Coefficients (Spearman and Pearson): Employed to assess the relationship between predicted and actual resale prices, ensuring a thorough evaluation of model effectiveness.

Hyperparameter tuning was conducted, especially for the Random Forest model, to identify the optimal parameters, such as the number of trees in the forest and the maximum depth of each tree. This tuning aimed to maximize the model's predictive performance while avoiding overfitting or underfitting.

The final model chosen was the Random Forest with K-fold Cross-Validation, due to its superior predictive performance and robust evaluation methodology. This model's high R² score and strong correlation with true prices indicate its reliability and strong explanatory power for predicting HDB resale prices.

Installation and Setup

To set up this project locally:

  1. Clone the repository to your local machine.
  2. Navigate to the project directory.
  3. Install the required dependencies:
    pip install -r requirements.txt
  4. Run the Streamlit application:
    streamlit run streamlit_app.py

Acknowledgments

- Dataset source: HDB Resale Dataset
- Streamlit: Streamlit website

About

Data Engineering HDB Resale Price Prediction Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published