Skip to content

Created and evaluated a linear regression algorithm to attempt to predict the average cost of the housing market (ZHVI from Zillow) in given metropolitan statistical areas. Compiled model using features including the federal funds rate and MSA demographics and market features such as unoccupied housing and net migration.

Notifications You must be signed in to change notification settings

dmtiblin/UR-Project4-Predicting-Housing-Market

 
 

Repository files navigation

Housing Market Prediction Model

The effects of the COVID 19 pandemic have had a major effect on the housing market and home values/listing prices. Using zillow to determine average home values per month for multiple neighborhood, and combining that with Federal Reserve Interest Rates and data from the US Census, we train a machine learning model to predict future home values.

The final presentation can be viewed in the included powerpoint deck.

Team Members: Bonnie Bailey, Diane Tiblin, Joseph March

Data Sources:

Resources:

  • AWS S3 Buckets
  • Census and json
  • Google Colab
  • Scikit-learn
  • PySpark
  • Pandas

Coding:

  • Pulled API data from US Census.
  • Cleaned the data by modifying headers, formatting data and converting to rates.
  • Uploaded Zillow and Federal Funding data to AWS in S3 buckets.
  • Pulled Zillow and Federal Funding data using PySpark.
  • Cleaned MSA columns so Census data would match Zillow in order to join data frames together.
  • Converted Federal Funding data into lists to join to the merged data frame with date columns.
  • Removed duplicate and non-zero values from the merged data frame. Performed one-hot encoding on the categorical data.
  • Scaled the data.
  • Created and tested linear regression model.

Model Performance

We tested several versions of the model before coming to our final version.

modeldata

Challenges

  • Difficulty finding free data on property sales.
  • Slight narrowing of our data source with the elimination of fields with null data or duplications.
  • Data was limited to metropolitan areas and the suburbs surrounding.

Results/Findings

A model can be created to predict the market value to within 99.99% accuracy within a metropolitan area.

Note: The housing market is naturally unpredictable as a result of human behavior.

How Could We Expand The Project?

  • Currently the model predicts one month ahead. With additional work the model could be expanded to predict further into the future.
  • Currently the model predicts averages within a metropolitan area. With funding, data could be pulled for individual properties.

Contacts

About

Created and evaluated a linear regression algorithm to attempt to predict the average cost of the housing market (ZHVI from Zillow) in given metropolitan statistical areas. Compiled model using features including the federal funds rate and MSA demographics and market features such as unoccupied housing and net migration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%