This repository contains the code and documentation for the Boston Utilities project. The project aims to analyze and model utility data for the city of Boston using various data science and machine learning techniques.
The dataset used in this project can be found on the City of Boston Utility Data website. Download the dataset and place it in the project directory before running the notebooks.
-
01-Data-Cleaning.ipynb: Initial commit for data cleaning. This notebook includes the code for cleaning and preprocessing the raw utility data.
-
02-EDA.ipynb: Initial commit for exploratory data analysis (EDA). This notebook explores the dataset, visualizes key insights, and identifies patterns.
-
03-linear-regression-model.ipynb: Initial commit for linear regression modeling. This notebook contains code for building and evaluating a linear regression model based on the preprocessed data.
-
04-gradientboost-model.ipynb: Initial commit for gradient boost modeling. This notebook includes the implementation of a gradient boosting model for predicting utility-related metrics.
-
05-spark-session.ipynb: Initial commit for Spark session. This notebook demonstrates the usage of Apache Spark for scalable data processing and analysis.
To get started with the Boston Utilities project, follow these steps:
-
Clone the repository to your local machine:
git clone https://github.com/your-username/Energy_Consumption_Analysis.git
-
Download the dataset from City of Boston Utility Data and place it in the project directory.
-
Install the required dependencies. You can use a virtual environment or the package manager of your choice.
-
Make sure to replace placeholders like
your-username
with your actual GitHub username and update any specific details based on your project.