Getting Started in Jupyter Notebook

To get started in Jupyter Notebook, there are a few key steps that you need to follow:

I. Download the Data

Download all the files from this Kaggle Link.

Note that there are several Zillow datasets on Kaggle. This link will take you to the correct one for purposes of completing this project.

Note that you do not have to unzip the files when you download them to your local machine.

II. Upload the Data

This assumes the following: (A) You already have Anaconda or Jupyter Notebook installed on your machine (B) You have already downloaded / cloned this git repository to your local machine (C) You are in the repository working directory on the command line.

Since data/ is part of the .gitignore the data a teammate will have downloaded onto their local machine is not going to be available to you. Therefore you must run the following commands from your terminal (in the project directory):

computer_name:rent-v-buy user$ mkdir data
computer_name:rent-v-buy user$ cd data/
computer_name:rent-v-buy/data user$ mkdir raw
computer_name:rent-v-buy/data user$ mkdir interim
computer_name:rent-v-buy/data user$ mkdir processed

Navigate back to your project root directory (rent-v-buy). From the command line type: computer_name:rent-v-buy user$ jupyter notebook
Your default browser should open up with what appears to be a GUI for your repository directory.
Navigate to the data/raw/ folder
Upload each of the downloaded files from Kaggle

III. Load & Serialize the Data

Once the upload is complete:

Navigate to back to the notebooks/ directory

Open the 00-Load-Data.ipynb notebook

Run all the cells in 00-Load-Data.ipynb which should take approximately 10-20 minutes to complete (depending on your machine specs & available RAM).

Alternatively, you can instead of navigating to notebooks/, navigate instead to src/ in your terminal and run the following command from the command line: computer_name:rent-v-buy/src user$ python make_dataset.py This should also take approximately 10-20 minutes to complete.

Thereafter, you should be able to load the files into a separate notebook that you create to do any data transformations / cleaning necessary.

In the new notebook, make all the same package imports as the ones made in 00-Load-Data.ipynb Execute that cell Data can subsequently be loaded by passing commands such as: cities_crosswalk = pd.read_pickle('../data/interim/city_crosswalk.pickle')

Store the cleaned/transformed data in the data/interim/ file directory and DO NOT overwrite the files that were uploaded in the data/raw/ directory.

Please reach out to Sylvia for help if necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started in Jupyter Notebook

I. Download the Data

II. Upload the Data

III. Load & Serialize the Data

Clone this wiki locally