-
Notifications
You must be signed in to change notification settings - Fork 2
Getting Started in Jupyter Notebook
To get started in Jupyter Notebook, there are a few key steps that you need to follow:
- Download all the files from this Kaggle Link.
Note that there are several Zillow datasets on Kaggle. This link will take you to the correct one for purposes of completing this project.
Note that you do not have to unzip the files when you download them to your local machine.
This assumes the following: (A) You already have Anaconda or Jupyter Notebook installed on your machine (B) You have already downloaded / cloned this git repository to your local machine (C) You are in the repository working directory on the command line.
Since data/
is part of the .gitignore
the data a teammate will have downloaded onto their local machine is not going to be available to you. Therefore you must run the following commands from your terminal (in the project directory):
computer_name:rent-v-buy user$ mkdir data
computer_name:rent-v-buy user$ cd data/
computer_name:rent-v-buy/data user$ mkdir raw
computer_name:rent-v-buy/data user$ mkdir interim
computer_name:rent-v-buy/data user$ mkdir processed
- Navigate back to your project root directory (rent-v-buy). From the command line type:
computer_name:rent-v-buy user$ jupyter notebook
- Your default browser should open up with what appears to be a GUI for your repository directory.
- Navigate to the
data/raw/
folder - Upload each of the downloaded files from Kaggle
Once the upload is complete:
- Navigate to back to the
notebooks/
directory- Open the
00-Load-Data.ipynb
notebook- Run all the cells in
00-Load-Data.ipynb
which should take approximately 10-20 minutes to complete (depending on your machine specs & available RAM).- Alternatively, you can instead of navigating to
notebooks/
, navigate instead tosrc/
in your terminal and run the following command from the command line:computer_name:rent-v-buy/src user$ python make_dataset.py
This should also take approximately 10-20 minutes to complete.
- Thereafter, you should be able to load the files into a separate notebook that you create to do any data transformations / cleaning necessary.
In the new notebook, make all the same package imports as the ones made in
00-Load-Data.ipynb
Execute that cell Data can subsequently be loaded by passing commands such as:cities_crosswalk = pd.read_pickle('../data/interim/city_crosswalk.pickle')
- Store the cleaned/transformed data in the
data/interim/
file directory and DO NOT overwrite the files that were uploaded in thedata/raw/
directory.
Please reach out to Sylvia for help if necessary.