Skip to content

Latest commit

 

History

History
25 lines (15 loc) · 1.89 KB

File metadata and controls

25 lines (15 loc) · 1.89 KB

Scrape data from the New York Times's real estate section

This program scrapes the listing price, number of beds, number of baths, latitude, longitude, neighborhood name, and the address of current properties listings - based upon your search criteria - and outputs data in csv format. Head to the New York Times real estate section and enter in a neighborhood, city or state:



Image1 of nytimes website



Note that after entering a location and clicking 'See Available Homes', there are some additional filtering tools, such as 'Open House', 'Reduced', and 'ADAVANCED FILTERS' (See next image).

Run:

To run this program, run the following command at the command line:

python nytimes_v2.py

After the script begins, it will prompt you for a url. Copy and paste a url from the nytimes web page into the command prompt, and the scraper will pull listings off of the web, create an output file, and output data into your local directory. For the program to work, the last four characters of the url must end with '-asc' or 'desc'. Please look at the format of the url in the following picture.

Image2 of nytimes website

If your url does not currently end with '-asc' or '-desc', use the 'Sort By' function, as shown above. This will generate a new url with the correct tail. If 'p', '&p', or '&p=' (plus a number) follows the '-asc' or 'desc' tail, the script will also hande these tails.

System Requirements

The program uses the 'requests' and 'beautifulsoup' packages, which are both available in the Anaconda distribution.
Python 2.7.11 [Anaconda 2.3.0] or later will do