Skip to content

Latest commit

 

History

History
69 lines (45 loc) · 2.25 KB

README.md

File metadata and controls

69 lines (45 loc) · 2.25 KB

Scrapy for Craigslists Apartments

A project by: Tsung Hung @masterfung

Scrapy is the open-source web scrapper (in Python) and it is used to power this library. Please visit this link for more information if you are not familiar with it.

Purpose:

The purpose of Scrapy Craigslist (CL) is to obtain useful CL apartment data for application- specific data analysis and representation.

A model example that utilizes the data from this project is located here. This project is a Django-powered project.

The code inside utilizes San Francisco as a model city for CL searches but you can easily change the link to one of your interests.

Requirements:

You will need:

  • Python 2.x or 3.x
  • pip
  • Scrapy (pip install scrapy)

Project:

Clone the project with git clone [email protected]:masterfung/scrapy-craigslist.git

Craigslist City Codes:

If you need to change the city on the project to harvest a different city, please reference this link for more information. Once you change that component, you would be able to obtain the data from the city of interest.

Running the Project

Here are the steps to take to modify the the returning city data of interests:

  • Look into the spiders folder and click on the craigslist_scrapy.py
  • Click on the city codes and replace link with the city of your interest. There are three areas this appears.
  • Run the code and use this command: scrapy crawl craigslist -o FILENAME.json

NOTE: This may take awhile, depends on how many listings are on Craigslist and how many parameters you selected.

Outputted JSON files should be saved on the applicational level of this project.

If there are any issues or request, please let me know. I am happy to help. All the feedback are welcomed. Let us learn and build things together!

Happy Scraping!

User Agent

Visit [this link] (http://api.useragent.io/) to obtain newly generated user agent to run your this Scrapy. You can change the User Agent in the settings.py file.

The link was a project by the wonderful [Randall Degges] (https://github.com/rdegges). He is awesome!!

Others

Beware of IP ban from Craigslist.