This project is designed to scrape quotes and author information from a website, store the data in JSON files, and then load this data into a MongoDB database. The project demonstrates the ability to extract and manage data efficiently, showcasing proficiency in web scraping, data parsing, and database integration.
- Clone the repository.
- Install the required Python packages.
- Set up MongoDB and ensure it's running.
Run the Scrapy spider to scrape the quotes and author information:
python main_crawler.py
This will save the scraped quotes to data/quotes.json
and author information to data/authors.json
.
Run the data loader script to load the scraped data into MongoDB:
python load_data.py
This script initializes and runs the Scrapy spider.
The Scrapy spider for scraping quotes and author information.
This script reads the scraped data from JSON files and loads it into MongoDB.
This project showcases the ability to build a complete data pipeline from web scraping to database storage, demonstrating proficiency in web scraping, data parsing, and database management.