The following is documentation on how to setup and use ScrapeOps with your Scrapy spiders.
-
Scrapy Job Stats & Visualisation
- 📈 Individual Job Progress Stats
- 📊 Compare Jobs versus Historical Jobs
- 💯 Job Stats Tracked
- ✅ Pages Scraped & Missed
- ✅ Items Parsed & Missed
- ✅ Item Field Coverage
- ✅ Runtimes
- ✅ Response Status Codes
- ✅ Success Rates & Average Latencies
- ✅ Errors & Warnings
- ✅ Bandwidth
-
Health Checks & Alerts
- 🕵️♂️ Custom Spider & Job Health Checks
- 📦 Out of the Box Alerts - Slack (More coming soon!)
- 📑 Daily Scraping Reports
-
ScrapyD Cluster Management
- 🔗 Integrate With ScrapyD Servers
- ⏰ Schedule Periodic Jobs
- 💯 All Scrapyd JSON API Supported
- 🔐 Secure Your ScrapyD with BasicAuth, HTTPS or Whitelisted IPs
-
Proxy Monitoring (Coming Soon)
- 📈 Monitor Your Proxy Account Usage
- 📉 Track Your Proxy Providers Performance
- 📊 Compare Proxy Performance Verus Other Providers
To use ScrapeOps you first need to create a free account and get your free API_KEY.
There are 2 way you can use ScrapeOps:
In this mode the ScrapeOps SDK will log all your scraping stats and generate statistics, graphs and trigger alerts on the ScrapeOps dashboard. Getting setup is very easy, you just need to add 3 lines to your Scrapy projects settings.py
file and the ScrapeOps SDK will take care of the rest.
Detailed Read: ScrapeOps SDK Installation Guide
In this mode, if you connect ScrapeOps with your ScrapyD server you will be able to schedule and manage your ScrapyD spiders via the ScrapeOps dashboard.
❗ Note: To use the stats, graphs and alerts functionality of ScrapeOps, you need to install the ScrapeOps SDK in your Scrapy spiders.