Skip to content

Latest commit

 

History

History
31 lines (20 loc) · 1.18 KB

CHANGELOG.md

File metadata and controls

31 lines (20 loc) · 1.18 KB

Change Log

HEAD

v1.5.0

  • Strip URLs found in Sitemaps
  • Inline robots dependency, closes #51
  • Update Sitemap XML parsing to work better with newer versions of REXML
  • Fix issue calling Spidr with option hash (i.e use double spat operator)

v1.4.0

  • Don't respect robots.txt file by default, PR#41
  • Add WaybackArchiver::respect_robots_txt= configuration option, to control whether to respect robots.txt file or not
  • Update spidr gem, resolves issue#25
  • Set default concurrency to 1 due to harsher rate limiting on Wayback Machine
  • Support for crawling multiple hosts, for example www.example.com, example.com and app.example.com PR#27

v1.3.0

v1.2.1

  • Track what urls have been visited in sitemapper and don't visit them twice
  • Protect sitemap index duplicates

v1.2.0

Is history...