Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[Unreleased]

Added

Support waitFor for crawler.queue()'s options.

Fixed

Fix a bug of not allowed to set timeout option per request.
Fix a bug of crawling twice if one url has a trailing slash on the root folder and the other does not.

[1.4.0] - 2018-02-24

Added

Support browserCache for crawler.queue()'s options.
Support depthPriority option again.

[1.3.4] - 2018-02-22

changed

Drop depthPriority for crawler.queue()'s options.

[1.3.3] - 2018-02-21

Added

Emit newpage event.
Support deniedDomains and depthPriority for crawler.queue()'s options.

changed

Allow allowedDomains option to accept a list of regular expressions.

[1.3.2] - 2018-01-19

Added

Support followSitemapXml for crawler.queue()'s options.

Fixed

Fix a bug of not showing console message properly.

[1.3.1] - 2018-01-14

Fixed

Fix a bug of listing response properties as methods.
Fix a bug of not obeying robots.txt.

[1.3.0] - 2018-01-12

Added

Add HCCrawler.defaultArgs() method.
Emit requestretried event.

changed

Use cache option not only for remembering already requested URLs but for request queue for distributed environments.
Moved onSuccess, onError and maxDepth options from HCCrawler.connect() and HCCrawler.launch() to crawler.queue().

[1.2.5] - 2018-01-03

Added

Support obeyRobotsTxt for crawler.queue()'s options.
Support persist for RedisCache's constructing options.

changed

Make cache to be required for HCCrawler.connect() and HCCrawler.launch()'s options.
Provide skipDuplicates to remember and skip duplicate URLs, instead of passing null to cache option.
Modify BaseCache interface.

[1.2.4] - 2017-12-25

Added

Support CSV and JSON Lines formats for exporting results
Emit requeststarted, requestskipped, requestfinished, requestfailed, maxdepthreached, maxrequestreached and disconnected events.
Improve debug logs by tracing public APIs and events.

Changed

Allow onSuccess and evaluatePage options as null.
Change crawler.isPaused, crawler.queueSize, crawler.pendingQueueSize and crawler.requestedCount from read-only properties to methods.

Fixed

Fix a bug of ignoring maxDepth option.

[1.2.3] - 2017-12-17

Changed

Refactor by changing tye style of requiring cache directory.

Fixed

Fix a bug of starting too many crawlers more than maxConcurrency when requests fail.

[1.2.2] - 2017-12-16

Added

Automatically collect and follow links found in the requested page.
Support maxDepth for crawler.queue()'s options.

[1.2.1] - 2017-12-13

Added

Support screenshot for crawler.queue()'s options.

[1.2.0] - 2017-12-11

Changed

Rename ensureCacheClear to persistCache for HCCrawler.connect() and HCCrawler.launch()'s options.

[1.1.2] - 2017-12-10

Added

Support maxRequest for HCCrawler.connect() and HCCrawler.launch()'s options.
Support allowedDomains and userAgent for crawler.queue()'s options.
Support pluggable cache such as SessionCache, RedisCache and BaseCache interface for customizing caches.
Add crawler.setMaxRequest(), crawler.pause() and crawler.resume() methods.
Add crawler.pendingQueueSize and crawler.requestedCount read-only properties.

[1.1.1] - 2017-12-09

Added

Add CHANGELOG.md based on Keep a Changelog.
Add unit tests.

Changed

Automatically dismisses dialog.
Performance improvement by setting a page parallel.

[1.1.0] - 2017-12-08

Added

Support extraHeaders for crawler.queue()'s options.
Add comment in JSDoc style.

Changed

Public API to launch a browser has changed. Now you can launch browser by HCCrawler.launch().
Rename shouldRequest to preRequest for crawler.queue()'s options.
Refactor by separating HCCrawler and Crawler classes.
Refactor handlers for options.

[1.0.0] - 2017-12-05

Added

Add test with mocha and power-assert.
Add coverage with istanbul.
Add setting for CircleCI.
Add .editorconfig.
Add debug log.

Changed

Migrate from NPM to Yarn.
Refactor helper to class static method style.