Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update dependencies and add new features #33

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
node_modules
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
From source:

<pre>
git clone git://github.com/mikeal/spider.git
git clone git://github.com/mikeal/spider.git
cd spider
npm link ../spider
</pre>
Expand All @@ -26,11 +26,19 @@ The `options` object can have the following fields:
* `userAgent` - The User Agent String to be sent to the remote server along with our request. Defaults to `Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7` (firefox userAgent String).
* `cache` - The Cache object to be used as cache. Defaults to NoCache, see code for implementation details for a new Cache object.
* `pool` - A hash object containing the agents for the requests. If omitted the requests will use the global pool which is set to maxSockets.
* `timeout` - Integer containing the number of milliseconds to wait for a server to send response headers (and start the response body) before aborting the request.
* `delay` - Integer containing the number of milliseconds to wait for next request to send after finishing the current request.
* `retry` - An integer for determining how many retry times after the request get a error response.
* `concurrency` - An integer for determining how many requests should be send in parallel.
* `cookieJar` - To use a custom cookie jar (instead of request’s global cookie jar), set jar to an instance of request.jar().
* `proxy` - An HTTP proxy to be used. [learn more](https://github.com/request/request/#requestoptions-callback)
* `taskTrigger` - A function will be called when a task of queue is done.
* `finish` - A function will be called when the queue is drained.

### Adding a Route Handler

#### spider.route(hosts, pattern, cb)
Where the params are the following :
Where the params are the following :

* `hosts` - A string -- or an array of string -- representing the `host` part of the targeted URL(s).
* `pattern` - The pattern against which spider tries to match the remaining (`pathname` + `search` + `hash`) of the URL(s).
Expand All @@ -43,7 +51,7 @@ Where the params are the following :

`spider.get(url)` where `url` is the url to fetch.

### Extending / Replacing the MemoryCache
### Extending / Replacing the MemoryCache

Currently the MemoryCache must provide the following methods:

Expand Down
Loading