- Add
.learn()
to generate a selector for a selected node - Add
.listen()
for easily creating DOM event listeners - Add
.trigger()
for easily triggering DOM events - Add
.on()
for binding callback to a local-only event - Add
.url()
to set the current URL - Add
.params()
to set the current URL parameters - Add
.save()
to save response data to a file - Add
.add()
,.remove()
for node creation/deletion? - Add
.scroll()
to scrape infinite scroll pages - Add warnings for parser errors?
- Switch to semantic versioning?
- Event/error handling
- Error.code = 404, 'timeout', etc.
- Error.module = 'http', 'dom', etc.
- return true = retry, false = stop, anything else = continue
- Event for discontinued context/data
- Module system using osmosis.require and modules prefixed with
osmosis-
- Way to trigger DOM
- Throw unhandled errors?
.while()
to do things more than once as long as they call next()
- Fixed bug where .get() without
params
caused empty query string ('?') - Preserve sort order for
.follow()
results within.set()
- Removed
opts
andcallback
arguments
- Supports an array as the root data object
- Fixed case where nested
.find
searches the entire document
- parseHtml uses
huge
option by default - Fixed nested Osmosis instances inside
set
- Update to
libxmljs-dom
v0.0.5
- Fixed nested Osmosis instances inside
set
- Added tests for nested set data
- Proper
submit
button handling - Accepts a
submit
button selector as the first argument - Supports
submit
button attributes: "form", "formaction", "formenctype" and "formmethod" - Added tests for
submit
button handling
- Update to
libxmljs-dom
v0.0.4
proxy
option can now be an array of multiple proxies
- Added
.proxy()
to easily set theproxy
configuration option
- If the first argument's name is:
- "document" - The callback is given the current document
- "window" - The callback is given the Window object
- "$" - The callback is given a jQuery object (if available)
- Uses 'use strict'
- Minimize use of array.forEach
- Added libxml specific memoryUsage monitoring
- Switched to static
libxmljs-dom
version
- Added
ignore_http_errors
option - Added
:internal
for selecting internal links - Added
:external
for selecting external links - Added
:domain
for searching by domain name - Added
:path
for searching by path
- Configuration options are inherited down the chain
- Added
.contains(string)
to discard nodes whose contents do not matchstring
- Added
.do()
to call one or more commands using the current context
- Added
.failure(selector)
to discard nodes that match the given selector
- Added
.filter(selector)
to discard nodes that do not match the given selector
- Accepts a tokenized URL string
- @{...} - Request info (url, method, params, headers, etc.)
- %{...} -
data
object - ${...} -
context
search
- Added
headers({ key: value })
andheader(key, value)
to set HTTP headers
- Added
.match([selector], RegExp)
to discard nodes whose contents do not match
- Added
.rewrite(callback)
to set a URL rewriting function for the preceding request
promise.args
is now an object (used to be an array)- HTTP 400 errors are now logged and the requests are retried.
- DOM and css2xpath functionality have been moved to
libxmljs-dom
- Added
keep_data
option to retain the original HTTP response - Added
process_response
option for processing data before parsing - Added test suite
- Added
.click()
for interacting with JS-only content
- Added
.delay(n)
for waiting n seconds before calling next. Accepts a decimal value.
- Accepts an array of selectors as the first argument
- Accepts second argument. Boolean (true = follow external links) or a URL rewriting function.
- Accepts
function(context, data)
as the first argument. The function must return a URL string.
- Added second argument to associate a base-url to the document
- Added optional
done
argument
- Added
.select
for finding elements within the current context
- Replaces previously set values
- Enhanced stack counting
- Added data object ref counting
- Added domain specific cookie handling
- Improved stability of deep instance nesting with
.set()
- Osmosis instances operate more independently
- Request queues are now a single array for each instance
- Promises must accept and call
done
if they asynchronously send more than one output context per input context - If
.then
sends more than one output context per input context, then it must acceptdone()
as its last argument and call it after callingnext()
for the last time.
- Ensure non-default
needle
options propagate
- Added a more intuitive method for pagination
- Added easy form submission
- Added easy login support
- Added pause, resume, and stop functionality
- Searches the entire document by default
- Supports innerHTML using
:html
or:source
in selectors - Supports deep JSON structures and nested Osmosis instances
.data(null)
clears the data object.data({})
appends keys to data object
.dom()
is continuing progress and can now run jQuery