Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix xpath selector namespace handling for HTML DOMs #17

Open
blahah opened this issue Apr 10, 2015 · 0 comments
Open

fix xpath selector namespace handling for HTML DOMs #17

blahah opened this issue Apr 10, 2015 · 0 comments
Labels

Comments

@blahah
Copy link
Member

blahah commented Apr 10, 2015

The latest versions of the xpath library break namespace-free selectors for normal HTML documents (goto100/xpath#27), so we're stuck using v0.0.6. For now this is fine, but eventually we should seek a resolution, either with a fork of the main xpath lib, or by just maintaining it inside this project.

@blahah blahah added the bug label Apr 10, 2015
blahah added a commit that referenced this issue Apr 10, 2015
The npm xpath package has been updated to more closely match
the xpath spec. However, this leads to problems using generic
xpath selectors to select into a DOM rendered from normal HTML.
Selectors that don't have a namespace fail to select anything.

This behaviour makes scraping of the kind enabled by thresher
impossible, but the previous version of the xpath lib was still
the best working implementation we found in npm during testing.

The fix is to bring the last working version of the xpath lib
into this repo (it's only a single file). At a later stage, we
can take later versions of the xpath lib that might have other
improvements, but restore the historical handling of namespaces.
This is tracked in issue #17.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant