Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w3lib-based url classes #46

Draft
wants to merge 1 commit into
base: url-page-inputs
Choose a base branch
from
Draft

w3lib-based url classes #46

wants to merge 1 commit into from

Conversation

kmike
Copy link
Member

@kmike kmike commented Jun 2, 2022

An experiment: how a w3lib-based URL class could look like. Again, a minimal version.

@kmike kmike changed the base branch from master to url-page-inputs June 2, 2022 19:04
@codecov
Copy link

codecov bot commented Jun 2, 2022

Codecov Report

Merging #46 (2fd79d0) into url-page-inputs (5b0e889) will not change coverage.
The diff coverage is 100.00%.

@@                Coverage Diff                @@
##           url-page-inputs       #46   +/-   ##
=================================================
  Coverage           100.00%   100.00%           
=================================================
  Files                   15        15           
  Lines                  344       355   +11     
=================================================
+ Hits                   344       355   +11     
Impacted Files Coverage Δ
web_poet/_base.py 100.00% <100.00%> (ø)

def test_urljoin():
url = _Url("http://example.com/foo/bar?x=y#fragment")
assert str(url.join("baz")) == "http://example.com/foo/baz"
assert str(url / "baz") == "http://example.com/foo/baz"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from yarl, for the cases path doesn't end with "/": in yarl you need to call url.parent / "baz" to get urljoin behavior. I'm not sure what's better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the proposed behavior, clean and fit to web scraping scenarios.


def update_query(self: T_url,
new_parameters: Dict[str, str]) -> T_url:
new_url = add_or_replace_parameters(self._url,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yarl's update_query looks way more powerful than w3lib's add_or_replace_parameters, see https://yarl.readthedocs.io/en/latest/api.html#yarl.URL.update_query for the supported arguments. I like that multiple values are handled in yarl, and duplicate query parameters in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants