-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add join
method to Url
class
#1378
base: main
Are you sure you want to change the base?
Conversation
- added support for URL path joining with optional trailing slashes and multiple arguments.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1378 +/- ##
==========================================
- Coverage 90.21% 89.10% -1.11%
==========================================
Files 106 112 +6
Lines 16339 17892 +1553
Branches 36 40 +4
==========================================
+ Hits 14740 15943 +1203
- Misses 1592 1929 +337
- Partials 7 20 +13
... and 52 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
CodSpeed Performance ReportMerging #1378 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! The signature looks fine, though I think the implementation can be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just needs test cases and then I would be happy to see this merged. Sorry for the long delay 😬
- Refactor URL join function for better handling of relative paths - Add tests for joining URLs with and without trailing slashes - Cover various edge cases in line with URL specification that the previous function would fail to handle
I've added tests and changed the implementation. Now, it only takes one argument instead of multiple.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for yet another long review cycle. I would like to just agree what the right default for trailing_slash
is, and have a suggestion to avoid the cannot_be_a_base()
restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the many iterations here and slow reviews by me. I am happy with the design here now. I would like to see the __floordiv__
operator removed (see comment below), and then let's merge 👍
fn __truediv__(&self, other: &str) -> PyResult<Self> { | ||
self.join(other, true) | ||
} | ||
|
||
fn __floordiv__(&self, other: &str) -> PyResult<Self> { | ||
self.join(other, false) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry I missed these in the last round of review. I think the difference between the /
and //
operators here is subtle and hard to document.
I think better we just have /
, and make it so that it matches the default of append_trailing_slash=False
. This will also simplify testing, I think.
fn __truediv__(&self, other: &str) -> PyResult<Self> { | |
self.join(other, true) | |
} | |
fn __floordiv__(&self, other: &str) -> PyResult<Self> { | |
self.join(other, false) | |
} | |
fn __truediv__(&self, other: &str) -> PyResult<Self> { | |
self.join(other, false) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay __floordiv__
can be removed but I feel the __truediv__
should have append_trailing_slash=True
because this overloaded operator would likely be used to join multiple paths in shorter code. This behaviour would feel familiar to Python users, as it resembles pathlib
's path joining.
For example,
a = Url("http://a")
print(a / "b" / "c" / "d")
# http://a/b/c/d/
a = Url("file:///home/user/")
print(a / "music" / "pop")
# file:///home/user/music/pop/
With append_trailing_slash=False
it would instead result in http://a/d
and file:///home/user/pop
which I think is not what the user would expect.
I chose to add __floordiv__
too because it would simplify adding files at the end.
print(a / "dir" / "dir" / "dir" // "file.txt") # file:///home/user/dir/dir/dir/file.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Yikes, there are so many subleties here!
It seems to me that our .join()
method really works like urllib.parse.urljoin
when it comes to semantics, e.g.
>>> urllib.parse.urljoin("https://foo.com/a", "b")
'https://foo.com/b'
versus pathlib's
>>> pathlib.Path("/foo/a").joinpath("b")
PosixPath('/foo/a/b')
Given these are inconsistent, I think we should perhaps back away from trying to have pathlib-like semantics at all.
Would you be open to the idea of dropping the operators from the PR completely, so we can get .join()
merged? We could then open a pydantic
issue to discuss the design of the operators and move forward with an implementation when there's consensus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could also have joinpath()
which works like Pathlib
and doesn't accept query string or fragments as the whole input?
And then could have /
operator work like joinpath
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joinpath()
would certainly make things cleaner. Should I implement joinpath()
in this PR, or should we drop the operators for now and discuss it in the issues instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. I think I'd prefer we just had .join()
here and worried about .joinpath()
and the operators later. That said, there's potentially a desire to agree a sketch of the follow ups here. @pydantic/oss - any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think without comment from anyone else, let's just do .join()
here and then follow-up with an issue in the main pydantic
repo where we can discuss .joinpath()
and operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to have more time to discuss the semantics (does it need to match urllib
? What about other libraries like furl
? Should be double check with the current RFCs? We should also check what was said in this discussion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to have more time to discuss the semantics (does it need to match
urllib
? What about other libraries likefurl
? Should be double check with the current RFCs? We should also check what was said in this discussion).
Really sorry for the late response. The main URL joining part is handled by rust-url's join method which implements the WHATWG URL spec. So, the /
operator semantics are handled by the libraries in our case, python libraries like urllib
and furl
. While furl
provides /
operator, urllib
does not. urllib.parse.urljoin
and furl.furl.join
follows the RFC 3986 to resolve the new URL.
furl
is using /
operator for only adding path to URL like pathlib.Path.
I think the furl
approach is good. I should not have written my function signature as signature=(path, append_trailing_slash=false)
but rather as signature=(url, append_trailing_slash=false)
because the argument can be a relative or absolute url, not just path.
IMO, we can have Url.join()
for URL joining, similar to furl.furl.join
, and Url.__truediv__
for the sole purpose of adding a path without a trailing slash, just like furl.furl.__truediv__
.
Let's do this before moving forward |
Change Summary
This PR implements a feature based on pydantic/pydantic#9794 to join URL path into the base URL. It uses the
join
method from theurl
crate.Related issue number
fix pydantic/pydantic#9794
Checklist
pydantic-core
(except for expected changes)