-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor url - object split (motivated by fsspec
integration)
#976
Conversation
Perhaps it would make sense to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is done, fsspec will become a strict dependency.
This would avoid having potentially different behaviours for users having fsspec installed or not
If that's possible, then you would be introducing a behavior change now for those users who happen to have fsspec installed for other reasons. We don't want the path-handling to change at all. Could you get more certainty about that?
Meanwhile, this is not a change that has to happen now (or ever, technically). The original splitting was using a Python standard library function. Even when we start relying on fsspec for all file backends, there's no reason we couldn't still use the standard urlparse
for URL parsing. Does fsspec.utils.urlsplit
do something special?
I'm on the fence about this one.
Co-authored-by: Jim Pivarski <[email protected]>
Yes I think you are right we should not use the fsspec url parsing when we can just use urllib. At the end I ended up refactoring the helper method to use I feel that this method is more concise but I am worried it does not fully reproduce the old one (everything looks good so far, all the tests pass). |
Might be worth reviewing all the supported syntaxes of |
The colon-splitting between filename and object (not related to #974—that's a colon in an object path) has been a lot of trouble. There's a list of problems it's caused on #920 (comment), including platform-dependent issues with On page 19 of this talk I analyzed a large dataset of user code to see how many people are using it, to see if we could ever get rid of it. The result was 10% of So the path interpretation is unfortunately complex and we should leave it untouched if possible. It seems to me that it should be possible, since replacing the backend doesn't change the path-or-URL that we send to the backend. |
Personally I'd be in favor of dropping |
I think we have consistency across the file-opening functions under control. File syntax is never interpreted differently by different file-opening functions, but some functions have more options than others.
( The above is complex, but I believe that it is under control. Each one of these methods was motivated by a request (a long history...). |
fsspec
to split urlfsspec
integration
fsspec
integrationfsspec
integration)
To clarify (after an in-person discussion with @jpivarski I think there was some confusion):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been staring at this for a while, it should be okay. Let's try it out in production and find out!
(We'll know where to look if anyone runs into any new problems with colon-parsing.)
Currently the
uproot._util.py
file contains some helper methods for things such as url parsing, split root object from file path etc.Besides the splitting of the root object from a URI which is non-standard, fsspec should be able to handle everything else and we could remove or simplify some helper methods.
Since this uses fsspec but fsspec is not currently a dependency my solution was to do the new url split using fsspec only if its installed, otherwise do the old one, which may cause some problems.