-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api.get_url: use index storage for getting remote URL #9676
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9676 +/- ##
==========================================
- Coverage 90.55% 90.53% -0.02%
==========================================
Files 480 480
Lines 36399 36422 +23
Branches 5230 5235 +5
==========================================
+ Hits 32960 32976 +16
- Misses 2850 2855 +5
- Partials 589 591 +2
β View full report in Codecov by Sentry. |
Should it just use |
I'd be fine with making this change. I went with dvcfs initially because the |
@pmrowla Yeah, those are there just historically since we didn't have index before and dvcfs was the most convenient way to access those. We've removed some over the past months (e.g. isdvc isout from datafs, but not dvcfs yet). I think this PR is alright, but maybe take a look at doing it with index, it might even be easier and cleaner. |
if not fs_info and "md5-dos2unix" in dvc_info: | ||
ret["md5-dos2unix"] = dvc_info["md5-dos2unix"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to the get_url
after the index change but legacy hashes still need to be handled in dvcfs
if repo is None: | ||
repo = self._make_repo(url=url, rev=rev, subrepos=subrepos, **repo_kwargs) | ||
assert repo is not None | ||
# pylint: disable=protected-access | ||
repo_factory = repo._fs_conf["repo_factory"] | ||
self._repo_stack.enter_context(repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time that dvcfs constructs a new repo (or subrepo) instance we need to ensure that it gets closed cleanly, otherwise the index context for that subrepo could end up being opened/locked and never closed/unlocked
@@ -67,6 +67,7 @@ def open_repo(url, *args, **kwargs): | |||
url = os.getcwd() | |||
|
|||
if os.path.exists(url): | |||
url = os.path.abspath(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was previously broken in the case where you passed a relative local path like repo='.'
into dvc.api
calls.
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Fixes #9176
Previously,
dvc.api.get_url
just tried to get the default or specified remote and then manually generated a URL based on hard-coded'md5'
dict lookup. As a result, it did not support per-outputremote:
settings and did not support cases where a legacy 2.x output would returnmd5-dos2unix
info instead ofmd5
.DVCFileSystem.info()
now returns a properly generateddvc_info['remote_url']
field for DVC outs that gets populated using the actual remote storage for the given data index entry (meaning per-outputremote:
is used when set, and default or--remote
flag otherwise)api.get_url
anddvc get --show-url
now use theremote_url
field fromdvcfs.info()
cli:
api: