Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch: warn if cloud-versioned file has no version ID #9669

Merged
merged 1 commit into from
Jul 3, 2023

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Jun 28, 2023

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Closes #9226

  • fetch/pull will now warn and subsequently fail when trying to fetch a file with no version ID from a cloud-versioned remote (i.e. when the file was not previously pushed to the versioned remote)
$ dvc pull                                    ⏎
WARNING: Some files are missing cloud version information and will not be fetched from the remote:
azure://test-versioned/cats-dogs/data/train/cats/cat.1.jpg
azure://test-versioned/cats-dogs/data/train/cats/cat.10.jpg
ERROR: failed to pull data from the cloud - 2 files failed to download

@pmrowla pmrowla added the A: cloud-versioning Related to cloud-versioned remotes label Jun 28, 2023
@pmrowla pmrowla self-assigned this Jun 28, 2023
@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 55.55% and project coverage change: -0.02 ⚠️

Comparison is base (1f31a9c) 90.55% compared to head (0fc8170) 90.53%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9669      +/-   ##
==========================================
- Coverage   90.55%   90.53%   -0.02%     
==========================================
  Files         479      479              
  Lines       36309    36335      +26     
  Branches     5219     5224       +5     
==========================================
+ Hits        32878    32897      +19     
- Misses       2843     2850       +7     
  Partials      588      588              
Impacted Files Coverage Ξ”
dvc/repo/fetch.py 79.10% <42.85%> (-16.55%) ⬇️
dvc/repo/pull.py 100.00% <100.00%> (ΓΈ)

... and 2 files with indirect coverage changes

β˜” View full report in Codecov by Sentry.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@pmrowla pmrowla marked this pull request as draft June 28, 2023 07:54
@pmrowla pmrowla requested a review from dberenbaum June 28, 2023 08:47
@pmrowla pmrowla marked this pull request as ready for review June 28, 2023 08:47
@dberenbaum
Copy link
Collaborator

Looks great @pmrowla! Should we also give some more info about why checkout failed (either by default or in verbose mode)?

@pmrowla
Copy link
Contributor Author

pmrowla commented Jun 29, 2023

Looks great @pmrowla! Should we also give some more info about why checkout failed (either by default or in verbose mode)?

It's not really straightforward for us to tell why the checkout failed. The checkout step is completely separate from the fetch (and the fetch technically succeeds, since we fetch the latest version of the file). If anything, we could probably just update the linked troubleshooting page with a note about the cloud versioned use case

@dberenbaum
Copy link
Collaborator

πŸ€” Should it really be the default behavior that fetching from a cloud-versioned remote should fetch the latest versions when there is no version ID? I wonder if we should fail on fetch unless maybe -f is passed. What's the use case where it's expected by default?

@pmrowla
Copy link
Contributor Author

pmrowla commented Jun 29, 2023

πŸ€” Should it really be the default behavior that fetching from a cloud-versioned remote should fetch the latest versions when there is no version ID? I wonder if we should fail on fetch unless maybe -f is passed. What's the use case where it's expected by default?

When we had the discussion about the original issue a while back we decided that the warning would be sufficient for now, but if we want to just not fetch anything we can do that.

I think fetching the latest version by default was more useful when worktree was a thing and it was possible that users may be updating files in the remote.

@dberenbaum
Copy link
Collaborator

Sorry for the lack of clarity. I'm trying to think if I'm missing some scenario, but by default it makes more sense to me to fail fast here rather than try to fetch everything first.

@pmrowla
Copy link
Contributor Author

pmrowla commented Jun 30, 2023

@dberenbaum updated the initial comment in the PR, we now warn and then fail if files are unversioned (we will still try fetch whatever data was available for files that are properly versioned)

@pmrowla pmrowla merged commit 24d9a05 into iterative:main Jul 3, 2023
18 checks passed
@pmrowla pmrowla deleted the fetch-warn-unversioned branch July 3, 2023 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cloud-versioning Related to cloud-versioned remotes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dvc pull: does not warn about missing versions in a versioned S3
2 participants