Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Git LFS support to uv-git crate #10335

Merged
merged 9 commits into from
Jan 13, 2025
Merged

Conversation

sydduckworth
Copy link
Contributor

@sydduckworth sydduckworth commented Jan 6, 2025

Summary

Closes #3312.

This PR adds Git LFS support to the uv-git crate by using the git-lfs CLI to fetch required LFS objects for a revision following the call to git fetch.

The LFS fetch step is disabled by default and only enabled if the environment variable UV_GIT_LFS is set.

When enabled, the LFS fetch step is run for all repositories regardless of whether they have associated LFS objects. The step is skipped if the git-lfs CLI tool isn't installed.

Test Plan

I verified that the minimal example in the linked issue passes, i.e. this command now succeeds:

UV_GIT_LFS=1 uv pip install git+https://github.com/grebnetiew/lfs-py.git

I also verified that non-LFS repositories still work, with or without git-lfs installed.

To Replicate

Attempt to use uv to install a Git dependency that contains LFS objects (e.g. uv pip install git+https://github.com/grebnetiew/lfs-py.git). This should fail with a smudge filter error.

Re-run the same command with the added environment variable UV_GIT_LFS=1. The install should now succeed.

Potential Changes / Improvements

With this change LFS objects in a given revision will always be downloaded if the user has Git LFS installed, which may not always be desired behavior. It might be helpful to add a field to the uv settings and/or an environment variable so that the LFS step can be disabled if needed.

Enabling/disabled via environment variable has now been implemented.

- Added a `git lfs fetch` step after fetching the repository but before cloning it locally.
- Git LFS step is ignored if `git-lfs` is not installed.
- Errors while executing Git LFS are reported as normal.
/// We search for the Git LFS binary instead of using `git lfs` so that we can
/// distinguish Git LFS not being installed (which we can ignore) from
/// LFS errors (which should be returned).
static GIT_LFS: LazyLock<Option<PathBuf>> = LazyLock::new(|| which::which("git-lfs").ok());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain this detection method is portable to all end user systems. For example, I can have git lfs installed and working but I not have git-lfs binary on the path. I think probing git lfs directly is likely the most portable approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've updated the logic so now instead of searching for git-lfs on the path, it just tries to run git lfs version the first time LFS is invoked and uses the success of that command as an indicator of LFS availability.

I'd definitely be interested in if anyone has a cleaner way of determining whether LFS is available. I was initially going to just check if the LFS fetch command returned 127 (command not found), but that return code isn't available in Powershell, so that approach isn't portable across shells.

Now instead of checking for a `git-lfs` binary on the path, we just try to run `git lfs version` to verify LFS is available.
@sydduckworth sydduckworth requested a review from samypr100 January 9, 2025 15:43
@samypr100
Copy link
Collaborator

With this change LFS objects in a given revision will always be downloaded if the user has Git LFS installed, which may not always be desired behavior. It might be helpful to add a field to the uv settings and/or an environment variable so that the LFS step can be disabled if needed.

Agree, since this is technically a possible breaking change a way to opt-in would be great. I think an env var can be used here to trigger the behavior change as its relatively simple. I'd rather have this be opt-in than opt-out as fetching lfs objects may not always be desired, particularly on large downloads.

Thoughts @zanieb?

sydduckworth and others added 2 commits January 12, 2025 12:18
- Added `UV_GIT_LFS` environment variable
- Moved LFS fetch step so it is now gated behind `UV_GIT_LFS` flag.
@sydduckworth
Copy link
Contributor Author

@samypr100 I've updated the PR so that the LFS fetch step is off by default, and is enabled by setting the environment variable UV_GIT_LFS.

@zanieb
Copy link
Member

zanieb commented Jan 12, 2025

Opt-in is a great idea — it makes us much more likely to accept the feature.

@zanieb zanieb self-requested a review January 12, 2025 18:09
@zanieb zanieb self-assigned this Jan 12, 2025
@zanieb
Copy link
Member

zanieb commented Jan 12, 2025

Did you do any benchmarking of this?

@sydduckworth
Copy link
Contributor Author

Testing on my local system with a repository with a small number of tracked files (git+https://github.com/astral-test/uv-public-pypackage), I couldn't produce a statistically significant difference in execution time for uv pip install with or without LFS support enabled.

That said, the added overhead should be equal to the time required to run git lfs version once (4.6 ms on my system) plus the time required to run git lfs fetch for each repository (15 ms for the test repo on my system). Execution time of git lfs fetch scales with number of tracked files but even testing with larger repositories I got about the same result.
So the predicted overhead (on my machine) with LFS enabled (assuming the command interacts with at least one Git repository) would be about 5 ms + about 15 ms per Git repository.

@samypr100
Copy link
Collaborator

samypr100 commented Jan 12, 2025

You might be able to eliminate the 5ms overhead by moving the env var check before the lfs check nevermind, I hadn't seen the latest code.

@zanieb zanieb enabled auto-merge (squash) January 13, 2025 18:21
@zanieb zanieb merged commit 97c1877 into astral-sh:main Jan 13, 2025
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uv doesn't correctly checkout Git dependencies with Git LFS assets
3 participants