Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sys.path for package discovery #9849

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

pradyunsg
Copy link

Resolves #2500
Intend to resolve #4466 before coming out of draft mode here.

x-ref #2413, since this PR removes an assumption introduced in that PR that a missing site-packages directory means that there are no packages to be found.

Summary

  • Change installed package discovery logic to use sys.path rather than inferred paths based on sysconfig's purelib and platlib directories.
  • [TODO] Change uninstallation logic to skip over packages that are installed outside the working environment.
  • [TODO] Remove any dead code, if the site_packages we were computing for this is not used anywhere else.

Test Plan

Only manually tested as of now.

The plan for automated tests is: Use a .pth file as an extra path to be injected to the virtual environment being modified, with the extra path pointing to site-packages from another virtual environment.

This should be less complicated than the proposed approaches of using sitecustomize.py as well as relying on system-site-packages with a mock system interpreter -- all these approaches modify the sys.path in a manner visible to the interpreter introspection script and using a .pth file is likely to be the easiest to reason about.

@charliermarsh
Copy link
Member

Wow, new contributor just dropped!

Screenshot 2024-12-12 at 3 15 03 PM

@charliermarsh charliermarsh self-assigned this Dec 12, 2024
@pradyunsg
Copy link
Author

sigh

Locally, I'm hitting rust-lang/rust-analyzer#16959 trying to diagnose what's happening in the two spots that the most recent commit has added more granular bail! logs to, and looks like I can't step through with a debugger on this without resorting to hackery around CARGO_MANIFEST_DIR. 😞

All the tests run via the VS Code UI for debug end up panicking before running with:

---- tool_upgrade::tool_upgrade_multiple_names stdout ----
thread 'tool_upgrade::tool_upgrade_multiple_names' panicked at crates/uv/tests/it/common/mod.rs:292:84:
called `Result::unwrap()` on an `Err` value: NotPresent

Looks like I'll be falling back to using regular println! to diagnose things. 🌈

@pradyunsg
Copy link
Author

pradyunsg commented Dec 13, 2024

Doing a brain dump since I need a break now -- it took me a bit of time to separate a bunch of tooling failures/noise, and actually understand what's broken here...

let interpreter = interpreter.with_virtualenv(virtualenv);

The virtual environment's Interpreter object is created based on the base environments' Interpreter object. This step, notably, does not update the information about sys.path, which keeps pointing to paths from the base environment. A bunch of the failures in tests are happening since the logic ends up using a "stale" sys.path information from the base environment when working with the virtual environment's interpreter.

An easy reproducer for this, with this PR, is using uv tool install <whatever> for a tool that doesn't already have an environment for it. You'll see it fail to find the distributions that were just installed in the virtualenv created.

This used to be fine earlier since the logic didn't care at all about sys.path and the interpreter's prefix was being updated which was effectively "updating" the site-packages path that'd be used for the lookups. Ignoring to update sys.path is not viable, at least, not with the goal of this change of using appropriately using sys.path for package discovery.

Still thinking through what to do here... The "slow" option here would be to recreate the Interpreter object with the interpreter query logic again (caching that as appropriate). However, the only attribute that's incorrect in a manner that matters semantically today is sys.path and all the other attributes could be inferred from the base Python. Plus, this approach is also gonna make nearly all locations where uv works with a virtualenv it just created meaningfully slower by requiring it to wait on the Python interpreter query logic ~every time (it's very unlikely that the user will have the newly created interpreters' information cached).

@pradyunsg
Copy link
Author

Ok, an alternative that I haven't fully thought through the implications of... I think this is more of a reasonable guess of what sys.path will be, but it might just work. Here's the idea:

Instead of querying the interpreter again, use the base_interpreter.sys_path_after_site_import + [virtualenv.site_packages] when system-site-packages is true / use base_interpreter.sys_path_before_site_import + [virtualenv.site_packages] when system-site-packages is false. This, of course, relies on the fact that python lets us disable site on startup & uv has control over the complete lifetime/invocation for its interpreter introspection logic.

/opt/pradyunsg/bin/python3.12 -S -c "import sys; print(sys.path)"                                                
['', '/opt/pradyunsg/lib/python312.zip', '/opt/pradyunsg/lib/python3.12', '/opt/pradyunsg/lib/python3.12/lib-dynload']/opt/pradyunsg/bin/python3.12 -c "import sys; print(sys.path)" 
['', '/opt/pradyunsg/lib/python312.zip', '/opt/pradyunsg/lib/python3.12', '/opt/pradyunsg/lib/python3.12/lib-dynload', '/opt/pradyunsg/lib/python3.12/site-packages']

Let me know if this seems reasonable, and also... I'd be very happy to hear if someone can think of a scenario where this might be broken!

@paveldikov
Copy link
Contributor

paveldikov commented Dec 15, 2024

I'd be very happy to hear if someone can think of a scenario where this might be broken!

Broken might be a bit harsh, but 'incomplete'?

  • this will indeed work for system-site-packages, since the paths all come from the base interpreter anyway
  • but I don't think it will work for the broader pattern of modifying the module search path via sitecustomize.py/.pth files?
    • This is commonly used to implement 'layered'/'stacked' venvs (see comment)

I do note that the former usage is much more common than the latter.

If the 'correct' thing to do is unacceptably slow -- and the guess is good enough for most use cases -- maybe an acceptable compromise is to do the 'good enough' thing by default and provide an option for those who really need that extra correctness?

@pradyunsg
Copy link
Author

pradyunsg commented Dec 16, 2024

sitecustomize.py/.pth files

You won't have these in an environment that was just-created in-process by uv, as I understand it. You could have these in the base environment though, so the "guess" model doesn't work in general for environments with include-system-site-packages = true.

I think that's fine. We could stick with this computational approach for environments with include-system-site-packages = false, since that's likely the 90% case and fall back to doing the subprocess for environments with include-system-site-packages = true.

@charliermarsh
Copy link
Member

I think it's ok to re-query information if we have to. We cache it anyway, and ideally we only do this when system-site-packages = true.

@charliermarsh
Copy link
Member

Do you need any guidance on the code changes, beyond that?

@paveldikov
Copy link
Contributor

paveldikov commented Dec 29, 2024

I think it's ok to re-query information if we have to. We cache it anyway, and ideally we only do this when system-site-packages = true.

A-ha! Maybe an elegant solution would be to tune the cache key for sys.path?

That way the 'expensive'/side-effectful recomputation will be done at most once, and ideally not at all (ideally cache entry populated at end of venv creation)

The cache key could be some sort of serialised struct that encapsulates the myriad things that could influence sys.path:

  • base prefix
  • venv site-packages path
  • boolean value of system-site-packages
  • hashes of any/all sitecustomize.py, usercustomize.py, *.pth files found. (Or, if feeling cheap, just list of filenames found)

@pradyunsg
Copy link
Author

Do you need any guidance on the code changes, beyond that?

Not at the moment. I'll only be able to pick this back up early next year.

This better reflects the intended behavior of this object, which is to
serve as a database of InstalledPackages rather than a representation of
the relevant environment/interpreter path.
When a virtual environment is created, `sys.path` is modified. Instead
of reusing the base interpreter's `sys.path`, query the interpreter
again to get the new `sys.path` values.
@pradyunsg pradyunsg force-pushed the sys-path-for-discovery branch from cdb36f2 to b8d0038 Compare January 13, 2025 15:22
@pradyunsg
Copy link
Author

pradyunsg commented Jan 13, 2025

Whelp, looks like I broke something about regular virtualenv package discovery with the most recent commit. 🙃

🤦🏽 Figured it out.

Make it clear with the variable names that Interpreter struct's members
are not VirtualEnvironment.
@pradyunsg
Copy link
Author

{purelib:"/Users/pgedam/Developer/github/astral-sh/uv/workarea.ignore/.venv/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages", platlib:"/Users/pgedam/Developer/github/astral-sh/uv/workarea.ignore/.venv/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages", scripts:"/Users/pgedam/Developer/github/astral-sh/uv/workarea.ignore/.venv/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/bin", data:"/Users/pgedam/Developer/github/astral-sh/uv/workarea.ignore/.venv/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13", include:"/Users/pgedam/Developer/github/astral-sh/uv/workarea.ignore/.venv/include/site/python3.13"}

It looks like the virtualenv.scheme's value is incorrect when it gets passed into Interpreter.with_virtualenv. I'm gonna look into what's going on with the path composition for those schemes tomorrow.

@pradyunsg
Copy link
Author

Looks like it's wrong at the point of querying the interpreter.

        virtualenv: Scheme {
            purelib: "opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages",
            platlib: "opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages",
            scripts: "opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/bin",
            data: "opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13",
            include: "include/site/python3.13",
        },

The most recent commit "fixes" that to:

        virtualenv: Scheme {
            purelib: "lib/python3.13/site-packages",
            platlib: "lib/python3.13/site-packages",
            scripts: "bin",
            data: "",
            include: "include/site/python3.13",
        },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants