-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolver preformance regression in 21.2 #10201
Comments
This is not a bug. Please kindly read this section of the documentation: https://pip.pypa.io/en/stable/user_guide/#dependency-resolution-backtracking To propose improvements to the dependency resolution algorithm, please see #9187. |
Reopening since I missed the part you said 21.1 worked reasonably well. I'll investigate whether we can bring back the 21.1 performance for you. |
Looks like a state pollution issue! |
Thanks @uranusjr. Checking the logs, it wasn't backtracking at all in 21.1 - it immediately accepted the latest dependency version (as it should). Hopefully the repo and CI logs help in narrowing this down, but let me know if there's anything else I can do. |
I think it’s simply trying to find a boto3 version that specifies a botocore version that is compatible with another dependency. The situation is probablyt something like this
Step 3-ii. and 3-iii. can be optimised with tree pruning, but that still leaves 3-i. To have a useful optimisation, we need to find what exactly is causing |
If you look closely, 21.1 still backtracks, but on different packages (specifically |
Yeah, I should have been more specific - I meant that 2.1 is not backtracking on botocore/boto3, so I'm surprised it needs to backtrack now with 21.2 - considering the botocore/boto3 versions are the same
It always is 🙂 |
One of the difficulties here is that botocore has so many releases (1224 in my extract, but it's probably gone up since then). with the current limitations on the packaging infrastructure (getting metadata needs a download) that's always going to be an issue. Less than 0.8% of packages on PyPI have more than 100 releases, so in principle it's entirely reasonable for the trade-offs that pip makes to favour projects with fewer releases. Unfortunately, projects like botocore are too popular to be considered "outliers" in a general sense. I wonder if we could prioritise projects with fewer releases over those with more? Or is that something we've already tried? On the other hand, given that it is only a few projects, maybe we should be looking at asking those projects to meet us half way? Maybe botocore/boto3 could yank older, unsupported versions? The support page isn't very clear on precisely which minor versions are supported for how long, but I can't imagine Amazon support all 1224 versions of botocore... |
Didn't we try to resolve projects with fewer releases before projects with lots of releases? Or am I remembering it the other way round? |
We only resolve packages with "tighter constraints" first, but don't actually count the number of releases. I think the main reason is there's currently no mechanism to do that; the information is of course available on the index page, but we don't have a way to bubble it up past PackageFinder. |
So while true most packages only have a few release, unfortunately looking at the top 10 packages on PyPi 50% have over 100 releases: botocore, awscli, boto3, requests, setuptools. In the past (not sure if it's still true) setuptools has been identified in the resolver as a special case, I think it was to deprioritize checking for older versions? IMO I think it makes sense to add the other 4 packages listed here to that special case, but at the least add botocore, awcli, and boto3 as it looks like all 3 packages release daily. What makes the pip resolver interesting about it's dependency resolution vs. other dependency resolver like conda is that exploring the dependency graph is a non-trivial cost. And it makes intuitive sense that packages with very frequent release (e.g. daily) will not change their dependency requirements every release and therefore trying to backtrack through those packages is going to be highly costly (this is my intuition and unsubstantiated by any data, I'll have a think if there's anyway I could empirically collect this information). But as @uranusjr points out this information currently isn't propagated to the pip resolver. So maybe it makes sense to special case very popular very commonly released packages? My 2 cents. |
I put together another commit that focuses only on Hypothesis 2, i.e. it tries to resolve the current conflicts before anything else. It prefers the causes and parents of causes of the last known conflicts. In my testing it fixes OPs issue and other test cases I have to hand. The hypothesis here is that "If package A version n depends on package B then package A version n-1 probably also depends on package B". Therefore "If package A and C both depend on package B but on different versions of package B the resolution will probably be faster it it focuses on packages A and C and not on some package X which does not depend on package B". Feel free to try here, I make no promises it will improve your particular case: Code diff: notatallshaw/pip@21.2.3...notatallshaw:third_attempt_at_prefer_non_conflicts Compared to before it doesn't try to hack the resolution steps of Pip, so it's worst case is the same as the current worst case. I think this approach is now simple enough that it's worth me trying to work a real PR rather than just PoC commits. FYI it also includes a fix for #10415 which I think may also help some use cases. |
Fourth reproducible test case:
Further the below test case has no solution (changed
|
Second ResolutionImpossible test case, taken from #9187, I don't think anyone has run the current pip resolver long enough to for this to reach ResolutionImpossible. But on my latest patched version of pip it gets to ResolutionImpossible in a few minutes. Again I think because it prefers the failure causes it narrows down the possible search space much more efficiently, at least for real world dependency trees (it's definitely possible to construct an artificial dependency tree where this approach is much slower). Linux only, must install python package
|
Thanks for continuing to look into this @notatallshaw - much appreciated. |
I tried this known extremely problematic test case:
While I believe my patched version of Pip will solve this many orders of magnitude faster than the current resolver on my computer it still didn't resolve after an hour. The problem is one of the failing causes is Mutating the state like this would be very complicated and require a lot of code / test cases, where as my current proposed PoC is just a few extra lines and probably just a few extra test cases. I think therefore while I've shown many real world test cases to be dramatically improved in performance by the current PoC, the limit is where failure causes are already pinned and changing the preference order won't cause them to be backtracked any time soon. That case would need to be a different more carefully thought out optimization.
Thanks for testing, really appreciate it! |
Side comment. I really like how |
Fifth real world test case (taken from: #9187 (comment) ) where my PoC installs the requirements in a few minutes but the current resolver gets stuck taking an unknown amount of time to install: (Linux only, may require installing some OS libs)
|
First of all, the work you're doing is really appreciated, so please don't take this comment the wrong way. But have you collected any evidence as to whether your approach makes any cases that are currently fast, slower? Obviously it's almost impossible to know how to check that - maybe it would be worth looking at some of the issues that were slow before 21.2, which the changes in 21.2 improved? Confirming that we don't make them worse again would be a helpful data point. Actually, what would be perfect would be if we could add performance tests to the test suite, so we could capture some of these bad cases and ensure we don't regress. But I haven't the foggiest idea how we could do this 🙁 |
Instead of measuring performance in terms of say duration, how about doing so in terms of expected number of backtracks/...? This would presumably be less flaky than traditional performance tests. |
Not taken the wrong way, I would really like to test this and find examples. But a couple of points: 1) If the resolver never backtracks there will be no difference, 2) As my latest version only changes the preference order then the worst possible times are the same. In theory, I think, the worst possible case for my optimization (and therefore presumably slower than the current resolver) would be: Package A version n depends on Package B and Package C, but they are both 100% incompatible , Package A version n-1 depends on 2 different packages Package D and Package E, there are also incompatible, etc. until eventually package A version 1 installs. Packages B, C, D, E, ... have similar dependency incompatibilities to Package A. But I don't think this represents a real world dependency tree, the assumption of the optimization is if package A version n depends on package B, then package A version n-1 probably also dependends on package B. Where this doesn't hold the optimization will be slower, but I think changing dependencies usually only happens once in a while and when the dependencies do change the user presumably changes them to a valid installable set of dependencies, in which case my optimization should still be faster.
I have been backtracking through open and closed issues that look performance based and trying to find any real world reproducible examples, there are shockingly few, most users just write something to the effect of "takes too long" and don't provide an actual reproducible example. If you have specific cases that improved from 21.1 to 21.2 I would love to test them. But I would like to point out my optimization is only adding 1 extra parameter to the pinning preference, most of the improvements that came with 21.2 should also be included in my optimization (in fact even better I fixed a bug in one of the improvements: #10415 )
An idea I had was trying to build a tool that takes a real world test case and turns it in to an offline repeatable test case. I imagine this would be pretty tricky but hopefully possible. But as of right now I can not get pip's test suite to even start to run on my machine, I spent an hour trying yesterday and tox/pytest just keep throwing me cryptic errors. I'll give it another shot at the weekend when I have more time to work on it.
@edmorley I think the issue is having any kinda of benchmark suite right now, yes the performance should probably be measured in iterations not time. |
I have summarized my proposal to assist with dependency trees like this and linked the PRs here: #10479 If anyone has any ideas of how to improve or where I need to make a stronger argument let me know. |
Dropping this from the release milestone, since there's no clear actionable items listed here -- and #10481 is already a part of the milestone independently. |
@notatallshaw Are all (applicable) reports in this issue in your test cases already? If that's the case, I feel we can just close this one and track those cases on their own. |
Yes, of the ones I could reproduce. Some I simply could never reproduce and some have had their dependencies fixed (In those cases I've done my best to keep the test cases working by adding {package} <= {last known bad version} to my test requirement files). I agree on closing, and once #10481 lands I think all pip issues related to heavy backtracking should be closed (e.g. #10373 #9187 #9215), as any new reports should be tested against that and if they experiencing heavy backtracking still it will take further analysis and the reasoning will not be the same. |
The scientific Python world has a tool airspeed velocity which runs the same benchmarks against different commits in a project's history, trying to identify performance regressions. It's more usually used for CPU-bound number crunching (I'm guessing this is network bound?), but when you're looking at orders of magnitude changes from different recursive algorithms, it's probably easy to distinguish signal from noise. And of course you could take the machinery for walking a project's history, running benchmarks, storing & viewing results, and change what it actually measures when running benchmarks. |
See #9187 (comment), if you're here to report the behaviour of pip's dependency resolver. |
Description
As of 21.2, the resolver decides it needs to download every possible version,
to determine which version is compatible with other requirements.
, for multiple dependencies.Note that this works fine for some dependencies (coverage, pytest_cov), as it only downloads a few.
For dependencies
botocore
andboto3
, however, it downloads every possible version.A minimal reproducible example can be found here:
https://github.com/bblommers/pipresolverbug
Not sure what the offending dependency/installation method is - I could only reproduce it for this specific workflow.
The CI for this project runs this specific example for both 21.1 and 21.2.
Version 21.1 completes in 15 seconds - 21.2 is still running after 20 minutes.
The CI logs can be found here:
https://github.com/bblommers/pipresolverbug/actions/runs/1064350454
Expected behavior
To only download the latest version of a dependency
pip version
21.2
Python version
3.7
OS
Linux (Ubuntu)
How to Reproduce
Output
Code of Conduct
The text was updated successfully, but these errors were encountered: