Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always prefer causes when backtracking #132

Closed

Conversation

notatallshaw
Copy link
Collaborator

@notatallshaw notatallshaw commented Jun 3, 2023

As per #131 this moves the responsibility from the downstream library to prefer backtracking causes into resolvelib itself solving the issue of an O(n2) performance problem that can happen.

It provides additional performance benefits under those circumstances because get_preference will be called a lot less.

After working on this PR I decided not to have this as optional, it keeps the resolver and tests much simpler and while it may be possible to construct a dependency graph where this approach is slower we have no such reports come from Pip and as per this comment in real world benchmarking there is over a 2x improvement in resolution speed to preferring backtracking, this makes intuitive sense as any successful solution must not have conflicts so resolving them first will generally be faster.

But let me know if you would prefer this approach to be optional, I can do so.

I have been testing these change in Pip on this branch: pypa/pip@main...notatallshaw:pip:always-prefer-causes

@notatallshaw notatallshaw force-pushed the prefer-causes-always branch from d3d7b9b to cf55c75 Compare June 3, 2023 22:01
self._r.backtracking_on(
backtrack_cause_names, unsatisfied_causes_names
)

# Choose the most preferred unpinned criterion to try.
name = min(unsatisfied_names, key=self._get_preference)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could put a check here to see if the length of unsatisfied names is 1 and skip calling self._get_preference altogether under this scenario?

information,
backtrack_causes,
):
def get_preference(self, identifier, resolutions, candidates, information):
Copy link
Member

@uranusjr uranusjr Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re at version 1 now so it’d be better if we could implement this in a backward-compatible way. Maybe we can still pass the backtrack causes anyway? We could potentially mark it as deprecated and pending removal in 2.0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it was just to simplify the interface and make sure downstream libraries aren't duplicating this logic, but I don't have any strong opinions on whether to keep providing it or not.

@uranusjr
Copy link
Member

Logic looks good, one question about interface change. Would it be possible to have a test for this?

@notatallshaw
Copy link
Collaborator Author

Logic looks good, one question about interface change. Would it be possible to have a test for this?

I will look at the existing tests and see if I can sufficiently adapt one.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Nov 12, 2023

I'm closing this, this optimization is relatively niche in real world cases (only one reported case of significant slowdown and never been able to reproduce).

But I do have a bigger optimization that will extend this strategey to "always prefer conflicting causes, then causes", that optimization will have a stronger justification with lots of real world examples I can show it speeds up. I will make a new PR for that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants