Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass an iterator to add_incompatibility_from_dependencies #226

Merged

Conversation

konstin
Copy link
Member

@konstin konstin commented Jun 2, 2024

Previously, add_incompatibility_from_dependencies was hardcoded to take a hash map. By relaxing this to impl IntoIterator<_>, we avoid converting from uv's ordered Vec<(_, _)> to a HashMap<_, _> and avoid cloning.

This intentionally does not change the public api of the DependencyProvider, only the internal interface that we are using in uv.

@konstin konstin added the enhancement New feature or request label Jun 2, 2024
Previously, `Dependencies::Available` was hardcoded to a hash map. By relaxing this to `impl IntoIterator<_> + Clone`, we avoid converting from uv's ordered `Vec<(_, _)>` to a `HashMap<_, _>` and avoid cloning.

## Design considerations

We implement this using the return type `Dependencies<impl IntoIterator<Item = (DP::P, DP::VS)> + Clone, DP::M>`. This allows using `get_dependencies` without knowing the actual (which a generic bound would require) or adding another associated type to the dependency provider. The `impl` bound also captures all input lifetimes, keeping `p` and `v` borrowed. We could bind the iterator to only `&self` instead, but this would only move two clones (package and version) to the implementer.

Co-authored-by: Jacob Finkelman <[email protected]>
Co-authored-by: Zanie <[email protected]>
Co-authored-by: Charlie Marsh <[email protected]>
@konstin konstin force-pushed the konsti/dev/dependencies-are-an-iterator branch from 0fba094 to ab20eb2 Compare June 3, 2024 06:03
Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been indecisive on this work for a long time.

Pros:

  • It can be used to reduce some allocations, but only with odd constructs and a lot of attention. I should rerun benchmarks to see how much difference this makes.
  • It allows more accurate modeling of multiple dependency edges on the same package. Which in turn leads to better error messages for this already confusing situation. (I believe this is functionality we NEED to support for UV.)

Cons:

  • Is a significantly more complicated signature. This is a large next step toward DependencyProvider becoming an advanced user only interface. Although that may be an inevitable consequence of having production users.
  • It is not obvious that this is the "right" abstraction for the situation. Would something else be more ergonomic for users? Would something else be more compatible with future improvements iterator?

The worst that can happen is we decide it's the wrong abstraction after releasing 0.3 and have to change it for 0.4. So I'm leaning toward accepting this change. But I like to hear some other people's thoughts and arguments, especially @mpizenberg about the ergonomics.

src/solver.rs Outdated
Ok(match self.dependencies(package, version) {
None => {
Dependencies::Unavailable("its dependencies could not be determined".to_string())
}
Some(dependencies) => Dependencies::Available(dependencies),
Some(dependencies) => Dependencies::Available(dependencies.clone()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The allocation in this clone can be removed by making it Dependencies::Available(dependencies.iter().map(|(p, vs)| (p.clone(), vs.clone()))).

Copy link
Member Author

@konstin konstin Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx i accidentally worsened this because i missed that this isn't impl IntoIterator<_> yet.

return Ok(match dependencies {
Dependencies::Unavailable(reason) => Dependencies::Unavailable(reason),
Dependencies::Available(available) => Dependencies::Available(
available.into_iter().collect::<Vec<(Self::P, Self::VS)>>(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocations in these collect calls make me sad. This cache hit case is exactly where I was hoping to avoid allocations by accepting an iterator. My most recent attempt got a little closer to something elegant, but with its own set of trade-offs.

Fundamentally were hitting https://smallcultfollowing.com/babysteps/blog/2018/09/02/rust-pattern-iterating-an-over-a-rc-vec-t/ I think the unstable gen blocks may be an ergonomic fix for this, but I have not succeeded at trying it in my own projects.

@konstin
Copy link
Member Author

konstin commented Jun 3, 2024

Is a significantly more complicated signature. This is a large next step toward DependencyProvider becoming an advanced user only interface. Although that may be an inevitable consequence of having production users.

We can also keep the DependencyProvider more constraint, we only call add_incompatibility_from_dependencies.

If it's relevant, i can do some benchmarking with this changed and with it rolled back in uv.

It is not obvious that this is the "right" abstraction for the situation. Would something else be more ergonomic for users? Would something else be more compatible with future improvements iterator?

I tried to abstract it into a trait and other tricks, but (my) rust is too limited to figure out something better.

@konstin
Copy link
Member Author

konstin commented Jun 3, 2024

I've reverted the biggest change to the dependency provider interface, we can also merge Dependencies<DependencyConstraints<String, SemVS>, Self::M> into a single type again when we're aligned on this PR in general.

@Eh2406
Copy link
Member

Eh2406 commented Jun 3, 2024

It is not obvious that this is the "right" abstraction for the situation. Would something else be more ergonomic for users? Would something else be more compatible with future improvements iterator?

I tried to abstract it into a trait and other tricks, but (my) rust is too limited to figure out something better.

Having ask you to make this PR and then had second thoughts while reviewing it with the second thoughts being abstract "does there exist" questions I owed you my full time and attention today. I went to a lot of harebrained ideas, most of which obviously wouldn't work without even looking at an editor. Eventually I had the idea of a DependencyAdder which was a wrapper around &mut state for the basic operations we want to stabilize. (Add one dependency, and iterator of dependencies, Mark this version is unavailable. That's all for now but we could add more later.) This didn't work because of type shenanigans. Then I thought about generalizing it so that it invoked a user provided callbacks for each stabilized method. Halfway through implement it, I realized I was hand rolling a vtable. So I tried making a trait. The result is dev...Eh2406:pubgrub:add_deps_take_2

What are people's thoughts? Should I open a PR for us to discuss that proposal?

@mpizenberg
Copy link
Member

struct DependencyAdder<'c, DP: DependencyProvider> {
    state: &'c mut State<DP>,
    package: &'c DP::P,
    version: &'c DP::V,
    err: Option<PubGrubError<DP>>,
}

Isn’t the fact that this struct holds a mut ref to the state going to be super annoying for implementations with async concurrency? Or is this structure always only short lived? (Sorry if my question doesn’t make sense)

Waiting on Konstin feedback on this.

Otherwise, I don’t remember exactly why we did not use impl IntoIterator. I’m pretty sure we tried. Was it because of the simplicity tradeoff? or because we did it and benchmarked no perf improvement? or maybe because it was not well supported by the compiler at the time (we had a few decisions related to compiler support I recall)?

@konstin
Copy link
Member Author

konstin commented Jun 4, 2024

The &mut data passing makes DependencyProvider a lot more complex. I do like the add_incompatibility_from_dependency a lot though, i have to check if that may be all we need, though i probably won't get to it today.

@Eh2406
Copy link
Member

Eh2406 commented Jun 4, 2024

Isn’t the fact that this struct holds a mut ref to the state going to be super annoying for implementations with async concurrency? Or is this structure always only short lived?

It is short lived. So I don't expect trouble. But I don't maintain a async api, so not 100%. If it makes problems for UV, they can use Mutex<&mut State> as they have their own version of resolve already.

Otherwise, I don’t remember exactly why we did not use impl IntoIterator. ... maybe because it was not well supported by the compiler at the time (we had a few decisions related to compiler support I recall)?

impl trait in trait only recently stabilized. So that is why it did not work when we last tried it. If you like the ergonomics of that API, we should merge this PR as originally proposed.

@mpizenberg
Copy link
Member

I think the lib is already too complex to be used without looking at docs. If we want people making useful things with pubgrub they should implement quite a lot themselve already so I don’t think this change would add too much barrier to entry.

@Eh2406
Copy link
Member

Eh2406 commented Jun 4, 2024

I'm having a lot of thoughts, but I'm not sure what they are. I'm going to start writing in the hope that I can figure it out by the time I get to the end. If this ends up being incoherent I'm sorry.

There bunch of things I want the get_dependencies API to have. But the world is not perfect and were going to have to compromise on some of them.

  1. "simplicity", this is hard to measure is something like how much esoteric features of rust do you need to understand to use the API for basic functionality. But on the other hand it's also how well does it use the functionality of the language to make normal-looking code.
  2. "compatibility", it would be nice if code that existed on previous version still worked on the new API.
  3. "minimal cloning", it should not require constructing a data structure (or cloning an existing one) to work with.
  4. "duplicate", it should be possible to have two dependency edges on the same package. ( This is required because it actually exists in Python and Rust, and the error messages are significantly better if we express this duality directly to PubGrub instead of calling intersection beforehand.)
  5. "future-proof", there are many kinds of edges we have recently added or might want to add in the future and they should not require breaking changes to the API. Examples include: "unavailable with metadata" (already merged), "dependencies with metadata", "constraints", "constraints with metadata", "week dependencies". let's not argue about whether any of these are good ideas in this thread, but the list is long enough that at some point we will be convinced to add at least some of these.
  6. "unrepresentability", it only makes limited sense to return a list of dependencies AND that a package is unavailable. Our API should not let you represent nonsensical things.

So let's start by evaluate the existing API: ) -> Result<Dependencies<DP::P, DP::VS, DP::M>, DP::Err>.

  1. "simplicity": Definitely the simplest option, but already has a lot going on. Especially if you elaborate all the type aliases.
  2. "compatibility": definitionally.
  3. "minimal cloning": No. You need to return an owned map.
  4. "duplicate": No. It is a map.
  5. "future-proof": Not really. We can add new variance to Dependencies (if we market nonexhaustive), but that's a really awkward way to add a new kind of edge. The new syntax would need to duplicate all the existing functionality, and add more functionality, and then we need to document why you would use one or the other.
  6. "unrepresentability": Yes.

Next let's look at the original proposal from this PR. ) -> Result<Dependencies<impl IntoIterator<Item = (Self::P, Self::VS)> + Clone, Self::M>, Self::Err,>.

  1. "simplicity": That is very long to read, and it uses impl Trait, and Item = (I don't even remember the name of that syntax).
  2. "compatibility": Yes. A Map is a impl IntoIterator<Item = (Self::P, Self::VS)> + Clone
  3. "minimal cloning": If you have a ordinary & then yes. (.iter().map...) If you have a Rc or RefCell then no.
  4. "duplicate": Yes.
  5. "future-proof": Not really. Basically the same situation as we had before.
  6. "unrepresentability": Yes.

What if we add a new Enum for "future-proof". Basically combining this PR with #124. Something like ) -> Result<Dependencies<impl IntoIterator<Item = Requirement<Self::P, Self::VS>> + Clone, Self::M>, Self::Err,> where Requirement is Required | Constrained.

  1. "simplicity": All the same problems as before but even longer.
  2. "compatibility": No.
  3. "minimal cloning": Same as before. If you have a ordinary & then yes. (.iter().map...) If you have a Rc or RefCell then no.
  4. "duplicate": Yes.
  5. "future-proof": Yes, we can add new kinds of edges to Requirement without breaking things. As long as what we are adding are "edges" were good.
  6. "unrepresentability": Yes.

Having two different Enum is really stretching our simplicity budget. Perhaps we can combine them? Something like ) -> Result<impl IntoIterator<Item = Requirement<Self::P, Self::VS, Self::M>> + Clone, Self::Err,> where Requirement is Required | Constrained | Unavailable.

  1. "simplicity": Not great, but slightly better than before.
  2. "compatibility": No.
  3. "minimal cloning": Same as before. If you have a ordinary & then yes. (.iter().map...) If you have a Rc or RefCell then no.
  4. "duplicate": Yes.
  5. "future-proof": Yes, we can add new kinds of edges to Requirement without breaking things. As long as what we are adding are "edges" were good.
  6. "unrepresentability": No. This is sad but I don't know how important it is in reality.

A note on the + Clone, it's only really required for checking for self dependencies. We may be able to avoid it with .inspect or some such. But I don't think it changes any of the fundamental argument.

Now let's evaluate my harebrained idea from yesterday, add_deps_take_2. dependency_adder: &mut DependencyVisitor<Self::P, Self::VS, Self::M>>) -> Result<(), Self::Err>

  1. "simplicity": On one hand this is a lot shorter and does not require anything fancy. On the other hand the "mutable argument you need to interact with in this method call and then return ()" is not a normal Rust pattern.
  2. "compatibility": Not even close.
  3. "minimal cloning": Yes. this should be zero overhead to interact with no matter what your data pattern is. (Except for possibly asink.)
  4. "duplicate": Yes.
  5. "future-proof": Yes. We can easily add arbitrary methods to DependencyVisitor without breaking anyone.
  6. "unrepresentability": No.

So I've gotten to the end, and I still don't know what I was trying to say. It seems to have turned into a brainstorming about #148. One take away is that eventually we will probably grow a SimplifiedDependencyProvider with an api like 0.2 that can be used if the full complexity of DependencyProvider is not needed.
In the time I spent writing this Matthieu commented that the additional API complexity is not a critical blocker. So we should probably merge this PR with its original configuration. But we should do so recognizing that there are probably other changes that will be needed. This is not a "one-way door", it is totally okay to merge this and then make more changes.

@mpizenberg
Copy link
Member

One take away is that eventually we will probably grow a SimplifiedDependencyProvider with an api like 0.2

I’d think the same, so as long as it’s possible to rewrite a v0.2-like api from v0.3 I don’t think your point (2) will be an issue.

What if we add a new Enum for "future-proof". Basically combining this PR with #124. Something like ) -> Result<Dependencies<impl IntoIterator<Item = Requirement<Self::P, Self::VS>> + Clone, Self::M>, Self::Err,> where Requirement is Required | Constrained.

I remember we discussed a lot about this. My point of view is that we should base this type of changes on the theory (incompatibilities). I think this type of change, enabling dependencies to provide more fine-grain type of incompatibilities, instead of just the shape {a, not b} should be approached holistically, not iteratively with enums variants appended one after the other.

But also, I’m also not worried about compatibility, because just like it’s possible to make v0.2-like wrappers when v0.3 is released, it can be exactly the same for v0.4 because this is a strict superset of capabilities.

@Eh2406
Copy link
Member

Eh2406 commented Jun 5, 2024

I think this type of change, enabling dependencies to provide more fine-grain type of incompatibilities, instead of just the shape {a, not b} should be approached holistically, not iteratively with enums variants appended one after the other.

I don't quite follow. Trying to figure out what you had in mind. If we want to provide direct access to the solver, that will probably involve public access to "state" which is probably worth doing but its own headache. Even if we did that, we would still want DependencyProvider to be quite flexible. So I don't think it helps me answer the question what API DependencyProvider should offer. Taking your comment in different direction, any approach that succeeds at "future-proof" can always add a direct "raw incompatibility" version (either in a non-variant "Raw" or a .add_raw_incompatibility() depending). Which seems like even more reason to figure out the future proofing story. Even if we provided such a raw API, I would want to have some amount of API that makes it clear what is "a pattern we know works" and what is "feel free to try it and see what happens". Perhaps by having named variance or named callbacks or by having the constructors of incompatibilities for the known patterns. So those are some thoughts I'm having. What did you have in mind by "approached holistically"?

@mpizenberg
Copy link
Member

What I mean by "holistically" is that all possibilities should be evaluated and added to the api at the same time. There is a finite number of incompatibility types that can be considered. {a}, {-a}, {a, b}, {a, -b} etc. For all the incompatibilities with 1 element and two elements it’s important to give appropriate names. And for incompatibilities with more than two, I imagine it we can customize some patterns, like "only one positive" or "only one negative", otherwise fallback to asking people to enter raw incompatibilities.

What I’d like to avoid is saying ok now we enable {a, b} and {a, -b}, let’s see in the future if we need something else. Instead, it would make sense to just verify that with the past two years of optimizations we didn’t break a fundamental property that would prevent pubgrub from working with any incompatibility provided. And if ok, add these apis in a cohesive fashion.

Let me know if I’ve expressed my opinion better this time.

@konstin konstin changed the title Pass dependencies as iterator instead of a hashmap Pass an iterator to add_incompatibility_from_dependencies to avoid cloning Jun 5, 2024
@Eh2406
Copy link
Member

Eh2406 commented Jun 5, 2024

That makes a lot of sense. Let's get a full list of reasonable possibilities and name/document what they might be useful for before adding flexible APIs.

In office hours @konstin convinced me that avoiding the hash map construction is only an important performance optimization when the index data has been prefetched. If the data actually needs to be loaded from somewhere (whether that's network or disk) reading the data will be so much more expensive then constructing the map to make the optimization irrelevant. Ironically, the only place the optimization is visible is while doing benchmarking.

UV also rools its own resolve loop, and does not use dependency provider. so we agreed that the best next step was to change the API for State and leave the API the same for dependency provider. we will discuss changes to dependency provider in follow-up PR's and after incorporating your holistic approach.

And while I was writing that comment it looks like the new revision is in.

@Eh2406 Eh2406 changed the title Pass an iterator to add_incompatibility_from_dependencies to avoid cloning Pass an iterator to add_incompatibility_from_dependencies Jun 5, 2024
Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently a very small internal change, but it helps UV.

@Eh2406 Eh2406 enabled auto-merge June 5, 2024 18:05
@konstin konstin force-pushed the konsti/dev/dependencies-are-an-iterator branch from 55a73e4 to 08549f3 Compare June 5, 2024 18:06
@Eh2406 Eh2406 added this pull request to the merge queue Jun 5, 2024
Merged via the queue into pubgrub-rs:dev with commit 749abfa Jun 5, 2024
4 checks passed
@konstin konstin deleted the konsti/dev/dependencies-are-an-iterator branch June 5, 2024 18:09
konstin added a commit to astral-sh/uv that referenced this pull request Jun 5, 2024
We had previously changed the signature of `DependencyProvider::get_dependencies` to return an iterator instead of a hashmap to avoid the conversion cost from our dependencies `Vec` to the pubgrub's hashmap. These changes are difficult to make in pubgrub since they complicate the public api. But we don't actually use `DependencyProvider::get_dependencies`, so we rolled those customizations back in pubgrub-rs/pubgrub#226 and instead opted to change only the internal `add_incompatibility_from_dependencies` method that we exposed in our fork. This aligns us closer with upstream, removes the design questions about `DependencyProvider` from our concerns and reduces our diff (not counting the github action) to +36 -12.
konstin added a commit to astral-sh/uv that referenced this pull request Jun 5, 2024
We had previously changed the signature of
`DependencyProvider::get_dependencies` to return an iterator instead of
a hashmap to avoid the conversion cost from our dependencies `Vec` to
the pubgrub's hashmap. These changes are difficult to make in pubgrub
since they complicate the public api. But we don't actually use
`DependencyProvider::get_dependencies`, so we rolled those
customizations back in pubgrub-rs/pubgrub#226
and instead opted to change only the internal
`add_incompatibility_from_dependencies` method that we exposed in our
fork. This aligns us closer with upstream, removes the design questions
about `DependencyProvider` from our concerns and reduces our diff (not
counting the github action) to +36 -12.
@konstin
Copy link
Member Author

konstin commented Jun 5, 2024

To summarize, in the initial version of this PR we were changing DependencyProvider::get_dependencies from returning a hashmap to return an impl IntoIterator<_> to avoid the conversion cost from our dependencies Vec to the pubgrub's hashmap. But in uv, we're not actually using DependencyProvider, we're driving our own solver loop and call add_incompatibility_from_dependencies directly. In the merged version, we only change this internal api to take an iterator instead of a hashmap.

Personally, i don't think the current signature of DependencyProvider::get_dependencies is fine. I'd prioritize releasing 0.3 and the new community engagement that comes with it. While the hashmap clone shows up in benchmarks, in practice you have to be far gone into microoptimizations for this to matter, i'm not even sure if it's noticeable in uv (and if you're that complex and that far, you want to run the solver loop yourself anyway, which is an entirely different interface). It would help a lot to get the changes we already made (associated type, better error handling, both the error type and the custom metadata, the massive speedups, etc.) out so we can get users reporting on how well the apis works for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants