Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial working example of lending iterator for combinations. #682

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Easyoakland
Copy link
Contributor

I was working on another project and realized that 30% of the runtime was the itertools .combinations() adapter allocating vectors on the heap.

Added an implementation for a lending iterator that is able to avoid heap allocations with a lending iterator and completely eliminated the heap allocation speed penalties. It called the adapter combinations_lending. I also added an optional feature that can be enabled to utilize this and maybe in the future other lending iterators.

Added benchmarks comparing .combinations() and .combinations_lending(). They show the lending implementation is ~1x-7x speed depending on use case. I did not measure a noticeable speed penalty in the benchmarks so a future change could theoretically implement the normal .combinations() iterator in terms of the lending iterator's .into_iter() method.

Feature allows for conditional compilation so those who don't want a lending iterator don't need to compile the related dependencies.

More info in the source code documentation.

Let me know what you think!

Added benchmarks comparing `combinations` and `combinations_lending`.
They show the lending implementation is ~1x-7x speed.

Feature works either on or off for conditional compilation.
@Easyoakland Easyoakland force-pushed the initial_lending_iterator_support branch from f8b22e7 to 0eefe16 Compare March 10, 2023 04:08
Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, thanks for the effort you put into this.

Here's my two cents (personal - so no final word):

  • I saw that the PR contains quite some formatting/documentation changes. You can surely include them, but it will simplify the final reviewer's life when they're in a separate commit.
  • I am always wary when it comes to including another crate - even if it is behind a feature flag, it is code to properly maintain and test.
  • If you are looking for maximum performance, I am not sure whether the "manipulate-indices and collect indexed items"-approach is the best. It may be, but it may also be faster to manipulate items (or references to items) instead. (E.g. if you really just want combinations from the Vec containing 0,1,2,3 you could simply work on the numbers directly.) Did you happen to research that? Does anybody reading this have relevant experience?

As per the above, merging this PR might be a good idea, but before that we should really find out what we want to offer. Right now the raison d'être of combinations seems to be convenience (at the price of some performance penalty). If we want to have its efficient counterpart, I would strive to make it as efficient as possible (possibly even sacrificing some convenience). I would vote against having something in between.

}
}

#[gat]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never seen this attribute. Can you shed light on what it does?

Copy link
Contributor Author

@Easyoakland Easyoakland Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It enables usage of generic associated types in stable rust.
It is defined here in nougat and used here in lending_iterator. I believe it has to do with higher kinded types
It is what makes it possible to depend on the 'next lifetime in the following code. (I don't know how to link the location so here is a copy-paste)

     I: Iterator,
        I::Item: Clone,
    {
        type Item<'next>
        where
            Self: 'next,
        = Combination<'next, I>;

Comment on lines 185 to 224
fn next(&mut self) -> Option<Combination<I>> {
if self.first {
if self.k() > self.n() {
return None;
}
self.first = false;
} else if self.indices.is_empty() {
return None;
} else {
// Scan from the end, looking for an index to increment
let mut i: usize = self.indices.len() - 1;

// Check if we need to consume more from the iterator
if self.indices[i] == self.pool.len() - 1 {
self.pool.get_next(); // may change pool size
}

while self.indices[i] == i + self.pool.len() - self.indices.len() {
if i > 0 {
i -= 1;
} else {
// Reached the last combination
return None;
}
}

// Increment index, and reset the ones to its right
self.indices[i] += 1;
for j in i + 1..self.indices.len() {
self.indices[j] = self.indices[j - 1] + 1;
}
}

// Create result vector based on the indices
// let out: () = Some(self.indices.iter().map(|i| self.pool[*i].clone()));
Some(Combination {
combinations: &*self,
index: 0,
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this logic is non-trivial, it should possibly be shared with the existing combinations iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7839cd3

@Easyoakland
Copy link
Contributor Author

I saw that the PR contains quite some formatting/documentation changes. You can surely include them, but it will simplify the final reviewer's life when they're in a separate commit.

I'm not sure I understand this point. Are you saying I should have separated out the comments and documentation additions to the new code from the new code being commented on?

I am always wary when it comes to including another crate - even if it is behind a feature flag, it is code to properly maintain and test.

That is the why there are automated tests, no? If in a worst case situation all the code relying on the dependency broke the remaining features would still work. It seems like there is not cost beyond the bug risk inherent in any code written.

If you are looking for maximum performance, I am not sure whether the "manipulate-indices and collect indexed items"-approach is the best.

I also do not know if the implementation is the best. I did not change the logic of how combinations are computed. Though it seems worth noting that the indexed items are not being collected as a group in the LendingIterator implementation. That's the point of the new lending implementation. An iterator is returned which returns individual copies instead of a collected Vec.

It may be, but it may also be faster to manipulate items (or references to items) instead. (E.g. if you really just want combinations from the Vec containing 0,1,2,3 you could simply work on the numbers directly.) Did you happen to research that? Does anybody reading this have relevant experience?

My thoughts on this are:

  • The combinations adapter appears generic over items that it is iterating with the only restriction being that the items implement Clone. The vec![0,1,2,3] situation of iterating over items directly would only work if this was limited to Copy items and I'm not sure it would be faster.
  • Is the current implementation of manipulating indices (usize numbers) really different from manipulating references somehow? How would references be manipulated? Aren't references just pointers which are themselves usize?
  • I don't fully understand LazyBuffer so I don't know if this could be changed to contain references instead of actual items. I also don't know if you would want this extra layer of indirection.
  • If the items of a collection are too unwieldy to clone directly (or don't implement clone at all) one can do collection.iter().combinations_lending(k) to make the items of iteration references, and cloning references is as cheap as it gets. If the collection is of elements that can be easily cloned then collection.into_iter().combinations_lending(k) can be used instead to avoid an unnecessary level of indirection. Best of both worlds.

The below code was a check I used to confirm that .iter() will result in clones of references. Notice how the function that requires a cloneable item receives a reference to a non-clonable struct and compiles fine.

fn need_cloneable<T: Clone>(y: T) -> T {
    y.clone()
}

#[derive(Debug)]
struct NotCloneableStruct(usize, Vec<usize>);

fn main() {
    let c = vec![
        NotCloneableStruct(1, vec![1, 1]),
        NotCloneableStruct(2, vec![2, 2]),
        NotCloneableStruct(3, vec![3, 3]),
        NotCloneableStruct(4, vec![4, 4]),
    ];

    c.iter()
        .map(|x: &NotCloneableStruct| {
            dbg!(&x);
            dbg!(x.clone());
            dbg!(x);
            dbg!(need_cloneable(x)); // All references implement Clone because of core's impl<T> Clone for &T.
        })
        .for_each(drop);
}

As per the above, merging this PR might be a good idea, but before that we should really find out what we want to offer. Right now the raison d'être of combinations seems to be convenience (at the price of some performance penalty). If we want to have its efficient counterpart, I would strive to make it as efficient as possible (possibly even sacrificing some convenience). I would vote against having something in between.

If I saw a way to further sacrifice convenience for performance I would have done so. I don't see how this would be done.

Combinations lending and non-lending use same function now.
@phimuemue
Copy link
Member

I'm not sure I understand this point. Are you saying I should have separated out the comments and documentation additions to the new code from the new code being commented on?

No, I'm talking about things as https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L18-R27 or https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L48-R62 or https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L53-R108.

I also do not know if the implementation is the best. I did not change the logic of how combinations are computed. Though it seems worth noting that the indexed items are not being collected as a group in the LendingIterator implementation. That's the point of the new lending implementation. An iterator is returned which returns individual copies instead of a collected Vec.

I saw that you do not collect into a Vec - I like that.

  • Is the current implementation of manipulating indices (usize numbers) really different from manipulating references somehow? How would references be manipulated? Aren't references just pointers which are themselves usize?

With indices you'll have to look up the item from the buffer. That is, you essentially have to add the index to the begin-pointer of the buffer. Handling references directly could skip this addition, thus possibly save one indirection. (As always, compilers may be clever enough to optimize some things. And - on top of that - I did not benchmark; it just came to my mind.)

If I saw a way to further sacrifice convenience for performance I would have done so. I don't see how this would be done.

I could imagine something like fn combinations_inplace(items: &mut [Item]) -> CombinationsInplace where CombinationsInplace works directly on the slice items - without using indices. I.e. CombinationsInplace would repeatedly call the combinations-counterpart of something like C++'s std::next_permutation, and yield a slice of the first k elements of items. This way, you'd force the caller to first create a mutable slice of items (read: less convenience), but could possibly exploit clever techniques to make it more efficient.

Comment on lines 914 to 969
#[test]
#[cfg(feature = "lending_iters")]
fn combinations_lending_feature_parity_to_non_lending() {
assert!((1..3).combinations_lending(5).next().is_none());

let it = (1..3).combinations_lending(2).map_into_iter(|x| x.collect_vec()).collect::<Vec<_>>();
it::assert_equal(it, vec![
vec![1, 2],
]);

let mut out = Vec::new();
for i in (1..3).combinations_lending(2) {
out.push(i);
}
it::assert_equal(out, vec![vec![1, 2]]);

let it = (1..5).combinations_lending(2).collect_nested_vec();
it::assert_equal(it, vec![
vec![1, 2],
vec![1, 3],
vec![1, 4],
vec![2, 3],
vec![2, 4],
vec![3, 4],
]);

it::assert_equal((0..0).combinations_lending(2).map_into_iter(|x| x.collect_vec()).collect_vec(), <Vec<Vec<_>>>::new());

it::assert_equal((0..1).combinations_lending(1).collect_nested_vec(), vec![vec![0]]);
it::assert_equal((0..2).combinations_lending(1).collect_nested_vec(), vec![vec![0], vec![1]]);
it::assert_equal((0..2).combinations_lending(2).collect_nested_vec(), vec![vec![0, 1]]);

for i in 1..10 {
assert!((0..0).combinations_lending(i).next().is_none());
assert!((0..i - 1).combinations_lending(i).next().is_none());

}

it::assert_equal((1..3).combinations_lending(0), vec![vec![]]);
it::assert_equal((0..0).combinations_lending(0), vec![vec![]]);

it::assert_equal((1..3).combinations(0), (1..3).combinations_lending(0));
it::assert_equal((0..0).combinations(0), (0..0).combinations_lending(0));
assert_eq!((1..3).combinations(0).collect_vec(), (1..3).combinations_lending(0).collect_nested_vec()); // Should match exactly including type.
it::assert_equal((0..0).combinations(0), (0..0).combinations_lending(0));
}

// Below shouldn't compile because of attempt to reference an already mut reference.
// #[test]
// fn combinations_lending_cant_double_mut() {
// let mut out = (1..4).combinations_lending(2);
// let mut combination = out.next().unwrap();
// let combination2 = out.next().unwrap();
// combination.next();
// }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use property based testing heavily in itertools. Here, the property that you want to check is that alike invocations of combinations and combinations_lending will produce alike outputs. Could you formulate this as a quickcheck property based test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ce1e22a.

@@ -27,6 +27,7 @@ test = false

[dependencies]
either = { version = "1.0", default-features = false }
lending-iterator = { version = "0.1.6", optional = true}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielhenrymantilla This feature relies heavily on your lending-iterator crate. Do you think this would be appropriate functionality to lending-iterator? I think the performance story presented by the PR is compelling enough that this functionality should be somewhere, but I'm not yet sure where.

I share @phimuemue's hesitation for adding a new public dependency to itertools. The bar for adding dependencies, especially public ones, is very high. Itertools only has one dependency (either), and it's maintained by a maintainer of itertools

First, we need to be sure that new dependencies will not pose a risk of breakage for our users. At minimum, new dependencies must adhere to the itertools MSRV policy. As a foundational crate in the Rust ecosystem, we treat MSRV increases as breaking changes; our dependencies should do so too. We also maintain a very conservative MSRV of 1.36; adding lending-iterator as a dependency would require increasing that to 1.57.

Second, we need to be confident that new dependencies (and features, really) will not increase our maintenance burden too much. If we accept this PR, I think it would be reasonable to expect that we will eventually provide lending alternatives of many of our other iterator adapters. I don't know about @phimuemue, but I don't have any experience with lending-iterator. We'll need to consider whether this is a road we want itertools to go down.

@scottmcm
Copy link
Contributor

I think this needs a strong reason that these adapters shouldn't just be in the lending-iterator crate directly.

My instinct is that itertools itself shouldn't provide any lending iterators until there's a lending iterator trait in core.

@Easyoakland
Copy link
Contributor Author

@scottmcm @jswrenn The reason I though to implement this functionality here is that the actual logic of the lending iterators is almost identical. The new feature uses the same logic and the state is held in the same struct. It would be a shame if this would have to be implemented in @danielhenrymantilla 's lending-iterator crate because that would require copy-pasting most of itertools into lending-iterator for all the iterators from itertools that have an effective LendingIterator implementation. Multiple iterators in itertools can benefit from the same lending re-implementation while not changing the actual logic of the iterators.

Additionally the into_iterator definition can only be as simple as this because the iterator and lending_iterator implementations use the same struct and logic with a flag to indicate which is in use. If this wasn't implemented in this crate then the PhantomData trick to make usage of either implementation simple and unambiguous would not work.

It seems unlikely to me that the best solution to this issue would be to have two of every iterator in two different crates with the exact same struct and nearly identical methods, but I don't see how else to implement this if it is merged in a different crate.

@Easyoakland
Copy link
Contributor Author

@phimuemue

I'm not sure I understand this point. Are you saying I should have separated out the comments and documentation additions to the new code from the new code being commented on?

No, I'm talking about things as https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L18-R27 or https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L48-R62 or https://github.com/rust-itertools/itertools/pull/682/files#diff-01a3ceaed8205eb8f50a269c77b7888890da04e982c9c793943743a1ee4471c3L53-R108.

Oh I understand now. That looks like rustfmt. I forgot I had it set to auto format on save. I will endeavor to format in a separate commit for the future. Thank you.

I also do not know if the implementation is the best. I did not change the logic of how combinations are computed. Though it seems worth noting that the indexed items are not being collected as a group in the LendingIterator implementation. That's the point of the new lending implementation. An iterator is returned which returns individual copies instead of a collected Vec.

I saw that you do not collect into a Vec - I like that.

  • Is the current implementation of manipulating indices (usize numbers) really different from manipulating references somehow? How would references be manipulated? Aren't references just pointers which are themselves usize?

With indices you'll have to look up the item from the buffer. That is, you essentially have to add the index to the begin-pointer of the buffer. Handling references directly could skip this addition, thus possibly save one indirection. (As always, compilers may be clever enough to optimize some things. And - on top of that - I did not benchmark; it just came to my mind.)

How would one handle references directly? Wouldn't you need to store in an array/buffer/vec the references that were being manipulated anyway? I could very well be misunderstanding but it seems like the only difference between what you describe and what is implemented is that the current implementation will handle references if the iterator is over references and actual values if the iterator is over values. Rather than always references of what is being iterated over. Is the current implementation not preferable?

If I saw a way to further sacrifice convenience for performance I would have done so. I don't see how this would be done.

I could imagine something like fn combinations_inplace(items: &mut [Item]) -> CombinationsInplace where CombinationsInplace works directly on the slice items - without using indices. I.e. CombinationsInplace would repeatedly call the combinations-counterpart of something like C++'s std::next_permutation, and yield a slice of the first k elements of items. This way, you'd force the caller to first create a mutable slice of items (read: less convenience), but could possibly exploit clever techniques to make it more efficient.

The c++ implementation for permutations can use the fact that they define the ordering to be the "set of all permutations is ordered lexicographically with respect to operator<" to determine the next permutation from the current. What would the definition of the combinations ordering be? Also, extrapolating what the next combination should be by looking through the current combination seems like it would be slower than keeping track of which combination is next in the iterator's state as is currently the case.

@Easyoakland Easyoakland requested review from jswrenn and removed request for phimuemue March 23, 2023 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants