-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add join_multiset() #804
Conversation
also remove documentation about HalfJoinMultiset, the way to access that now is to use join_multiset()
|
||
/// > 2 input streams of type <(K, V1)> and <(K, V2)>, 1 output stream of type <(K, (V1, V2))> | ||
/// | ||
/// This operator is equivalent to `join` except that the LHS and RHS are collected into multisets rather than sets before joining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm order is preserved I think so multisets might not be the right word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are many (multiset) join algorithms, that seems like a side-effect of the implementation that we may not want to provide as a guarantee, so I wouldn't sweat it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, non-critical
|
||
/// > 2 input streams of type <(K, V1)> and <(K, V2)>, 1 output stream of type <(K, (V1, V2))> | ||
/// | ||
/// This operator is equivalent to `join` except that the LHS and RHS are collected into multisets rather than sets before joining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are many (multiset) join algorithms, that seems like a side-effect of the implementation that we may not want to provide as a guarantee, so I wouldn't sweat it.
/// For example: | ||
/// ```hydroflow | ||
/// lhs = source_iter([("a", 0), ("a", 0)]) -> tee(); | ||
/// rhs = source_iter([("a", 0)]) -> tee(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd encourage you to make the RHS distinguishable to avoid confusion in the example.
rhs = source_iter(["a", "hydro")]) -> tee();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzlk please fix the assert below the code as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to yank the extra functionality out of normal join? (Am I remembering correctly thats how it worked before?)
Or in a separate commit |
I think we can do it later. I removed the documentation for it so people shouldn't use it. The equivalent functionality is now expoed via join_multiset(); And we also need it because the way join_multiset works is just by generating join::(); |
* feat: add join_multiset() also remove documentation about HalfJoinMultiset, the way to access that now is to use join_multiset() * address comments * fix assert
* feat: add join_multiset() also remove documentation about HalfJoinMultiset, the way to access that now is to use join_multiset() * address comments * fix assert
also remove documentation about HalfJoinMultiset, the way to access that now is to use join_multiset()
fixes #802