-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: improve the documentation for Aggregate code #12617
Conversation
@@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; | |||
|
|||
use hashbrown::raw::RawTable; | |||
|
|||
/// Compare GroupValue Rows column by column | |||
/// A [`GroupValues`] that stores multiple columns of group values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with it and thus the question, what is group values?
is it group keys, or exact values attached for specific key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following example, (1, 'a') is one of the group values, so is (2, 'b') and (3, 'c')
statement ok
create table t(a int, b varchar) as values (1, 'a'), (2, 'b'), (1, 'a'), (3, 'c');
query ITI
select a, b, count(*) from t group by a, b;
----
1 a 2
2 b 1
3 c 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rows implementation, we convert (1, 'a') to row and compare against it. In Column implementation, we compare iteratively from 1 to 'a' in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great example. I added in to the docs
@@ -60,11 +61,13 @@ pub trait ArrayRowEq: Send + Sync { | |||
fn take_n(&mut self, n: usize) -> ArrayRef; | |||
} | |||
|
|||
/// An implementation of [`ArrayRowEq`] for primitive types. | |||
pub struct PrimitiveGroupValueBuilder<T: ArrowPrimitiveType> { | |||
group_values: Vec<T::Native>, | |||
nulls: Vec<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be it a BooleanArray? so this null checks will be faster and in 1 place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are foreshadowing :) This is an excellent point. @jayzhan211 and I are working on exactly this topic. #12623
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its good to me, leaving the final review to @jayzhan211
I would add an example or something on group keys/values, its not very obvious imho
@@ -60,11 +61,13 @@ pub trait ArrayRowEq: Send + Sync { | |||
fn take_n(&mut self, n: usize) -> ArrayRef; | |||
} | |||
|
|||
/// An implementation of [`ArrayRowEq`] for primitive types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was it renamed in #12619?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good call -- will fix
@@ -154,13 +157,14 @@ impl<T: ArrowPrimitiveType> ArrayRowEq for PrimitiveGroupValueBuilder<T> { | |||
} | |||
} | |||
|
|||
/// An implementation of [`ArrayRowEq`] for binary and utf8 types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed?
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
N/A
Rationale for this change
As part of reviewing the PR for #12269 from @jayzhan211 I found some
documentation that would make the code easier to understand.
What changes are included in this PR?
Are these changes tested?
By docs CI
Are there any user-facing changes?
No functional changes, only docs