[BugFix] Fix statistics agg functions to return NULL incorrectly (backport #47904) #48339
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
Return NULL incorrectly
For now, the result nullable property of aggregation functions contains two cases:
is_output_nullable
withfalse
value to BE, such as such as count, count distinct, and bitmap_union_int.is_output_nullable
withtrue
value to BE. (NOTE that BE need determine whether the result is nullable by itself!)NULl
only when all the input rows areNULL
.This is OK, until the statistics aggregations occur. The reason is that the output of statistics aggregations is always nullable no matter what the input is, because they will return
NULL
when the number of input non-null rows is less than 2.COVARIANCE
According to Wikipedia, when merging two COVARIANCE states$C_A$ and $C_B$ , the formula should be as follows:
but we use a wrong one as follows:
where$C_n$ is defined as follows:
$$C_n = \sum_{i=1}^{n} (x_i - \overline{x}_n)(y_i - \overline{y}_n)$$
What I'm doing:
AggNullPred(State)
toNullableAggregate
.For now,
NullableAggregate
only returns null when the input is null, but the statistics function will returnNULL
when the number of input non-null rows is less than 2.Therefore, we need to add a new parameter
AggNullPred
to determine whether the output is nullable.And this is zero overhead when it is passed as a constexpr functor
AggNonNullPred
by default, since the compiler will eliminate this always false predicate.is_result_nullable
anduse_nullable_fn
according touse_intermediate_as_output
,is_input_nullable
,is_output_nullable
andis_always_nullable_result
.Fixes #47762.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #47904 done by [Mergify](https://mergify.com). ## Why I'm doing:
Return NULL incorrectly
For now, the result nullable property of aggregation functions contains two cases:
is_output_nullable
withfalse
value to BE, such as such as count, count distinct, and bitmap_union_int.is_output_nullable
withtrue
value to BE. (NOTE that BE need determine whether the result is nullable by itself!)NULl
only when all the input rows areNULL
.This is OK, until the statistics aggregations occur. The reason is that the output of statistics aggregations is always nullable no matter what the input is, because they will return
NULL
when the number of input non-null rows is less than 2.COVARIANCE
According to Wikipedia, when merging two COVARIANCE states$C_A$ and $C_B$ , the formula should be as follows:
but we use a wrong one as follows:
where$C_n$ is defined as follows:
$$C_n = \sum_{i=1}^{n} (x_i - \overline{x}_n)(y_i - \overline{y}_n)$$
What I'm doing:
AggNullPred(State)
toNullableAggregate
.For now,
NullableAggregate
only returns null when the input is null, but the statistics function will returnNULL
when the number of input non-null rows is less than 2.Therefore, we need to add a new parameter
AggNullPred
to determine whether the output is nullable.And this is zero overhead when it is passed as a constexpr functor
AggNonNullPred
by default, since the compiler will eliminate this always false predicate.is_result_nullable
anduse_nullable_fn
according touse_intermediate_as_output
,is_input_nullable
,is_output_nullable
andis_always_nullable_result
.Fixes #47762.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist: