Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(isthmus): improved Calcite support for Substrait Aggregate rels #214

Merged
merged 12 commits into from
Jan 11, 2024

Conversation

vbarua
Copy link
Member

@vbarua vbarua commented Dec 21, 2023

Substrait Aggregates that contain expressions that are not field
references and/or grouping keys that are not in input order require
extra processing to be converted to Calcite Aggregates successfully AND
correctly

@vbarua vbarua force-pushed the vbarua/complex-aggr-expressions-support branch from 5a26b13 to b54175b Compare December 21, 2023 21:56
@vbarua
Copy link
Member Author

vbarua commented Dec 21, 2023

Fixes #155

expr,
"min",
// min output is always nullable
TypeCreator.asNullable(expr.getType()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added arithmetic aggregate functions that take expression directly.


public Expression.ScalarFunctionInvocation divide(Expression left, Expression right) {
return arithmeticFunction("divide", left, right);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to nest some functions within aggregate function calls for testing purposes, so I added some easy ones.

@@ -271,6 +271,7 @@ Aggregate.Measure fromAggCall(RelNode input, Type.Struct inputType, AggregateCal
if (call.filterArg != -1) {
builder.preMeasureFilter(FieldReference.newRootStructReference(call.filterArg, inputType));
}
// TODO: handle the collation on the AggregateCall
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed an issues for this: #215

@vbarua vbarua force-pushed the vbarua/complex-aggr-expressions-support branch from b54175b to ba7b484 Compare December 21, 2023 22:09
Copy link
Contributor

@patientstreetlight patientstreetlight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly had minor style questions/comments, but overall I think this looks good, and very welcome.

@vbarua
Copy link
Member Author

vbarua commented Dec 26, 2023

To briefly summarise what this PR is about. Calcite has a more restrictive definition of aggregate relations than Substrait does, which means that not all Substrait aggregates can be converted to Calcite aggregates. The primary restriction is that Calcite aggregates can only take field references in various places that Substrait allows full expressions.

Luckily, this is easy to address by simply moving those expressions to a project before the aggregate, and then updating the aggregate to use references.

The various comments in the PreCalciteAggregateValdidator try and document these various restrictions in more detail.

@vbarua
Copy link
Member Author

vbarua commented Dec 26, 2023

Addresses #155

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbarua I am not very fluent with the Calcite component, so I will leave someone else to verify this part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: I see the expected output is hand-calculated (meaning determined by ourselves), is there a way to use Calcite to get the optimal and validate with it? This is an out of scope area for me, but just curious if we can do things differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation code that I'm testing here is purely going from Substrait -> Substrait. My intent with these tests was to demonstrate that the transformToValidCalciteAggregate(aggregate) produces equivalent Substrait plans, no Calcite required.

Calcite only becomes relevant in the validation code when we call

    // Substrait POJO -> Calcite
    new SubstraitToCalcite(EXTENSION_COLLECTION, typeFactory).convert(pojo);

which tries to convert the given input to a Calcite plan. One thing that's hard to show here is that if you comment out

    if (!PreCalciteAggregateValidator.isValidCalciteAggregate(aggregate)) {
      aggregate =
          PreCalciteAggregateValidator.PreCalciteAggregateTransformer
              .transformToValidCalciteAggregate(aggregate);
    }

in the SubstraitRelNodeConverter#visit(Aggregate aggregate) method all of these fail.

is there a way to use Calcite to get the optimal and validate with it?

Something that's kind of cursed about this is that Calcite already does something like this, but only when you give it SQL input. When it converts from its SqlNode representation to the RelNode representation it applies similar transforms to this.

I don't think copying exactly what Calcite does is a good idea here, because I'd rather perform straightforward (if verbose) transformation here that may not necessarily be the best but is easier to reason about. This also happens before any actual optimizations a user would perform with Calcite, as this transformation is just to allow us to convert the Substrait plan into a Calcite representation.

Does this answer what you were asking?

@vibhatha
Copy link
Contributor

vibhatha commented Jan 5, 2024

@vbarua seems like there is a conflict.

Copy link
Contributor

@vibhatha vibhatha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbarua Thanks for working on this. The overall change LGTM!
Though I would appreciate someone with better Calcite knowledge take a look.

@vbarua vbarua force-pushed the vbarua/complex-aggr-expressions-support branch from a7f5982 to 565bde0 Compare January 5, 2024 21:39
@vbarua
Copy link
Member Author

vbarua commented Jan 11, 2024

I'm going to merge these changes. They will only take effect for substrait plans that would previously have resulted in exceptions, so they shouldn't be surprising anyone.

@vbarua vbarua merged commit 1689c93 into main Jan 11, 2024
8 checks passed
@vbarua vbarua deleted the vbarua/complex-aggr-expressions-support branch January 11, 2024 17:50
ajegou pushed a commit to ajegou/substrait-java that referenced this pull request Mar 29, 2024
…ubstrait-io#214)

Substrait Aggregates that contain expressions that are not field
references and/or grouping keys that are not in input order require
extra processing to be converted to Calcite Aggregates successfully AND
correctly

BREAKING CHANGE: signatures for aggregate building utils have changed

* feat: additional builder methods for arithmetic aggregate functions
* feat: sortField builder method
* feat: grouping builder method
* feat: add, subtract, multiply, divide and negate methods for builder
* refactor: extract row matching assertions to PlanTestBase
* feat(isthmus): improved Calcite support for Substrait Aggregate rels
* refactor: builder functions for aggregates and aggregate functions now
consume and return Aggregate.Measure instead of
AggregateFunctionInvocation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants