-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(isthmus): improved Calcite support for Substrait Aggregate rels #214
Conversation
5a26b13
to
b54175b
Compare
Fixes #155 |
expr, | ||
"min", | ||
// min output is always nullable | ||
TypeCreator.asNullable(expr.getType())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added arithmetic aggregate functions that take expression directly.
|
||
public Expression.ScalarFunctionInvocation divide(Expression left, Expression right) { | ||
return arithmeticFunction("divide", left, right); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to nest some functions within aggregate function calls for testing purposes, so I added some easy ones.
isthmus/src/main/java/io/substrait/isthmus/AggregateValidator.java
Outdated
Show resolved
Hide resolved
@@ -271,6 +271,7 @@ Aggregate.Measure fromAggCall(RelNode input, Type.Struct inputType, AggregateCal | |||
if (call.filterArg != -1) { | |||
builder.preMeasureFilter(FieldReference.newRootStructReference(call.filterArg, inputType)); | |||
} | |||
// TODO: handle the collation on the AggregateCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issues for this: #215
b54175b
to
ba7b484
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly had minor style questions/comments, but overall I think this looks good, and very welcome.
isthmus/src/main/java/io/substrait/isthmus/AggregateValidator.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/AggregateValidator.java
Outdated
Show resolved
Hide resolved
To briefly summarise what this PR is about. Calcite has a more restrictive definition of aggregate relations than Substrait does, which means that not all Substrait aggregates can be converted to Calcite aggregates. The primary restriction is that Calcite aggregates can only take field references in various places that Substrait allows full expressions. Luckily, this is easy to address by simply moving those expressions to a project before the aggregate, and then updating the aggregate to use references. The various comments in the |
Addresses #155 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbarua I am not very fluent with the Calcite component, so I will leave someone else to verify this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: I see the expected output is hand-calculated (meaning determined by ourselves), is there a way to use Calcite to get the optimal and validate with it? This is an out of scope area for me, but just curious if we can do things differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transformation code that I'm testing here is purely going from Substrait -> Substrait. My intent with these tests was to demonstrate that the transformToValidCalciteAggregate(aggregate)
produces equivalent Substrait plans, no Calcite required.
Calcite only becomes relevant in the validation code when we call
// Substrait POJO -> Calcite
new SubstraitToCalcite(EXTENSION_COLLECTION, typeFactory).convert(pojo);
which tries to convert the given input to a Calcite plan. One thing that's hard to show here is that if you comment out
if (!PreCalciteAggregateValidator.isValidCalciteAggregate(aggregate)) {
aggregate =
PreCalciteAggregateValidator.PreCalciteAggregateTransformer
.transformToValidCalciteAggregate(aggregate);
}
in the SubstraitRelNodeConverter#visit(Aggregate aggregate)
method all of these fail.
is there a way to use Calcite to get the optimal and validate with it?
Something that's kind of cursed about this is that Calcite already does something like this, but only when you give it SQL input. When it converts from its SqlNode representation to the RelNode representation it applies similar transforms to this.
I don't think copying exactly what Calcite does is a good idea here, because I'd rather perform straightforward (if verbose) transformation here that may not necessarily be the best but is easier to reason about. This also happens before any actual optimizations a user would perform with Calcite, as this transformation is just to allow us to convert the Substrait plan into a Calcite representation.
Does this answer what you were asking?
@vbarua seems like there is a conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbarua Thanks for working on this. The overall change LGTM!
Though I would appreciate someone with better Calcite knowledge take a look.
Substrait Aggregates that contain expressions that are not field references and/or grouping keys that are not in input order require extra processing to be converted to Calcite Aggregates successfully AND correctly
consume and return Aggregate.Measure instead of AggregateFunctionInvocation BREAKING CHANGE: signatures for aggregate building utils have changed
a7f5982
to
565bde0
Compare
I'm going to merge these changes. They will only take effect for substrait plans that would previously have resulted in exceptions, so they shouldn't be surprising anyone. |
…ubstrait-io#214) Substrait Aggregates that contain expressions that are not field references and/or grouping keys that are not in input order require extra processing to be converted to Calcite Aggregates successfully AND correctly BREAKING CHANGE: signatures for aggregate building utils have changed * feat: additional builder methods for arithmetic aggregate functions * feat: sortField builder method * feat: grouping builder method * feat: add, subtract, multiply, divide and negate methods for builder * refactor: extract row matching assertions to PlanTestBase * feat(isthmus): improved Calcite support for Substrait Aggregate rels * refactor: builder functions for aggregates and aggregate functions now consume and return Aggregate.Measure instead of AggregateFunctionInvocation
Substrait Aggregates that contain expressions that are not field
references and/or grouping keys that are not in input order require
extra processing to be converted to Calcite Aggregates successfully AND
correctly