Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for emit in aggregate relations #122

Merged
merged 16 commits into from
Oct 1, 2024

Conversation

EpsilonPrime
Copy link
Member

@EpsilonPrime EpsilonPrime commented Sep 12, 2024

This adds support for emit in aggregations where there is no more than one grouping section.

This addresses #121 .

@EpsilonPrime EpsilonPrime force-pushed the aggregate_emits branch 2 times, most recently from 01631be to c31787d Compare September 12, 2024 07:51
src/substrait/textplan/PlanPrinterVisitor.cpp Outdated Show resolved Hide resolved
src/substrait/textplan/PlanPrinterVisitor.cpp Outdated Show resolved Hide resolved
auto relationData =
ANY_CAST(std::shared_ptr<RelationData>, actualScope->blob);
const SymbolInfo* symbol{nullptr};
if (isAggregate(currentScope)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer, thanks. It might be worthwhile to include the type of currentScope (and therefore the range of fieldReferences which would have been valid) below in the out of range error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that better error messages are useful but I believe the substrait validator is better suited for helping developers figure what is wrong with their plan generation (allowing this code to be simpler). I've added the type here though.

Comment on lines 633 to 634
// TODO -- Use a more explicit check.
if (!relationData->outputFieldReferences.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of more explicit check were you thinking?

Suggested change
// TODO -- Use a more explicit check.
if (!relationData->outputFieldReferences.empty()) {
if (!relationData->outputFieldReferences.empty()) {
assert(relationData->relation.rel_type_case() == ::substrait::proto::Rel::kAggregate));

I'm not familiar enough with substrait to know at what level aggregate nodes are the only relations with output field references. Do you mean they are the only ones supported by substrait-cpp? The only relation for which textplan will ever provide a representation? I don't think they're the only relations which substrait itself allows to have an output mapping...

I think the comment should explain this in greater depth. If it's the last thing (IE, part of general substrait knowledge) maybe a link to an appropriate section of the substrait doc is in order.

In any case, if this block of code is specific to relationData->relation.rel_type_case() == ::substrait::proto::Rel::kAggregate, it should probably be moved into the switch above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was relation into the comment below. I've replaced the output field references with a check on the incoming proto's type. I did not move it above as the function is currently separated into a section where the fields are populated with a later emit handling step. I can see some potential refactoring where those two parts of this rather long function could be split up.

@@ -316,6 +316,13 @@ bool isRelationEmitDetail(SubstraitPlanParser::Relation_detailContext* ctx) {
nullptr;
}

bool isAggregate(const SymbolInfo* symbol) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two implementations of this function (PlanPrinterVisitor.cpp:74)? The definitions are even different. If the difference is intentional, that's definitely worth a comment or at least renaming the functions to clarify their distinct behaviors. If the difference is not intentional, could this become SymbolInfo::isAggregate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long term there shouldn't be this function at all and just a single type in the representation. Right now the intermediate form can have both types (depending on how it was parsed) which needs to be addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, a comment would definitely be advisable just to give future readers a clue that this is a stopgap and that in the future there should only be RelationType::kAggregate

// TODO -- Add support for aggregate relations.
errorListener_->addError(
token, "Aggregate relations do not yet support emit sections.");
if (relationData->relation.has_aggregate()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to the code at the end of InitialPlanProtoVisitor::updateLocalSchema. Any chance it could be extracted to a helper function?

@@ -374,6 +372,13 @@ comparisonToProto(const std::string& text) {
Expression_Subquery_SetComparison_ComparisonOp_COMPARISON_OP_UNSPECIFIED;
}

bool isAggregate(const SymbolInfo* symbol) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// TODO -- Add support for aggregate relations.
errorListener_->addError(
token, "Aggregate relations do not yet support emit sections.");
if (relationData->relation.has_aggregate()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more ditto

@EpsilonPrime EpsilonPrime requested a review from bkietz September 26, 2024 21:17
@@ -316,6 +316,13 @@ bool isRelationEmitDetail(SubstraitPlanParser::Relation_detailContext* ctx) {
nullptr;
}

bool isAggregate(const SymbolInfo* symbol) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, a comment would definitely be advisable just to give future readers a clue that this is a stopgap and that in the future there should only be RelationType::kAggregate

@EpsilonPrime EpsilonPrime merged commit 68036ca into substrait-io:main Oct 1, 2024
3 checks passed
@EpsilonPrime EpsilonPrime deleted the aggregate_emits branch October 1, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants