Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Datum comparator #1545

Merged
merged 3 commits into from
Aug 21, 2024
Merged

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Aug 12, 2024

Relevant Issues

Description

  • Adds a performant comparator for Datum. Please read the Javadocs associated with the comparator.
  • Replaced all usages of the PartiQLValue.comparator() in favor of Datum.comparator().
    • This should significantly speed up evaluation of aggregations, ordering, etc.
    • The only caveat is for the conformance tests -- I believe Robert was working on replacing the use of PartiQLValue with Datum with the Ion reader, so I didn't switch the comparator there.
  • Passes all tests previously passing.

Other Information

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@johnedquinn johnedquinn force-pushed the v1-datum-comparator branch 3 times, most recently from 0bbc039 to b694c6a Compare August 13, 2024 18:26
@johnedquinn johnedquinn marked this pull request as ready for review August 13, 2024 18:38
@johnedquinn johnedquinn requested a review from alancai98 August 13, 2024 18:42
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! The new factoring of the datum comparator looks to be more efficient than the previous versions of our value comparator. Left some questions and other minor comments.

// TODO: Add support for a datum comparator once the accumulator passes datums instead of PartiQL values.
@OptIn(PartiQLValueExperimental::class)
private val seen = TreeSet(PartiQLValue.comparator())
private val seen = TreeSet(Datum.comparator())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we go over why a TreeSet needs to be used rather than a HashSet? If a user of Datum wishes to store a set of Datums, we should make it clear in the docs and examples that a TreeSet must be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll add a comment. But, it comes down to the lack of support for hashCode and equals. The reason we can't support them outright for aggregations/grouping is because of how the PartiQL Specification dictates that the grouping should act similarly to the < operator. This operator compares the arithmetic values of INT, BIGINT, etc -- so a simple hashCode/equals is non-trivial. It may be fast to coerce values into some larger type specifically for aggregations -- but this requires further research.

So, a TreeSet is necessary since we can pass in the comparator.

Comment on lines +16 to +22
/**
* This class allows for the comparison between two {@link Datum}s. This is internally implemented by constructing
* a comparison table, where each cell contains a reference to a {@link DatumComparison} to compute the comparison.
* The table's rows and columns are indexed by the {@link PType.Kind#ordinal()}. The first dimension matches the
* left-hand-side's type of {@link #compare(Datum lhs, Datum rhs)}. The second dimension matches the right-hand-side's
* type of {@link #compare(Datum lhs, Datum rhs)}. As such, this implementation allows for O(1) comparison of scalars.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if you had ran any benchmarks comparing the performance between PartiQLValueComparator and the DatumComparator. If DatumComparator is significantly more performant than the previous PartiQLValueComparator, it could be a good callout for our upcoming v1 release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just finished running a set of benchmarks that shadow this PR and the V1.0.0-perf.1 release. The benchmarks compare evaluation of aggregations, distinct, order by, and comparison operators for the latest engine (this PR), the v1.0.0-perf.1 engine, and the legacy engine.

See the benchmarks at this tag. See the results here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, to hint: it's much faster. In 2 of the 3 benchmarks, the new implementation is over 2x faster than v1.0.0-perf.1 and over 5x* faster than the legacy implementation.

To get a better overall view of how much the comparator itself is contributing to the performance, it could be a good idea to benchmark AND profile the comparator in action. For now though, we have a good glimpse as for how Datum, PType, and DatumComparator in coordination are improving performance.

The * above is discussed in the results link above. The legacy comparator is hard to quantify due to its immediate materialization of aggregations, but preliminary analysis indicates a huge performance improvement nonetheless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for creating the benchmarks and tests! The preliminary analysis looks good. Those statements seem like a good starting point for any future performance benchmarking test suite. Ahead of the v1 release, we should definitely benchmark more queries to show off the performance benefit of the new evaluator.

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@johnedquinn johnedquinn merged commit eda118f into partiql:v1 Aug 21, 2024
7 checks passed
@johnedquinn johnedquinn deleted the v1-datum-comparator branch August 21, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants