-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Datum comparator #1545
Adds Datum comparator #1545
Conversation
0bbc039
to
b694c6a
Compare
partiql-spi/src/test/kotlin/org/partiql/eval/value/DatumComparatorTest.kt
Outdated
Show resolved
Hide resolved
test/partiql-tests-runner/src/test/kotlin/org/partiql/runner/test/TestRunner.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! The new factoring of the datum comparator looks to be more efficient than the previous versions of our value comparator. Left some questions and other minor comments.
// TODO: Add support for a datum comparator once the accumulator passes datums instead of PartiQL values. | ||
@OptIn(PartiQLValueExperimental::class) | ||
private val seen = TreeSet(PartiQLValue.comparator()) | ||
private val seen = TreeSet(Datum.comparator()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we go over why a TreeSet
needs to be used rather than a HashSet
? If a user of Datum
wishes to store a set of Datum
s, we should make it clear in the docs and examples that a TreeSet
must be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll add a comment. But, it comes down to the lack of support for hashCode and equals. The reason we can't support them outright for aggregations/grouping is because of how the PartiQL Specification dictates that the grouping should act similarly to the <
operator. This operator compares the arithmetic values of INT, BIGINT, etc -- so a simple hashCode/equals is non-trivial. It may be fast to coerce values into some larger type specifically for aggregations -- but this requires further research.
So, a TreeSet is necessary since we can pass in the comparator.
partiql-spi/src/test/kotlin/org/partiql/eval/value/DatumComparatorTest.kt
Outdated
Show resolved
Hide resolved
partiql-spi/src/main/java/org/partiql/eval/value/DatumComparator.java
Outdated
Show resolved
Hide resolved
/** | ||
* This class allows for the comparison between two {@link Datum}s. This is internally implemented by constructing | ||
* a comparison table, where each cell contains a reference to a {@link DatumComparison} to compute the comparison. | ||
* The table's rows and columns are indexed by the {@link PType.Kind#ordinal()}. The first dimension matches the | ||
* left-hand-side's type of {@link #compare(Datum lhs, Datum rhs)}. The second dimension matches the right-hand-side's | ||
* type of {@link #compare(Datum lhs, Datum rhs)}. As such, this implementation allows for O(1) comparison of scalars. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if you had ran any benchmarks comparing the performance between PartiQLValueComparator
and the DatumComparator
. If DatumComparator
is significantly more performant than the previous PartiQLValueComparator
, it could be a good callout for our upcoming v1
release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just finished running a set of benchmarks that shadow this PR and the V1.0.0-perf.1 release. The benchmarks compare evaluation of aggregations, distinct, order by, and comparison operators for the latest engine (this PR), the v1.0.0-perf.1
engine, and the legacy engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, to hint: it's much faster. In 2 of the 3 benchmarks, the new implementation is over 2x faster than v1.0.0-perf.1
and over 5x* faster than the legacy implementation.
To get a better overall view of how much the comparator itself is contributing to the performance, it could be a good idea to benchmark AND profile the comparator in action. For now though, we have a good glimpse as for how Datum, PType, and DatumComparator in coordination are improving performance.
The *
above is discussed in the results link above. The legacy comparator is hard to quantify due to its immediate materialization of aggregations, but preliminary analysis indicates a huge performance improvement nonetheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for creating the benchmarks and tests! The preliminary analysis looks good. Those statements seem like a good starting point for any future performance benchmarking test suite. Ahead of the v1
release, we should definitely benchmark more queries to show off the performance benefit of the new evaluator.
b694c6a
to
5a00560
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Relevant Issues
Description
PartiQLValue.comparator()
in favor ofDatum.comparator()
.Other Information
and Code Style Guidelines? YES
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.