-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] Remove PartiQLValue from AST; refactor AST literals #1650
Conversation
095bf8d
to
8579b04
Compare
CROSS-ENGINE-REPORT ❌
Testing Details
Result Details
Now FAILING Tests ❌The following 3 test(s) were previously PASSING in BASE but are now FAILING in TARGET: Click here to see
Now IGNORED Tests ❌The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. Now Passing Tests180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-503D653). Before merging, confirm they are intended to pass. The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. CROSS-COMMIT-REPORT ✅
Testing DetailsResult Details
|
8579b04
to
d6212ca
Compare
partiql-ast/src/main/java/org/partiql/ast/literal/LiteralDecimal.java
Outdated
Show resolved
Hide resolved
@NotNull | ||
public final Map<String, PartiQLValue> options; | ||
public final Map<String, Literal> options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) model the EXPLAIN
option map values w/ a Literal
directly. Could also opt for ExprLit
but wasn't sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal makes sense
api(project(":partiql-types")) | ||
// TODO REMOVE ME ONCE PartiQLValue IS REMOVED | ||
// THE AST NEEDS ITS OWN "VALUE" REPRESENTATION | ||
api(project(":partiql-spi")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) partiql-ast now will not depend on other packages. Had to add the spi dependency to partiql-parser following this change
val newArgs = listOf(exprLit(symbolValue(dtField))) + node.args.drop(1) | ||
val dtField = ((node.args[0] as ExprLit).lit as LiteralString).value | ||
// Represent as an `ExprVarRef` to mimic a literal symbol. | ||
// TODO consider some other representation for unquoted strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) I couldn't find a better way to represent the previous capability of printing PartiQLValue
symbols (i.e. text without any quoting). For now just represent with a var ref.
DataType.TIME, DataType.TIMESTAMP -> tail concat type(node.name(), node.precision, gap = true) | ||
DataType.TIME_WITH_TIME_ZONE -> tail concat type("TIME", node.precision, gap = true) concat(" WITH TIME ZONE") | ||
DataType.TIMESTAMP_WITH_TIME_ZONE -> tail concat type("TIMESTAMP", node.precision, gap = true) concat(" WITH TIME ZONE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) was a bug w/ prior pretty-printing. Time and timestamp precision should follow the time/timestamp keyword.
is LiteralInt -> PType.integer() | ||
is LiteralLong -> PType.bigint() | ||
is LiteralDouble -> PType.real() | ||
is LiteralTypedString -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) move the datetime literal validation to the ast->plan conversion.
Marking as a draft to look at the conformance test failures. |
@NotNull | ||
public final Map<String, PartiQLValue> options; | ||
public final Map<String, Literal> options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal makes sense
public class LiteralInt extends Literal { | ||
public int value; | ||
|
||
public LiteralInt(int value) { | ||
super(String.format("%d", value)); | ||
this.value = value; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there should be both a LiteralInt and LiteralLong, there should be one "integral literal" which may just be called LiteralInteger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, it would be nice to just have LiteralNumber which has the necessary APIs / details to model floating point, decimals, and integers. But also there are places in the BNF where we might want to enforce exact numeric or integral.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- the literals, especially the numerics, had a lot of different ways to model. I played around w/ some different approaches but wasn't yet sure of which stage (i.e. parser, ast, planner) should perform the validation and conversion of the literal text.
I'll sync offline w/ you and @johnedquinn on where the text -> value conversion should take place and where to perform any validation.
af7f6d5
to
f6ea942
Compare
partiql-planner/src/main/kotlin/org/partiql/planner/internal/transforms/RexConverter.kt
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,52 @@ | |||
package org.partiql.ast.literal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed a different modeling with @RCHowell using just one class rather than three for the numerics (LiteralApprox
, LiteralExact
, LiteralInt
) that used a enum/kind to distinguish between the three numeric types.
public class LiteralNumber extends Literal {
@Nullable
private final Long p1;
@Nullable
private final BigDecimal p2;
@NotNull
public final ParseContext kind;
private LiteralNumber(@Nullable Long p1, @Nullable BigDecimal p2, @NotNull ParseContext kind) {
this.p1 = p1;
this.p2 = p2;
this.kind = kind;
}
// Factory methods
public static LiteralNumber integer(long value) {
return new LiteralNumber(value, null, ParseContext.INTEGER);
}
public static LiteralNumber integer(int value) {
return new LiteralNumber((long) value, null, ParseContext.INTEGER);
}
public static LiteralNumber exact(BigDecimal value) {
return new LiteralNumber(null, value, ParseContext.EXACT);
}
public static LiteralNumber approx(BigDecimal value, long exponent) {
return new LiteralNumber(exponent, value, ParseContext.APPROX);
}
public static LiteralNumber approx(BigDecimal value) {
return new LiteralNumber(null, value, ParseContext.APPROX);
}
public static LiteralNumber approx(float value) {
return approx(BigDecimal.valueOf(value));
}
public static LiteralNumber approx(double value) {
return approx(BigDecimal.valueOf(value));
}
// Getting the value out
@NotNull
public BigDecimal getDecimal() {
if (kind == ParseContext.INTEGER) {
return p2;
} else {
throw new IllegalStateException("Unknown context: " + kind);
}
}
public long getInteger() {
if (kind == ParseContext.INTEGER) {
return p1;
} else {
throw new IllegalStateException("Unknown context: " + kind);
}
}
// similarly for getDouble
// TODO if we keep this representation, change to extend `AstEnum`
public enum ParseContext {
INTEGER,
EXACT,
APPROX
}
@NotNull
@Override
public String getText() {
switch (kind) {
// Since they're nullable, can be slightly annoying to extract the value but it's internal code
case INTEGER:
assert p1 != null;
return p1.toString();
case EXACT:
assert p2 != null;
return p2.toString();
case APPROX:
assert p1 != null;
assert p2 != null;
return p2 + "e" + p1;
default:
throw new IllegalStateException("Unknown context: " + kind);
}
}
}
and if we wanted to internalize the enum, could add some additional methods
public boolean isInteger() {
return kind == ParseContext.INTEGER;
}
public boolean isExact() {
return kind == ParseContext.EXACT;
}
public boolean isApprox() {
return kind == ParseContext.APPROX;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, the customer code using these different modelings is pretty similar. But I feel like the internal implementation code is simpler if we break apart the LiteralNumeric
class into separate classes since we don't have to do the null
checks or casing on the enum/kind.
Using the single class with an enum
val v = when (lit) {
is LiteralNumber -> {
val kind = lit.kind
when (kind) {
LiteralNumber.ParseContext.EXACT -> {
lit.decimal
}
LiteralNumber.ParseContext.INTEGER -> {
lit.integer
}
LiteralNumber.ParseContext.APPROX -> {
lit.double
}
else -> error("Unexpected numeric literal: $lit")
}
}
else -> error("Unexpected literal: $lit")
}
Not using the enum directly but using helper methods
val v2 = when (lit) {
is LiteralNumber -> {
if (lit.isExact) {
lit.decimal
}
else if (lit.isInteger) {
lit.integer
}
else if (lit.isApprox) {
lit.double
} else {
error("Unexpected numeric literal: $lit")
}
}
else -> error("Unexpected literal: $lit")
}
Using current PR's three classes
val v3 = when (lit) {
is LiteralInteger -> lit.integer
is LiteralExact -> lit.decimal
is LiteralApprox -> lit.double
else -> error("Unexpected literal: $lit")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent a comment elsewhere, but we don't have to do the null checks. This is internal code which will throw a NPE if we have a bug whereas now it throws an assertion exception, again if we have a bug. The checks don't gain us anything because an exception is thrown either way.
@@ -0,0 +1,52 @@ | |||
package org.partiql.ast.literal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent a comment elsewhere, but we don't have to do the null checks. This is internal code which will throw a NPE if we have a bug whereas now it throws an assertion exception, again if we have a bug. The checks don't gain us anything because an exception is thrown either way.
@NotNull | ||
public BigDecimal getDecimal() { | ||
return value; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we want to get the exact numeric as an int or long? Also float or double?
What is missing here is an interface LiteralNumber which has all of the toInt(), toFloat(), toLong() etc. methods. Because all of these methods should be shared for the literal number, I don't see that the additional classes actually gains you much. Also now the factory methods are spread out. Why factory methods if we already have the classes split AND constructors? If you want to split classes, use a common interface and a factory method. Then when you step back you will see that the split classes gains you little at the expense of complexity elsewhere (visitors/rewriters/printers/etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. I had not considered those interface methods in the customer code I was testing out. Agree with your comment that the typical OO approach (i.e. effective java item 23 to prefer class hierarchies to tagged classes) may not be what we want here. Something more akin to PType, Datum, and AnyElement could offer a more cohesive interface.
As mentioned by @RCHowell, the Calcite follows the tagged class approach using their SqlTypeName
property -- https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlLiteral.java. I'll try out a similar approach for literal modeling.
private final BigDecimal mantissa; | ||
|
||
private final int exponent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't these be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not quite sure what needs to be public atm. We could always add helper functions in the future.
When looking at an approximate numeric literal, are there use cases where an API user would actually need just the mantissa or just the exponent? The use cases we currently have use both the arguments (e.g. pretty-printing, AST->plan conversion).
* TODO docs | ||
*/ | ||
@EqualsAndHashCode(callSuper = false) | ||
public class LiteralApprox extends Literal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, I'm struggling to understand why the LiteralInteger class is necessary, especially since the EBNF for approximate numeric is:
<approximate numeric literal> ::= <mantissa> E <exponent>
<mantissa> ::= <exact numeric literal>
With the modeling here, you're implicitly saying that BigDecimal is sufficient for the mantissa
(an exact numeric literal, which encompasses the LiteralInteger
and LiteralExact
variants). Which makes me wonder why LiteralInteger
is distinct from LiteralExact
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally thought this as well. There are some edge cases here. For example, 123
and 123.
cannot both be represented by a BigDecimal alone since BigDecimal doesn't preserve whether the 123
mantissa is specified with or without a decimal point.
We could provide some boolean field to distinguish between the two (e.g. hasPeriod
); however I came to release that approximate numeric also has this same issue and may need that field as well. Example: 123e0
vs 123.e0
. To exactly preserve the text, we would even need to distinguish whether the approximate numeric's exponent had an implicit plus (123e0
), explicit plus (123e+0
), or explicit minus (123e-0
).
This led me to think perhaps we should just be passing around strings for the AST representation of numerics, leaving the parser to perform simple validation and type tagging.
} | ||
|
||
@NotNull | ||
public static LiteralExact litExact(BigInteger value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with my other comment in LiteralApprox
, I think it could be useful to have constructors for long and int here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could always add them later. With the current design, I didn't actually need a factory method or constructor to create an exact numeric literal using a long or int.
public static LiteralTypedString litTypedString(@NotNull DataType type, @NotNull String value) { | ||
return new LiteralTypedString(type, value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure people don't put in bad types, I'd consider making this a final class and then exchanging this method for litTime(int precision, @NotNull String value)
, litTimeZ(int precision, @NotNull String value)
, and so on and so forth.
Or, just making them their own classes (LiteralTime
, LiteralTimestamp
, LiteralDate
). I imagine it would have the same benefits that you described in https://github.com/partiql/partiql-lang-kotlin/pull/1650/files#r1851189538, and it would more closely match the EBNF and more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure people don't put in bad types, I'd consider making this a final class and then exchanging this method for litTime(int precision, @NotNull String value), litTimeZ(int precision, @NotNull String value), and so on and so forth.
I like the idea to create date/time/timestamp-specific factory methods. Perhaps I could add those after I get the general design approved first? We could always add these in a subsequent PR or release.
Or, just making them their own classes (LiteralTime, LiteralTimestamp, LiteralDate). I imagine it would have the same benefits that you described in #1650 (files), and it would more closely match the EBNF and more straightforward.
For now, I liked the modeling of these <data type > '<string repr in quotes>'
to be under one class and reuse the data type enum. The parser does perform some validation to confirm that the strings match the EBNF rules.
LITERAL_DECIMAL | ||
: DIGIT+ '.' DIGIT* | ||
| '.' DIGIT+ | ||
; | ||
|
||
LITERAL_FLOAT | ||
: DIGIT+ ('.' DIGIT*)? 'E' [+-]? DIGIT+ | ||
| '.' DIGIT+ 'E' [+-]? DIGIT+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to fix the G4 to match the EBNF now regarding +/-, you can. Though, non-blocking. These could be parser rules rather than lexing rules. That way, you can simplify how you write this (and the PartiQLParserDefault).
LITERAL_DECIMAL | |
: DIGIT+ '.' DIGIT* | |
| '.' DIGIT+ | |
; | |
LITERAL_FLOAT | |
: DIGIT+ ('.' DIGIT*)? 'E' [+-]? DIGIT+ | |
| '.' DIGIT+ 'E' [+-]? DIGIT+ | |
literalNumeric | |
: SIGN? (literalExactNumeric | literalApproxNumeric) | |
; | |
literalExactNumeric | |
: INT | |
| lhs=INT PERIOD rhs=INT? | |
| PERIOD INT | |
; | |
literalApproxNumeric | |
: mantissa=literalExactNumeric 'E' exponent=signedInt | |
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with some different ANTLR parsing modelings and it led to some other issues. I think the functionality is fine to parse as unsigned numerics. We could always change the modeling in the future or add some AstVisitor pass to unify unary +/- with AST numeric literals.
Changing literal parsing to be in the parser rather than lexer -- cannot use the single-quoted 'E'
, so E
would need to be in the tokens.
Changing literal parsing in the tokens may bind too eagerly -- e.g. 1 - 23
would tokenize -23
together.
535279b
to
4bd73cf
Compare
4bd73cf
to
f9a25b6
Compare
import java.util.Collection; | ||
import java.util.Collections; | ||
|
||
public abstract class Literal extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest design currently models things a bit differently than before. Key summary is
- Fat interface w/ a
LiteralKind
enum. Includes relevant constructors and property accessor methods. much like PType, Datum, AnyElement, and Calcite's SqlLiteral - Three numeric variants (approx, int, exact) with each containing just a string
- The parser will still perform some type ascription
- Modeling certain numerics with a string proved less cumbersome. E.g.
123
vs123.
-- BigDecimal does not actually preserve whether there was a period or not123e0
vs123.e0
-- BigDecimal for the mantissa of an approximate does not preserve whether there was a period or not123e0
vs123.e+0
-- a long/integer for the exponent does not preserve whether there was an explicit+
- Simplifying to just a string allows us to hold off on any eager computation
- Date/time/timestamp represented as a typed string
- The parser already performs some validations and basic type ascription.
- One potential issue with eagerly converting the fields into java is all of the additional properties that would need added. E.g. date would need year, month, day. Time would need hour, minute, second, precision, timezone. Timestamp would need all of date and time's fields. We would need to at some point convert these AST fields to the plan's representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential issue with eagerly converting the fields into java is all of the additional properties that would need added. E.g. date would need year, month, day. Time would need hour, minute, second, precision, timezone. Timestamp would need all of date and time's fields. We would need to at some point convert these AST fields to the plan's representation.
That's a good point, and I didn't think of that. We wouldn't be able to zero-pad and would need to insert nulls for those fields. Seems like a lot of extra work (on read aka scribe) when a (rule,string) is sufficient for capturing syntax which ultimately is the goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capturing syntax which ultimately is the goal
Yeah that was my mindset w/ this latest refactor -- AST at the end of the day just captures syntax with some slight validation w/ the parser. I think most users are not actually performing computation directly on the AST. Computation (like in our case) will come later in the plan or even pushed all the way to the evaluator.
import java.util.Collection; | ||
import java.util.Collections; | ||
|
||
public abstract class Literal extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential issue with eagerly converting the fields into java is all of the additional properties that would need added. E.g. date would need year, month, day. Time would need hour, minute, second, precision, timezone. Timestamp would need all of date and time's fields. We would need to at some point convert these AST fields to the plan's representation.
That's a good point, and I didn't think of that. We wouldn't be able to zero-pad and would need to insert nulls for those fields. Seems like a lot of extra work (on read aka scribe) when a (rule,string) is sufficient for capturing syntax which ultimately is the goal.
public Collection<AstNode> children() { | ||
return Collections.emptyList(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a quick follow-up PR, could you please change this to
public List<AstNode> getChildren() {
...
}
This way it has ordered semantics and the getFoo
while clunky (imo) plays better with kotlin accessors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll track this as a followup in #1610.
|
||
public abstract class Literal extends AstNode { | ||
@NotNull | ||
public abstract LiteralKind kind(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly a fat-interface but similar pattern! We are doing a tagged-union, and we don't actually want a LiteralKind enum and the code (aka tag) should go directly on the Literal (like we did with DataType).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try using the code directly on the Literal. I had ran into some slightly repetitive + confusing code pattern when using a LiteralKind:
val litText = when (lit.kind().code()) {
LiteralKind.NULL -> "NULL"
LiteralKind.MISSING -> "MISSING"
LiteralKind.BOOLEAN -> lit.booleanValue().toString()
...
else -> error("Unsupported literal kind ${lit.kind()}")
}
^ lit.kind().code()
rather than just lit.code()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to be more like DataType
in the lateset commit.
public static Literal litTypedString(@NotNull DataType type, @NotNull String value) { | ||
return new LiteralTypedString(type, value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be as restrictive as possible, you can have explicit time(boolean withTimeZone, int precision, @NotNull String value)
for the datetime values, rather than the typed string allowing all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think there's some value in having a public factory method to construct typed string literals w/ arguments DataType
and String
. For instance, if an API user had a DataType
already (date, time, timetz, timestamp, timestamptz), they could simply call this factory method. If we only make the specific typed string methods public (e.g. time(boolean withTimeZone, int precision, @NotNull String value)
, then they would need to perform additional conversion logic.
We could always add such methods in the future.
import java.util.Collection; | ||
import java.util.Collections; | ||
|
||
public abstract class Literal extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the flexibility PType provides with the fat interface, I explicitly wrote in the Javadocs that consumers of this should not author their own implementations. See
partiql-lang-kotlin/partiql-types/src/main/java/org/partiql/types/PType.java
Lines 23 to 24 in f5c6eff
* Users should NOT author their own implementation. The current recommendation is to use the static methods | |
* (exposed by this interface) to instantiate a type. |
Could add that to this API as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up changing this to not be an abstract class and a concrete class extending AstEnum
(like DataType
) -- #1650 (comment). So adding another implementation shouldn't be an issue anymore.
} | ||
DataType.TIME, DataType.TIME_WITH_TIME_ZONE -> { | ||
val time = DateTimeUtils.parseTimeLiteral(typedString) | ||
val precision = type.precision ?: 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these are safe to !!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataType
's precision
method returns a nullable integer (null indicates no precision was explicitly specified like for TIME WITH TIME ZONE '12:34:56'
), so I think we still need to check for null w/ the elvis operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually seems like the parser is eagerly inferring the precision from the time/timestamp string when the type precision argument is unspecified -- https://github.com/partiql/partiql-lang-kotlin/blob/v1-ast-remove-partiqlvalue/partiql-parser/src/main/kotlin/org/partiql/parser/internal/PartiQLParserDefault.kt#L2201-L2208. I don't think we should be doing this eager inference in the parser so I'll change.
Changed in latest commit.
/** | ||
* With the <string> and <int> nodes of a literal time expression, returns the parsed string and precision. | ||
* TIME (<int>)? (WITH TIME ZONE)? <string> | ||
*/ | ||
private fun getTimeStringAndPrecision( | ||
stringNode: TerminalNode, | ||
integerNode: TerminalNode?, | ||
): Pair<String, Int> { | ||
val timeString = stringNode.getStringValue() | ||
val precision = when (integerNode) { | ||
null -> { | ||
try { | ||
getPrecisionFromTimeString(timeString) | ||
} catch (e: Exception) { | ||
throw error(stringNode.symbol, "Unable to parse precision.", e) | ||
} | ||
} | ||
else -> { | ||
val p = integerNode.text.toBigInteger().toInt() | ||
if (p < 0 || 9 < p) throw error(integerNode.symbol, "Precision out of bounds") | ||
p | ||
} | ||
} | ||
return timeString to precision | ||
} | ||
|
||
private fun getPrecisionFromTimeString(timeString: String): Int { | ||
val matcher = GENERIC_TIME_REGEX.toPattern().matcher(timeString) | ||
if (!matcher.find()) { | ||
throw IllegalArgumentException("Time string does not match the format 'HH:MM:SS[.ddd....][+|-HH:MM]'") | ||
} | ||
val fraction = matcher.group(1)?.removePrefix(".") | ||
return fraction?.length ?: 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review): this prior parsing logic was deleted for the following reasons
- was eagerly inferring the seconds precision for TIME and TIMESTAMP when the precision argument was unspecified. Thus it was classifying
TIME '12:34:56'
as the same asTIME (0) '12:34:56'"
- applying the time string's regex check for timestamp
- only performing the string regex check when the precision was unspecified
TIME and TIMESTAMP-specific validation has been moved to their respective ANTLR visitor methods.
55ec171
to
ebd8411
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved on the modeling, no comment/preferences on changes to partiql-parser since the proof is in the tests and the details are internal. I have some nits but I'll spare you those considering the cycles of this PR. Thanks for closing on this.
* TODO docs | ||
*/ | ||
@EqualsAndHashCode(callSuper = false) | ||
public class Literal extends AstEnum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modeling looks good to me; very concise union and easy to extend if needed.
Addressed comments. Received ship it offline.
Relevant Issues
Description
PartiQLValue
from ASTOther Information
Updated Unreleased Section in CHANGELOG: [NO]
Any backward-incompatible changes? [YES]
Any new external dependencies? [NO]
Do your changes comply with the Contributing Guidelines
and Code Style Guidelines? [YES]
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.