Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] Remove PartiQLValue from AST; refactor AST literals #1650

Merged
merged 7 commits into from
Dec 7, 2024

Conversation

alancai98
Copy link
Member

Relevant Issues

Description

  • Removes PartiQLValue from AST
  • Refactors AST literals to own representation

Other Information

  • Updated Unreleased Section in CHANGELOG: [NO]

    • No on v1 branch.
  • Any backward-incompatible changes? [YES]

    • Yes but on v1 branch
  • Any new external dependencies? [NO]

  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 self-assigned this Nov 19, 2024
@alancai98 alancai98 force-pushed the v1-ast-remove-partiqlvalue branch from 095bf8d to 8579b04 Compare November 19, 2024 01:53
Copy link

github-actions bot commented Nov 19, 2024

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-503D653) +/-
% Passing 89.67% 94.39% 4.72% ✅
Passing 5287 5565 278 ✅
Failing 609 50 -559 ✅
Ignored 0 281 281 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: 503d653
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2643
  • Failing in both: 17
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 3
  • PASSING in BASE but now IGNORED in TARGET: 108
  • FAILING in BASE but now PASSING in TARGET: 180
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 3 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. undefinedUnqualifiedVariableWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  2. undefinedUnqualifiedVariableIsNullExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  3. undefinedUnqualifiedVariableIsMissingExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-503D653). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ✅

BASE (EVAL-F5C6EFF) TARGET (EVAL-503D653) +/-
% Passing 94.39% 94.39% 0.00% ✅
Passing 5565 5565 0 ✅
Failing 50 50 0 ✅
Ignored 281 281 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: f5c6eff
  • Base Engine: EVAL
  • Target Commit: 503d653
  • Target Engine: EVAL

Result Details

  • Passing in both: 5565
  • Failing in both: 50
  • Ignored in both: 281
  • PASSING in BASE but now FAILING in TARGET: 0
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0
  • IGNORED in BASE but now PASSING in TARGET: 0

@alancai98 alancai98 force-pushed the v1-ast-remove-partiqlvalue branch from 8579b04 to d6212ca Compare November 19, 2024 02:00
@NotNull
public final Map<String, PartiQLValue> options;
public final Map<String, Literal> options;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) model the EXPLAIN option map values w/ a Literal directly. Could also opt for ExprLit but wasn't sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literal makes sense

Comment on lines -23 to -26
api(project(":partiql-types"))
// TODO REMOVE ME ONCE PartiQLValue IS REMOVED
// THE AST NEEDS ITS OWN "VALUE" REPRESENTATION
api(project(":partiql-spi"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) partiql-ast now will not depend on other packages. Had to add the spi dependency to partiql-parser following this change

val newArgs = listOf(exprLit(symbolValue(dtField))) + node.args.drop(1)
val dtField = ((node.args[0] as ExprLit).lit as LiteralString).value
// Represent as an `ExprVarRef` to mimic a literal symbol.
// TODO consider some other representation for unquoted strings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) I couldn't find a better way to represent the previous capability of printing PartiQLValue symbols (i.e. text without any quoting). For now just represent with a var ref.

Comment on lines +209 to +212
DataType.TIME, DataType.TIMESTAMP -> tail concat type(node.name(), node.precision, gap = true)
DataType.TIME_WITH_TIME_ZONE -> tail concat type("TIME", node.precision, gap = true) concat(" WITH TIME ZONE")
DataType.TIMESTAMP_WITH_TIME_ZONE -> tail concat type("TIMESTAMP", node.precision, gap = true) concat(" WITH TIME ZONE")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) was a bug w/ prior pretty-printing. Time and timestamp precision should follow the time/timestamp keyword.

is LiteralInt -> PType.integer()
is LiteralLong -> PType.bigint()
is LiteralDouble -> PType.real()
is LiteralTypedString -> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) move the datetime literal validation to the ast->plan conversion.

@alancai98 alancai98 marked this pull request as draft November 19, 2024 02:03
@alancai98
Copy link
Member Author

Marking as a draft to look at the conformance test failures.

@NotNull
public final Map<String, PartiQLValue> options;
public final Map<String, Literal> options;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literal makes sense

Comment on lines 9 to 37
public class LiteralInt extends Literal {
public int value;

public LiteralInt(int value) {
super(String.format("%d", value));
this.value = value;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there should be both a LiteralInt and LiteralLong, there should be one "integral literal" which may just be called LiteralInteger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, it would be nice to just have LiteralNumber which has the necessary APIs / details to model floating point, decimals, and integers. But also there are places in the BNF where we might want to enforce exact numeric or integral.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- the literals, especially the numerics, had a lot of different ways to model. I played around w/ some different approaches but wasn't yet sure of which stage (i.e. parser, ast, planner) should perform the validation and conversion of the literal text.

I'll sync offline w/ you and @johnedquinn on where the text -> value conversion should take place and where to perform any validation.

@alancai98 alancai98 force-pushed the v1-ast-remove-partiqlvalue branch 2 times, most recently from af7f6d5 to f6ea942 Compare November 20, 2024 01:18
@@ -0,0 +1,52 @@
package org.partiql.ast.literal;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed a different modeling with @RCHowell using just one class rather than three for the numerics (LiteralApprox, LiteralExact, LiteralInt) that used a enum/kind to distinguish between the three numeric types.

public class LiteralNumber extends Literal {
    @Nullable
    private final Long p1;

    @Nullable
    private final BigDecimal p2;

    @NotNull
    public final ParseContext kind;

    private LiteralNumber(@Nullable Long p1, @Nullable BigDecimal p2, @NotNull ParseContext kind) {
        this.p1 = p1;
        this.p2 = p2;
        this.kind = kind;
    }

    // Factory methods
    public static LiteralNumber integer(long value) {
        return new LiteralNumber(value, null, ParseContext.INTEGER);
    }

    public static LiteralNumber integer(int value) {
        return new LiteralNumber((long) value, null, ParseContext.INTEGER);
    }

    public static LiteralNumber exact(BigDecimal value) {
        return new LiteralNumber(null, value, ParseContext.EXACT);
    }

    public static LiteralNumber approx(BigDecimal value, long exponent) {
        return new LiteralNumber(exponent, value, ParseContext.APPROX);
    }

    public static LiteralNumber approx(BigDecimal value) {
        return new LiteralNumber(null, value, ParseContext.APPROX);
    }

    public static LiteralNumber approx(float value) {
        return approx(BigDecimal.valueOf(value));
    }

    public static LiteralNumber approx(double value) {
        return approx(BigDecimal.valueOf(value));
    }

    // Getting the value out
    @NotNull
    public BigDecimal getDecimal() {
        if (kind == ParseContext.INTEGER) {
            return p2;
        } else {
            throw new IllegalStateException("Unknown context: " + kind);
        }
    }

    public long getInteger() {
        if (kind == ParseContext.INTEGER) {
            return p1;
        } else {
            throw new IllegalStateException("Unknown context: " + kind);
        }
    }

    // similarly for getDouble

    // TODO if we keep this representation, change to extend `AstEnum`
    public enum ParseContext {
        INTEGER,
        EXACT,
        APPROX
    }

    @NotNull
    @Override
    public String getText() {
        switch (kind) {
            // Since they're nullable, can be slightly annoying to extract the value but it's internal code
            case INTEGER:
                assert p1 != null;
                return p1.toString();
            case EXACT:
                assert p2 != null;
                return p2.toString();
            case APPROX:
                assert p1 != null;
                assert p2 != null;
                return p2 + "e" + p1;
            default:
                throw new IllegalStateException("Unknown context: " + kind);
        }
    }
}

and if we wanted to internalize the enum, could add some additional methods

    public boolean isInteger() {
        return kind == ParseContext.INTEGER;
    }

    public boolean isExact() {
        return kind == ParseContext.EXACT;
    }

    public boolean isApprox() {
        return kind == ParseContext.APPROX;
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, the customer code using these different modelings is pretty similar. But I feel like the internal implementation code is simpler if we break apart the LiteralNumeric class into separate classes since we don't have to do the null checks or casing on the enum/kind.

Using the single class with an enum

val v = when (lit) {
    is LiteralNumber -> {
        val kind = lit.kind
        when (kind) {
            LiteralNumber.ParseContext.EXACT -> {
                lit.decimal
            }
            LiteralNumber.ParseContext.INTEGER -> {
                lit.integer
            }
            LiteralNumber.ParseContext.APPROX -> {
                lit.double
            }
            else -> error("Unexpected numeric literal: $lit")
        }
    }
    else -> error("Unexpected literal: $lit")
}

Not using the enum directly but using helper methods

val v2 = when (lit) {
    is LiteralNumber -> {
        if (lit.isExact) {
            lit.decimal
        }
        else if (lit.isInteger) {
            lit.integer
        }
        else if (lit.isApprox) {
            lit.double
        } else {
            error("Unexpected numeric literal: $lit")
        }
    }
    else -> error("Unexpected literal: $lit")
}

Using current PR's three classes

val v3 = when (lit) {
    is LiteralInteger -> lit.integer
    is LiteralExact -> lit.decimal
    is LiteralApprox -> lit.double
    else -> error("Unexpected literal: $lit")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a comment elsewhere, but we don't have to do the null checks. This is internal code which will throw a NPE if we have a bug whereas now it throws an assertion exception, again if we have a bug. The checks don't gain us anything because an exception is thrown either way.

@alancai98 alancai98 marked this pull request as ready for review November 21, 2024 01:28
@@ -0,0 +1,52 @@
package org.partiql.ast.literal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a comment elsewhere, but we don't have to do the null checks. This is internal code which will throw a NPE if we have a bug whereas now it throws an assertion exception, again if we have a bug. The checks don't gain us anything because an exception is thrown either way.

Comment on lines 31 to 33
@NotNull
public BigDecimal getDecimal() {
return value;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to get the exact numeric as an int or long? Also float or double?

What is missing here is an interface LiteralNumber which has all of the toInt(), toFloat(), toLong() etc. methods. Because all of these methods should be shared for the literal number, I don't see that the additional classes actually gains you much. Also now the factory methods are spread out. Why factory methods if we already have the classes split AND constructors? If you want to split classes, use a common interface and a factory method. Then when you step back you will see that the split classes gains you little at the expense of complexity elsewhere (visitors/rewriters/printers/etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. I had not considered those interface methods in the customer code I was testing out. Agree with your comment that the typical OO approach (i.e. effective java item 23 to prefer class hierarchies to tagged classes) may not be what we want here. Something more akin to PType, Datum, and AnyElement could offer a more cohesive interface.

As mentioned by @RCHowell, the Calcite follows the tagged class approach using their SqlTypeName property -- https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlLiteral.java. I'll try out a similar approach for literal modeling.

@alancai98 alancai98 marked this pull request as draft November 22, 2024 20:54
Comment on lines 14 to 16
private final BigDecimal mantissa;

private final int exponent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't these be public?

Copy link
Member Author

@alancai98 alancai98 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not quite sure what needs to be public atm. We could always add helper functions in the future.
When looking at an approximate numeric literal, are there use cases where an API user would actually need just the mantissa or just the exponent? The use cases we currently have use both the arguments (e.g. pretty-printing, AST->plan conversion).

* TODO docs
*/
@EqualsAndHashCode(callSuper = false)
public class LiteralApprox extends Literal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, I'm struggling to understand why the LiteralInteger class is necessary, especially since the EBNF for approximate numeric is:

<approximate numeric literal> ::= <mantissa> E <exponent>
<mantissa> ::= <exact numeric literal>

With the modeling here, you're implicitly saying that BigDecimal is sufficient for the mantissa (an exact numeric literal, which encompasses the LiteralInteger and LiteralExact variants). Which makes me wonder why LiteralInteger is distinct from LiteralExact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally thought this as well. There are some edge cases here. For example, 123 and 123. cannot both be represented by a BigDecimal alone since BigDecimal doesn't preserve whether the 123 mantissa is specified with or without a decimal point.

We could provide some boolean field to distinguish between the two (e.g. hasPeriod); however I came to release that approximate numeric also has this same issue and may need that field as well. Example: 123e0 vs 123.e0. To exactly preserve the text, we would even need to distinguish whether the approximate numeric's exponent had an implicit plus (123e0), explicit plus (123e+0), or explicit minus (123e-0).

This led me to think perhaps we should just be passing around strings for the AST representation of numerics, leaving the parser to perform simple validation and type tagging.

}

@NotNull
public static LiteralExact litExact(BigInteger value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with my other comment in LiteralApprox, I think it could be useful to have constructors for long and int here.

Copy link
Member Author

@alancai98 alancai98 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could always add them later. With the current design, I didn't actually need a factory method or constructor to create an exact numeric literal using a long or int.

Comment on lines 25 to 30
public static LiteralTypedString litTypedString(@NotNull DataType type, @NotNull String value) {
return new LiteralTypedString(type, value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure people don't put in bad types, I'd consider making this a final class and then exchanging this method for litTime(int precision, @NotNull String value), litTimeZ(int precision, @NotNull String value), and so on and so forth.

Or, just making them their own classes (LiteralTime, LiteralTimestamp, LiteralDate). I imagine it would have the same benefits that you described in https://github.com/partiql/partiql-lang-kotlin/pull/1650/files#r1851189538, and it would more closely match the EBNF and more straightforward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure people don't put in bad types, I'd consider making this a final class and then exchanging this method for litTime(int precision, @NotNull String value), litTimeZ(int precision, @NotNull String value), and so on and so forth.

I like the idea to create date/time/timestamp-specific factory methods. Perhaps I could add those after I get the general design approved first? We could always add these in a subsequent PR or release.


Or, just making them their own classes (LiteralTime, LiteralTimestamp, LiteralDate). I imagine it would have the same benefits that you described in #1650 (files), and it would more closely match the EBNF and more straightforward.

For now, I liked the modeling of these <data type > '<string repr in quotes>' to be under one class and reuse the data type enum. The parser does perform some validation to confirm that the strings match the EBNF rules.

Comment on lines +393 to +403
LITERAL_DECIMAL
: DIGIT+ '.' DIGIT*
| '.' DIGIT+
;

LITERAL_FLOAT
: DIGIT+ ('.' DIGIT*)? 'E' [+-]? DIGIT+
| '.' DIGIT+ 'E' [+-]? DIGIT+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to fix the G4 to match the EBNF now regarding +/-, you can. Though, non-blocking. These could be parser rules rather than lexing rules. That way, you can simplify how you write this (and the PartiQLParserDefault).

Suggested change
LITERAL_DECIMAL
: DIGIT+ '.' DIGIT*
| '.' DIGIT+
;
LITERAL_FLOAT
: DIGIT+ ('.' DIGIT*)? 'E' [+-]? DIGIT+
| '.' DIGIT+ 'E' [+-]? DIGIT+
literalNumeric
: SIGN? (literalExactNumeric | literalApproxNumeric)
;
literalExactNumeric
: INT
| lhs=INT PERIOD rhs=INT?
| PERIOD INT
;
literalApproxNumeric
: mantissa=literalExactNumeric 'E' exponent=signedInt
;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with some different ANTLR parsing modelings and it led to some other issues. I think the functionality is fine to parse as unsigned numerics. We could always change the modeling in the future or add some AstVisitor pass to unify unary +/- with AST numeric literals.

Changing literal parsing to be in the parser rather than lexer -- cannot use the single-quoted 'E', so E would need to be in the tokens.

Changing literal parsing in the tokens may bind too eagerly -- e.g. 1 - 23 would tokenize -23 together.

@alancai98 alancai98 force-pushed the v1-ast-remove-partiqlvalue branch from 4bd73cf to f9a25b6 Compare December 6, 2024 04:02
import java.util.Collection;
import java.util.Collections;

public abstract class Literal extends AstNode {
Copy link
Member Author

@alancai98 alancai98 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest design currently models things a bit differently than before. Key summary is

  • Fat interface w/ a LiteralKind enum. Includes relevant constructors and property accessor methods. much like PType, Datum, AnyElement, and Calcite's SqlLiteral
  • Three numeric variants (approx, int, exact) with each containing just a string
    • The parser will still perform some type ascription
    • Modeling certain numerics with a string proved less cumbersome. E.g.
      • 123 vs 123. -- BigDecimal does not actually preserve whether there was a period or not
      • 123e0 vs 123.e0 -- BigDecimal for the mantissa of an approximate does not preserve whether there was a period or not
      • 123e0 vs 123.e+0 -- a long/integer for the exponent does not preserve whether there was an explicit +
    • Simplifying to just a string allows us to hold off on any eager computation
  • Date/time/timestamp represented as a typed string
    • The parser already performs some validations and basic type ascription.
    • One potential issue with eagerly converting the fields into java is all of the additional properties that would need added. E.g. date would need year, month, day. Time would need hour, minute, second, precision, timezone. Timestamp would need all of date and time's fields. We would need to at some point convert these AST fields to the plan's representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential issue with eagerly converting the fields into java is all of the additional properties that would need added. E.g. date would need year, month, day. Time would need hour, minute, second, precision, timezone. Timestamp would need all of date and time's fields. We would need to at some point convert these AST fields to the plan's representation.

That's a good point, and I didn't think of that. We wouldn't be able to zero-pad and would need to insert nulls for those fields. Seems like a lot of extra work (on read aka scribe) when a (rule,string) is sufficient for capturing syntax which ultimately is the goal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capturing syntax which ultimately is the goal

Yeah that was my mindset w/ this latest refactor -- AST at the end of the day just captures syntax with some slight validation w/ the parser. I think most users are not actually performing computation directly on the AST. Computation (like in our case) will come later in the plan or even pushed all the way to the evaluator.

@alancai98 alancai98 marked this pull request as ready for review December 6, 2024 04:47
import java.util.Collection;
import java.util.Collections;

public abstract class Literal extends AstNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential issue with eagerly converting the fields into java is all of the additional properties that would need added. E.g. date would need year, month, day. Time would need hour, minute, second, precision, timezone. Timestamp would need all of date and time's fields. We would need to at some point convert these AST fields to the plan's representation.

That's a good point, and I didn't think of that. We wouldn't be able to zero-pad and would need to insert nulls for those fields. Seems like a lot of extra work (on read aka scribe) when a (rule,string) is sufficient for capturing syntax which ultimately is the goal.

Comment on lines 19 to 21
public Collection<AstNode> children() {
return Collections.emptyList();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a quick follow-up PR, could you please change this to

public List<AstNode> getChildren() {
    ...
}

This way it has ordered semantics and the getFoo while clunky (imo) plays better with kotlin accessors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll track this as a followup in #1610.


public abstract class Literal extends AstNode {
@NotNull
public abstract LiteralKind kind();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly a fat-interface but similar pattern! We are doing a tagged-union, and we don't actually want a LiteralKind enum and the code (aka tag) should go directly on the Literal (like we did with DataType).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try using the code directly on the Literal. I had ran into some slightly repetitive + confusing code pattern when using a LiteralKind:

val litText = when (lit.kind().code()) {
    LiteralKind.NULL -> "NULL"
    LiteralKind.MISSING -> "MISSING"
    LiteralKind.BOOLEAN -> lit.booleanValue().toString()
    ...
    else -> error("Unsupported literal kind ${lit.kind()}")
}

^ lit.kind().code() rather than just lit.code().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be more like DataType in the lateset commit.

Comment on lines 77 to 79
public static Literal litTypedString(@NotNull DataType type, @NotNull String value) {
return new LiteralTypedString(type, value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be as restrictive as possible, you can have explicit time(boolean withTimeZone, int precision, @NotNull String value) for the datetime values, rather than the typed string allowing all types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there's some value in having a public factory method to construct typed string literals w/ arguments DataType and String. For instance, if an API user had a DataType already (date, time, timetz, timestamp, timestamptz), they could simply call this factory method. If we only make the specific typed string methods public (e.g. time(boolean withTimeZone, int precision, @NotNull String value), then they would need to perform additional conversion logic.

We could always add such methods in the future.

import java.util.Collection;
import java.util.Collections;

public abstract class Literal extends AstNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the flexibility PType provides with the fat interface, I explicitly wrote in the Javadocs that consumers of this should not author their own implementations. See

* Users should NOT author their own implementation. The current recommendation is to use the static methods
* (exposed by this interface) to instantiate a type.
.

Could add that to this API as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up changing this to not be an abstract class and a concrete class extending AstEnum (like DataType) -- #1650 (comment). So adding another implementation shouldn't be an issue anymore.

}
DataType.TIME, DataType.TIME_WITH_TIME_ZONE -> {
val time = DateTimeUtils.parseTimeLiteral(typedString)
val precision = type.precision ?: 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these are safe to !!.

Copy link
Member Author

@alancai98 alancai98 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataType's precision method returns a nullable integer (null indicates no precision was explicitly specified like for TIME WITH TIME ZONE '12:34:56'), so I think we still need to check for null w/ the elvis operator.

Copy link
Member Author

@alancai98 alancai98 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually seems like the parser is eagerly inferring the precision from the time/timestamp string when the type precision argument is unspecified -- https://github.com/partiql/partiql-lang-kotlin/blob/v1-ast-remove-partiqlvalue/partiql-parser/src/main/kotlin/org/partiql/parser/internal/PartiQLParserDefault.kt#L2201-L2208. I don't think we should be doing this eager inference in the parser so I'll change.

Changed in latest commit.

Comment on lines -2205 to -2238
/**
* With the <string> and <int> nodes of a literal time expression, returns the parsed string and precision.
* TIME (<int>)? (WITH TIME ZONE)? <string>
*/
private fun getTimeStringAndPrecision(
stringNode: TerminalNode,
integerNode: TerminalNode?,
): Pair<String, Int> {
val timeString = stringNode.getStringValue()
val precision = when (integerNode) {
null -> {
try {
getPrecisionFromTimeString(timeString)
} catch (e: Exception) {
throw error(stringNode.symbol, "Unable to parse precision.", e)
}
}
else -> {
val p = integerNode.text.toBigInteger().toInt()
if (p < 0 || 9 < p) throw error(integerNode.symbol, "Precision out of bounds")
p
}
}
return timeString to precision
}

private fun getPrecisionFromTimeString(timeString: String): Int {
val matcher = GENERIC_TIME_REGEX.toPattern().matcher(timeString)
if (!matcher.find()) {
throw IllegalArgumentException("Time string does not match the format 'HH:MM:SS[.ddd....][+|-HH:MM]'")
}
val fraction = matcher.group(1)?.removePrefix(".")
return fraction?.length ?: 0
}
Copy link
Member Author

@alancai98 alancai98 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review): this prior parsing logic was deleted for the following reasons

  • was eagerly inferring the seconds precision for TIME and TIMESTAMP when the precision argument was unspecified. Thus it was classifying TIME '12:34:56' as the same as TIME (0) '12:34:56'"
  • applying the time string's regex check for timestamp
  • only performing the string regex check when the precision was unspecified

TIME and TIMESTAMP-specific validation has been moved to their respective ANTLR visitor methods.

@alancai98 alancai98 force-pushed the v1-ast-remove-partiqlvalue branch from 55ec171 to ebd8411 Compare December 6, 2024 21:28
Copy link
Member

@RCHowell RCHowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on the modeling, no comment/preferences on changes to partiql-parser since the proof is in the tests and the details are internal. I have some nits but I'll spare you those considering the cycles of this PR. Thanks for closing on this.

* TODO docs
*/
@EqualsAndHashCode(callSuper = false)
public class Literal extends AstEnum {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modeling looks good to me; very concise union and easy to extend if needed.

@alancai98 alancai98 dismissed johnedquinn’s stale review December 7, 2024 01:08

Addressed comments. Received ship it offline.

@alancai98 alancai98 merged commit 44117b5 into v1 Dec 7, 2024
14 checks passed
@alancai98 alancai98 deleted the v1-ast-remove-partiqlvalue branch December 7, 2024 01:09
@alancai98 alancai98 linked an issue Dec 10, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[V1] Remove PartiQLValue from the AST
3 participants