Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifier mode #1511

Closed
wants to merge 3 commits into from
Closed

Identifier mode #1511

wants to merge 3 commits into from

Conversation

yliuuuu
Copy link
Contributor

@yliuuuu yliuuuu commented Jul 12, 2024

Relevant Issues

Note

  • This is a draft PR that allows us to further experiment with the idea of adding modes for lvalue and rvalue behavior.
  • Some tests cases are still to-be-fixed, but the purpose of this PR is as a code reference point to further discuss the identifier behavior.

Semantics

  • This draft PR enables adding planner modes to determine Case Preservation and Variable Look Up behavior.

  • Case Preservation:

    • A binary flag set on the planner:
    • This flag determines the lvalue behavior (As Binding, DDL, etc).
    • If this flag is included: Lvalue (AS binding, DDL, etc) are case-sensitive and stored in the environment as-is.
    • if this flag is not included: Lvalue are normalized using an implementation defined normalization rule.
      • In this PR we made a conscious decision to set this implementation defined normalization rule to be using the input string.
      • This is: even without the case preservation flag, the lvalue will behave case sensitively.
      • This allows us to reduce one moving element for the purpose of testing.
      • While technically not breaking the semantics of "implementation-defined normalization rule", it does bug the question: do we also want to allow for different case preservation flag? If so, does it make sense to set case preservation to "FOLDING DOWN" and set Variable look up behavior to "FOLDING UP" ?
  • Variable Look Up (RValue):

    • FOLDING UP: Using upper case text to look up
    • FOLDING DOWN: Using lower case text to look up
    • SENSITIVE: Using original text to look up
    • INSENSITIVE: "Match" behavior - text comparison with case ignored. This is the current behavior.

Modeling

  • Behavior

    • Consider the follow expression SELECT bAr .....
      • Folding UP -> the query should behave the same as SELECT "BAR" ...
      • Folding DOWN -> the query should behave the same as SELECT "bar" ...
      • SENSITIVE -> the query should behave the same as SELECT "bAR" ...
    • Based on the above observation: we can keep the resolution logic as-is, with the only change to be normalize the identifier in "planning"
    • This PR does this during AST normalization steps, which is an internal process during planning.
  • AST Node: Binder

    • The Binder node differs from Identifier in two aspect:
      • It is used for LValue.
      • It is always present as unqualified.
      • i.e., t.c.a AS A, t.c.a is a qualified identifier and t, c, a are identifier symbol. A is a binder.
    • The addition of binder node allow us to modifier the lvalue and rvalue on the AST level easily, without the need to go down each node and distinguish the type of identifier based on the node parameter. (check if the identifier symbol is an as_alias, etc).

Removed to reduce noise in this PR.
- Decouple SelectNormalization with RelConverter.
- The previous invocation of SelectNormalization is invoked by RelConverter.
- This is due to the need of handling subquery coercion.
- I found this jumping between AST and Relational Operation introduces confusion, but to decouple the two, we are forced to add a boolean flag in the AST to indicate coercible. Not something I necessarily like either...
- We can revert this decouple per later discussion.

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES/NO]

    • No.
  • Any backward-incompatible changes? [YES/NO]

    • Yes.
  • Any new external dependencies? [YES/NO]

    • No.
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES/NO]

  • Yes.

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

github-actions bot commented Jul 13, 2024

Conformance comparison report-Cross Engine

Base (eval) legacy +/-
% Passing 87.05% 92.60% 5.55%
✅ Passing 5097 5421 324
❌ Failing 758 433 -325
🔶 Ignored 0 0 0
Total Tests 5855 5854 -1
Number passing in both: 4894

Number failing in both: 230

Number passing in eval engine but fail in legacy engine: 203

Number failing in eval engine but pass in legacy engine: 528
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.
528 test(s) were failing in eval but now pass in legacy. Before merging, confirm they are intended to pass.
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Conformance comparison report-Cross Commit-EVAL

Base (8b4e0d3) 2080440 +/-
% Passing 90.59% 87.05% -3.53%
✅ Passing 5322 5097 -225
❌ Failing 553 758 205
🔶 Ignored 0 0 0
Total Tests 5875 5855 -20
Number passing in both: 5047

Number failing in both: 499

Number passing in Base (8b4e0d3) but now fail: 260

Number failing in Base (8b4e0d3) but now pass: 51
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.
51 test(s) were previously failing but now pass. Before merging, confirm they are intended to pass
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Conformance comparison report-Cross Commit-LEGACY

Base (8b4e0d3) 2080440 +/-
% Passing 92.51% 92.60% 0.09%
✅ Passing 5434 5421 -13
❌ Failing 440 433 -7
🔶 Ignored 0 0 0
Total Tests 5874 5854 -20
Number passing in both: 5417

Number failing in both: 432

Number passing in Base (8b4e0d3) but now fail: 1

Number failing in Base (8b4e0d3) but now pass: 4
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️. The following test(s) were previously passing but now fail:

Click here to see
  • MYSQL_SELECT_29, compileOption: LEGACY
The following test(s) were previously failing but now pass. Before merging, confirm they are intended to pass:
Click here to see
  • pathIndexStructLiteral, compileOption: PERMISSIVE

  • subscript with variable in lowercase, compileOption: PERMISSIVE

  • subscript with variable in uppercase, compileOption: PERMISSIVE

  • subscript with variable in mixed case, compileOption: PERMISSIVE

Comment on lines +280 to +286
binder::{
symbol: string,
case_sensitivity: [
SENSITIVE,
INSENSITIVE,
],
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
binder::{
symbol: string,
case_sensitivity: [
SENSITIVE,
INSENSITIVE,
],
}
binder::{
text: string,
normal: bool,
}

partiql-ast/src/main/resources/partiql_ast.ion Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat think we should modify the definition for identifier to be like,

identifier::{
   qualifier: list::[binder],
   identifier: binder,
}

Or something similar. The point is, having identifier with a qualifier rather than a root and steps. I guess this is also difficult because identifier paths in PartiQL are like path expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binder is different than Identifier.

Identifier: used at place where we do variable reference. It can be optionally qualified.

Binder: used at place where we do variable declaration. It can not be qualified (AFAIK).

public fun default(): PartiQLPlanner = PartiQLPlannerBuilder().build()
public fun default(): PartiQLPlanner =
PartiQLPlannerBuilder()
.casePreserve()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave out casePreserve for now because we agreed this is is the only lvalue behavior for now.

public fun default(): PartiQLPlanner =
PartiQLPlannerBuilder()
.casePreserve()
.lookUpBehavior("INSENSITIVE")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest an enum like

enum class CaseNormalForm {
   LOWERCASE,
   UPPERCASE,
   EXACTCASE,
}

This determines if we normalize identifiers, ie how we handle SQL/PartiQL "normal" identifiers.

LOWERCASE -> AbC -> "abc"
UPPERCASE -> AbC -> "ABC"
EXACTCASE -> AbC -> "AbC"

// null (aka insensitive, no normalization)
AbC -> AbC

@yliuuuu yliuuuu marked this pull request as ready for review July 16, 2024 00:04
@yliuuuu yliuuuu changed the title [Draft] - Identifier mode Identifier mode Jul 16, 2024
@yliuuuu
Copy link
Contributor Author

yliuuuu commented Jul 18, 2024

Close in favor of PR #1519

@yliuuuu yliuuuu closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants