Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identifier normalization #1519

Closed
wants to merge 2 commits into from
Closed

identifier normalization #1519

wants to merge 2 commits into from

Conversation

yliuuuu
Copy link
Contributor

@yliuuuu yliuuuu commented Jul 18, 2024

Relevant Issues

Description

  • This PR adds a planner flag that determines the case normalization behavior.

    • UPPERCASE: normalize Lvalue and Rvalue by uppercasing the original text.
    • LOWERCASE: normalize Lvalue and Rvalue by lowercasing the original text.
    • EXACTCASE: normalize Lvalue and Rvalue by preserving the original text.
    • If no flag is set: normalize Lvalue by preserving the original text, no Rvalue normalization.
  • To better separate Lvalue and Rvalue, this PR introduce a Binder Node in AST.

  • The Identifier/Binder normalization happens at the planner package, as the first step for AST normalization.

    • Note that order matter for conducting AST normalization.
    • Subsequence passes may generate binder, i.e., AS alias for from source.
    • The binder generated should always be delimited.
    • Example:
UPPERCASE -> FROM bAr -> FROM "BAR" -> FROM "BAR" AS "BAR"
LOWERCASE -> FROM bAr -> FROM "bar" -> FROM "bar" AS "bar"
EXACTCASE -> FROM bAr -> FROM "bAr" -> FROM "bAr" AS "bAr"
NO FLAG -> FROM bAr -> FROM bAr -> FROM bAr AS "bAr"

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES/NO]

    • No. Targeting v1
  • Any backward-incompatible changes? [YES/NO]

  • Any new external dependencies? [YES/NO]

    • No.
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES/NO]
    Yes.

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@yliuuuu yliuuuu mentioned this pull request Jul 18, 2024
@yliuuuu yliuuuu marked this pull request as ready for review July 18, 2024 22:41
Copy link

github-actions bot commented Jul 18, 2024

Conformance comparison report-Cross Engine

Base (legacy) eval +/-
% Passing 92.49% 92.60% 0.10%
✅ Passing 5433 5440 7
❌ Failing 441 435 -6
🔶 Ignored 0 0 0
Total Tests 5874 5875 1
Number passing in both: 5179

Number failing in both: 180

Number passing in legacy engine but fail in eval engine: 255

Number failing in legacy engine but pass in eval engine: 261
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.
261 test(s) were failing in legacy but now pass in eval. Before merging, confirm they are intended to pass.
The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Conformance comparison report-Cross Commit-LEGACY

Base (ee6c814) 6fabd0d +/-
% Passing 92.49% 92.49% 0.00%
✅ Passing 5433 5433 0
❌ Failing 441 441 0
🔶 Ignored 0 0 0
Total Tests 5874 5874 0
Number passing in both: 5433

Number failing in both: 441

Number passing in Base (ee6c814) but now fail: 0

Number failing in Base (ee6c814) but now pass: 0

Conformance comparison report-Cross Commit-EVAL

Base (ee6c814) 6fabd0d +/-
% Passing 92.60% 92.60% 0.00%
✅ Passing 5440 5440 0
❌ Failing 435 435 0
🔶 Ignored 0 0 0
Total Tests 5875 5875 0
Number passing in both: 5440

Number failing in both: 435

Number passing in Base (ee6c814) but now fail: 1

Number failing in Base (ee6c814) but now pass: 1
⁉️ CONFORMANCE REPORT REGRESSION DETECTED ⁉️. The following test(s) were previously passing but now fail:

Click here to see
  • Example 6 — Value Coercion, compileOption: LEGACY
The following test(s) were previously failing but now pass. Before merging, confirm they are intended to pass:
Click here to see
  • Example 6 — Value Coercion, compileOption: LEGACY

@yliuuuu yliuuuu requested review from RCHowell and am357 July 18, 2024 22:56
Comment on lines +18 to +20
internal class NormalizeIdentifier(
caseNormalization: CaseNormalization?
) : AstPass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Merge this with IdentifierManager
  • Use init to pick the normalization function ie (string) -> string

For example,

internal class NormalizeIdentifier(
    mode: CaseNormalization,
) {

   val normalize: (String) -> String

  init {
      normalize = when (mode) {
          UPPERCASE -> Strings.uppercase
          LOWERCASE -> Strings.lowercase
          EXACTCASE -> { it } 
      }
  }

}

Something like that so we aren't null checking and swithcing on mode every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this suggestion is based off the assumption that we also case preserve (after normalization), which is indeed how this PR is implemented.

But my thinking is to keep those two separate layer of indirection until at least the RFC is in draft.

@yliuuuu yliuuuu requested a review from RCHowell July 25, 2024 23:28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but please keep the same package name "normalize" rather than changing it. I'm ok with moving this to planner, in fact, I think that's a great idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove IdentifierManager per my previous comments. The identifier normalization pass should hold/take an enum for the normalization type — that is all. There is no need for the identifier manager.

import org.partiql.ast.identifierSymbol

// The identifier manager class takes in two parameters,
// in case we want to decouple case preservation and case normalization behavior in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have explicitly decided NOT to decouple these per our latest discussion. This is an internal class, so we can always change this later, but for now let's not create edge-cases that we've explicitly decided against.

@yliuuuu
Copy link
Contributor Author

yliuuuu commented Aug 15, 2024

Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants