Updates catalog names for case-insensitive lookup #1510

RCHowell · 2024-07-11T23:01:43Z

Relevant Issues

#1496

Description

Adds session, path, and identifiers to the catalog package with support for case-insensitive identifiers.

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

johnedquinn · 2024-07-15T18:26:26Z

partiql-planner/src/main/kotlin/org/partiql/planner/catalog/Identifier.kt

+        if (ignoreCase && !(this.identifier.matches(other.identifier))) {
+            return false


What's the purpose of ignoreCase here? If I write (pseudo-code):

Id("geh").matches(Id("geH"), ignoreCase = true)

Assume both of the above are delimited.

The above won't actually ignore the case. The Part.matches requires that one of them be non-delimited.

Beyond this, what is the purpose of matches? From what I'm gathering, Name is the resolved unix-style path of a DB object, whereas Identifier is unresolved. Why would you want to compare two identifiers? In the current implementation, we have BindingPath.matches(ConnectorPath) (AKA unresolved.matches(resolved))

Correct that the above won't actually ignore the case (because these are delimited), but consider if one or more of the identifiers is "regular" but the planner is in case-sensitive mode.

The ignoreCase = false is an explicit way to ignore regular vs. delimited and just compare the text.

On the second point, what is the purpose of matches? I don't actually believe we need to compare two identifiers, but I do know the catalog resolution implementation will be different than bindingpath.matches(connectorpath) of today. I can remove the match logic later.

johnedquinn · 2024-07-15T19:12:06Z

partiql-planner/src/main/kotlin/org/partiql/planner/catalog/Catalogs.kt

+/**
+ * Catalogs is used to provide the default catalog and possibly others by name.
+ */
+public interface Catalogs {


Does this need to be public?

Yes, the idea here is to abstract the Map<String, ConnectorMetadata> from the session so that a customer may implement their own catalog provider. I understand that you can have a default impl (which is really just a map) that you construct with the planner builder, but consider something like Catalogs backed by a DynamoDB implementation.

yliuuuu

I would need to spend more time on the base branch v1-metadata, to understand more on how planner / spi uses those interface.

Here is my understanding on the subject of identifier case sensitivity.

Consider the following PartiQL Query, planned in CASE-SENSITIVE Mode for reduced complicity

SELECT tbl.a FROM TBL as tbl

We should have a normalization pass to produce an equivalent query:

SELECT VALUE {'c' : "tbl"."a"} FROM "TBL" as "tbl"

After the normalization, during the planning stage, our planner should be request metadata on "TBL". This is: a case-sensitive identifier with symbol "TBL".

In the specific connector, the connector implementer should be able to make a conscious choice on whether to honor the case sensitivity flag requested by PartiQL.

In other words: the connector implementor may choose to honor the case sensitivity rules of partiql, and only search for exact match of "TBL", or it could possible do matching with case ignored or even folding down the case.

In which case the connector implementer is responsible for configuring PartiQL and implement the connector-specific lookup rule.

In the above example, the retrieval of from source metadata (metadata of "TBL") is the only time we need to call the connector, post from source, we bind the "TBL" with "tbl" and everything in within PartiQL.

Assume the DDL on for relation "TBL" is:

CREATE TABLE "TBL" (
    a INT2
)

Case One: "FooConnector", which connects to a relational database "Foo" that lowercase the case insensitive identifier.

retrieve of "TBL" -> BAG<STRUCT<a: INT2>>

Projection: "tbl"."a" -> INT2

Case Two: "BarConnector", which connects to a relational database "Bar" that uppercase the case insensitive identifier.

retrieve of "TBL" -> BAG<STRUCT<A: INT2>>

Projection: "tbl"."a" -> Unresolved.

In which case: this is a bad configuration in regards to PartiQL mode.

Case three: "BazConnector", which connects to a relational database "Baz" that dishonor delimited identifier and lower case everything:

In which case, the implementation of Baz Connector, upon receiving request for "TBL", should consider searching for lowered case "tbl"

RCHowell · 2024-07-15T22:13:00Z

@yliuuuu please consider this outside the context of connector/spi as that is merely a pattern for implementing a catalog.

After the normalization, during the planning stage, our planner should be request metadata on "TBL". This is: a case-sensitive identifier with symbol "TBL".
In the specific connector, the connector implementer should be able to make a conscious choice on whether to honor the case sensitivity flag requested by PartiQL.

I argue they MUST honor the sensitivity flag if we want to have consistent semantics.

In other words: the connector implementor may choose to honor the case sensitivity rules of partiql, and only search for exact match of "TBL", or it could possible do matching with case ignored or even folding down the case.

The implementer must honor the case configuration or else it could lead to inconsistent behavior.

In which case the connector implementer is responsible for configuring PartiQL and implement the connector-specific lookup rule.

The customer is indeed responsible for lookup, this is precisely their implementation of either

- getTable(name: Name) // get by exact/resolved name
- getTable(identifier: Identifier) // lookup based on identifier (MAYBE case-insensitive)

In the above example, the retrieval of from source metadata (metadata of "TBL") is the only time we need to call the connector, post from source, we bind the "TBL" with "tbl" and everything in within PartiQL.

Assume the DDL on for relation "TBL" is:

CREATE TABLE "TBL" (a INT2)

Case One: "FooConnector", which connects to a relational database "Foo" that lowercase the case insensitive identifier.
retrieve of "TBL" -> BAG<STRUCT<a: INT2>>
Projection: "tbl"."a" -> INT2

I think you are misunderstanding what the identifier is. If you look more closely, you will see that an identifier is composed of parts which are either regular or delimited. We send the identifier to a catalog implementation to get any associated metadata.

It looks more like this,

SELECT tbl.a FROM TBL as tbl

val tbl = env.getTable(Identifier.regular("TBL")) // we have a REGULAR identifier (not-delimited)
val schema = tbl.getSchema()

println(schema)

// << { a: int2 } >>

Case Two: "BarConnector", which connects to a relational database "Bar" that uppercase the case insensitive identifier.
retrieve of "TBL" -> BAG<STRUCT<A: INT2>>
Projection: "tbl"."a" -> Unresolved.

Again we only forward that the identifier is REGULAR (not delimited) — it is the responsibility of the catalog implementor to handle the regular identifier resolution based upon their system.

In which case: this is a bad configuration in regards to PartiQL mode.
Case three: "BazConnector", which connects to a relational database "Baz" that dishonor delimited identifier and lower case everything:
In which case, the implementation of Baz Connector, upon receiving request for "TBL", should consider searching for lowered case "tbl"

This is correct! but only when the identifier is REGULAR. If the identifier is DELIMITED you cannot change its case. However, if the identifier is REGULAR then the catalog is responsible for its matching ie case normalization or insensitive matching.

yliuuuu · 2024-07-15T23:13:09Z

My argument is that PartiQL only have case insensitive identifier in "Case-Insensitive" mode.

The PartiQL Semantics for

SELECT tbl.a FROM TBL as "tbl"

is mode dependent.

Assuming case preservation on:

-- Folding up
-- asking for regular("TBL", regular = false)
SELECT "TBL"."A" FROM "TBL" as "tbl"

-- Folding down
-- asking for ("tbl", regular = false)
SELECT "tbl"."a" FROM "tbl" as "tbl"

-- Case Sensitive
-- asking for ("TBL", regular = false)
SELECT "tbl"."a" FROM "TBL" as "tbl"

-- Case Insensitive
-- asking for ("TBL", regular= true)
SELECT tbl.a FROM TBL as "tbl"

I argue they MUST honor the sensitivity flag if we want to have consistent semantics.

I think the invocation to connector is a data/metadata retrieval only. It does not effect PartiQL Semantic.

Ideally with the different modes their should be not need to do the custom normalization logic unless they are in "case-insensitive" mode, but we have no control over how connectors implement the retrieval function.

RCHowell · 2024-07-15T23:47:54Z

I think the invocation to connector is a data/metadata retrieval only. It does not effect PartiQL Semantic.

It does because it determines what "TBL" actually refers to!

My argument is that PartiQL only have case insensitive identifier in "Case-Insensitive" mode.

By normalization I mean we literally rewrite the AST based upon the mode.

-- input
select a from tbl as t

-- normalized in case-sensitive mode
select "a" from "tbl" as t

Put quotes on everything, then we send ("tbl", regular = false) to the catalog APIs — but you are right that we have no control over whether or not a customer respects the regular flag.

Updates catalog names for case-insensitive lookup

9f558e4

RCHowell requested a review from johnedquinn July 11, 2024 23:01

RCHowell mentioned this pull request Jul 15, 2024

PartiQL-Environment Tracking Issue #1496

Closed

17 tasks

johnedquinn reviewed Jul 15, 2024

View reviewed changes

yliuuuu reviewed Jul 15, 2024

View reviewed changes

johnedquinn approved these changes Jul 15, 2024

View reviewed changes

RCHowell closed this Jul 16, 2024

RCHowell reopened this Jul 16, 2024

RCHowell merged commit 1c3085b into v1-metadata Jul 16, 2024
8 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates catalog names for case-insensitive lookup #1510

Updates catalog names for case-insensitive lookup #1510

RCHowell commented Jul 11, 2024

johnedquinn Jul 15, 2024

johnedquinn Jul 15, 2024

RCHowell Jul 15, 2024

johnedquinn Jul 15, 2024

RCHowell Jul 15, 2024

yliuuuu left a comment •

edited

Loading

RCHowell commented Jul 15, 2024

yliuuuu commented Jul 15, 2024

RCHowell commented Jul 15, 2024

		if (ignoreCase && !(this.identifier.matches(other.identifier))) {
		return false

Updates catalog names for case-insensitive lookup #1510

Updates catalog names for case-insensitive lookup #1510

Conversation

RCHowell commented Jul 11, 2024

Relevant Issues

Description

License Information

johnedquinn Jul 15, 2024

Choose a reason for hiding this comment

johnedquinn Jul 15, 2024

Choose a reason for hiding this comment

RCHowell Jul 15, 2024

Choose a reason for hiding this comment

johnedquinn Jul 15, 2024

Choose a reason for hiding this comment

RCHowell Jul 15, 2024

Choose a reason for hiding this comment

yliuuuu left a comment • edited Loading

Choose a reason for hiding this comment

RCHowell commented Jul 15, 2024

yliuuuu commented Jul 15, 2024

RCHowell commented Jul 15, 2024

yliuuuu left a comment •

edited

Loading