-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates catalog names for case-insensitive lookup #1510
Conversation
if (ignoreCase && !(this.identifier.matches(other.identifier))) { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of ignoreCase
here? If I write (pseudo-code):
Id("geh").matches(Id("geH"), ignoreCase = true)
Assume both of the above are delimited.
The above won't actually ignore the case. The Part.matches
requires that one of them be non-delimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond this, what is the purpose of matches
? From what I'm gathering, Name
is the resolved unix-style path of a DB object, whereas Identifier
is unresolved. Why would you want to compare two identifiers? In the current implementation, we have BindingPath.matches(ConnectorPath)
(AKA unresolved.matches(resolved)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct that the above won't actually ignore the case (because these are delimited), but consider if one or more of the identifiers is "regular" but the planner is in case-sensitive mode.
The ignoreCase = false
is an explicit way to ignore regular vs. delimited and just compare the text.
On the second point, what is the purpose of matches? I don't actually believe we need to compare two identifiers, but I do know the catalog resolution implementation will be different than bindingpath.matches(connectorpath) of today. I can remove the match logic later.
/** | ||
* Catalogs is used to provide the default catalog and possibly others by name. | ||
*/ | ||
public interface Catalogs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea here is to abstract the Map<String, ConnectorMetadata>
from the session so that a customer may implement their own catalog provider. I understand that you can have a default impl (which is really just a map) that you construct with the planner builder, but consider something like Catalogs
backed by a DynamoDB implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to spend more time on the base branch v1-metadata
, to understand more on how planner / spi uses those interface.
Here is my understanding on the subject of identifier case sensitivity.
Consider the following PartiQL Query, planned in CASE-SENSITIVE Mode for reduced complicity
SELECT tbl.a FROM TBL as tbl
We should have a normalization pass to produce an equivalent query:
SELECT VALUE {'c' : "tbl"."a"} FROM "TBL" as "tbl"
After the normalization, during the planning stage, our planner should be request metadata on "TBL". This is: a case-sensitive identifier with symbol "TBL".
In the specific connector, the connector implementer should be able to make a conscious choice on whether to honor the case sensitivity flag requested by PartiQL.
In other words: the connector implementor may choose to honor the case sensitivity rules of partiql, and only search for exact match of "TBL", or it could possible do matching with case ignored or even folding down the case.
In which case the connector implementer is responsible for configuring PartiQL and implement the connector-specific lookup rule.
In the above example, the retrieval of from source metadata (metadata of "TBL") is the only time we need to call the connector, post from source, we bind the "TBL" with "tbl" and everything in within PartiQL.
Assume the DDL on for relation "TBL" is:
CREATE TABLE "TBL" (
a INT2
)
Case One: "FooConnector", which connects to a relational database "Foo" that lowercase the case insensitive identifier.
retrieve of "TBL" -> BAG<STRUCT<a: INT2>>
Projection: "tbl"."a" -> INT2
Case Two: "BarConnector", which connects to a relational database "Bar" that uppercase the case insensitive identifier.
retrieve of "TBL" -> BAG<STRUCT<A: INT2>>
Projection: "tbl"."a" -> Unresolved.
In which case: this is a bad configuration in regards to PartiQL mode.
Case three: "BazConnector", which connects to a relational database "Baz" that dishonor delimited identifier and lower case everything:
In which case, the implementation of Baz Connector, upon receiving request for "TBL", should consider searching for lowered case "tbl"
@yliuuuu please consider this outside the context of connector/spi as that is merely a pattern for implementing a catalog.
I argue they MUST honor the sensitivity flag if we want to have consistent semantics.
The implementer must honor the case configuration or else it could lead to inconsistent behavior.
The customer is indeed responsible for lookup, this is precisely their implementation of either
I think you are misunderstanding what the identifier is. If you look more closely, you will see that an identifier is composed of parts which are either regular or delimited. We send the identifier to a catalog implementation to get any associated metadata. It looks more like this,
Again we only forward that the identifier is REGULAR (not delimited) — it is the responsibility of the catalog implementor to handle the regular identifier resolution based upon their system.
This is correct! but only when the identifier is REGULAR. If the identifier is DELIMITED you cannot change its case. However, if the identifier is REGULAR then the catalog is responsible for its matching ie case normalization or insensitive matching. |
My argument is that PartiQL only have case insensitive identifier in "Case-Insensitive" mode. The PartiQL Semantics for
is mode dependent. Assuming case preservation on:
I think the invocation to connector is a data/metadata retrieval only. It does not effect PartiQL Semantic. Ideally with the different modes their should be not need to do the custom normalization logic unless they are in "case-insensitive" mode, but we have no control over how connectors implement the retrieval function. |
It does because it determines what "TBL" actually refers to!
By normalization I mean we literally rewrite the AST based upon the mode. -- input
select a from tbl as t
-- normalized in case-sensitive mode
select "a" from "tbl" as t Put quotes on everything, then we send |
Relevant Issues
#1496
Description
Adds session, path, and identifiers to the catalog package with support for case-insensitive identifiers.
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.