-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harmonize representations for field names to Vec<String>
#591
Comments
Allowing such a case would also require a change on the table scan api, e.g. we need a way to allow user to tell use what |
Also cc @rdblue |
FWIW. Can we introduce a structure similar to
|
This is an implementation detail that is not part of the spec. But I don't think it is worth bothering to enable both The reference implementation has been this way for about 7 years and no one has every complained or, to my knowledge, hit a problem with this in practice. I highly recommend focusing time and effort on other improvements. |
This is not so much about this specific use case - which I also don't care about much either, but about having two different representations for the same entity. Took me a while to formulate it so clearly. Let's assume for example we want to add a field to a schema, then the point representation in the https://github.com/apache/iceberg/blob/e06b069529be3d3d389b156646e751de3753feb0/core/src/main/java/org/apache/iceberg/SchemaUpdate.java#L97-L103 We are even inclined to document at some places that a point is nothing special here:
All of this could be solved by using a globaly unique identifier for a field - which for me is very clearly I stumbled across this again when implementing |
foo.bar
even if struct foo->bar is presentVec<String>
Currently due to the way name-to-id is build, we cannot have points in columnames if it collides with a struct.
The following schema fails to build:
By prohibiting this we follow the Java implementation.
There is nothing in the iceberg spec that prohibits these names, so I think we should allow them.
For column names as a user I expect the same behaviors as for namespaces or databases with points - it needs to be escaped. As escaping depends on the query engine, and iceberg-rust as well as iceberg java has no SQL-parser, those libraries should not take away the option from the engine or make that decision in their stead.
I propose to change the representation of a colname to
Vec<String>
instead of just "String with points". It would also make accessor compatible between schemas - even if we decide to stick keep this artificial restriction.The text was updated successfully, but these errors were encountered: