-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ExtensionType
for uuid
and map to parquet logical type
#5822
base: master
Are you sure you want to change the base?
Conversation
Maybe ExtensionType could be a trait to be externally implementable and not limited to canonical extension types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, and seems like an unobtrusive way to provide better ergonomics for extension types.
That being said I've limited exposure to them so getting some broader perspectives might be valuable, perhaps on the mailing list or something?
I haven't had time to work on this, but I'm planning to pick this up later. |
Thanks @mbrobbel -- marking this PR as draft as I think it still has planned but not yet completed work |
Thanks for the PR. This seems very useful to support not yet mapped logical types. e.g. json |
|
Rationale for this change
It would be nice to better support reading and writing the canonical
uuid
extension type with the arrow and parquet crate i.e. mapping between the arrow extension type and the parquet logicaluuid
type.What changes are included in this PR?
This adds an
ExtensionType
trait, some impls for canonical extension types andCanonicalExtensionTypes
enum for canonical extension types.Are there any user-facing changes?
Users can now annotate their logical types with extension types, and for
uuid
they are propagated via the arrow writer to map to the parquetuuid
logical type.This needs better tests and better docs, but I'd like to get some feedback on the approach first, because there are many different ways to implement this.
I quickly tested this change with narrow and those
uuid
fields (in the parquet file) are now picked up asuuid
instead ofblob
by DuckDB.