Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Duration in JSON Reader #6683

Merged
merged 2 commits into from
Nov 16, 2024

Conversation

simonvandel
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Add support for deserializing durations.

Are there any user-facing changes?

Support for Duration in JSON deserialization.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 5, 2024
@tustvold
Copy link
Contributor

tustvold commented Nov 5, 2024

AFAICT this expects durations to be encoded as integers, this isn't necessarily wrong, but I'm also not sure this is what people would necessarily expect? When formatting durations for display we currently use a string based representation.

Perhaps it might be worth filing an issue to discuss what representation is expected for duration types, e.g. with reference to other arrow implementations, and then tracking adding support to both the read and write sides?

@simonvandel
Copy link
Contributor Author

simonvandel commented Nov 6, 2024

Yes, the current PR only supports durations encoded as integers or strings parsed as integers (it just uses the existing decoders for i64)

I think it's a good idea to explore if we should support other duration representations.

Do you think the current PR can be merged as-is? We can then create a follow-up issue to discuss the other representations.

@tustvold
Copy link
Contributor

tustvold commented Nov 6, 2024

Do you think the current PR can be merged as-is

I don't think we should merge this if it isn't consistent with other implementations

@simonvandel
Copy link
Contributor Author

I tried pyarrow here: https://gist.github.com/simonvandel/ff420a412cc9a34d95a23086ecec3b15 using uv run python hello.py

It seems like duration is not supported:

pyarrow.lib.ArrowNotImplementedError: JSON conversion to duration[s] is not supported

@alamb
Copy link
Contributor

alamb commented Nov 8, 2024

Do you think the current PR can be merged as-is? We can then create a follow-up issue to discuss the other representations.

I recommend we at least have the property that data written by the JSON encoder can be read by the JSON encoder

So in other words, if we encode a Duration as JSON can it then be read back as the same Duration? As long as the answer is yes this seems like a reasonable change to me.

Thank you @simonvandel and @tustvold

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable change to me -- thank you @simonvandel and @tustvold

@alamb alamb merged commit 008b5fe into apache:master Nov 16, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants