Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement json_get_array UDF #36

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vigimite
Copy link

Implements json_get_array which unwraps the JSON array into an arrow array containing the raw JSON strings.

The current behavior will leave the contents of the underlying JSON data unparsed so it needs additional parsing when accessing the individual array elements

@vigimite
Copy link
Author

@samuelcolvin I tried to get it to work using the JsonUnion struct contained in this lib but ended up making my own struct as it was easier to work with initially. I will try to make it work with the JsonUnion struct, any hints would be appreciated

@vigimite
Copy link
Author

nvm figured it out but it's a tiny bit hacky to support both the jiter index based access syntax and the new udf. It might be better to split this up...

Copy link
Collaborator

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise looks like a good start.

@@ -111,6 +114,25 @@ impl FromIterator<Option<JsonUnionField>> for JsonUnion {
}
}

/// Tries to collect an array represented as a string to a `JsonUnion`
impl FromIterator<Option<String>> for JsonUnion {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this, if we're just collecting some optional strings, how can we know they all represent arrays?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way to go about what (I think) you're trying to do.

If you need this logic, better to put it where you need it.

}
}

fn jiter_json_get_array(json_data: Option<&str>, path: &[JsonPath]) -> Result<String, GetError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to build up and return a UnionArray here, not return a string.

@vigimite
Copy link
Author

Sorry for the noise I tried a couple of different appoaches but I think the final one I landed on is ok. I implemented it using the JsonUnion struct by adding a more refined type for the json_get_array case without changing the behavior for any of the existing functions. I also ran cargo clippy --fix which resulted in a bunch of small edits, hope thats ok.

If this approach is acceptable, I would also proceed adding docs. Anything else I should include in this PR?

@vigimite
Copy link
Author

As I understand according to apache/arrow-rs#6218 it is not possible to extract arrays from arrow union types right?

Otherwise it would be possible to support something like the statement below which would completely remove the need for the json_get_array udf. The type change would be a breaking change though.

SELECT json_get('{"foo": [1, 2, 3]}', foo)::text[];

@@ -93,24 +96,6 @@ impl JsonUnion {
}
}

/// So we can do `collect::<JsonUnion>()`
impl FromIterator<Option<JsonUnionField>> for JsonUnion {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved down to the JsonUnionField definition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants