Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add API for unwrapping JSON values into backend-native values #8958

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Apr 13, 2024

Adds str, int, float, and bool properties to JSONValue as well as an unwrap_as method for easier programmatic usage and more fine-grained casting.

Unless someone really hates the static property names, I'd prefer to keep them as they are. Open to alternative names for unwrap_as though.

In theory this can all be done with casting, but if you look at what's being done in the various backends it's typically a lot more involved than that. Trino in particular requires queries over JSON to be VARCHAR inputs, when then have to be cast back to its JSON type to be able to cast that to the desired output type.

Complicating the cast branching just for the JSON -> not JSON case seemed like the wrong tradeoff.

I went with these names to match the map and array APIs, and to match the short type names we have for the specific types (str, int, float, and bool), which exist to match the equivalent Python types.

@cpcloud cpcloud force-pushed the json-unwrap branch 2 times, most recently from 73ec035 to 7d762f2 Compare April 13, 2024 14:03
@cpcloud cpcloud added postgres The PostgreSQL backend sqlite The SQLite backend mysql The MySQL backend bigquery The BigQuery backend duckdb The DuckDB backend snowflake The Snowflake backend trino The Trino backend risingwave The RisingWave backend feature Features or general enhancements labels Apr 13, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2024

@dixdotdev I've added your example into the docstring for the JSONValue.str API.

Would you mind taking this PR a spin?

@cpcloud cpcloud force-pushed the json-unwrap branch 6 times, most recently from 14b508a to da17d78 Compare April 14, 2024 16:11
@dixdotdev
Copy link

Yep happy to try this out tomorrow. I've reviewed the code from mobile and it seems like a pretty elegant solution, but proof is in the execution!

@cpcloud cpcloud changed the title feat(api): add API for unwrapping JSON string values into backend-native string values feat(api): add API for unwrapping JSON values into backend-native values Apr 14, 2024
@cpcloud cpcloud force-pushed the json-unwrap branch 7 times, most recently from e5d7720 to be98190 Compare April 14, 2024 20:10
@dixdotdev
Copy link

So I've taken a spin through it, and while it does work as intended I think that as a replacement for .cast(type) it deviates a bit syntactically, particularly when plucking out datatypes from iterators or objects.

I think ideally the solution should allow for a similar syntactic expression so your original naming with .unwrap(type), or .unwrap_as(type) might make more sense when considering other functional uses.

@cpcloud cpcloud force-pushed the json-unwrap branch 2 times, most recently from 556cb1f to a8d3498 Compare April 15, 2024 11:21
Copy link

@dixdotdev dixdotdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried it out for my implementations and it seems to work as expected with the .unwrap_as() as we discussed- just a couple of queries in the docstrings for clarity.

ibis/expr/types/json.py Show resolved Hide resolved
ibis/expr/types/json.py Show resolved Hide resolved
@cpcloud cpcloud force-pushed the json-unwrap branch 2 times, most recently from 0046fa2 to ddf3cd9 Compare April 15, 2024 12:01
@cpcloud cpcloud added this to the 9.0 milestone Apr 15, 2024
@gforsyth
Copy link
Member

new unwrap tests are passing on clouds:

🐚 pytest -m "bigquery or snowflake" ibis/backends/tests/test_json.py -k "unwrap"
================================ test session starts ================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
Using --randomly-seed=2966770701
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/gil/github.com/ibis-project/ibis
configfile: pyproject.toml
plugins: xdist-3.5.0, hypothesis-6.100.1, snapshot-0.9.0, randomly-3.15.0, mock-3.14.0, benchmark-4.0.0, cov-5.0.0, repeat-0.9.3, clarity-1.0.1, pytest_httpserver-1.0.10
collected 240 items / 224 deselected / 16 selected                                  

ibis/backends/tests/test_json.py ................                             [100%]

=================== 16 passed, 224 deselected in 80.76s (0:01:20) ===================

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I suppose this will show Risingwave as supporting these ops in the support matrix, but we can add explicit NotImplementeds in a followup.

@gforsyth gforsyth merged commit aebb5cf into ibis-project:main Apr 15, 2024
84 checks passed
@cpcloud cpcloud deleted the json-unwrap branch April 15, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend duckdb The DuckDB backend feature Features or general enhancements mysql The MySQL backend postgres The PostgreSQL backend risingwave The RisingWave backend snowflake The Snowflake backend sqlite The SQLite backend trino The Trino backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants