-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add json_encode
to pl.List
#14029
Comments
Yeah, I think I mentioned it at the time but may have gotten lost in translation. Essentially the equivalent of: df = pl.DataFrame({
"a": [["1", "2"], ["3", "4"]],
"b": [[dict(B=1)], [dict(B=2)]],
"c": [dict(C=3), dict(C=4)],
"d": [datetime.date.today(), None],
"e": [5, 6]
})
# shape: (2, 5)
# ┌────────────┬─────────────────┬───────────┬────────────┬─────┐
# │ a ┆ b ┆ c ┆ d ┆ e │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ list[str] ┆ list[struct[1]] ┆ struct[1] ┆ date ┆ i64 │
# ╞════════════╪═════════════════╪═══════════╪════════════╪═════╡
# │ ["1", "2"] ┆ [{1}] ┆ {3} ┆ 2024-01-27 ┆ 5 │
# │ ["3", "4"] ┆ [{2}] ┆ {4} ┆ null ┆ 6 │
# └────────────┴─────────────────┴───────────┴────────────┴─────┘ duckdb.sql("""
from df
select a::json, b::json, c::json, d::json, e::json
""").pl()
# shape: (2, 5)
# ┌───────────────────┬───────────────────┬───────────────────┬───────────────────┬───────────────────┐
# │ CAST(a AS "json") ┆ CAST(b AS "json") ┆ CAST(c AS "json") ┆ CAST(d AS "json") ┆ CAST(e AS "json") │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str ┆ str ┆ str │
# ╞═══════════════════╪═══════════════════╪═══════════════════╪═══════════════════╪═══════════════════╡
# │ ["1","2"] ┆ [{"B":1}] ┆ {"C":3} ┆ "2024-01-27" ┆ 5 │
# │ ["3","4"] ┆ [{"B":2}] ┆ {"C":4} ┆ null ┆ 6 │
# └───────────────────┴───────────────────┴───────────────────┴───────────────────┴───────────────────┘ (decoding is (I'm assuming |
Yeah after I posted the example I realised I also need to cast/encode a |
Instead use
|
@deanm0000 thanks for the suggestion but I actually want it to process one item at a time, I don't want a single valid json str for the entire column, I specifically want to convert each element to a json fragment. Both are valid use cases as I often work with jsonlines and in this case I just want to dump the entire frame to csv. |
As far as I can tell, all that is needed is to add this to polars/crates/polars-plan/src/dsl/function_expr/struct_.rs Lines 123 to 134 in 2c5f4f3
and change But perhaps someone can answer if this should be implemented as And not limited to lists/structs? |
Seems to be a duplicate of #8482. Seems like a pretty easy solution. Would be awesome to be able to use |
Description
pl.Struct
contains ajson_encode
/json_decode
which amongst other things is useful if you want to dump a table to csv but it contains nested fields.It would be useful if
pl.List
also contains at leastjson_encode
(json_decode
might be trickier as it would have to check that the json was a list).At the moment my workaround is
I considered using
to_struct
but that doesn't preserve the list structure.Also the error from my original attempt
ComputeError: TypeError: Object of type Series is not JSON serializable
I guess internally a list is a series but that was surprising, Ideally the engine would internally cast to a python list before invoking the function being used by map_elements?
The text was updated successfully, but these errors were encountered: