Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables.materializer should return a StructArray #291

Closed
oschulz opened this issue Dec 15, 2023 · 3 comments
Closed

Tables.materializer should return a StructArray #291

oschulz opened this issue Dec 15, 2023 · 3 comments

Comments

@oschulz
Copy link
Contributor

oschulz commented Dec 15, 2023

We currently have this behavior, which is not optimal:

julia> X = StructArray(a = rand(5), b = rand(5));

julia> Tables.materializer(X)(Tables.columns(X))
5-element Vector{@NamedTuple{a::Float64, b::Float64}}:
 (a = 0.2940266626814607, b = 0.6915915110907411)
 (a = 0.6392677058537343, b = 0.5609264578636665)
 [...]

When using Tables.materializer on a StructArray we probably don't want to get something that generates plain vectors of named tuples. If X isa StructArray then we should also have Tables.materializer(X)(Tables.columns(X)) isa StructArray (TypeTables.Table, for example, behaves like this).

I propose to specialize Tables.materializer like this

_materialize_sa(cols) = StructArray(cols)
Tables.materializer(X::StructArray) = _materialize_sa

(Returning a function _materialize_sa instead of StructArray directly helps with type stability). This will result in

julia> Tables.materializer(X)(Tables.columns(X))
5-element StructArray(::Vector{Float64}, ::Vector{Float64}) with eltype @NamedTuple{a::Float64, b::Float64}:
 (a = 0.2940266626814607, b = 0.6915915110907411)
 (a = 0.6392677058537343, b = 0.5609264578636665)
 [...]

Questions is, should this be considered to be breaking, requiring a v0.7 release?

@piever
Copy link
Collaborator

piever commented Dec 20, 2023

The proposed implementation is probably something like

fromtable(cols) = StructArray(Tables.columntable(cols)) # so we ensure that _all_ Tables compliant objects are supported
Tables.materializer(X::StructArray) = fromtable

And one can use StructArrays.fromtable or materialize(StructArray) as a sink, eg

CSV.read(filepath, StructArrays.fromtable)

or

Arrow.Table(file) |> StructArrays.fromtable

(see also brief discussion in #276 (comment)).

As #289 and #290 also involve slight changes in behavior, maybe it's best to be completely safe and mark those two plus this change as 0.7

I'll be travelling for the holidays but was planning to make a PR for this shortly after that.

@oschulz
Copy link
Contributor Author

oschulz commented Dec 20, 2023

I'll be travelling for the holidays but was planning to make a PR for this shortly after that.

Thanks @piever !

@aplavin
Copy link
Member

aplavin commented Feb 18, 2024

Added fromtable and materializer to my "table-support" PR #276.

@aplavin aplavin closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants