ak.records_to_regular
to convert [{"x": 1, "y": 2}, {"x": 3, "y": 4}]
into [[1, 2], [3, 4]]
#3257
Labels
feature
New feature or request
Description of new feature
Awkward Array's idiomatic form for data points with named features is to use RecordArray, which keeps each record field in a separate array (useful for loading or working with a subset of columns).
Machine learning libraries like to see a feature-set (an input vector into a neural network) as a regular dimension, either RegularArray or NumpyArray with
inner_shape != ()
(which become the same thing after conversion out of Awkward). Unlike a RecordArray, the different features of the same vector are contiguous in memory.Also unlike a RecordArray, the elements of a feature vector have no names. I do not know if there's a way to preserve these feature names, in PyTorch for instance, but it would be nice to do so in a conversion from Awkward Arrays into PyTorch Tensors.
ak.records_to_regular
in which the records are one level deep,can be implemented as
but we're interested in a function that can be applied regardless of how deep the first level of records is. It would be written with
recursively_apply
. At some level ofrecursively_apply
, you'd have passed through the list-type node and would be seeing the RecordArray directly:and then you'd want to do something like
(preserves the length, 3, so it's good for
recursively_apply
).This function would be useful for Awkward → ML conversions regardless of whether the data are ragged or not.
If more than one RecordArray is nested within each other, this function can be applied multiple times to turn each record-type into a dimension.
The text was updated successfully, but these errors were encountered: