Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.parents_index with axis #3256

Open
jpivarski opened this issue Sep 25, 2024 · 1 comment
Open

ak.parents_index with axis #3256

jpivarski opened this issue Sep 25, 2024 · 1 comment
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

Description of new feature

Similar to ak.local_index, this function would give what we have been calling "parents" at some chosen axis.

Given a ragged dimension (ListOffsetArray node or ListArray node) like this:

>>> array = ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
>>> array.layout.offsets
<Index dtype='int64' len='6'>
    [ 0  3  3  5  6 10]
</Index>

the "parents" are:

>>> counts = np.diff(np.r_[0, array.layout.offsets])
>>> indices = np.arange(-1, len(array.layout.offsets) - 1)
>>> parents = np.repeat(indices, counts)
>>> parents
array([0, 0, 0, 2, 2, 3, 4, 4, 4, 4])

(I don't think nplike has an r_, but something can be arranged. We have implementations of "parents" in the reducer code, for instance.)

The reason we want this to be a high-level interface now is because it's a step in preparing data for DeepSets and GNNs; PyTorch-Geometric has a DeepSetsAggregation and a aggr.MeanAggregation that does vectorized segmented reduction using "parents" as input (called "index").

But to be a normal Awkward function, it should take axis as an argument, like ak.local_index. That can be implemented with our internal recursively_apply. Suppose we have

>>> deep = ak.Array([[[[[[1.1, 2.2, 3.3], [4.4]]]], []]])

Functions like ak.num and ak.local_index do this:

>>> ak.num(deep, axis=1)
<Array [2] type='1 * int64'>
>>> ak.num(deep, axis=2)
<Array [[1, 0]] type='1 * var * int64'>
>>> ak.num(deep, axis=3)
<Array [[[1], []]] type='1 * var * var * int64'>
>>> ak.num(deep, axis=4)
<Array [[[[2]], []]] type='1 * var * var * var * int64'>
>>> ak.num(deep, axis=5)
<Array [[[[[3, 1]]], []]] type='1 * var * var * var * var * int64'>

>>> ak.local_index(deep, axis=1)
<Array [[0, 1]] type='1 * var * int64'>
>>> ak.local_index(deep, axis=2)
<Array [[[0], []]] type='1 * var * var * int64'>
>>> ak.local_index(deep, axis=3)
<Array [[[[0]], []]] type='1 * var * var * var * int64'>
>>> ak.local_index(deep, axis=4)
<Array [[[[[0, 1]]], []]] type='1 * var * var * var * var * int64'>
>>> ak.local_index(deep, axis=5)
<Array [[[[[[0, 1, 2], [0]]]], []]] type='1 * var * var * var * var * var *...'>

We would want ak.parents_index(deep, axis=5) to return

<Array [[[[[0, 0, 0, 1]]], []]] type='1 * var * var * var * var * int64'>

Note that this is the same axis interpretation as ak.flatten:

>>> ak.flatten(deep, axis=5)
<Array [[[[[1.1, 2.2, 3.3, 4.4]]], []]] type='1 * var * var * var * var * float64'>

and it makes an array with the same list-lengths. I think that ak.parents_index and ak.flatten would be used together a lot.

@jpivarski
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant