optimize get_ith #286

aplavin · 2023-11-21T18:05:55Z

Makes getindex and related functions much faster on arrays with a few tens of components.

Narrow arrays:

julia> a1 = StructArray(x=1:10, y=1:10, z=1:10)
julia> @btime $a1[2]
# before
  1.625 ns (0 allocations: 0 bytes)
# after
  1.666 ns (0 allocations: 0 bytes)

Intermediate arrays:

julia> a2 = StructArray(;[Symbol("x$i") => 1:10 for i in 1:50]...)

julia> @btime $a2[2]
# before
  42.166 μs (854 allocations: 45.86 KiB)
# after
  3.182 μs (56 allocations: 4.91 KiB)

I guess originally there was some reason why the current implementation was chosen instead of a more naive one? The latter turns out to be faster now.

aplavin · 2023-11-21T21:51:34Z

Nightly failures unrelated

N5N3 · 2023-11-24T02:09:55Z

We just met a Vararg length limitation here. Tuple reducer using Base.tail plays bad for these long inputs as expected.

It should be noticed that the current version gives a stronger promising on inlining. As map has no @inline, and callsite inlining is not supported before 1.8.

There's also some concern on inferability on nested StructArray, though it should be usually ok in this case.

aplavin · 2023-11-24T02:29:07Z

some concern on inferability on nested StructArray

Could you please suggest a few @inferred tests to add to the testsuite, so that to check for that? Not just for this PR, but for anything that happens in the future as well. Otherwise it's very easy to forget!

the current version gives a stronger promising on inlining

Are you talking about the @inline annotation? Docs say that it doesn't force nor guarantee anything: it's an "extra nudge" as docs phrase it. Also they say small functions typically don't need this annotation at all...
map on Tuples is very fundamental in Julia, pretty sure its performance is kept in check.

N5N3 · 2023-11-24T03:37:05Z

I believe map itself is not designed for handling this kind of long tuples though. It would give up rescusion and fallback to a collect to Vector{Any} algorithm which punishes inference. You need ntuple or generated function to forse unroll in this case.

As for inlining, the current code would has more possibility to be inlined. In fact, getindex is not that cheap, especially for AbstractArray. So I think we'd better remove all inline blocker here.

aplavin · 2023-11-24T03:55:43Z

Indeed, sounds like ntuple() should bring the best from both worlds here :) Will update the PR, always forget about that function.

And would be really great if you could suggest more inference or related tests for StructArrays, so that we could catch (some) of these potential issues automatically in the future.

N5N3 · 2023-11-24T04:16:49Z

ntuple is also a inline blocker here. I think a StructArray of SMatrix{4,4} (or bigger) is enough to see the difference.

In fact, the limitation comes from Base.argtail. So the simpliest fix here is defining tail ourselves.

As for inference, I believe a nested StructArray with 4 or 5 layers is enough to check stability. Though I wound if this has a realistic usage (just like the big struct in MWE). IIUC
, Array should be preferred for these cases.

piever · 2023-11-30T13:57:41Z

I also seem to remember we switched to this version as a map-based (or rather ntuple-based) get_ith caused problem in CUDA kernels (see #114 and #87). We should probably add tests that StructArrays work in CUDA kernels, but I'm unsure what is a way to do that without complicating CI (I don't think the standard GitHub CI has access to a GPU).

aplavin · 2023-11-30T21:51:22Z

Looks like lots of stuff is interconnected here! Wide, deep, GPU arrays...
Maybe just do @generated get_ith then, should always be no-overhead and work with any GPU type?

Pinging @vchuravy @lcw (those from the PR you linked): do you guys have any suggestions?

aplavin · 2023-12-09T00:30:31Z

Pushed a straightforward @generated implementation: seems to solve all mentioned issues!

Don't know how to add something relevant to tests, but here are benchmarks:

A = StructArray([SMatrix{6,6}(rand(6,6)) for _ in 1:10])
@btime $A[5]
# before:
# 6.948 μs (155 allocations: 5.36 KiB)
# after:
# 1.021 μs (39 allocations: 1.45 KiB)

B = StructArray(a=StructArray(b=StructArray(c=StructArray(d=StructArray(e=StructArray(f=1:10))))))
@btime $B[5]
# before:
# 1.500 ns (0 allocations: 0 bytes)
# after:
# 1.500 ns (0 allocations: 0 bytes)

C = StructArray(;[Symbol("x$i") => 1:10 for i in 1:50]...)
@btime $C[5]
# before:
# 42.375 μs (854 allocations: 45.86 KiB)
# after:
# 1.092 μs (3 allocations: 1.31 KiB)

N5N3 · 2023-12-09T01:22:36Z

So there are still remaining allocations? Perhaps it's another similar problem caused by splatted Tuple. I guess you need to fix that too to make your case fully optimized.

If we design to optimize this package for eltype with big struct, it would be good to ensure it works in more cases. Just for example, apparently inference would be blocked by map in _getindex below.

aplavin · 2023-12-09T03:52:14Z

That's a bit above my head – tried to specialize more functions involved, but it didn't help. Suggestions welcome!

And this PR is a strict improvement anyway, with more than an order of magnitude speedup. This can easily turn the overhead from "dominates the profview" to "barely visible there".

lcw · 2023-12-11T22:31:15Z

I haven't tested it but the new @generated version of the code looks great. It should work on the GPU as well.

src/structarray.jl

piever · 2023-12-14T15:20:30Z

I also like the generated function approach, it does seem very simple and elegant. I've left a couple minor suggestions, but overall looks good to me.

Co-authored-by: Pietro Vertechi <[email protected]>

aplavin · 2023-12-26T13:26:20Z

Followed those comments, should be ready!
Still, not sure if something relevant is possible to put into tests, but that would be great.

aplavin · 2024-01-12T03:57:37Z

gentle bump...

aplavin · 2024-02-14T18:24:07Z

bump

aplavin · 2024-02-23T11:38:58Z

I'll take the liberty to merge this, following an advice on slack #arrays. The PR didn't get a strong "no" from maintainers, changes are nonbreaking, and quite local (ie not an overhaul).
Please let me know if anything is wrong, it's the first time I do this in JuliArrays.

aplavin added 2 commits November 21, 2023 13:02

optimize get_ith

0a5f18e

typo

af916b8

generated

e0d3313

add @inbounds back

4229023

piever reviewed Dec 14, 2023

View reviewed changes

src/structarray.jl Outdated Show resolved Hide resolved

src/structarray.jl Outdated Show resolved Hide resolved

aplavin and others added 2 commits December 26, 2023 08:20

Update src/structarray.jl

e2662d3

Co-authored-by: Pietro Vertechi <[email protected]>

upd

e35d7e7

aplavin merged commit 3880403 into JuliaArrays:master Feb 23, 2024
7 checks passed

aplavin deleted the patch-1 branch June 15, 2024 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize get_ith #286

optimize get_ith #286

aplavin commented Nov 21, 2023

aplavin commented Nov 21, 2023

N5N3 commented Nov 24, 2023

aplavin commented Nov 24, 2023

N5N3 commented Nov 24, 2023 •

edited

Loading

aplavin commented Nov 24, 2023

N5N3 commented Nov 24, 2023 •

edited

Loading

piever commented Nov 30, 2023

aplavin commented Nov 30, 2023 •

edited

Loading

aplavin commented Dec 9, 2023 •

edited

Loading

N5N3 commented Dec 9, 2023 •

edited

Loading

aplavin commented Dec 9, 2023

lcw commented Dec 11, 2023

piever commented Dec 14, 2023

aplavin commented Dec 26, 2023

aplavin commented Jan 12, 2024

aplavin commented Feb 14, 2024

aplavin commented Feb 23, 2024

optimize get_ith #286

optimize get_ith #286

Conversation

aplavin commented Nov 21, 2023

aplavin commented Nov 21, 2023

N5N3 commented Nov 24, 2023

aplavin commented Nov 24, 2023

N5N3 commented Nov 24, 2023 • edited Loading

aplavin commented Nov 24, 2023

N5N3 commented Nov 24, 2023 • edited Loading

piever commented Nov 30, 2023

aplavin commented Nov 30, 2023 • edited Loading

aplavin commented Dec 9, 2023 • edited Loading

N5N3 commented Dec 9, 2023 • edited Loading

aplavin commented Dec 9, 2023

lcw commented Dec 11, 2023

piever commented Dec 14, 2023

aplavin commented Dec 26, 2023

aplavin commented Jan 12, 2024

aplavin commented Feb 14, 2024

aplavin commented Feb 23, 2024

N5N3 commented Nov 24, 2023 •

edited

Loading

N5N3 commented Nov 24, 2023 •

edited

Loading

aplavin commented Nov 30, 2023 •

edited

Loading

aplavin commented Dec 9, 2023 •

edited

Loading

N5N3 commented Dec 9, 2023 •

edited

Loading