-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@batch
slows down other non-@batched code on x86
#110
Comments
I see the same thing with julia> using Polyester
julia> function reset_x!(x)
x .= 0
end
reset_x! (generic function with 1 method)
julia> function without_batch(x)
for i in 1:100_000
if x[i] != 0
# This is never executed, x contains only zeros
sin(x[i])
end
end
end
without_batch (generic function with 1 method)
julia> function with_batch(x)
@batch for i in 1:100_000
if x[i] != 0
# This is never executed, x contains only zeros
sin(x[i])
end
end
end
with_batch (generic function with 1 method)
julia> function with_thread(x)
Threads.@threads for i in 1:100_000
if x[i] != 0
# This is never executed, x contains only zeros
sin(x[i])
end
end
end
with_thread (generic function with 1 method)
julia>
julia> x = zeros(Int, 1_000_000);
julia> using BenchmarkTools
julia> @benchmark reset_x!($x) setup=without_batch(x)
BenchmarkTools.Trial: 2881 samples with 1 evaluation.
Range (min … max): 272.081 μs … 319.111 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 273.518 μs ┊ GC (median): 0.00%
Time (mean ± σ): 273.863 μs ± 2.094 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅▇▂▄▄██▄▅▅▇▅▁▃▃▂
▆████████████████▆▆▇▆▇▇▆▅█▇▅▄▄▃▃▂▂▁▂▁▁▁▂▂▁▂▁▁▁▂▂▁▂▂▁▁▁▁▁▂▂▂▁▂ ▄
272 μs Histogram: frequency by time 280 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark reset_x!($x) setup=with_batch(x)
BenchmarkTools.Trial: 2809 samples with 1 evaluation.
Range (min … max): 312.843 μs … 350.793 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 316.288 μs ┊ GC (median): 0.00%
Time (mean ± σ): 316.484 μs ± 2.223 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂ ▂▄▄▆▅▅▄▅▅▅▅▇▇▅▇███▆█▆▄▁▁▃▁▂▂▁▂ ▂ ▁ ▁ ▁
▃███▆▅▆▆▄▆██████████████████████████████████████▆█▇▄▄▃▃▃▂▃▄▄▂ ▆
313 μs Histogram: frequency by time 321 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark reset_x!($x) setup=with_thread(x)
BenchmarkTools.Trial: 2063 samples with 1 evaluation.
Range (min … max): 313.300 μs … 392.157 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 359.390 μs ┊ GC (median): 0.00%
Time (mean ± σ): 358.813 μs ± 10.205 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▄▄▄█▄▅▄▇▅▅▄▃▂▂ ▁
▂▃▁▁▂▂▂▂▁▃▂▁▂▂▂▂▃▂▂▃▂▃▃▃▄▃▃▄▄▅▅▇███████████████████▆▆▅▄▄▃▃▃▃▃ ▄
313 μs Histogram: frequency by time 381 μs <
Memory estimate: 0 bytes, allocs estimate: 0. |
You're right. It's even worse for me:
Where would I report this to then? JuliaLang/julia? |
It may be interesting to uncover why.
You could experiment by playing with https://github.com/JuliaSIMD/ThreadingUtilities.jl/blob/e7f2f4ba725cf8862f42cb34b83916e3561c15f8/src/threadtasks.jl#L24 |
I tried this as well.
Wouldn't that rule out your first idea? |
Yes. |
You could try counting cpu cycles with cpucycle. |
I'll come back to this when I have more time. For now, I opened an issue in JuliaLang/julia. |
Note that by default, Polyester only uses one thread per physical core. Those experiments could tell you if the |
Consider the following functions.
Running
with_batch
betweenreset_x!
calls seems to somehow slow downreset_x!
significantly:I also tested and reproduced this on an Intel i9-10980XE, but the difference there was only ~10% on 8 threads.
CC: @sloede @ranocha @LasNikas.
The text was updated successfully, but these errors were encountered: