-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup _observed_ with dynamic broadcasting #40
Comments
What's the problem? Your EDIT: oh, even when broadcasting |
FWIW, I got julia> using FastBroadcast
julia> function fast_foo9(a, b, c, d, e, f, g, h, i)
@.. a = b + 0.1 * (0.2c + 0.3d + 0.4e + 0.5f + 0.6g + 0.6h + 0.6i)
nothing
end
fast_foo9 (generic function with 1 method)
julia> function foo9(a, b, c, d, e, f, g, h, i)
@. a = b + 0.1 * (0.2c + 0.3d + 0.4e + 0.5f + 0.6g + 0.6h + 0.6i)
nothing
end
foo9 (generic function with 1 method)
julia> a, b, c, d, e, f, g, h, i = [rand(100, 100, 2) for i in 1:9];
julia> using BenchmarkTools
julia> @btime fast_foo9($a, $b, $c, $d, $e, $f, $g, $h, $i);
38.674 μs (0 allocations: 0 bytes)
julia> @btime foo9($a, $b, $c, $d, $e, $f, $g, $h, $i);
83.503 μs (0 allocations: 0 bytes)
julia> b = [1.0];
julia> @btime foo9($a, $b, $c, $d, $e, $f, $g, $h, $i);
85.732 μs (0 allocations: 0 bytes)
julia> @btime fast_foo9($a, $b, $c, $d, $e, $f, $g, $h, $i);
30.452 μs (0 allocations: 0 bytes) So I can reproduce. |
Comparing 30k evaluations, where julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads)" begin
foreachf(fast_foo9, 30_000, a, bs, c, d, e, f, g, h, i)
end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles 3.60e+09 49.9% # 3.6 cycles per ns
┌ instructions 2.96e+09 50.0% # 0.8 insns per cycle
│ branch-instructions 2.27e+08 50.0% # 7.7% of insns
└ branch-misses 1.85e+06 50.0% # 0.8% of branch insns
┌ task-clock 1.01e+09 100.0% # 1.0 s
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 4.00e+00 100.0%
┌ L1-dcache-load-misses 6.09e+08 25.0% # 48.6% of dcache loads
│ L1-dcache-loads 1.25e+09 25.0%
└ L1-icache-load-misses 8.45e+06 25.0%
┌ dTLB-load-misses 1.23e+05 25.0% # 0.0% of dTLB loads
└ dTLB-loads 1.25e+09 25.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads)" begin
foreachf(fast_foo9, 30_000, a, b, c, d, e, f, g, h, i)
end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles 3.90e+09 49.9% # 3.6 cycles per ns
┌ instructions 1.43e+09 50.0% # 0.4 insns per cycle
│ branch-instructions 7.52e+07 50.0% # 5.3% of insns
└ branch-misses 3.01e+04 50.0% # 0.0% of branch insns
┌ task-clock 1.09e+09 100.0% # 1.1 s
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 0.00e+00 100.0%
┌ L1-dcache-load-misses 6.76e+08 25.0% # 112.4% of dcache loads
│ L1-dcache-loads 6.02e+08 25.0%
└ L1-icache-load-misses 1.71e+04 25.0%
┌ dTLB-load-misses 4.01e+00 25.0% # 0.0% of dTLB loads
└ dTLB-loads 6.02e+08 25.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads)" begin
foreachf(foo9, 30_000, a, b, c, d, e, f, g, h, i)
end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles 9.86e+09 50.0% # 3.8 cycles per ns
┌ instructions 3.07e+10 50.0% # 3.1 insns per cycle
│ branch-instructions 6.37e+08 50.0% # 2.1% of insns
└ branch-misses 6.58e+06 50.0% # 1.0% of branch insns
┌ task-clock 2.59e+09 100.0% # 2.6 s
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 4.80e+01 100.0%
┌ L1-dcache-load-misses 6.80e+08 25.0% # 5.5% of dcache loads
│ L1-dcache-loads 1.24e+10 25.0%
└ L1-icache-load-misses 1.13e+06 25.0%
┌ dTLB-load-misses 7.47e+03 25.0% # 0.0% of dTLB loads
└ dTLB-loads 1.24e+10 25.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads)" begin
foreachf(foo9, 30_000, a, bs, c, d, e, f, g, h, i)
end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles 1.12e+10 50.0% # 3.8 cycles per ns
┌ instructions 3.18e+10 50.0% # 2.8 insns per cycle
│ branch-instructions 8.62e+08 50.0% # 2.7% of insns
└ branch-misses 9.41e+06 50.0% # 1.1% of branch insns
┌ task-clock 2.93e+09 100.0% # 2.9 s
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 1.26e+03 100.0%
┌ L1-dcache-load-misses 6.16e+08 25.0% # 4.3% of dcache loads
│ L1-dcache-loads 1.45e+10 25.0%
└ L1-icache-load-misses 1.59e+07 25.0%
┌ dTLB-load-misses 2.51e+05 25.0% # 0.0% of dTLB loads
└ dTLB-loads 1.45e+10 25.0%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ It needs twice as many instructions for the small |
Should we fix this inaccuracy by inserting a sleep call in the dynamic broadcasting branch? |
Probably better to update the README instead, as the README claims FastBroadcast is slower than base broadcasting for dynamic broadcasts. |
Despite the claims in the README, I actually get:
The text was updated successfully, but these errors were encountered: