Remove cache structs and use Tuples, allocating only when necessary #21

DanielVandH · 2024-09-15T09:49:31Z

This PR removes the old structs for caches (and thus the TLS sapproach) and uses the ideas from #20 to improve performance. All the predicates now try and use Tuples for as long as possible and only allocate once an expansion could not be computed within 32 components. The modified code from Delaunator.jl in #20 works with some further fixes I mentioned in JuliaGeometry/Delaunator.jl#20.

Since it is possible for expansions to be computed with length > 32 but then later derived expansions to have length < 32, for other predicates other than incircle it is not so simple to just check the length of a final expansion. Thus, for all expansions which could have a length exceeding 32, they are checked if they exceed that length and, if so, immediately allocate. This is made a bit easier with a @check_length macro. The code is a bit cluttered with these checks but it's still fine for me to work with.

Caches can still be provided manually to help reduce allocations a bit (although they are rare anyway), and so I made the cache functions public.

To also help with testing I defined (in the test functions not the package itself obviously) some failure vectors that stores test failures, and a retry function for retrying those failures.

Will merge after I test it a bit more inside DelaunayTriangulation.jl.

DanielVandH · 2024-09-15T09:57:23Z

As a comparison, on this PR

julia> @benchmark incircle((nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0))
BenchmarkTools.Trial: 10000 samples with 204 evaluations.
 Range (min … max):  359.314 ns …  42.923 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     433.333 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   597.103 ns ± 866.459 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▅▅▅▄▅▂▁▁▁ ▁                   ▂▁▃▁                          ▁
  ██████████████▇██▇▆▆▅█▅▇▅▆▆▇▇▇▆▇██████▇▇▆▆▅▅▅▅▄▅▂▄▅▄▃▄▃▃▄▅▄▂▅ █
  359 ns        Histogram: log(frequency) by time       2.05 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

while on main

julia> @benchmark incircle((nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0))
BenchmarkTools.Trial: 10000 samples with 112 evaluations.
 Range (min … max):  766.964 ns …   5.227 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.692 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.793 μs ± 401.194 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▁▁                  █▄▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁                        ▁
  ███▆▆▅▆▆▃▅▅▃▃▃▄▁▃▁▁▃▄█████████████████████▇▇▇▇▇▇▇▇▇▆▆▆▆▇▆▆▅▅▇ █
  767 ns        Histogram: log(frequency) by time        3.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Delaunator.jl's version is comparable,

julia> @benchmark Delaunator.incircle((nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0))
BenchmarkTools.Trial: 10000 samples with 200 evaluations.
 Range (min … max):  408.000 ns …   2.918 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.133 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.199 μs ± 309.719 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▃                     █▅▅▄▃▃▃▂▂▂▂▂▁▁▁                        ▁
  ███▇▇▇▅▇▆▅▆▇▇▆▇█▅▅▅▄▅▃▆█████████████████▇███▇▇██▇▇█▇▇▇▆▇▆▇▇▇▅ █
  408 ns        Histogram: log(frequency) by time       2.26 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Would show the other benchmarks but I can't figure out why even

p, q, r, s = (nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0)
@benchmark incircle($(Ref(p))[], $(Ref(q))[], $(Ref(r))[], $(Ref(s))[])
p, q, r, s = (nextfloat(1.0), eps(), -eps()), (4eps()/3, -4eps()/3, 3eps()), (eps(), prevfloat(1.0), -2eps()), (1.0, 1.0, 1.0)
@benchmark orient3($(Ref(p))[], $(Ref(q))[], $(Ref(r))[], $(Ref(s))[])
p, q, r, s, t = (nextfloat(1.0), eps(), -eps()), (4eps()/3, -4eps()/3, 3eps()), (eps(), prevfloat(1.0), -2eps()), (1.0, 1.0, 1.0), (0.0, -1.0, nextfloat(1.0))
@benchmark insphere($(Ref(p))[], $(Ref(q))[], $(Ref(r))[], $(Ref(s))[], $(Ref(t))[])

gives results on the order 1nanosecond only for orient3 and insphere and I don't remember the other tricks for it. Don't really mind, at least the old approach can be replaced.

DanielVandH added 4 commits September 14, 2024 00:03

An attempt

eef1cc4

tmp

7200739

Finalise

102ea9f

Remove

624848c

DanielVandH linked an issue Sep 15, 2024 that may be closed by this pull request

incircle performance may be able to be faster... #20

Closed

Woops

8a1072e

DanielVandH mentioned this pull request Sep 15, 2024

incircle performance may be able to be faster... #20

Closed

DanielVandH merged commit 871ac22 into main Sep 15, 2024
9 checks passed

DanielVandH deleted the optimise branch September 15, 2024 15:17

This was referenced Sep 15, 2024

Upgrade to AdaptivePredicates 1.2 JuliaGeometry/DelaunayTriangulation.jl#185

Merged

incircle issues JuliaGeometry/Delaunator.jl#20

Open

Fix incircle issuees JuliaGeometry/Delaunator.jl#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cache structs and use Tuples, allocating only when necessary #21

Remove cache structs and use Tuples, allocating only when necessary #21

DanielVandH commented Sep 15, 2024

DanielVandH commented Sep 15, 2024 •

edited

Loading

Remove cache structs and use Tuples, allocating only when necessary #21

Remove cache structs and use Tuples, allocating only when necessary #21

Conversation

DanielVandH commented Sep 15, 2024

DanielVandH commented Sep 15, 2024 • edited Loading

DanielVandH commented Sep 15, 2024 •

edited

Loading