Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cache structs and use Tuples, allocating only when necessary #21

Merged
merged 5 commits into from
Sep 15, 2024

Conversation

DanielVandH
Copy link
Member

This PR removes the old structs for caches (and thus the TLS sapproach) and uses the ideas from #20 to improve performance. All the predicates now try and use Tuples for as long as possible and only allocate once an expansion could not be computed within 32 components. The modified code from Delaunator.jl in #20 works with some further fixes I mentioned in JuliaGeometry/Delaunator.jl#20.

Since it is possible for expansions to be computed with length > 32 but then later derived expansions to have length < 32, for other predicates other than incircle it is not so simple to just check the length of a final expansion. Thus, for all expansions which could have a length exceeding 32, they are checked if they exceed that length and, if so, immediately allocate. This is made a bit easier with a @check_length macro. The code is a bit cluttered with these checks but it's still fine for me to work with.

Caches can still be provided manually to help reduce allocations a bit (although they are rare anyway), and so I made the cache functions public.

To also help with testing I defined (in the test functions not the package itself obviously) some failure vectors that stores test failures, and a retry function for retrying those failures.

Will merge after I test it a bit more inside DelaunayTriangulation.jl.

@DanielVandH DanielVandH linked an issue Sep 15, 2024 that may be closed by this pull request
@DanielVandH
Copy link
Member Author

DanielVandH commented Sep 15, 2024

As a comparison, on this PR

julia> @benchmark incircle((nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0))
BenchmarkTools.Trial: 10000 samples with 204 evaluations.
 Range (min … max):  359.314 ns …  42.923 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     433.333 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   597.103 ns ± 866.459 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▅▅▅▄▅▂▁▁▁ ▁                   ▂▁▃▁                          ▁
  ██████████████▇██▇▆▆▅█▅▇▅▆▆▇▇▇▆▇██████▇▇▆▆▅▅▅▅▄▅▂▄▅▄▃▄▃▃▄▅▄▂▅ █
  359 ns        Histogram: log(frequency) by time       2.05 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

while on main

julia> @benchmark incircle((nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0))
BenchmarkTools.Trial: 10000 samples with 112 evaluations.
 Range (min … max):  766.964 ns …   5.227 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.692 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.793 μs ± 401.194 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▁▁                  █▄▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁                        ▁
  ███▆▆▅▆▆▃▅▅▃▃▃▄▁▃▁▁▃▄█████████████████████▇▇▇▇▇▇▇▇▇▆▆▆▆▇▆▆▅▅▇ █
  767 ns        Histogram: log(frequency) by time        3.4 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Delaunator.jl's version is comparable,

julia> @benchmark Delaunator.incircle((nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0))
BenchmarkTools.Trial: 10000 samples with 200 evaluations.
 Range (min  max):  408.000 ns    2.918 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):       1.133 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.199 μs ± 309.719 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▃                     █▅▅▄▃▃▃▂▂▂▂▂▁▁▁                        ▁
  ███▇▇▇▅▇▆▅▆▇▇▆▇█▅▅▅▄▅▃▆█████████████████▇███▇▇██▇▇█▇▇▇▆▇▆▇▇▇▅ █
  408 ns        Histogram: log(frequency) by time       2.26 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Would show the other benchmarks but I can't figure out why even

p, q, r, s = (nextfloat(1.0), eps()), (4eps()/3, -4eps()/3), (eps(), prevfloat(1.0)), (1.0, 1.0)
@benchmark incircle($(Ref(p))[], $(Ref(q))[], $(Ref(r))[], $(Ref(s))[])
p, q, r, s = (nextfloat(1.0), eps(), -eps()), (4eps()/3, -4eps()/3, 3eps()), (eps(), prevfloat(1.0), -2eps()), (1.0, 1.0, 1.0)
@benchmark orient3($(Ref(p))[], $(Ref(q))[], $(Ref(r))[], $(Ref(s))[])
p, q, r, s, t = (nextfloat(1.0), eps(), -eps()), (4eps()/3, -4eps()/3, 3eps()), (eps(), prevfloat(1.0), -2eps()), (1.0, 1.0, 1.0), (0.0, -1.0, nextfloat(1.0))
@benchmark insphere($(Ref(p))[], $(Ref(q))[], $(Ref(r))[], $(Ref(s))[], $(Ref(t))[])

gives results on the order 1nanosecond only for orient3 and insphere and I don't remember the other tricks for it. Don't really mind, at least the old approach can be replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incircle performance may be able to be faster...
1 participant