-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incircle performance may be able to be faster... #20
Comments
Is it easy for you to see what happens to the timings if Delaunator.jl uses your implementation vs AdaptivePredicates.jl's? It might be easier to see what the difference is with a larger set of calls to Most expansions have less than 32 coefficients (from what I can recall, I think most get to around 10..), so I suspect that your approach is worth the effort anyway. But seeing it in action would be nice too. |
My default test case didn't hit the adaptive parts of the test, so it wouldn't show any difference in this portion. Will try and see if any of the other ones do. |
Okay, on the two robustness tests in the Delaunator library, there is a noticeable difference:
|
Interesting. Thanks for that. When I have time I will try and implement your approach |
To give a bit of context. Originally I implemented this using StaticArrays MVector. In #1 @dgleich removed the StaticArrays dependency replacing MVector with Tuple and using the copying setindex version. Generally this is fine as long as the tuples remain smallish. I do wonder if the MArray version would have been able to scale beyond 32 elements. MArrays require a bit of care while coding, but since this code is in the hot loop that care seems warranted. Lastly the TLS caching approach seems to complicated for it's own good. It is probably better and faster to just hit the allocator like @dgleich code does. Especially if you are doing an |
Can the code handle the creation of an
I don't know how this would be guaranteed in general.
I'm happy to look into it again, but when I benchmarked it while implementing it caching was fine. |
The simple pattern I use is: Allocate within a function and if I return a |
You mean e.g. the functions where a cached |
I think either way caching is fine from what I have benchmarked, I don't see the complications. It's not really relevant to the idea of trying to do the most with 32-length |
Looking at I wonder why we have struct that now contain both NTuples and Vec. The code is less clear and the compiler has to do more work to first get rid of I think the strategy of going to a Vector when we are past 32 or 64 elements in an ntuple is fine. So for me the complications are that the code has gotten more complex through the introduction of a cache, I would simply allocate an array at those sizes and not do any form of caching. It ought to be the "rare" case and the array allocation is fast on Julia 1.11 |
The code is still very similar to the C version. I'll think about your approach again, then, but I don't think the performance will change drastically if at all - the adaptive methods are only even needed rarely. My plans for now will be to just try a vector for > 32 elements, and later I can consider this and see what happens to the implementation. If you want to make a PR rather than wait for me to get to it that's fine too. For me it's really only worth considering if it does actually make a difference inside DelaunayTriangulation.jl or Delaunator.jl (or any other algorithms making heavy use of this). |
Since I still had the code up ... using MVectors instead of the any type of caching...
This uses ntuples for <= 32 and MVector >= 48 ... Just for completeness, here's the routine.
|
#21 addresses this |
Just a quick note that the Delaunator port of the incircle routine seems to be consistently faster. It's using ntuples vs. the cached values whenever the size is <= 32, which may have some speed advantage (e.g. it can use vector registers... more efficiently?)
On my machine, I get:
I don't think this is due to the difference in the cache as this case doesn't hit the cache ... just wanted to mention it in case you wanted to move any of that implementation back here :)
The text was updated successfully, but these errors were encountered: