Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygon performance #109

Merged
merged 9 commits into from
Apr 2, 2024
Merged

Polygon performance #109

merged 9 commits into from
Apr 2, 2024

Conversation

rafaqz
Copy link
Member

@rafaqz rafaqz commented Apr 1, 2024

For Shapefile.Polygon this PR speeds up looping over GI.getgeom(poly, i) by ~100x for the first run and ~1000x after that (for very large shape files at least!)

  1. It caches the ring order as a Vector{Vector{Int}} and only calculates it on the first getgeom.
  2. It checks Extents.intersects before actually checking _inring for another more modest speedup.

It specifically does not eagerly fill the cache on construction, because some use cases don't need it, like GI.getring which returns them in the current order (which is 20x faster for e.g. calculating hulls or rasterization).

Users may also only want a subset of polygons, and this skips doing most of the work in that case.

@asinghvi17 if you want to review that would be great :)

(there is a tiny bit of type instability but this will be fixed in Extents.jl its just lack of inlining stopping const prop)

@rafaqz rafaqz requested review from evetion and visr and removed request for evetion April 1, 2024 19:48
Copy link
Member

@asinghvi17 asinghvi17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part! I will test in a bit and let you know how the performance is :)

src/polygons.jl Outdated Show resolved Hide resolved
src/polygons.jl Outdated Show resolved Hide resolved
src/polygons.jl Outdated Show resolved Hide resolved
src/polygons.jl Outdated Show resolved Hide resolved
rafaqz and others added 3 commits April 1, 2024 22:05
Co-authored-by: Anshul Singhvi <[email protected]>
Co-authored-by: Anshul Singhvi <[email protected]>
Co-authored-by: Anshul Singhvi <[email protected]>
Comment on lines +170 to +176
# TODO add this to `Extents.union` for any `Tuple/AbstractArray` point
function _union(extent::Extents.Extent, point::Tuple)
X = min(extent.X[1], point[1]), max(extent.X[2], point[1])
Y = min(extent.Y[1], point[2]), max(extent.Y[2], point[2])
return Extents.Extent(; X, Y)
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to move this to rafaqz/Extents.jl#24 or maybe GeoInterface?

It would be cool to have Extents.union(extent, geometry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will, but not worth waiting for the release cycle for this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add it for AbstractArray and NamedTuple as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying we cant do it for all GeoInterface.jl points...

Copy link
Member

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Would be good to add a test explicitly for the cached and uncached getgeom.

@rafaqz
Copy link
Member Author

rafaqz commented Apr 2, 2024

Ok its running all the test geometry checks twice. This should be good to go

@rafaqz rafaqz merged commit 94811c8 into main Apr 2, 2024
13 of 14 checks passed
@rafaqz rafaqz deleted the polygon_performance branch April 2, 2024 21:42
@evetion
Copy link
Member

evetion commented Apr 3, 2024

Good to have this 👍🏻. Might be good to comment on how you go about finding these optimizations?Similar work probably is required for GeoArrow (once there) and other packages when we want the most performance.

@rafaqz
Copy link
Member Author

rafaqz commented Apr 3, 2024

How I found this was putting a println on the number of rings counted for each polygon in a dataset and loading a huge shapefile, then seeing the same number 50 times rather than once. I was trying to optimise something else at the time and it turned out to be irrelevent!

Not very fancy... but "print summary metrics while slow things run" is a good motto I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants