Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial join bounding box #1546

Merged
merged 25 commits into from
Nov 25, 2024

Conversation

Jonathan24680
Copy link
Collaborator

@Jonathan24680 Jonathan24680 commented Oct 11, 2024

We now have two efficient implementations of spatial joins: One using Google's S2 library, and now a new one using boost::geometry::rtree. Currently both of these implementations are limited to point geometries, the new, rtree-based implementation is easier to extend to arbitrary geometries, which will be implemented soon.

As boost::geometry purely works on Cartesian coordinates, which are especially unsuited for the spherical form of the earth, this PR manually implements the required expansion of bounding boxes to also show the correct behavior for bounding boxes that cross one of the poles or the 180th degree of longitude.

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 99.20000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.34%. Comparing base (5f28e83) to head (ce5db45).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/SpatialJoinAlgorithms.cpp 99.12% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1546      +/-   ##
==========================================
+ Coverage   89.24%   89.34%   +0.09%     
==========================================
  Files         374      375       +1     
  Lines       35683    35925     +242     
  Branches     4027     4063      +36     
==========================================
+ Hits        31845    32096     +251     
+ Misses       2538     2533       -5     
+ Partials     1300     1296       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@Jonathan24680 Jonathan24680 force-pushed the SpatialJoinBoundingBox branch from a9eb289 to e9ed3e4 Compare October 18, 2024 21:26
@Jonathan24680 Jonathan24680 requested a review from joka921 November 3, 2024 14:01
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first detailed round of reviews for the Code, and not so detailed for the tests.

src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
Comment on lines 29 to 37
std::vector<box> OnlyForTestingWrapperComputeBoundingBox(
const point& startPoint) const {
return computeBoundingBox(startPoint);
}

bool OnlyForTestingWrapperContainedInBoundingBoxes(
const std::vector<box>& bbox, point point1) const {
return containedInBoundingBoxes(bbox, point1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As they are const, and thus safe to use, you don't need those funnily named wrappers, but just can make the called functions public. Maybe some of them can even be static.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want the names to convey that these methods are not supposed to get called by the user of the class and that he does not need to care what those methods do. Otherwise i could just make the two wrapped functions public and completely avoid the wrapping altogether. Don't you think it is confusing for users of the class?

src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
test/engine/SpatialJoinAlgorithmsTest.cpp Outdated Show resolved Hide resolved

// testing only possible, if lat bounds are correct and the lon bounds
// don't cover everything (as then left or right of the box is again
// inside the box because of the spherical geometry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what do we have to check in that case? Or can't we do anything?

Copy link
Collaborator Author

@Jonathan24680 Jonathan24680 Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a bounding box, which goes from -180 to 180 longitude, then left of the bounding box is just in the bounding box again. (i.e. -180.00001 is the same as +179.99999). As all longitudes are covered, a left or right bound does not exist (on the sphere this makes intuitive sense). A test in that case is not necessary, because this test is about testing the edges and if everything is covered an edge doesn't exist there is no need for testing in that case

I added the above explanation to the code

test/engine/SpatialJoinAlgorithmsTest.cpp Outdated Show resolved Hide resolved
Comment on lines 865 to 870
// broad grid test
for (int lon = -180; lon < 180; lon += 20) {
for (int lat = -90; lat < 90; lat += 20) {
checkOutside(point(lon, lat), startPoint, bbox, &spatialJoinAlgs);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you should choose some distribution that has more values closer to the startPoint, because that's where it is interesting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that's not necessary as the testBounds function is doing exactly that: heavily testing the values barely in or out of the bounding box. This function checks everything but that so the combination covers everything.

}
}

} // namespace boundingBox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't yet look at all those tests in detail,
but something you could also do:
Get a bounding box from your implementation, and get a bounding box via S2, and then
check that they are consistent etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a possibility. Although i think the current tests are already sufficient. If you want to i can implement it though.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments.

src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
rightJoinCol, numColumns, maxDist, maxResults] = params_;
IdTable result{numColumns, qec_->getAllocator()};

// create r-tree for smaller result table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely.

src/engine/SpatialJoinAlgorithms.cpp Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoin.h Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only small things missing

  1. Potentially the runtime info
  2. Coverage (small things)
  3. Sonarcloud (make functions const).

src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.h Outdated Show resolved Hide resolved
src/engine/SpatialJoin.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes left, but only very minor ones.
There is one test coverage gap for one of your numeric-accuracy-safety-buffer things, how easy is it to add a test for those?

src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
src/engine/SpatialJoinAlgorithms.cpp Outdated Show resolved Hide resolved
@sparql-conformance
Copy link

@joka921 joka921 merged commit 7bffcd5 into ad-freiburg:master Nov 25, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants