Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept single polygons as prediction area boundaries #146

Merged
merged 4 commits into from
Jun 30, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
* `get_rsplit()` is now re-exported from the rsample package. This provides a
more natural, pipe-able interface for accessing individual splits;
`get_rsplit(rset, 1)` is identical to `rset$splits[[1]]`.

* Passing a single polygon (or multipolygon) to the `prediction_sites` argument
of `spatial_nndm_cv()` will result in prediction sites being sampled from that
polygon, rather than from its bounding box.

# spatialsample 0.4.0

Expand Down
42 changes: 36 additions & 6 deletions R/spatial_nndm_cv.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@
#' @param prediction_sites An `sf` or `sfc` object describing the areas to be
#' predicted. If `prediction_sites` are all points, then those points are
#' treated as the intended prediction points when calculating target nearest
#' neighbor distances. If any element of `prediction_sites` is not a single
#' point, then points are sampled from within the bounding box of
#' `prediction_sites` and those points are then used as the intended prediction
#' points.
#' neighbor distances. If `prediction_sites` is a single (multi-)polygon, then
#' points are sampled from within the boundaries of that polygon. Otherwise,
#' if `prediction_sites` is of length > 1 and not made up of points,
#' then points are sampled from within the bounding box of `prediction_sites`
#' and used as the intended prediction points.
#' @param ... Additional arguments passed to [sf::st_sample()]. Note that the
#' number of points to sample is controlled by `prediction_sample_size`; trying
#' to pass `size` via `...` will cause an error.
Expand Down Expand Up @@ -117,12 +118,41 @@ spatial_nndm_cv <- function(data, prediction_sites, ...,
# we check both for length > 1 (in order to avoid the "condition has length"
# error) and to see if the input is already only points
pred_geometry <- unique(sf::st_geometry_type(prediction_sites))
if (length(pred_geometry) > 1 || pred_geometry != "POINT") {
prediction_sites <- sf::st_sample(

# these are more for clarity than control flow -- they do not get used
# outside of the below if/else
use_provided_points <- length(pred_geometry) == 1 && pred_geometry == "POINT"
sample_provided_poly <- length(pred_geometry) == 1 && pred_geometry %in% c(
"POLYGON",
"MULTIPOLYGON"
)
sample_bbox <- !use_provided_points && !sample_provided_poly

if (sample_bbox) {
sample_points <- sf::st_sample(
x = sf::st_as_sfc(sf::st_bbox(prediction_sites)),
size = prediction_sample_size,
...
)
} else if (sample_provided_poly) {
sample_points <- sf::st_sample(
x = sf::st_geometry(prediction_sites),
size = prediction_sample_size,
...
)
} else if (use_provided_points) {
sample_points <- prediction_sites
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Imo you can drop the comment about this being about clarity, I'd say that's a well-established pattern and doesn't need the explanation.

I'd make the order of the ifelse statement the same as you laid it out above: sampling from the bounding box is the fall back so it shouldn't be the first condition to check. If the order is polygon > points > bbox, then that should be the order for both the definitions up in line 124-128 (no need for sample_bbox unless you want it) and the condition checking in the if-else block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a ton of sense, thanks!


# st_sample can _sometimes_ use geographic coordinates (for SRS, mainly)
# and will _sometimes_ warn instead (systematic sampling)
# but will _often_ strip CRS from the returned data;
# enforce here that our output prediction sites share a CRS with input data
if (is.na(sf::st_crs(sample_points))) {
prediction_sites <- sf::st_set_crs(
sample_points,
sf::st_crs(prediction_sites)
)
}

# Set autocorrelation range, if not specified, to be the distance between
Expand Down
9 changes: 5 additions & 4 deletions man/spatial_nndm_cv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions tests/testthat/_snaps/spatial_nndm_cv.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,35 @@
14 <split [7/1]> Fold14
15 <split [14/1]> Fold15

# can pass a single polygon to sample within

Code
spatial_nndm_cv(Smithsonian_sf, example_poly)
Output
# A tibble: 20 x 2
splits id
<list> <chr>
1 <split [10/1]> Fold01
2 <split [10/1]> Fold02
3 <split [10/1]> Fold03
4 <split [18/1]> Fold04
5 <split [10/1]> Fold05
6 <split [10/1]> Fold06
7 <split [10/1]> Fold07
8 <split [10/1]> Fold08
9 <split [14/1]> Fold09
10 <split [10/1]> Fold10
11 <split [10/1]> Fold11
12 <split [15/1]> Fold12
13 <split [18/1]> Fold13
14 <split [10/1]> Fold14
15 <split [17/1]> Fold15
16 <split [10/1]> Fold16
17 <split [10/1]> Fold17
18 <split [11/1]> Fold18
19 <split [10/1]> Fold19
20 <split [10/1]> Fold20

# printing

# A tibble: 15 x 2
Expand Down
30 changes: 27 additions & 3 deletions tests/testthat/test-spatial_nndm_cv.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,12 @@ test_that("bad args", {

test_that("can pass the dots to st_sample", {
skip_if_not(sf::sf_use_s2())
expect_error(
expect_no_error(
spatial_nndm_cv(
Smithsonian_sf[1:15, ],
Smithsonian_sf[16:20, ],
type = "regular"
),
NA
)
)
})

Expand Down Expand Up @@ -74,6 +73,31 @@ test_that("normal usage", {
)
})

test_that("can pass a single polygon to sample within", {
library(sf)
skip_if_not(sf::sf_use_s2())

example_poly <- sf::st_as_sfc(
list(
sf::st_point(c(-77.03, 40)),
sf::st_point(c(-76, 40.5)),
sf::st_point(c(-76.5, 39.5))
)
)
example_poly <- sf::st_set_crs(example_poly, sf::st_crs(Smithsonian_sf))
example_poly <- sf::st_union(example_poly)
example_poly <- sf::st_cast(example_poly, "POLYGON")

expect_snapshot(
spatial_nndm_cv(
Smithsonian_sf,
example_poly
)
)
})



test_that("printing", {
skip_if_not(sf::sf_use_s2())
# The default RNG changed in 3.6.0
Expand Down