Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With prediction_sites as a single polygon, spatial_nndm_cv() produces a LOOCV #160

Closed
julienvollering opened this issue Aug 27, 2024 · 1 comment · Fixed by #161
Closed

Comments

@julienvollering
Copy link

The problem

Related to #145
I think there's a bug when passing a polygon as prediction_sites to spatial_nndm_cv. When I use a single single sfc_POLYGON, the assessment folds do not have any buffer around them -- the CV just becomes a LOOCV. When I pass a sfc_POINT object sampled from the same sfc_POLYGON, the NNDM has folds as expected.

Wrapping the sfc_POLYGON in a call to st_sample is of course a very easy workaround, but the documentation clearly states the desired behavior for a single polygon.

Having looked quickly at the source code, it seems to me that sample_points is not being carried forward as the prediction_sites (unless it lacks a CRS).

sample_points <- sf::st_sample(

Reproducible example

library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(spatialsample)
data(ames, package = "modeldata")

ames_sf <- st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)

# Passing prediction_sites as single sfc_POLYGON
ch <- st_concave_hull(st_union(ames_sf), ratio = 0.4, allow_holes = TRUE)
str(ch)
#> sfc_POLYGON of length 1; first list element: List of 2
#>  $ : num [1:72, 1:2] -93.7 -93.7 -93.7 -93.7 -93.7 ...
#>  $ : num [1:6, 1:2] -93.6 -93.6 -93.6 -93.6 -93.6 ...
#>  - attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"

set.seed(123)
nndm <- spatial_nndm_cv(ames_sf[1:100, ], ch) 
print(nndm, n=2)
#> # A tibble: 100 × 2
#>   splits         id     
#>   <list>         <chr>  
#> 1 <split [99/1]> Fold001
#> 2 <split [99/1]> Fold002
#> # ℹ 98 more rows
foldN <- nndm |> 
  rsample::tidy() |> 
  dplyr::count(Resample) |> 
  dplyr::pull(n)
all(foldN == 100)
#> [1] TRUE
autoplot(get_rsplit(nndm, 1))

# Passing prediction_sites as sfc_POINT
set.seed(123)
chpts <- st_sample(ch, size = 1000)
str(chpts)
#> sfc_POINT of length 1000; first list element:  'XY' num [1:2] -93.7 42

set.seed(123)
nndm2 <- spatial_nndm_cv(ames_sf[1:100, ], chpts) 
print(nndm2, n=2)
#> # A tibble: 100 × 2
#>   splits         id     
#>   <list>         <chr>  
#> 1 <split [50/1]> Fold001
#> 2 <split [50/1]> Fold002
#> # ℹ 98 more rows
autoplot(get_rsplit(nndm2, 1))

Created on 2024-08-27 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       Europe/Oslo
#>  date     2024-08-27
#>  pandoc   3.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  class           7.3-22  2023-05-03 [2] CRAN (R 4.4.1)
#>  classInt        0.4-10  2023-09-05 [1] CRAN (R 4.4.1)
#>  cli             3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  codetools       0.2-20  2024-03-31 [2] CRAN (R 4.4.1)
#>  colorspace      2.1-1   2024-07-26 [1] CRAN (R 4.4.1)
#>  curl            5.2.1   2024-03-01 [1] CRAN (R 4.4.1)
#>  DBI             1.2.3   2024-06-02 [1] CRAN (R 4.4.1)
#>  digest          0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr           1.1.4   2023-11-17 [1] CRAN (R 4.4.1)
#>  e1071           1.7-14  2023-12-06 [1] CRAN (R 4.4.1)
#>  evaluate        0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
#>  fansi           1.0.6   2023-12-08 [1] CRAN (R 4.4.1)
#>  farver          2.1.2   2024-05-13 [1] CRAN (R 4.4.1)
#>  fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs              1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
#>  furrr           0.3.1   2022-08-15 [1] CRAN (R 4.4.1)
#>  future          1.34.0  2024-07-29 [1] CRAN (R 4.4.1)
#>  generics        0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  ggplot2         3.5.1   2024-04-23 [1] CRAN (R 4.4.1)
#>  globals         0.16.3  2024-03-08 [1] CRAN (R 4.4.0)
#>  glue            1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
#>  gtable          0.3.5   2024-04-22 [1] CRAN (R 4.4.1)
#>  highr           0.11    2024-05-26 [1] CRAN (R 4.4.1)
#>  htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  KernSmooth      2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#>  knitr           1.48    2024-07-07 [1] CRAN (R 4.4.1)
#>  lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  listenv         0.9.1   2024-01-29 [1] CRAN (R 4.4.1)
#>  lwgeom          0.2-14  2024-02-21 [1] CRAN (R 4.4.1)
#>  magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  munsell         0.5.1   2024-04-01 [1] CRAN (R 4.4.1)
#>  parallelly      1.38.0  2024-07-27 [1] CRAN (R 4.4.1)
#>  pillar          1.9.0   2023-03-22 [1] CRAN (R 4.4.1)
#>  pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  proxy           0.4-27  2022-06-09 [1] CRAN (R 4.4.1)
#>  purrr           1.0.2   2023-08-10 [1] CRAN (R 4.4.1)
#>  R6              2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
#>  Rcpp            1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
#>  reprex          2.1.1   2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang           1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown       2.28    2024-08-17 [1] CRAN (R 4.4.1)
#>  rsample         1.2.1   2024-03-25 [1] CRAN (R 4.4.1)
#>  rstudioapi      0.16.0  2024-03-24 [1] CRAN (R 4.4.1)
#>  s2              1.1.7   2024-07-17 [1] CRAN (R 4.4.1)
#>  scales          1.3.0   2023-11-28 [1] CRAN (R 4.4.1)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
#>  sf            * 1.0-16  2024-03-24 [1] CRAN (R 4.4.1)
#>  spatialsample * 0.5.1   2023-11-08 [1] CRAN (R 4.4.1)
#>  tibble          3.2.1   2023-03-20 [1] CRAN (R 4.4.1)
#>  tidyr           1.3.1   2024-01-24 [1] CRAN (R 4.4.1)
#>  tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.4.1)
#>  units           0.8-5   2023-11-28 [1] CRAN (R 4.4.1)
#>  utf8            1.2.4   2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.4.1)
#>  withr           3.0.1   2024-07-31 [1] CRAN (R 4.4.1)
#>  wk              0.9.2   2024-07-09 [1] CRAN (R 4.4.1)
#>  xfun            0.47    2024-08-17 [1] CRAN (R 4.4.1)
#>  xml2            1.3.6   2023-12-04 [1] CRAN (R 4.4.1)
#>  yaml            2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/Users/julienv/AppData/Local/R/win-library/4.4
#>  [2] C:/Program Files/R/R-4.4.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@mikemahoney218
Copy link
Member

Yikes! Thanks for the report (and the excellent reprex); this will be fixed once I merge #161 into main.

mikemahoney218 added a commit that referenced this issue Sep 4, 2024
mikemahoney218 added a commit that referenced this issue Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants