You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using spatial_clustering_cv to create spatial resamples, the geometry column is retained within the folds. This causes fit_resamples to fail with an error indicating that not all columns of y are known outcome types. It's unclear whether spatial_clustering_cv should drop the spatial information in the folds or if fit_resamples should exclude the geometry information. There might be something I'm missing.
Reproducible example
# Load package
library(dplyr, warn.conflicts=FALSE)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(spatialsample)
library(workflows)
library(parsnip)
library(tune)
# Example datanc<- st_read(system.file("shape/nc.shp", package="sf"), quiet=TRUE)
# Making spatial clustersnc_folds<- spatial_clustering_cv(nc, v=5)
# Workflow for linear regressionlr_recipe<- workflow() %>%
add_variables(outcomes=BIR74,
predictors=AREA) %>%
add_model(linear_reg(engine="lm"))
# Tuning parameters: Fail
(spatial_lr<- fit_resamples(lr_recipe, nc_folds))
#> → A | error: Not all columns of `y` are known outcome types. These columns have unknown types: 'geometry'.#> There were issues with some computations A: x1#> There were issues with some computations A: x5#> #> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more#> information.#> # Resampling results#> # 5-fold spatial cross-validation #> # A tibble: 5 × 4#> splits id .metrics .notes #> <list> <chr> <list> <list> #> 1 <split [77/23]> Fold1 <NULL> <tibble [1 × 3]>#> 2 <split [75/25]> Fold2 <NULL> <tibble [1 × 3]>#> 3 <split [79/21]> Fold3 <NULL> <tibble [1 × 3]>#> 4 <split [84/16]> Fold4 <NULL> <tibble [1 × 3]>#> 5 <split [85/15]> Fold5 <NULL> <tibble [1 × 3]>#> #> There were issues with some computations:#> #> - Error(s) x5: Not all columns of `y` are known outcome types. These columns hav...#> #> Run `show_notes(.Last.tune.result)` for more information.# Best tuning parameters: : Fail
collect_metrics(spatial_lr)
#> Error in `estimate_tune_results()`:#> ! All models failed. Run `show_notes(.Last.tune.result)` for more information.# Try with st_drop_geometry:orig_class<- class(nc_folds)
nc_folds<-nc_folds %>%
mutate(splits=purrr::map(splits, ~ {
.x$data<- st_drop_geometry(.x$data)
.x
}))
class(nc_folds) <-orig_class# Tuning parameters
(spatial_lr<- fit_resamples(lr_recipe, nc_folds))
#> # Resampling results#> # -fold spatial cross-validation #> # A tibble: 5 × 4#> splits id .metrics .notes #> <list> <chr> <list> <list> #> 1 <split [77/23]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>#> 2 <split [75/25]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>#> 3 <split [79/21]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>#> 4 <split [84/16]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>#> 5 <split [85/15]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]># Best tuning parameters
collect_metrics(spatial_lr)
#> # A tibble: 2 × 6#> .metric .estimator mean n std_err .config #> <chr> <chr> <dbl> <int> <dbl> <chr> #> 1 rmse standard 3542. 5 634. Preprocessor1_Model1#> 2 rsq standard 0.178 5 0.0616 Preprocessor1_Model1
Try using add_formula instead of add_variables as a workaround
(Sorry for the brief reply -- I'm traveling at the moment so can't run stuff, but wanted to make sure I could try to help you get unstuck. This is definitely a bug somewhere)
The problem
When using
spatial_clustering_cv
to create spatial resamples, the geometry column is retained within the folds. This causesfit_resamples
to fail with an error indicating that not all columns of y are known outcome types. It's unclear whetherspatial_clustering_cv
should drop the spatial information in the folds or iffit_resamples
should exclude the geometry information. There might be something I'm missing.Reproducible example
Created on 2024-07-19 with reprex v2.1.1
Session info
The text was updated successfully, but these errors were encountered: