Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constantDensitySampling gives wrong output with MULTIPOLYGONS #26

Open
dylanbeaudette opened this issue Mar 31, 2020 · 5 comments
Open

Comments

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Mar 31, 2020

constantDensitySampling() was designed to work with sp class objects which don't differentiate between POLYGON and MULTIPOLYGON types. Exploding the MULTIPOLYGONS into POLYGONS via sf is a simple work-around. Long-term, a more robust solution is required, likely switching to sf-class objects / methods.

Not really repeatable, but the basic idea.

library(rgdal)
library(sp)
library(sf)
library(soilDB)
library(sharpshootR)

# load as sp
x <- readOGR(dsn='e:/temp/HI-sampling/Hawaii', layer = 'MLRA_v51_HI')

# check, must be projected CRS
proj4string(x)

# feature ID
x$pID <- 1:nrow(x)

# convert to sf
x.sf <- st_as_sf(x)
x.sf.poly <- st_cast(x.sf, 'POLYGON')

# and back
x.sp.poly <- as(x.sf.poly, 'Spatial')

# sampling on MULTIPOLYGONS
s <- constantDensitySampling(x, min.samples = 0, n.pts.per.ac = 0.001)

# sampling on POLYGONS
x.sp.poly$pID <- 1:nrow(x.sp.poly)
s.2 <- constantDensitySampling(x.sp.poly, min.samples = 0, n.pts.per.ac = 0.001)

# save
writeOGR(s, dsn='e:/temp/HI-sampling', layer='samples', driver = 'ESRI Shapefile', overwrite_layer = TRUE)
writeOGR(s.2, dsn='e:/temp/HI-sampling', layer='samples-2', driver = 'ESRI Shapefile', overwrite_layer = TRUE)
@brownag
Copy link
Member

brownag commented Mar 31, 2020

To me this looks like the shapefile contains one or more multipart features -- so the st_as_sf() result has MULTIPOLYGON geometry.

constantDensitySampling number of sample heuristics are based on total polygon area -- without respect to whether it is multipart or not. It was designed for "clean" SSURGO-topologically-correct (no multipart) features.

The spurious samples are a degenerate case of spsample operating on a multipart polygon. Converting to POLYGON with sf has the net effect of "exploding" the multipart feature, so that
constantDensitySampling can catch that those are really small polygons. If you set the minimum samples to zero, and sampling density is low enough, they would be "ignored'.

As far as a clean way to do this with sp... nah. sf is the way to go. And explode your multipart features before sampling.

@dylanbeaudette dylanbeaudette changed the title constantDensitySampling gives wrong output with MULTIPOLYGONS constantDensitySampling gives wrong output with MULTIPOLYGONS Mar 31, 2020
@dylanbeaudette dylanbeaudette changed the title constantDensitySampling gives wrong output with MULTIPOLYGONS constantDensitySampling gives wrong output with MULTIPOLYGONS Mar 31, 2020
@brownag
Copy link
Member

brownag commented Mar 31, 2020

I'll add that this seems to work as intended on MULTIPOLYGON -- if those multipart features weren't so comically small compared to the big one we would see constant density samples across all parts (I predict) -- as we should

@dylanbeaudette
Copy link
Member Author

Long-term lets move this over to sf and force iteration of POLYGONs.

@brownag
Copy link
Member

brownag commented Apr 1, 2020

Here is a slightly more obvious depiction of what is going on with MULTIPOLYGON feature input to constantDensitySampling. EDIT: I was thinking that there might have been an issue, but the previous code had an error in it.

image

library(sf)
library(sharpshootR)

# replace with any shapefile containing two POLYGON features
f <- st_read("E:/poly_test.shp")

# one MULTIPOLYGON
f.m <- f[1,]
geo <- st_combine(st_cast(f, "MULTIPOLYGON"))
f.m$geometry <- geo

# inspect
par(mfrow=c(1,2), mar=c(1,1,1,1))
plot(f$geometry)
f$pID <- 1:2
f.s <- constantDensitySampling(as_Spatial(f), min.samples = 0, n.pts.per.ac = 0.1)
points(f.s)
box()

plot(f.m$geometry)
f.m$pID <- 1
f.m.s <- constantDensitySampling(as_Spatial(f.m), min.samples = 0, n.pts.per.ac = 0.1)
points(f.m.s)
box()

@brownag
Copy link
Member

brownag commented Apr 1, 2020

I had made a nasty mistake:

# one MULTIPOLYGON
f.m <- f
geo <- st_combine(st_cast(f, "MULTIPOLYGON"))
f.m$geometry <- geo

versus

# one MULTIPOLYGON
f.m <- f[1,]
geo <- st_combine(st_cast(f, "MULTIPOLYGON"))
f.m$geometry <- geo

I was inadvertently duping the geometry -- creating overlapping features that got sampled "correctly". It does seem that depending on the way the multipolygons were generated, the constantDensitySampling results could vary.

@dylanbeaudette
Copy link
Member Author

Thanks for testing this. I'll likely return to this after closing some aqp issues.

1 similar comment
@dylanbeaudette
Copy link
Member Author

Thanks for testing this. I'll likely return to this after closing some aqp issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants