Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

site(d) <- results in corrupt profiles #171

Closed
dylanbeaudette opened this issue Nov 18, 2020 · 2 comments · Fixed by #192
Closed

site(d) <- results in corrupt profiles #171

dylanbeaudette opened this issue Nov 18, 2020 · 2 comments · Fixed by #192

Comments

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Nov 18, 2020

For some reason the NEON data once converted into an SPC does not play nicely with:

subset(S, siteID %in% c('SJER', 'SOAP', 'TEAK'))

There is a lot of "magic" involved with getting the NEON data into R, resulting in a global environment full of objects. Also, the final object S contains many duplicated columns resulting from joins.

This has nothing to do with aqp::subset, rather novel input data and site(x) <-.

Setup:

# Load required packages
library(neonUtilities)
library(aqp)
library(dplyr)

## ----download-data, results='hide'--------------------------------------------------------------------

MP <- loadByProduct(dpID = "DP1.00096.001", check.size = F)

# Unlist to environment - see download/explore tutorial for description
list2env(MP, .GlobalEnv)



## ----join-bgc-----------------------------------------------------------------------------------------

# duplicate the 'horizon' information into a new table
S <- mgp_perhorizon

# duplicate the biogeochemical information into a new table
B <- mgp_perbiogeosample

# Select only 'Regular' samples (not audit)
B <- B[B$biogeoSampleType=="Regular" & 
         !is.na(B$biogeoSampleType),]

# Join biogeochem data to horizon data
S <- left_join(S, B, by=c('horizonID', 'siteID',
                          'pitID','setDate',
                          'collectDate',
                          'domainID',
                          'horizonName'))
S <- arrange(S, siteID, horizonTopDepth)



## ----site-labels--------------------------------------------------------------------------------------

## combine 'domainID' and 'siteID' into a new label variable
S$siteLabel=paste(S$domainID, S$siteID, sep="-")


# init SPC
depths(S) <- siteLabel ~ horizonTopDepth + horizonBottomDepth

Corrupt SPC here:

# move site-level attributes to @site
site(S) <- ~ siteID + pitNamedLocation.x + pitID + labProjID + laboratoryName + nrcsDescriptionID

View error here:

S[42, ]
@dylanbeaudette dylanbeaudette changed the title odd subset related error, most likely the source data site(d) <- results in corrupt profiles Nov 18, 2020
@brownag
Copy link
Member

brownag commented Nov 18, 2020

Narrowed the error condition (SPC validity method returns FALSE) down to inclusion of labProjID and/or laboratoryName in the site<- (formula) normalization call

That is:

site(S) <- ~ siteID + pitNamedLocation.x + pitID + nrcsDescriptionID
S[42,]

works as expected.

The levels of those attributes do not quite do what they are supposed to with site<- -- we are getting duplication of some siteLabel values in site table. Accessing any of the row indices that is duplicated (4, 5, 28, 29, 32, 33, 42, 43 etc.) will produce an invalid SPC (because multiple profile IDs are matched)

image

If we look at shared attributes between S and B (input data tables) we find that we need to probably separate the preparation of horizon data for promotion to SPC from the creation of a unique set of site data (that can be joined via site<-)

@brownag
Copy link
Member

brownag commented Jan 21, 2021

We should add some checks to site<- ~ to make sure it does not allow for this

I would love to work up a proper example for #172 that also pulls in some pertinent information from our databases (SSURGO from SDA).

brownag added a commit that referenced this issue Jan 22, 2021
* Tests for "round trip site normalization/denormalization"; fix target for #171

* Add warnings for bad site<-~ normalization; closes #171; remove ddply #157
brownag added a commit that referenced this issue Jan 22, 2021
* proper data.table generalization of #171 #192

* Revert main change in #194

* cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants