Skip to content

Commit

Permalink
Merge pull request #47 from ncss-tech/WRB2022
Browse files Browse the repository at this point in the history
World Reference Base for Soil Resources (4th Edition, 2022)
  • Loading branch information
brownag authored Oct 1, 2024
2 parents 6f81a38 + d381897 commit 79397ec
Show file tree
Hide file tree
Showing 11 changed files with 2,052 additions and 18 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Suggests:
soilDB,
ape,
data.tree
RoxygenNote: 7.2.3
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
VignetteBuilder: knitr
LazyData: false
46 changes: 34 additions & 12 deletions R/data-documentation.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#'
#' @title Soil Taxonomy Hierarchy
#' Soil Taxonomy Hierarchy
#'
#' @description The first 4 levels of the US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup), presented as a \code{data.frame} (denormalized) and a \code{list} of unique taxa.
#' The first 4 levels of the US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup), presented as a \code{data.frame} (denormalized) and a \code{list} of unique taxa.
#' @details Ordered based on the unique letter codes denoting taxa from the 13th edition of the Keys to Soil Taxonomy.
#' @usage data(ST)
#'
Expand All @@ -20,9 +20,9 @@
#'
"ST"

#' @title Family-level Classes for Soil Taxonomy
#' Family-level Classes for Soil Taxonomy
#'
#' @description A database of family-level class names for Soil Taxonomy.
#' A database of family-level class names for Soil Taxonomy.
#'
#' @references
#' Soil Survey Staff. 2014. Keys to Soil Taxonomy, 12th ed. USDA-Natural Resources Conservation Service, Washington, DC.
Expand All @@ -34,9 +34,9 @@
#'
"ST_family_classes"

#' @title Epipedons, Diagnostic Horizons, Characteristics and Features in Soil Taxonomy
#' Epipedons, Diagnostic Horizons, Characteristics and Features in Soil Taxonomy
#'
#' @description A `data.frame` with columns "group", "name", "chapter", "page", "description", "criteria". Currently page numbers and contents are referenced to 12th Edition Keys to Soil Taxonomy and derived from products in the ncss-tech SoilKnowledgeBase repository (https://github.com/ncss-tech/SoilKnowledgeBase).
#' A `data.frame` with columns "group", "name", "chapter", "page", "description", "criteria". Currently page numbers and contents are referenced to 12th Edition Keys to Soil Taxonomy and derived from products in the ncss-tech SoilKnowledgeBase repository (https://github.com/ncss-tech/SoilKnowledgeBase).
#'
#' @references
#' Soil Survey Staff. 2014. Keys to Soil Taxonomy, 12th ed. USDA-Natural Resources Conservation Service, Washington, DC.
Expand All @@ -48,9 +48,9 @@
#'
"ST_features"

#' @title Formative Elements used by Soil Taxonomy
#' Formative Elements used by Soil Taxonomy
#'
#' @description A database of formative elements used by the first 4 levels of US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup).
#' A database of formative elements used by the first 4 levels of US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup).
#'
#' @references
#' S. W. Buol and R. C. Graham and P. A. McDaniel and R. J. Southard. Soil Genesis and Classification, 5th edition. Iowa State Press, 2003.
Expand All @@ -61,9 +61,9 @@
#'
"ST_formative_elements"

#' @title Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (12th Edition)
#' Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (12th Edition)
#'
#' @description A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#' A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#'
#' @details The lookup table has been corrected to reflect errata that were posted after the print publication of the 12th Edition Keys, as well as typos in the Spanish language edition.
#'
Expand All @@ -81,9 +81,9 @@
#'
"ST_higher_taxa_codes_12th"

#' @title Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (13th Edition)
#' Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (13th Edition)
#'
#' @description A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#' A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#'
#' @references
#'
Expand All @@ -95,3 +95,25 @@
#' @keywords datasets
#'
"ST_higher_taxa_codes_13th"

#' World Reference Base for Soil Resources (4th Edition, 2022)
#'
#' A _list_ containing three _data.frame_ elements `"rsg"`, `"pq"`, and `"sq"` providing information on the 'Representative Soil Groups', 'Principal Qualifiers,' and 'Supplementary Qualifiers,' respectively.
#'
#' @details
#'
#' Each element has the column `"code"` which is a number (1-32) referring to the position in the Reference Soil Groups, and the column `"reference_soil_group"` which is the corresponding group name. The `"pq"` and `"sq"` qualifier name columns (`primary_qualifier` and `supplementary_qualifier`) contain individual qualifier terms. Related qualifiers are identified using `qualifier_group` column derived from qualifier names separated with a forward slash `" / "`
#'
#' - The _data.frame_ `"rsg"` has column `"criteria"`, describing the logical criteria for each Reference Soil Group.
#' - The _data.frame_ `"pq"` has qualifier names in column `"principal_qualifier"`
#' - The _data.frame_ `"sq"` has column `"supplementary_qualifier"`.
#'
#' @references
#'
#' IUSS Working Group WRB. 2022. World Reference Base for Soil Resources. International soil classification system for naming soils and creating legends for soil maps. 4th edition. International Union of Soil Sciences (IUSS), Vienna, Austria.
#'
#' @usage data(WRB_4th_2022)
#'
#' @keywords datasets
#'
"WRB_4th_2022"
6 changes: 3 additions & 3 deletions R/higherTaxaCodes.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Decompose taxon letter codes
#'
#' @description Find all codes that logically comprise the specified codes. For instance, code "ABC" ("Anhyturbels") returns "A" ("Gelisols"), "AB" ("Turbels"), "ABC" ("Anhyturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#' Find all codes that logically comprise the specified codes. For instance, code "ABC" ("Anhyturbels") returns "A" ("Gelisols"), "AB" ("Turbels"), "ABC" ("Anhyturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#'
#' @details Accounts for Keys that run out of capital letters (more than 26 subgroups) and use lowercase letters for a unique subdivision within the "fourth character position."
#'
Expand Down Expand Up @@ -49,7 +49,7 @@ decompose_taxon_code <- function(codes) {

#' Get taxon codes of preceding taxa
#'
#' @description Find all codes that logically precede the specified codes. For instance, code "ABC" ("Anhyturbels") returns "AA" ("Histels") "ABA" ("Histoturbels") and "ABB" ("Aquiturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#' Find all codes that logically precede the specified codes. For instance, code "ABC" ("Anhyturbels") returns "AA" ("Histels") "ABA" ("Histoturbels") and "ABB" ("Aquiturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#'
#' @details Accounts for Keys that run out of capital letters (more than 26 subgroups) and use lowercase letters for a unique subdivision within the "fourth character position."
#'
Expand Down Expand Up @@ -187,7 +187,7 @@ taxon_to_taxon_code <- function(taxon) {

#' Determine relative position of taxon within Keys to Soil Taxonomy (Order to Subgroup)
#'
#' @description The relative position of a taxon is `[number of preceding Key steps] + 1`, or `NA` if it does not exist in the lookup table.
#' The relative position of a taxon is `[number of preceding Key steps] + 1`, or `NA` if it does not exist in the lookup table.
#'
#' @param code A character vector of taxon codes to determine the relative position of.
#'
Expand Down
105 changes: 105 additions & 0 deletions data-raw/wrb_4th_2022.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
## code to prepare `WRBs_4th_2022` dataset goes here
library(pdftools)

## SETUP
##
# dir.create("misc/WRB2022")
# download.file("https://wrb.isric.org/files/WRB_fourth_edition_2022-12-18.pdf",
# destfile = "misc/WRB2022/WRB_fourth_edition_2022-12-18.pdf")

## does not work for RSG/qualifiers; tables used in formatting
## can be used for definitions of diagnostics and qualifiers
# x <- pdf_text("misc/WRB2022/WRB_fourth_edition_2022-12-18.pdf")
# x <- unlist(strsplit(x, "\n"))
# ldx <- cumsum(grepl("Key to the Reference Soil Groups", x))
# y <- split(x, ldx)
# data.frame(y[[11]]) |> View()

## nope
# x <- pdf_data("misc/WRB2022/WRB_fourth_edition_2022-12-18.pdf")
# y <- do.call('rbind', x)
#

x <- readLines("misc/WRB2022/WRB_RSG.txt")
x <- gsub("\u003c", "<", gsub("\u003E", ">", gsub("\u2264", "<=", gsub("\u2265", ">=", x))))
n <- grep("^[A-Z]+$", x)
z.names <- x[n]
x <- x[-n]
idx <- grep("^(Soils having|Other soils)", x)
ldx <- rep(FALSE, length(x))
ldx[idx] <- TRUE
xx <- split(x, cumsum(ldx))
z <- lapply(xx, function(y) {
i <- grep("(; (and|or)|\\.|:)$", y) + 1
i <- i[i < length(y)]
l <- rep(FALSE, length(y))
l[i] <- TRUE
sapply(split(y, cumsum(l)), paste0, collapse = " ")
})
names(z) <- z.names

wrb_rsg <- do.call('rbind', lapply(seq(z), function(i) {
data.frame(code = i, reference_soil_group = z.names[i], criteria = z[[z.names[i]]])
}))
rownames(wrb_rsg) <- NULL
# View(wrb_rsg)

x <- readLines("misc/WRB2022/WRB_PQ.txt")
n <- grep("^[A-Z]+$", x)
z.names <- x[n]
x <- x[-n]
idx <- grep("Principal qualifiers", x)
ldx <- rep(FALSE, length(x))
ldx[idx] <- TRUE
xx <- split(x, cumsum(ldx))
z <- lapply(xx, function(y) {
y <- trimws(gsub("([^ ])/ ", "\\1 / ", y))
y[y != "Principal qualifiers"]
})
names(z) <- z.names

wrb_pq <- do.call('rbind', lapply(seq(z), function(i) {
pq <- lapply(strsplit(z[[z.names[i]]], "/"), trimws)
pg <- lapply(seq(pq), function(j) rep(z[[z.names[i]]][j], length(pq[[j]])))
data.frame(code = i,
reference_soil_group = z.names[i],
qualifier_group = unlist(pg),
principal_qualifiers = unlist(pq))
}))
rownames(wrb_pq) <- NULL
# View(wrb_pq)

x <- readLines("misc/WRB2022/WRB_SQ.txt")
n <- grep("^[A-Z]+$", x)
z.names <- x[n]
x <- x[-n]
idx <- grep("Supplementary qualifiers", x)
ldx <- rep(FALSE, length(x))
ldx[idx] <- TRUE
xx <- split(x, cumsum(ldx))
z <- lapply(xx, function(y) {
y <- trimws(gsub("([^ ])/ ", "\\1 / ", y))
y[y != "Supplementary qualifiers"]
})
names(z) <- z.names

wrb_sq <- do.call('rbind', lapply(seq(z), function(i) {
sq <- lapply(strsplit(z[[z.names[i]]], "/"), trimws)
sg <- lapply(seq(sq), function(j) rep(z[[z.names[i]]][j], length(sq[[j]])))
data.frame(code = i,
reference_soil_group = z.names[i],
qualifier_group = unlist(sg),
supplementary_qualifiers = unlist(sq))
}))
rownames(wrb_sq) <- NULL
# View(wrb_sq)

WRB_4th_2022 <- list(
rsg = wrb_rsg,
pq = wrb_pq,
sq = wrb_sq
)

stopifnot(all(sapply(WRB_4th_2022, function(x) max(x$code)) == 32))

usethis::use_data(WRB_4th_2022, overwrite = TRUE)
Binary file added data/WRB_4th_2022.rda
Binary file not shown.
2 changes: 1 addition & 1 deletion man/SoilTaxonomy-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions man/WRB_4th_2022.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion misc/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
*.json
subgroups.tgz
.Rproj.user
.Rhistory
*.Rproj
WRB2022/WRB_fourth_edition_2022-12-18.pdf
Loading

0 comments on commit 79397ec

Please sign in to comment.