-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
246 lines (163 loc) · 11.9 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# rosettaPTF
<!-- badges: start -->
[![R-CMD-check](https://github.com/ncss-tech/rosettaPTF/workflows/R-CMD-check/badge.svg)](https://github.com/ncss-tech/rosettaPTF/actions)
[![HTML Docs](https://img.shields.io/badge/docs-HTML-informational)](https://ncss-tech.github.io/rosettaPTF/)
[![codecov](https://codecov.io/gh/ncss-tech/rosettaPTF/branch/main/graph/badge.svg?token=BYBKW7PKC3)](https://codecov.io/gh/ncss-tech/rosettaPTF)
<!-- badges: end -->
Rosetta is a neural network-based model for predicting unsaturated soil hydraulic parameters from basic soil characterization data. The model predicts parameters for the van Genuchten unsaturated soil hydraulic properties model, using sand, silt, and clay, bulk density and water content.
{rosettaPTF} uses {reticulate} to wrap the Python [rosetta-soil](https://github.com/usda-ars-ussl/rosetta-soil) module: providing several versions of the [Rosetta](http://ncss-tech.github.io/AQP/soilDB/ROSETTA-API.html) pedotransfer functions in an R environment.
This package is primarily intended for more demanding use cases (such as calling Rosetta "continuously" on each cell in a stack of rasters), or for accessing the uncertainty and shape metrics from Zhang & Schaap (2017). High-throughput input to the pedotransfer function is possible by using RasterStack ([raster](https://github.com/rspatial/raster/)) or SpatRaster ([terra](https://github.com/rspatial/terra/)) objects as input.
## Install {rosettaPTF}
First, install the package from GitHub:
```{r, eval = FALSE}
if (!require("remotes")) install.packages("remotes")
remotes::install_github("ncss-tech/rosettaPTF")
```
Then load the `rosetta-soil` module by loading the R package. If you do not have an available `python` installation or `rosetta-soil` module you will be notified.
```{r}
library(rosettaPTF)
```
### `rosetta-soil` Python module
The [rosetta-soil](https://github.com/usda-ars-ussl/rosetta-soil) module is a Python package maintained by Dr. Todd Skaggs (USDA-ARS) and other U.S. Department of Agriculture employees.
The Rosetta pedotransfer function predicts five parameters for the van Genuchten model of unsaturated soil hydraulic properties
- `theta_r` : residual volumetric water content
- `theta_s` : saturated volumetric water content
- `log10(alpha)` : retention shape parameter `[log10(1/cm)]`
- `log10(n)` : retention shape parameter (also referred to as `npar`)
- `log10(ksat)` : saturated hydraulic conductivity `[log10(cm/d)]`
For each set of input data a mean and standard deviation of each parameter is given.
Less demanding use cases are encouraged to use the web interface or API endpoint. There are additional wrappers of the API endpoints provided by the soilDB R package `ROSETTA()` method. For small amounts of data consider using the interactive version that has copy/paste functionality: https://www.handbook60.org/rosetta.
# Input Data
The [Rosetta](http://ncss-tech.github.io/AQP/soilDB/ROSETTA-API.html) model relies on a minimum of 3 soil properties, with increasing (expected) accuracy as additional properties are included:
* Required, `sand`, `silt`, `clay`: USDA soil texture separates (percentages) that sum to 100%
* Optional, `bulk density (any moisture basis)`: mass per volume after accounting for >2mm fragments, units of grams/cm3
* Optional, `volumetric water content at 33 kPa`: roughly “field capacity” for most soils, units of cm3/cm3
* Optional, `volumetric water content at 1500 kPa`: roughly “permanent wilting point” for most plants, units of cm3/cm3
The default order of inputs is: `sand`, `silt`, `clay`, `bulk density (any basis)`, `water content (field capacity; 33 kPa)`, `water content (permanent wilting point; 1500 kPa)` of which the first three are required.
If you specify field capacity water content, you must specify bulk density. If you specify permanent wilting point water content you must also specify bulk density and field capacity water content.
## {reticulate} Setup
If you are using this package for the first time you will need to have Python installed and you will need to download the necessary modules.
You can set up {reticulate} to install modules into a virtual environment. {reticulate} offers `reticulate::install_python()` to download and set up Python if you have not yet done so.
For example, install a recent version of Python, and create a virtual environment called `"r-reticulate"`
```{r, eval = FALSE}
# download latest python 3.10.x
reticulate::install_python(version = "3.10:latest")
reticulate::virtualenv_create("r-reticulate")
```
### Finding the `python` binaries
```{r}
rosettaPTF::find_python()
```
`find_python()` provides heuristics for setting up {reticulate} to use Python in commonly installed locations.
The first attempt makes use of `Sys.which()` to find installations available in the user path directory.
<!--
`find_python()` also provides an option for using ArcGIS Pro Conda environments--which may be needed for users who cannot install Conda by some other means. To use this option specify the `arcpy_path` argument or the `rosettaPTF.arcpy_path` option to locate both the ArcGIS Pro Conda environment and Python binaries in _C:/Program Files/ArcGIS/Pro/bin/Python_, for example:
```{r, eval=FALSE}
rosettaPTF::find_python(arcpy_path = "C:/Program Files/ArcGIS/Pro/bin/Python")
```
-->
If automatic configuration via `find_python()` fails (returns `NULL`) you can manually set a path to the `python` executable with the {reticulate} `RETICULATE_PYTHON` environment variable: `Sys.setenv(RETICULATE_PYTHON = "path/to/python")` or `reticulate::use_python("path/to/python")`
### Install `rosetta-soil` Python Module
The {rosettaPTF} `install_rosetta()` method wraps `reticulate::py_install("rosetta-soil")`. You may not need to install the `rosetta-soil` module if your environment is set up, as {reticulate} will install/upgrade dependencies of packages as specified in the package configuration section of the DESCRIPTION file.
You can use `install_rosetta()` to install into custom environments by specifying `envname` as needed. After installing a new version of the module you should restart your R session.
```{r}
rosettaPTF::install_rosetta()
```
Alternately, to install the module manually with `pip` you can run the following command. This assumes a Python 3 binary called `python` can be found on your path.
```sh
python -m pip install rosetta-soil
```
## `run_rosetta()`
Batch runs of Rosetta models can be done using using `list`, `data.frame`, `matrix`, `RasterStack`, `RasterBrick` and `SpatRaster` objects as input.
### `list()` Input Example
```{r}
run_rosetta(list(c(30, 30, 40, 1.5), c(55, 25, 20), c(55, 25, 20, 1.1)),
rosetta_version = 3)
```
Output `model_code` reflects the number of parameters in the input.
### `data.frame()` Input Example
The `data.frame` interface allows for using using custom column names and order. If the `vars` argument is not specified it is assumed that the columns are in the order specified in the `run_rosetta()` manual page.
```{r}
run_rosetta(data.frame(
d = c(NA, 1.5),
b = 60,
a = 20,
c = 20
), vars = letters[1:4])
```
### Soil Data Access / SSURGO Mapunit Aggregate Input Example
This example pulls mapunit/component data from Soil Data Access (SDA). We use the {soilDB} function `get_SDA_property()` to obtain representative values for `sand`, `silt`, `clay`, and `bulk density (1/3 bar)` we run Rosetta on the resulting data.frame (one row per mapunit) then use raster attribute table (RAT) to display the results (1:1 with `mukey`).
```{r}
library(soilDB)
library(terra)
library(rosettaPTF)
# obtain mukey map from SoilWeb Web Coverage Service (800m resolution SSURGO derived)
res <- mukey.wcs(aoi = list(aoi = c(-114.16, 47.65,-114.08, 47.68), crs = 'EPSG:4326'))
# request input data from SDA
varnames <- c("sandtotal_r", "silttotal_r", "claytotal_r", "dbthirdbar_r")
resprop <- get_SDA_property(property = varnames,
method = "Dominant Component (numeric)",
mukeys = unique(values(res$mukey)))
# keep only those where we have a complete set of 4 parameters (sand, silt, clay, bulk density; model code #3)
soildata <- resprop[complete.cases(resprop), c("mukey", varnames)]
# run Rosetta on the mapunit-level aggregate data
system.time(resrose <- run_rosetta(soildata[,varnames]))
# transfer mukey to result
resprop$mukey <- as.numeric(resprop$mukey)
resrose$mukey <- as.numeric(soildata$mukey)
# merge property (input) and rosetta parameters (output) into RAT
levels(res) <- merge(cats(res)[[1]], resprop, by.x = "ID", by.y = "mukey", all.x = TRUE, sort = FALSE)
levels(res) <- merge(cats(res)[[1]], resrose, by.x = "ID", by.y = "mukey", all.x = TRUE, sort = FALSE)
# convert categories based on mukey to numeric values
res2 <- catalyze(res)
# make a plot of the predicted Ksat
plot(res2, "log10_Ksat_mean")
```
### _SpatRaster_ (terra) Input Example
The above example shows how to create raster output based on _discrete_ (SSURGO polygon derived) data. A more general case is when each raster cell has "unique" values (i.e. _continuous_ raster inputs). `run_rosetta()` has an S3 method defined for _SpatRaster_ input.
We previously merged the input data from SDA (an ordinary _data.frame_) into the RAT of `res`; exploiting the linkage between `mukey` and raster cells to make the map. For comparison with the `mukey` results above we stack de-ratified input layers and create a new _SpatRaster_.
```{r}
res3 <- rast(list(
res2[["sandtotal_r"]],
res2[["silttotal_r"]],
res2[["claytotal_r"]],
res2[["dbthirdbar_r"]]
))
# SpatRaster to data.frame interface (one call on all cells)
system.time(test2 <- run_rosetta(res3))
# make a plot of the predicted Ksat (identical to mukey-based results)
plot(test2, "log10_Ksat_mean")
```
You will notice the results for Ksat distribution are identical since the same input values were used, but the latter approach took longer to run. The time difference is the difference of estimating ~40 (1 estimate per mapunit key) versus ~30,000 (1 estimate per raster cell) sets of Rosetta parameters.
## Extended Output with `Rosetta` S3 Class
### Make a _Rosetta_ class instance for running extended output methods
Note that each instance of _Rosetta_ has a fixed version and model code, so if you have heterogeneous input you need to iterate over model code.
```{r}
# defaults are version 3 and model code 3 (4 parameters: sand, silt, clay and bulk density)
my_rosetta <- Rosetta(rosetta_version = 3, model_code = 3)
```
### `predict()` Rosetta Parameter Values and Standard Deviations from a _Rosetta_ instance
```{r}
predict(my_rosetta, list(c(30, 30, 40, 1.5), c(55, 25, 20, 1.1)))
```
### Extended _Rosetta_ Predictions, Parameter Distributions and Summary Statistics after Zhang & Schaap (2017) with `ann_predict()`
```{r}
ann_predict(my_rosetta, list(c(30, 30, 40, 1.5), c(55, 25, 20, 1.1)))
```
## Selected References
Three versions of the ROSETTA model are available, selected using `rosetta_version` argument.
- `rosetta_version` 1 - Schaap, M.G., F.J. Leij, and M.Th. van Genuchten. 2001. ROSETTA: a computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions. Journal of Hydrology 251(3-4): 163-176. doi: 10.1016/S0022-1694(01)00466-8.
- `rosetta_version` 2 - Schaap, M.G., A. Nemes, and M.T. van Genuchten. 2004. Comparison of Models for Indirect Estimation of Water Retention and Available Water in Surface Soils. Vadose Zone Journal 3(4): 1455-1463. doi: 10.2136/vzj2004.1455.
- `rosetta_version` 3 - Zhang, Y., and M.G. Schaap. 2017. Weighted recalibration of the Rosetta pedotransfer model with improved estimates of hydraulic parameter distributions and summary statistics (Rosetta3). Journal of Hydrology 547: 39-53. doi: 10.1016/j.jhydrol.2017.01.004.