-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
184 lines (139 loc) · 7.04 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
fig.align = 'center',
out.width = "75%",
dev.args = list(png = list(type = "cairo"))
)
```
# whitewater
<!-- badges: start -->
[![R-CMD-check](https://github.com/joshualerickson/whitewater/workflows/R-CMD-check/badge.svg)](https://github.com/joshualerickson/whitewater/actions)[![codecov](https://codecov.io/gh/joshualerickson/whitewater/branch/main/graph/badge.svg)](https://app.codecov.io/gh/joshualerickson/whitewater)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
<!-- badges: end -->
The goal of whitewater is to provide sequential and parallel processing for USGS stations in a tidy-style format. This package allows user to `plan()` their choice of parallel processing and then use the argument `parallel = TRUE` in whitewater function calls. The package also puts every output in a `tibble` with data munging of sites, parameter and stat codes, which results in a __tidy__ style data frame.
## Attention!
**Please read the new information about USGS Water Services via [{dataRetrieval}](https://github.com/DOI-USGS/dataRetrieval). I will try to get this updated as soon as I can, thank you for your patience!**
**Due to potentially crashing (USGS Water Services)[https://waterservices.usgs.gov/] REST services parallel processing is kept to 120 requests/min. By following this rate limit, we can still benefit from parallel processing but also being mindful/respectful to the USGS Water Services via [{dataRetrieval}](https://github.com/DOI-USGS/dataRetrieval) and REST services. Thank you!**
## Installation
You can install the development version of whitewater from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("joshualerickson/whitewater")
```
## Example
This is a basic example which shows you how to solve a common problem: get daily values of discharge for multiple sites (all active sites in Pacific Northwest (Region 17)) using parallel processing. Please see [furrr](https://CRAN.R-project.org/package=furrr) and [future](https://CRAN.R-project.org/package=future) for more details on parallel processing methods.
### Running in parallel
```{r, warning=F, message=F}
library(whitewater)
library(tidyverse)
library(sf)
library(future)
library(dataRetrieval)
huc17_sites <- dataRetrieval::whatNWISdata(huc = 17,
siteStatus = 'active',
service = 'dv',
parameterCd = '00060',
drainAreaMax = 2000)
cat("# of sites: ", nrow(huc17_sites))
st_as_sf(huc17_sites, coords = c('dec_long_va', 'dec_lat_va')) %>%
ggplot() +
geom_sf() +
borders('state', xlim = c(-130, -110), ylim = c(20, 50)) +
theme_bw()
```
```{r}
#need to call future::plan()
##### Remember, please use 10 or less workers #####
plan(multisession(workers = 11))
#running on 11 cores
system.time({
pnw_dv <- ww_dvUSGS(huc17_sites$site_no,
parameter_cd = '00060',
wy_month = 10,
parallel = TRUE,
verbose = FALSE)
})
nrow(pnw_dv)
pnw_dv
```
Now we can use other `ww_` functions to filter the data by water year, month, water year and month, as well as stat reporting (percentiles comparing current readings).
### Water Year
Same as above, we can just call `parallel = TRUE` to run in parallel since we'll be getting peak flows from `dataRetrieval::readNWISpeak()`.
```{r, warning = F, message=F, error=F}
system.time({
pnw_wy <- suppressMessages(ww_wyUSGS(pnw_dv,
parallel = TRUE,
verbose = FALSE))
})
pnw_wy
```
```{r, echo=F, warning = F, message=F, error=F}
p1 <- pnw_wy %>%
group_by(site_no) %>%
filter(!is.na(peak_va)) %>%
add_count() %>%
filter( n > 10) %>%
nest() %>%
mutate(model = map(data, ~broom::tidy(Kendall::MannKendall(.$peak_va))['p.value'])) %>%
unnest() %>% slice(1) %>%
ggplot(aes(p.value)) +
geom_histogram(aes(fill = ..x..)) +
geom_vline(xintercept = 0.05, linetype = 2) +
geom_text(aes(x = 0.15, y = 50, label = '0.05')) +
scale_fill_gradientn(colors = rev(hcl.colors('Zissou1', n = 32))) +
theme_bw() + labs(fill = 'p.value')
p2 <- pnw_wy %>%
group_by(site_no) %>%
filter(!is.na(peak_va)) %>%
add_count() %>%
filter(n > 10) %>%
nest() %>%
mutate(model = map(data, ~broom::tidy(Kendall::MannKendall(.$peak_va))['p.value'])) %>%
unnest() %>% slice(n=1) %>%
st_as_sf(coords = c('long', 'lat')) %>%
ggplot() +
geom_sf(aes(color = p.value), show.legend = F, size = .5)+
scale_color_gradientn(colors = rev(hcl.colors('Zissou1', n = 32))) +
borders('state', xlim = c(-130, -110), ylim = c(20, 50)) +
theme_bw()
library(patchwork)
(p2|p1 + theme(legend.position = 'bottom', legend.margin = margin(l = -10, unit='cm'))) +
plot_annotation(title = 'Mann-Kendall trend analysis (Peak Flow) with active USGS sites in Region 17', subtitle = '10 years or more of data and drainage area < 2,000 km2')
```
### Without using parallel
If you just want a few sites (or one) and not use parallel processing, go for it! You'll still get the advantages of filtering and stats. In addition, You don't always have to pipe a `ww_dvUSGS()` object into the `ww_*()` and can just use the `sites` argument. In the example below I'll do this but IMO its nice to start with a `ww_dvUSGS()` object because you'll likely come back to it.
```{r, warning=F}
withlacoochee_temp_and_flow <- ww_wyUSGS(sites="02319394",
parameter_cd = c("00010", "00060"))
withlacoochee_temp_and_flow %>%
pivot_longer(c('Wtemp_max', 'Flow_max')) %>%
ggplot(aes(wy, value)) +
geom_point() +
geom_line() +
theme_bw() +
labs(x = 'Water Year') +
facet_wrap(~name, scale = 'free')
```
### Stats
Sometimes you just want to compare the current flow (water year or past week, whatever you want) to historical flows. The `ww_statsUSGS()` function does this for you! It takes the historical values for your parameter (flow in this example) and returns percentiles (`dataRetrieval::readNWISstat()`) but also combines the current values.
```{r}
yaak_river_dv <- ww_dvUSGS('12304500')
yaak_daily_report <- ww_statsUSGS(yaak_river_dv,
temporalFilter = 'daily',
days = 365)
yaak_daily_report %>%
pivot_longer(c('Flow', 'p25_va', 'p75_va')) %>%
ggplot() +
geom_line(aes(Date, value, color = name, alpha = name %in% c('p25_va', 'p75_va'))) +
scale_color_manual(values = c('black', 'red', 'blue')) +
scale_alpha_manual(values = c(1,.5), guide = 'none') +
labs(y = 'Discharge (cfs)', color = '', title = 'Comparing current flow to 25th and 75th percentiles') +
theme_bw()
```