README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.align = 'center',
  out.width = "75%",
  dev.args = list(png = list(type = "cairo"))
)
```

# whitewater

<!-- badges: start -->
[![R-CMD-check](https://github.com/joshualerickson/whitewater/workflows/R-CMD-check/badge.svg)](https://github.com/joshualerickson/whitewater/actions)[![codecov](https://codecov.io/gh/joshualerickson/whitewater/branch/main/graph/badge.svg)](https://app.codecov.io/gh/joshualerickson/whitewater)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
<!-- badges: end -->


The goal of whitewater is to provide sequential and parallel processing for USGS stations in a tidy-style format. This package allows user to `plan()` their choice of parallel processing and then use the argument `parallel = TRUE` in whitewater function calls. The package also puts every output in a `tibble` with data munging of sites, parameter and stat codes, which results in a __tidy__ style data frame.  

## Attention!  

**Please read the new information about USGS Water Services via [{dataRetrieval}](https://github.com/DOI-USGS/dataRetrieval). I will try to get this updated as soon as I can, thank you for your patience!**

**Due to potentially crashing (USGS Water Services)[https://waterservices.usgs.gov/] REST services parallel processing is kept to 120 requests/min. By following this rate limit, we can still benefit from parallel processing but also being mindful/respectful to the USGS Water Services via [{dataRetrieval}](https://github.com/DOI-USGS/dataRetrieval) and REST services. Thank you!**    

## Installation

You can install the development version of whitewater from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("joshualerickson/whitewater")
```

## Example

This is a basic example which shows you how to solve a common problem: get daily values of discharge for multiple sites (all active sites in Pacific Northwest (Region 17)) using parallel processing. Please see [furrr](https://CRAN.R-project.org/package=furrr) and [future](https://CRAN.R-project.org/package=future) for more details on parallel processing methods.  

### Running in parallel  

```{r, warning=F, message=F}
library(whitewater)
library(tidyverse)
library(sf)
library(future)
library(dataRetrieval)

huc17_sites <- dataRetrieval::whatNWISdata(huc = 17,
                                           siteStatus = 'active',
                                           service = 'dv',
                                           parameterCd = '00060',
                                           drainAreaMax = 2000)
cat("# of sites: ", nrow(huc17_sites))

st_as_sf(huc17_sites, coords = c('dec_long_va', 'dec_lat_va')) %>% 
  ggplot() + 
  geom_sf() +
  borders('state', xlim = c(-130, -110), ylim = c(20, 50)) + 
  theme_bw()
```


```{r}
#need to call future::plan()

##### Remember, please use 10 or less workers #####
plan(multisession(workers = 11))

#running on 11 cores

system.time({
pnw_dv <- ww_dvUSGS(huc17_sites$site_no,
                    parameter_cd = '00060',
                    wy_month = 10,
                    parallel = TRUE,
                    verbose = FALSE)
})

nrow(pnw_dv)

pnw_dv
```

Now we can use other `ww_` functions to filter the data by water year, month, water year and month, as well as stat reporting (percentiles comparing current readings).   

### Water Year  

Same as above, we can just call `parallel = TRUE` to run in parallel since we'll be getting peak flows from `dataRetrieval::readNWISpeak()`.  
```{r, warning = F, message=F, error=F}

system.time({
pnw_wy <- suppressMessages(ww_wyUSGS(pnw_dv,
                                     parallel = TRUE,
                                     verbose = FALSE))
})

pnw_wy
```

```{r, echo=F, warning = F, message=F, error=F}
 p1 <-  pnw_wy %>% 
  group_by(site_no) %>% 
  filter(!is.na(peak_va)) %>% 
  add_count() %>% 
  filter( n > 10) %>% 
  nest() %>% 
  mutate(model = map(data, ~broom::tidy(Kendall::MannKendall(.$peak_va))['p.value'])) %>% 
  unnest() %>% slice(1) %>% 
  ggplot(aes(p.value)) + 
  geom_histogram(aes(fill = ..x..)) +
  geom_vline(xintercept = 0.05, linetype = 2) + 
  geom_text(aes(x = 0.15, y = 50, label = '0.05')) +
  scale_fill_gradientn(colors = rev(hcl.colors('Zissou1', n = 32))) +
  theme_bw() + labs(fill = 'p.value')
  
p2 <- pnw_wy %>% 
  group_by(site_no) %>% 
  filter(!is.na(peak_va)) %>% 
  add_count() %>% 
  filter(n > 10) %>% 
  nest() %>% 
  mutate(model = map(data, ~broom::tidy(Kendall::MannKendall(.$peak_va))['p.value'])) %>% 
  unnest() %>% slice(n=1) %>% 
  st_as_sf(coords = c('long', 'lat')) %>% 
  ggplot() + 
  geom_sf(aes(color = p.value), show.legend = F, size = .5)+
  scale_color_gradientn(colors = rev(hcl.colors('Zissou1', n = 32))) +
  borders('state', xlim = c(-130, -110), ylim = c(20, 50)) + 
  theme_bw() 

library(patchwork)
(p2|p1 + theme(legend.position = 'bottom', legend.margin = margin(l = -10, unit='cm'))) +
  plot_annotation(title = 'Mann-Kendall trend analysis (Peak Flow) with active USGS sites in Region 17', subtitle =  '10 years or more of data and drainage area < 2,000 km2')
```


### Without using parallel  

If you just want a few sites (or one) and not use parallel processing, go for it! You'll still get the advantages of filtering and stats. In addition, You don't always have to pipe a `ww_dvUSGS()` object into the `ww_*()` and can just use the `sites` argument. In the example below I'll do this but IMO its nice to start with a `ww_dvUSGS()` object because you'll likely come back to it.   

```{r, warning=F}

withlacoochee_temp_and_flow <- ww_wyUSGS(sites="02319394",
                          parameter_cd = c("00010", "00060"))

withlacoochee_temp_and_flow %>% 
  pivot_longer(c('Wtemp_max', 'Flow_max')) %>% 
ggplot(aes(wy, value)) +
  geom_point() + 
  geom_line() +
  theme_bw() +
  labs(x = 'Water Year') +
  facet_wrap(~name, scale = 'free')

```

### Stats  

Sometimes you just want to compare the current flow (water year or past week, whatever you want) to historical flows. The `ww_statsUSGS()` function does this for you! It takes the historical values for your parameter (flow in this example) and returns percentiles (`dataRetrieval::readNWISstat()`) but also combines the current values.  

```{r}
yaak_river_dv <- ww_dvUSGS('12304500')

yaak_daily_report <- ww_statsUSGS(yaak_river_dv, 
                                  temporalFilter = 'daily',
                                  days = 365)
yaak_daily_report %>% 
  pivot_longer(c('Flow', 'p25_va', 'p75_va')) %>% 
  ggplot() + 
  geom_line(aes(Date, value, color = name, alpha = name %in% c('p25_va', 'p75_va'))) + 
  scale_color_manual(values = c('black', 'red', 'blue')) +
  scale_alpha_manual(values = c(1,.5), guide = 'none') +
  labs(y = 'Discharge (cfs)', color = '', title = 'Comparing current flow to 25th and 75th percentiles') + 
  theme_bw()
```