Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump dependencies on Delphi packages #12

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .Rprofile
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
source("renv/activate.R")

# Check if user .Rprofile exists
if (file.exists("~/.Rprofile")) {
# Source user .Rprofile
source("~/.Rprofile")
}
11 changes: 6 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@ Package: delphitoolingbook
Title: Delphi Tooling
Version: 0.0.0.9999
Authors@R: c(
person("Daniel", "McDonald", "J.", "[email protected]", role = c("cre", "aut"),
person("Logan", "Brooks", role = c("cre","aut"),
person("Rachel", "Lobay", role = "aut"))
person("Ryan", "Tibshirani", "J.", "[email protected]", role = "aut"),
Description:
person("Daniel", "McDonald", "J.", "[email protected]", role = c("cre", "aut")),
person("Logan", "Brooks", role = c("cre","aut")),
person("Rachel", "Lobay", role = "aut"),
person("Ryan", "Tibshirani", "J.", "[email protected]", role = "aut")
)
Description:
| This book is a longform introduction to analysing and forecasting epidemiological data.
License: MIT + file LICENSE
Imports:
Expand Down
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Delphi Tooling Book

The book is a collection of articles and tutorials on how to use the Delphi tooling effectively.

## Compiling the book

The book is written with [Quarto](https://quarto.org/docs/guide/) (which can be installed [here](https://quarto.org/docs/get-started/)). To compile the book, run the following commands:

```sh
# Install the R dependencies
R -e 'install.packages(c("pak", "rspm", "renv"))'
R -e 'renv::restore()'

# Compile the book and preview it
quarto preview
```

We use Quarto's freeze feature to re-render only the qmd files that have changed. To force a re-render of a page, run this command:

```sh
quarto render <name.qmd>
```
11 changes: 11 additions & 0 deletions _common.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,14 @@ options(

ggplot2::theme_set(ggplot2::theme_bw())

# Workaround for interleaved `cat`s and `message`s (from `cli`) getting
# intercepted and not combined properly by `collapse: true`:
with_messages_cat_to_stdout <- function(code) {
withCallingHandlers(
code,
message = function(m) {
cat(m$message)
tryInvokeRestart("muffleMessage")
}
)
}
4 changes: 2 additions & 2 deletions _freeze/archive/execute-results/html.json

Large diffs are not rendered by default.

1,465 changes: 1,465 additions & 0 deletions _freeze/archive/figure-html/unnamed-chunk-8-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,179 changes: 588 additions & 591 deletions _freeze/archive/figure-html/unnamed-chunk-9-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
297 changes: 147 additions & 150 deletions _freeze/correlations/figure-html/unnamed-chunk-10-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
345 changes: 171 additions & 174 deletions _freeze/correlations/figure-html/unnamed-chunk-4-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
381 changes: 189 additions & 192 deletions _freeze/correlations/figure-html/unnamed-chunk-6-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
271 changes: 134 additions & 137 deletions _freeze/correlations/figure-html/unnamed-chunk-8-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/epidf/execute-results/html.json

Large diffs are not rendered by default.

461 changes: 229 additions & 232 deletions _freeze/epidf/figure-html/unnamed-chunk-11-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
729 changes: 363 additions & 366 deletions _freeze/epidf/figure-html/unnamed-chunk-13-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,951 changes: 1,474 additions & 1,477 deletions _freeze/epidf/figure-html/unnamed-chunk-15-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/epipredict/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/flatline-forecaster/execute-results/html.json

Large diffs are not rendered by default.

691 changes: 344 additions & 347 deletions _freeze/flatline-forecaster/figure-html/unnamed-chunk-12-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
813 changes: 349 additions & 464 deletions _freeze/flatline-forecaster/figure-html/unnamed-chunk-13-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
875 changes: 436 additions & 439 deletions _freeze/flatline-forecaster/figure-html/unnamed-chunk-14-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
879 changes: 879 additions & 0 deletions _freeze/flatline-forecaster/figure-html/unnamed-chunk-15-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/forecast-framework/execute-results/html.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion _freeze/growth-rates/execute-results/html.json

Large diffs are not rendered by default.

599 changes: 298 additions & 301 deletions _freeze/growth-rates/figure-html/unnamed-chunk-11-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
653 changes: 325 additions & 328 deletions _freeze/growth-rates/figure-html/unnamed-chunk-11-2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,799 changes: 1,888 additions & 1,911 deletions _freeze/growth-rates/figure-html/unnamed-chunk-4-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
377 changes: 187 additions & 190 deletions _freeze/growth-rates/figure-html/unnamed-chunk-5-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
611 changes: 304 additions & 307 deletions _freeze/growth-rates/figure-html/unnamed-chunk-7-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
615 changes: 306 additions & 309 deletions _freeze/growth-rates/figure-html/unnamed-chunk-9-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/index/execute-results/html.json

Large diffs are not rendered by default.

418 changes: 208 additions & 210 deletions _freeze/index/figure-html/unnamed-chunk-8-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion _freeze/outliers/execute-results/html.json

Large diffs are not rendered by default.

537 changes: 267 additions & 270 deletions _freeze/outliers/figure-html/unnamed-chunk-3-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,278 changes: 637 additions & 641 deletions _freeze/outliers/figure-html/unnamed-chunk-7-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,224 changes: 610 additions & 614 deletions _freeze/outliers/figure-html/unnamed-chunk-7-2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
553 changes: 275 additions & 278 deletions _freeze/outliers/figure-html/unnamed-chunk-9-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/preprocessing-and-models/execute-results/html.json

Large diffs are not rendered by default.

481 changes: 239 additions & 242 deletions _freeze/preprocessing-and-models/figure-html/unnamed-chunk-9-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/slide/execute-results/html.json

Large diffs are not rendered by default.

3,219 changes: 1,559 additions & 1,660 deletions _freeze/slide/figure-html/unnamed-chunk-12-1.svg

Large diffs are not rendered by default.

13,652 changes: 6,817 additions & 6,835 deletions _freeze/slide/figure-html/unnamed-chunk-8-1.svg

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/sliding-forecasters/execute-results/html.json

Large diffs are not rendered by default.

3,495 changes: 1,750 additions & 1,745 deletions _freeze/sliding-forecasters/figure-html/plot-ar-asof-1.svg

Large diffs are not rendered by default.

3,458 changes: 1,731 additions & 1,727 deletions _freeze/sliding-forecasters/figure-html/plot-arx-1.svg

Large diffs are not rendered by default.

9,685 changes: 4,947 additions & 4,738 deletions _freeze/sliding-forecasters/figure-html/plot-can-fc-boost-1.svg

Large diffs are not rendered by default.

9,453 changes: 4,979 additions & 4,474 deletions _freeze/sliding-forecasters/figure-html/plot-can-fc-lr-1.svg

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion _freeze/tidymodels-intro/execute-results/html.json

Large diffs are not rendered by default.

647 changes: 322 additions & 325 deletions _freeze/tidymodels-intro/figure-html/unnamed-chunk-23-1.svg

Large diffs are not rendered by default.

273 changes: 136 additions & 137 deletions _freeze/tidymodels-intro/figure-html/unnamed-chunk-26-1.svg

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion _freeze/tidymodels-regression/execute-results/html.json

Large diffs are not rendered by default.

5,234 changes: 2,614 additions & 2,620 deletions _freeze/tidymodels-regression/figure-html/unnamed-chunk-21-1.svg

Large diffs are not rendered by default.

5,180 changes: 2,584 additions & 2,596 deletions _freeze/tidymodels-regression/figure-html/unnamed-chunk-24-1.svg

Large diffs are not rendered by default.

98 changes: 33 additions & 65 deletions archive.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ source("_common.R")

## Getting data into `epi_archive` format

An `epi_archive` object
can be constructed from a data frame, data table, or tibble, provided that it
has (at least) the following columns:
An `epi_archive` object can be constructed from a data frame, data table, or
tibble, provided that it has (at least) the following columns:

* `geo_value`: the geographic value associated with each row of measurements.
* `time_value`: the time value associated with each row of measurements.
Expand Down Expand Up @@ -55,10 +54,10 @@ class(x)
print(x)
```

An `epi_archive` is special kind of class called an R6 class. Its primary field
is a data table `DT`, which is of class `data.table` (from the `data.table`
package), and has columns `geo_value`, `time_value`, `version`, as well as any
number of additional columns.
An `epi_archive` is an S3 class. Its primary field is a data table `DT`, which
is of class `data.table` (from the `{data.table}` package), and has columns
`geo_value`, `time_value`, `version`, as well as any number of additional
columns.

```{r}
class(x$DT)
Expand All @@ -70,33 +69,18 @@ for the data table, as well as any other specified in the metadata (described
below). There can only be a single row per unique combination of key variables,
and therefore the key variables are critical for figuring out how to generate a
snapshot of data from the archive, as of a given version (also described below).

```{r, error=TRUE}
key(x$DT)
```

In general, the last version of each observation is carried forward (LOCF) to
fill in data between recorded versions. **A word of caution:** R6 objects,
unlike most other objects in R, have reference semantics. An important
consequence of this is that objects are not copied when modified.

```{r}
original_value <- x$DT$percent_cli[1]
y <- x # This DOES NOT make a copy of x
y$DT$percent_cli[1] = 0
head(y$DT)
head(x$DT)
x$DT$percent_cli[1] <- original_value
```

To make a copy, we can use the `clone()` method for an R6 class, as in `y <-
x$clone()`. You can read more about reference semantics in Hadley Wickham's
[Advanced R](https://adv-r.hadley.nz/r6.html#r6-semantics) book.
In general, the last version of each observation is carried forward (LOCF) to
fill in data between recorded versions.

## Some details on metadata

The following pieces of metadata are included as fields in an `epi_archive`
object:
object:

* `geo_type`: the type for the geo values.
* `time_type`: the type for the time values.
Expand All @@ -112,10 +96,8 @@ call (as it did in the case above).

A key method of an `epi_archive` class is `as_of()`, which generates a snapshot
of the archive in `epi_df` format. This represents the most up-to-date values of
the signal variables as of a given version. This can be accessed via `x$as_of()`
for an `epi_archive` object `x`, but the package also provides a simple wrapper
function `epix_as_of()` since this is likely a more familiar interface for users
not familiar with R6 (or object-oriented programming).
the signal variables as of a given version. This can be accessed via
`epix_as_of()`.

```{r}
x_snapshot <- epix_as_of(x, max_version = as.Date("2021-06-01"))
Expand All @@ -125,7 +107,7 @@ max(x_snapshot$time_value)
attributes(x_snapshot)$metadata$as_of
```

We can see that the max time value in the `epi_df` object `x_snapshot` that was
We can see that the max time value in the `epi_df` object `x_snapshot` that was
generated from the archive is May 29, 2021, even though the specified version
date was June 1, 2021. From this we can infer that the doctor's visits signal
was 2 days latent on June 1. Also, we can see that the metadata in the `epi_df`
Expand All @@ -134,7 +116,7 @@ object has the version date recorded in the `as_of` field.
By default, using the maximum of the `version` column in the underlying data table in an
`epi_archive` object itself generates a snapshot of the latest values of signal
variables in the entire archive. The `epix_as_of()` function issues a warning in
this case, since updates to the current version may still come in at a later
this case, since updates to the current version may still come in at a later
point in time, due to various reasons, such as synchronization issues.

```{r}
Expand All @@ -143,15 +125,15 @@ x_latest <- epix_as_of(x, max_version = max(x$DT$version))

Below, we pull several snapshots from the archive, spaced one month apart. We
overlay the corresponding signal curves as colored lines, with the version dates
marked by dotted vertical lines, and draw the latest curve in black (from the
marked by dotted vertical lines, and draw the latest curve in black (from the
latest snapshot `x_latest` that the archive can provide).

```{r, fig.width = 8, fig.height = 7}
self_max <- max(x$DT$version)
versions <- seq(as.Date("2020-06-01"), self_max - 1, by = "1 month")
snapshots <- map(
versions,
function(v) {
versions,
function(v) {
epix_as_of(x, max_version = v) %>% mutate(version = v)
}) %>%
list_rbind() %>%
Expand All @@ -162,37 +144,35 @@ snapshots <- map(
```{r, fig.height=7}
#| code-fold: true
ggplot(snapshots %>% filter(!latest),
aes(x = time_value, y = percent_cli)) +
geom_line(aes(color = factor(version)), na.rm = TRUE) +
aes(x = time_value, y = percent_cli)) +
geom_line(aes(color = factor(version)), na.rm = TRUE) +
geom_vline(aes(color = factor(version), xintercept = version), lty = 2) +
facet_wrap(~ geo_value, scales = "free_y", ncol = 1) +
scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
scale_color_viridis_d(option = "A", end = .9) +
labs(x = "Date", y = "% of doctor's visits with CLI") +
labs(x = "Date", y = "% of doctor's visits with CLI") +
theme(legend.position = "none") +
geom_line(data = snapshots %>% filter(latest),
aes(x = time_value, y = percent_cli),
aes(x = time_value, y = percent_cli),
inherit.aes = FALSE, color = "black", na.rm = TRUE)
```

We can see some interesting and highly nontrivial revision behavior: at some
points in time the provisional data snapshots grossly underestimate the latest
curve (look in particular at Florida close to the end of 2021), and at others
they overestimate it (both states towards the beginning of 2021), though not
they overestimate it (both states towards the beginning of 2021), though not
quite as dramatically. Modeling the revision process, which is often called
*backfill modeling*, is an important statistical problem in it of itself.


## Merging `epi_archive` objects
## Merging `epi_archive` objects

Now we demonstrate how to merge two `epi_archive` objects together, e.g., so
that grabbing data from multiple sources as of a particular version can be
performed with a single `as_of` call. The `epi_archive` class provides a method
`merge()` precisely for this purpose. The wrapper function is called
`epix_merge()`; this wrapper avoids mutating its inputs, while `x$merge` will
mutate `x`. Below we merge the working `epi_archive` of versioned percentage CLI
from outpatient visits to another one of versioned COVID-19 case reporting data,
which we fetch the from the [COVIDcast
performed with a single `as_of` call. The `epiprocess` packages provides
`epix_merge()` for this purpose. Below we merge the working `epi_archive` of
versioned percentage CLI from outpatient visits to another one of versioned
COVID-19 case reporting data, which we fetch the from the [COVIDcast
API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html/), on the
rate scale (counts per 100,000 people in the population).

Expand All @@ -209,39 +189,27 @@ When merging archives, unless the archives have identical data release patterns,
the other).

```{r, message = FALSE, warning = FALSE,eval=FALSE}
# This code is for illustration and doesn't run.
# This code is for illustration and doesn't run.
# The result is saved/loaded in the (hidden) next chunk from `{epidatasets}`
y <- covidcast(
data_source = "jhu-csse",
y <- pub_covidcast(
source = "jhu-csse",
signals = "confirmed_7dav_incidence_prop",
time_type = "day",
geo_type = "state",
time_values = epirange(20200601, 20211201),
geo_values = "ca,fl,ny,tx",
issues = epirange(20200601, 20211201)
) %>%
fetch() %>%
select(geo_value, time_value, version = issue, case_rate_7d_av = value) %>%
as_epi_archive(compactify = TRUE)

x$merge(y, sync = "locf", compactify = FALSE)
x <- epix_merge(x, y, sync = "locf", compactify = FALSE)
print(x)
head(x$DT)
```

```{r, echo=FALSE}
x <- archive_cases_dv_subset
print(x)
head(x$DT)
```

Importantly, see that `x$merge` mutated `x` to hold the result of the merge. We
could also have used `xy = epix_merge(x, y)` to avoid mutating `x`. See the
documentation for either for more detailed descriptions of what mutation,
pointer aliasing, and pointer reseating is possible.

## Sliding version-aware computations

::: {.callout-note}
TODO: need a simple example here.
:::
:::
6 changes: 3 additions & 3 deletions epidf.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ library(epidatr)
library(epiprocess)
library(withr)

cases <- covidcast(
data_source = "jhu-csse",
cases <- pub_covidcast(
source = "jhu-csse",
signals = "confirmed_cumulative_num",
time_type = "day",
geo_type = "state",
time_values = epirange(20200301, 20220131),
geo_values = "ca,fl,ny,tx"
) %>% fetch()
)

colnames(cases)
```
Expand Down
4 changes: 2 additions & 2 deletions epipredict.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ Another property of the basic model is the predictive interval. We describe this
```{r differential-levels}
out_q <- arx_forecaster(jhu, "death_rate", c("case_rate", "death_rate"),
args_list = arx_args_list(
levels = c(.01, .025, seq(.05, .95, by = .05), .975, .99))
quantile_levels = c(.01, .025, seq(.05, .95, by = .05), .975, .99))
)
```

Expand All @@ -188,7 +188,7 @@ Additional simple adjustments to the basic forecaster can be made using the func
```{r, eval = FALSE}
arx_args_list(
lags = c(0L, 7L, 14L), ahead = 7L, n_training = Inf,
forecast_date = NULL, target_date = NULL, levels = c(0.05, 0.95),
forecast_date = NULL, target_date = NULL, quantile_levels = c(0.05, 0.95),
symmetrize = TRUE, nonneg = TRUE, quantile_by_key = "geo_value"
)
```
Expand Down
20 changes: 9 additions & 11 deletions epiprocess.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,17 @@ contains the most up-to-date values of the signals variables, as of a given
time.

By convention, functions in the `epiprocess` package that operate on `epi_df`
objects begin with `epi`. For example:
objects begin with `epi`. For example:

- `epi_slide()`, for iteratively applying a custom computation to a variable in
an `epi_df` object over sliding windows in time;

- `epi_cor()`, for computing lagged correlations between variables in an
`epi_df` object, (allowing for grouping by geo value, time value, or any other
variables).

Functions in the package that operate directly on given variables do not begin
with `epi`. For example:
with `epi`. For example:

- `growth_rate()`, for estimating the growth rate of a given signal at given
time values, using various methodologies;
Expand All @@ -35,20 +35,18 @@ Functions in the package that operate directly on given variables do not begin

## `epi_archive`: full version history of a data set

The second main data structure in the package is called
[`epi_archive`]. This is a special class (R6 format)
wrapped around a data table that stores the archive (version history) of some
signal variables of interest.
The second main data structure in the package is called [`epi_archive`]. This is
an S3 class containing a data table that stores the archive (version history) of
some signal variables of interest.

By convention, functions in the `epiprocess` package that operate on
By convention, functions in the `{epiprocess}` package that operate on
`epi_archive` objects begin with `epix` (the "x" is meant to remind you of
"archive"). These are just wrapper functions around the public methods for the
`epi_archive` R6 class. For example:
"archive"). For example:

- `epix_as_of()`, for generating a snapshot in `epi_df` format from the data
archive, which represents the most up-to-date values of the signal variables,
as of the specified version;

- `epix_fill_through_version()`, for filling in some fake version data following
simple rules, for use when downstream methods expect an archive that is more
up-to-date (e.g., if it is a forecasting deadline date and one of our data
Expand Down
Loading