Skip to content

Commit

Permalink
Plot distributions in workflow vignette (#781)
Browse files Browse the repository at this point in the history
* Add plots of the distributions

* Assign dists before printing/plotting

* Delete speedup vignette from this PR

* precompile to Rmd

* re-render

---------

Co-authored-by: Sebastian Funk <[email protected]>
  • Loading branch information
jamesmbaazam and sbfnk authored Sep 27, 2024
1 parent ae825f9 commit cb46acc
Show file tree
Hide file tree
Showing 6 changed files with 88 additions and 31 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/estimate_infections_workflow-results-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
88 changes: 62 additions & 26 deletions vignettes/estimate_infections_workflow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ Obtaining a good and full understanding of the data being used is an important f
_EpiNow2_ expects data in the format of a data frame with two columns, `date` and `confirm`, where `confirm` stands for the number of confirmed counts - although in reality this can be applied to any data including suspected cases and lab-confirmed outcomes.
The user might already have the data as such a time series provided, for example, on public dashboards or directly from public health authorities.
Alternatively, they can be constructed from individual-level data, for example using the [incidence2](https://CRAN.R-project.org/package=incidence2) R package.
An example data set called `example_confirm` is included in the package:
An example data set called `example_confirmed` is included in the package:


```r
head(example_confirmed)
``` r
head(EpiNow2::example_confirmed)
#> date confirm
#> <Date> <num>
#> 1: 2020-02-22 14
Expand All @@ -49,14 +49,19 @@ Some of these can be mitigated using the routines available in _EpiNow2_ as desc
We first load the _EpiNow2_ package.


```r
``` r
library("EpiNow2")
#>
#> Attaching package: 'EpiNow2'
#> The following object is masked from 'package:stats':
#>
#> Gamma
```

We then set the number of cores to use. We will want to run 4 MCMC chains in parallel so we set this to 4.


```r
``` r
options(mc.cores = 4)
```

Expand All @@ -74,34 +79,45 @@ They are defined using a common interface that involves functions that are named
For help with this function, see its manual page


```r
``` r
?EpiNow2::Distributions
```

In all cases, the distributions given can be *fixed* (i.e. have no uncertainty) or *variable* (i.e. have associated uncertainty).
For example, to define a fixed gamma distribution with mean 3, standard deviation (sd) 1 and maximum value 10, you would write


```r
Gamma(mean = 3, sd = 1, max = 10)
``` r
fixed_gamma <- Gamma(mean = 3, sd = 1, max = 10)
fixed_gamma
#> - gamma distribution (max: 10):
#> shape:
#> 9
#> rate:
#> 3
```

which looks like this when plotted


``` r
plot(fixed_gamma)
```

![plot of chunk plot_fixed_gamma](estimate_infections_workflow-plot_fixed_gamma-1.png)

If distributions are variable, the values with uncertainty are treated as [prior probability densities](https://en.wikipedia.org/wiki/Prior_probability) in the Bayesian inference framework used by _EpiNow2_, i.e. they are estimated as part of the inference.
For example, to define a variable gamma distribution where uncertainty in the mean is given by a normal distribution with mean 3 and sd 2, and uncertainty in the standard deviation is given by a normal distribution with mean 1 and sd 0.1, with a maximum value 10, you would write


```r
Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10)
#> Warning in new_dist_spec(params, "gamma"): Uncertain gamma distribution
#> specified in terms of parameters that are not the "natural" parameters of the
#> distribution (shape, rate). Converting using a crude and very approximate
#> method that is likely to produce biased results. If possible, it is preferable
#> to specify the distribution directly in terms of the natural parameters.
``` r
uncertain_gamma <- Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10)
#> Warning: ! Uncertain gamma distribution specified in terms of parameters that are not the "natural"
#> parameters of the distribution shape and rate.
#> ℹ Converting using a crude and very approximate method that is likely to produce biased results.
#> ℹ If possible it is preferable to specify the distribution directly in terms of the natural
#> parameters.
uncertain_gamma
#> - gamma distribution (max: 10):
#> shape:
#> - normal distribution:
Expand All @@ -117,6 +133,15 @@ Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10)
#> 1.4
```

which looks like this when plotted


``` r
plot(uncertain_gamma)
```

![plot of chunk plot_uncertain_gamma](estimate_infections_workflow-plot_uncertain_gamma-1.png)

Note the warning about parameters.
We used the mean and standard deviation to define this distribution with uncertain parameters, but it would be better to use the "natural" parameters of the gamma distribution, shape and rate, for example using the values estimate and reported back after calling the previous command.

Expand All @@ -134,7 +159,7 @@ In _EpiNow2_, the generation time distribution is defined by a call to `gt_opts(
For example, to define the generation time as gamma distributed with uncertain mean centered on 3 and sd centered on 1 with some uncertainty, a maximum value of 10 and weighted by the number of case data points we could use the shape and rate parameters suggested above (though notes that this will only very approximately produce the uncertainty in mean and standard deviation stated there):


```r
``` r
generation_time <- Gamma(
shape = Normal(9, 2.5), rate = Normal(3, 1.4), max = 10
)
Expand All @@ -151,14 +176,15 @@ Often, such counts are composed of multiple delays for which we only have separa
In this case, we can combine multiple delays with the plus (`+`) operator, e.g.


```r
``` r
incubation_period <- LogNormal(
meanlog = Normal(1.6, 0.05),
sdlog = Normal(0.5, 0.05),
max = 14
)
reporting_delay <- LogNormal(meanlog = 0.5, sdlog = 0.5, max = 10)
incubation_period + reporting_delay
combined_delays <- incubation_period + reporting_delay
combined_delays
#> Composite distribution:
#> - lognormal distribution (max: 14):
#> meanlog:
Expand All @@ -180,18 +206,27 @@ incubation_period + reporting_delay
#> 0.5
```

We can visualise this combined delay


``` r
plot(combined_delays)
```

![plot of chunk plot_combined_delay](estimate_infections_workflow-plot_combined_delay-1.png)

In _EpiNow2_, the reporting delay distribution is defined by a call to `delay_opts()`, a function that takes a single argument defined as a `dist_spec` object (returned by `LogNormal()`, `Gamma()` etc.).
For example, if our observations were by symptom onset we would use


```r
``` r
delay_opts(incubation_period)
```

If they were by date of lab confirmation that happens with a delay given by `reporting_delay`, we would use


```r
``` r
delay <- incubation_period + reporting_delay
delay_opts(delay)
```
Expand All @@ -209,7 +244,7 @@ In _EpiNow2_, this can be done using the `estimate_truncation()` method which re
For more details on the model used for this, see the [estimate_truncation](estimate_truncation.html) vignette.


```r
``` r
?estimate_truncation
```

Expand All @@ -227,7 +262,7 @@ In _EpiNow2_ we can specify the proportion of infections that we expect to be ob
For example, if we think that 40% (with standard deviation 1%) of infections end up in the data as observations we could specify.


```r
``` r
obs_scale <- list(mean = 0.4, sd = 0.01)
obs_opts(scale = obs_scale)
```
Expand All @@ -241,7 +276,7 @@ It can be changed using the `rt_opts()` function.
For example, if the user believes that at the very start of the data the reproduction number was 2, with uncertainty in this belief represented by a standard deviation of 1, they would use


```r
``` r
rt_prior <- list(mean = 2, sd = 1)
rt_opts(prior = rt_prior)
```
Expand All @@ -258,14 +293,15 @@ All the options are combined in a call to the `estimate_infections()` function.
For example, using some of the options described above one could call


```r
``` r
def <- estimate_infections(
example_confirmed,
generation_time = gt_opts(generation_time),
delays = delay_opts(delay),
rt = rt_opts(prior = rt_prior)
)
#> Warning: There were 3 divergent transitions after warmup. See
#> DEBUG [2024-09-26 20:23:03] estimate_infections: Running in exact mode for 2000 samples (across 4 chains each with a warm up of 250 iterations each) and 147 time steps of which 7 are a forecast
#> Warning: There were 6 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
Expand All @@ -284,7 +320,7 @@ The package contains functionality to estimate the delay and scaling between mul
To visualise the results one can use the `plot()` function that comes with the package


```r
``` r
plot(def)
```

Expand Down
31 changes: 26 additions & 5 deletions vignettes/estimate_infections_workflow.Rmd.orig
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ Obtaining a good and full understanding of the data being used is an important f
_EpiNow2_ expects data in the format of a data frame with two columns, `date` and `confirm`, where `confirm` stands for the number of confirmed counts - although in reality this can be applied to any data including suspected cases and lab-confirmed outcomes.
The user might already have the data as such a time series provided, for example, on public dashboards or directly from public health authorities.
Alternatively, they can be constructed from individual-level data, for example using the [incidence2](https://CRAN.R-project.org/package=incidence2) R package.
An example data set called `example_confirm` is included in the package:
An example data set called `example_confirmed` is included in the package:

```{r}
head(example_confirmed)
head(EpiNow2::example_confirmed)
```

Any estimation procedure is only as good as the data that feeds into it.
Expand Down Expand Up @@ -79,14 +79,28 @@ In all cases, the distributions given can be *fixed* (i.e. have no uncertainty)
For example, to define a fixed gamma distribution with mean 3, standard deviation (sd) 1 and maximum value 10, you would write

```{r}
Gamma(mean = 3, sd = 1, max = 10)
fixed_gamma <- Gamma(mean = 3, sd = 1, max = 10)
fixed_gamma
```

which looks like this when plotted

```{r plot_fixed_gamma}
plot(fixed_gamma)
```

If distributions are variable, the values with uncertainty are treated as [prior probability densities](https://en.wikipedia.org/wiki/Prior_probability) in the Bayesian inference framework used by _EpiNow2_, i.e. they are estimated as part of the inference.
For example, to define a variable gamma distribution where uncertainty in the mean is given by a normal distribution with mean 3 and sd 2, and uncertainty in the standard deviation is given by a normal distribution with mean 1 and sd 0.1, with a maximum value 10, you would write

```{r}
Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10)
uncertain_gamma <- Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10)
uncertain_gamma
```

which looks like this when plotted

```{r plot_uncertain_gamma}
plot(uncertain_gamma)
```

Note the warning about parameters.
Expand Down Expand Up @@ -128,7 +142,14 @@ incubation_period <- LogNormal(
max = 14
)
reporting_delay <- LogNormal(meanlog = 0.5, sdlog = 0.5, max = 10)
incubation_period + reporting_delay
combined_delays <- incubation_period + reporting_delay
combined_delays
```

We can visualise this combined delay

```{r plot_combined_delay}
plot(combined_delays)
```

In _EpiNow2_, the reporting delay distribution is defined by a call to `delay_opts()`, a function that takes a single argument defined as a `dist_spec` object (returned by `LogNormal()`, `Gamma()` etc.).
Expand Down

0 comments on commit cb46acc

Please sign in to comment.