diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..8b1378917 --- /dev/null +++ b/.nojekyll @@ -0,0 +1 @@ + diff --git a/404.html b/404.html new file mode 100644 index 000000000..9174e3cc8 --- /dev/null +++ b/404.html @@ -0,0 +1,116 @@ + + +
+ + + + +CODE_OF_CONDUCT.md
+ We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
+We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
+Examples of behavior that contributes to a positive environment for our community include:
+Examples of unacceptable behavior include:
+Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
+Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
+This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at [INSERT CONTACT METHOD]. All complaints will be reviewed and investigated promptly and fairly.
+All community leaders are obligated to respect the privacy and security of the reporter of any incident.
+Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
+Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
+Consequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
+Community Impact: A violation through a single incident or series of actions.
+Consequence: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
+Community Impact: A serious violation of community standards, including sustained inappropriate behavior.
+Consequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
+Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
+Consequence: A permanent ban from any sort of public interaction within the community.
+This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+Community Impact Guidelines were inspired by Mozilla’s code of conduct enforcement ladder.
+For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https:// www.contributor-covenant.org/translations.
+.github/CONTRIBUTING.md
+ This outlines how to propose a change to EpiNow2. In general, we accept contributions in the form of issues and/or pull requests.
+You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the source file. This generally means you’ll need to edit roxygen2 comments in an .R
, not a .Rd
file. You can find the .R
file that generates the .Rd
by reading the comment in the first line of the .Rd
file in the /man
directory.
If you want to make a bigger change, it’s a good idea to first file an issue and make sure someone from the team agrees that it’s needed. Any of the following counts as a big change:
+You can suggest an idea for a new feature/enhancement. Please provide as much detail of its use case as possible. As an example, see this extensive issue about making the model outputs S3 classes.
+If you have found a bug, ideally illustrate it with a minimal reprex (this will also help you write a unit test, if you opt to fix it yourself). Here is an example of a bug report.
+If you find an issue with existing vignettes or would like to help improve them, outline the suggested changes in the submitted issue for discussion with the team. Use the various GitHub markdown features to (cross)reference lines, highlight suggested deletions/additions, etc.
+For new vignettes, please provide an outline of the vignette to be discussed with the team first. Since the models in EpiNow2 have long run times in most cases, we pre-compile the vignettes before merging. Please follow this guide on how to precompute vignettes or pkgdown articles. Here is an example where new pre-compiled vignettes were submitted.
+Fork the package and clone onto your computer. If you haven’t done this before, we recommend using usethis::create_from_github("epiforecasts/EpiNow2", fork = TRUE)
.
Install all development dependences with devtools::install_dev_deps()
, and then make sure the package passes R CMD check by running devtools::check()
. If R CMD check doesn’t pass cleanly, it’s a good idea to ask for help before continuing.
Create a Git branch for your pull request (PR). We recommend using usethis::pr_init("brief-description-of-change")
.
We use pre-commit
to check our changes match our package standards. This is optional but can be enabled using the following steps.
+# if python is not installed on your system
+install.packages("reticulate")
+reticulate::install_miniconda()
+# install precommit if not already installed
+precommit::install_precommit()
+# set up precommit for use
+precommit::use_precommit()
Make your changes, commit to git, and then create a PR by running usethis::pr_push()
, and following the prompts in your browser. The title of your PR should briefly describe the change. The body of your PR should contain Fixes #issue-number
.
For user-facing changes, add a bullet to the top of NEWS.md
(i.e. just below the first header). Follow the style described in https://style.tidyverse.org/news.html.
PRs are reviewed by the team before they are merged. The review process only begins after the continuous integration checks, which have to be manually triggered by a maintainer for first-time contributors, have passed.
The Github Actions checks currently take a while (about an hour), so it might be helpful to “watch” the repository and check your email for a notification when it’s all done.
Usually, all the review conversations occur under the PR. The reviewer merges the PR when every issue has been resolved. Please use the “Resolve conversation” functionality in the GitHub web interface to indicate when a specific issue has been adressed, responding with a commit pointing to the change made where applicable.
When a PR is ready to be merged, you may be asked to rebase on the main
branch. You can do this by checking out your branch and running git rebase main
. If it is successful, your commits will be placed on top of the commit history of main
in preparation for a merge. A rebase might result in some merge conflicts. Make sure that they are resolved, then push your changes to your branch again (using the --force
option, that is, git push -f
, if required).
A number of issues can cause the Github checks to fail. It can be helpful to safeguard against them by doing the following:
Check that there are no linting issues by running lintr::lint_package()
.
Run devtoools::check()
to check for wider package issues like mismatching documentation, etc. (this currently requires a fair bit of time/computation).
(Optional) Turn on continuous integration with Github Actions on your forked repository.
On a case-by-case basis, you may be asked to increment the package version both in the NEWS.md
and DESCRIPTION
files. Please do not do this unless you’re asked to. We follow the Tidyverse package versioning guide. You can run usethis::use_version()
to automatically make the changes for you interactively.
New code should follow the tidyverse style guide. You can use the styler package to apply these styles, but please don’t restyle code that has nothing to do with your PR.
We use roxygen2, with Markdown syntax, for documentation.
We use testthat for unit tests. Contributions with test cases included are easier to accept.
Please note that the EpiNow2 project is released with a Contributor Code of Conduct. By contributing to this project you agree to abide by its terms.
+YEAR: 2020 +COPYRIGHT HOLDER: EpiForecasts ++ +
LICENSE.md
+ Copyright (c) 2020 EpiForecasts
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+.github/PULL_REQUEST_TEMPLATE.md
+ In the following section we give an overview of the simple use case
+for epinow()
and regional_epinow()
.
The first step to using the package is to load it as follows.
+ +Distributions can be supplied in two ways. First, one can supplying
+delay data to estimate_delay()
, where a subsampled
+bootstrapped lognormal will be fit to account for uncertainty in the
+observed data without being biased by changes in incidence (see
+?EpiNow2::estimate_delay()
).
Second, one can specify predetermined delays with uncertainty using
+the distribution functions such as Gamma
or
+Lognormal
. An arbitrary number of delay distributions are
+supported in dist_spec()
with a common use case being an
+incubation period followed by a reporting delay. For more information on
+specifying distributions see (see
+?EpiNow2::Distributions
).
For example if data on the delay between onset and infection was
+available we could fit a distribution to it, using
+estimate_delay()
, with appropriate uncertainty as follows
+(note this is a synthetic example),
+reporting_delay <- estimate_delay(
+ rlnorm(1000, log(2), 1),
+ max_value = 14, bootstraps = 1
+)
If data was not available we could instead specify an informed
+estimate of the likely delay using the distribution functions
+Gamma
or LogNormal
. To demonstrate, we choose
+a lognormal distribution with mean 2, standard deviation 1 and a maximum
+of 10. This is just an example and unlikely to apply in any
+particular use case.
+reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10)
+reporting_delay
+#> - lognormal distribution (max: 10):
+#> meanlog:
+#> 0.58
+#> sdlog:
+#> 0.47
For the rest of this vignette, we will use inbuilt example literature +estimates for the incubation period and generation time of Covid-19 (see +here +for the code that generates these estimates). These distributions +are unlikely to be applicable for your use case. We strongly recommend +investigating what might be the best distributions to use in any given +use case.
+
+example_generation_time
+#> - gamma distribution (max: 14):
+#> shape:
+#> - normal distribution:
+#> mean:
+#> 1.4
+#> sd:
+#> 0.48
+#> rate:
+#> - normal distribution:
+#> mean:
+#> 0.38
+#> sd:
+#> 0.25
+example_incubation_period
+#> - lognormal distribution (max: 14):
+#> meanlog:
+#> - normal distribution:
+#> mean:
+#> 1.6
+#> sd:
+#> 0.064
+#> sdlog:
+#> - normal distribution:
+#> mean:
+#> 0.42
+#> sd:
+#> 0.069
Now, to the functions.
+This function represents the core functionality of the package and +includes results reporting, plotting, and optional saving. It requires a +data frame of cases by date of report and the distributions defined +above.
+Load example case data from EpiNow2.
+
+reported_cases <- example_confirmed[1:60]
+head(reported_cases)
+#> date confirm
+#> <Date> <num>
+#> 1: 2020-02-22 14
+#> 2: 2020-02-23 62
+#> 3: 2020-02-24 53
+#> 4: 2020-02-25 97
+#> 5: 2020-02-26 93
+#> 6: 2020-02-27 78
Estimate cases by date of infection, the time-varying reproduction
+number, the rate of growth, and forecast these estimates into the future
+by 7 days. Summarise the posterior and return a summary table and plots
+for reporting purposes. If a target_folder
is supplied
+results can be internally saved (with the option to also turn off
+explicit returning of results). Here we use the default model
+parameterisation that prioritises real-time performance over run-time or
+other considerations. For other formulations see the documentation for
+estimate_infections()
.
+estimates <- epinow(
+ data = reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(example_incubation_period + reporting_delay),
+ rt = rt_opts(prior = list(mean = 2, sd = 0.2)),
+ stan = stan_opts(cores = 4, control = list(adapt_delta = 0.99)),
+ verbose = interactive()
+)
+names(estimates)
+#> [1] "estimates" "estimated_reported_cases"
+#> [3] "summary" "plots"
+#> [5] "timing"
Both summary measures and posterior samples are returned for all
+parameters in an easily explored format which can be accessed using
+summary
. The default is to return a summary table of
+estimates for key parameters at the latest date partially supported by
+data.
measure | +estimate | +
---|---|
New infections per day | +2199 (962 – 4757) | +
Expected change in daily reports | +Likely decreasing | +
Effective reproduction no. | +0.88 (0.58 – 1.2) | +
Rate of growth | +-0.034 (-0.15 – 0.076) | +
Doubling/halving time (days) | +-20 (9.1 – -4.6) | +
Summarised parameter estimates can also easily be returned, either +filtered for a single parameter or for all parameters.
+
+head(summary(estimates, type = "parameters", params = "R"))
+#> date variable strat type median mean sd lower_90
+#> <Date> <char> <char> <char> <num> <num> <num> <num>
+#> 1: 2020-02-22 R <NA> estimate 2.288484 2.293707 0.1613209 2.044132
+#> 2: 2020-02-23 R <NA> estimate 2.254878 2.259642 0.1403568 2.040359
+#> 3: 2020-02-24 R <NA> estimate 2.217611 2.222709 0.1251234 2.026980
+#> 4: 2020-02-25 R <NA> estimate 2.174854 2.182984 0.1150322 2.002758
+#> 5: 2020-02-26 R <NA> estimate 2.132196 2.140643 0.1089983 1.974475
+#> 6: 2020-02-27 R <NA> estimate 2.088958 2.095945 0.1056741 1.933753
+#> lower_50 lower_20 upper_20 upper_50 upper_90
+#> <num> <num> <num> <num> <num>
+#> 1: 2.181696 2.246458 2.328439 2.398968 2.574407
+#> 2: 2.162459 2.217716 2.289158 2.349965 2.508728
+#> 3: 2.136181 2.184912 2.247679 2.303535 2.440930
+#> 4: 2.101307 2.148035 2.204825 2.256175 2.381675
+#> 5: 2.064946 2.108764 2.160005 2.207901 2.331636
+#> 6: 2.023783 2.062603 2.114298 2.159626 2.285821
Reported cases are returned in a separate data frame in order to +streamline the reporting of forecasts and for model evaluation.
+
+head(summary(estimates, output = "estimated_reported_cases"))
+#> date type median mean sd lower_90 lower_50 lower_20
+#> <Date> <char> <num> <num> <num> <num> <num> <num>
+#> 1: 2020-02-22 gp_rt 74.0 76.9430 21.47422 46 62 70.0
+#> 2: 2020-02-23 gp_rt 87.0 89.5820 24.71017 54 72 82.0
+#> 3: 2020-02-24 gp_rt 86.0 88.6670 24.76764 53 71 80.6
+#> 4: 2020-02-25 gp_rt 81.0 82.6625 22.56520 49 66 75.0
+#> 5: 2020-02-26 gp_rt 79.0 81.8240 23.31905 48 65 75.0
+#> 6: 2020-02-27 gp_rt 107.5 111.2375 30.27595 69 89 101.0
+#> upper_20 upper_50 upper_90
+#> <num> <num> <num>
+#> 1: 79 89 116.00
+#> 2: 92 104 135.00
+#> 3: 92 103 133.00
+#> 4: 87 96 121.00
+#> 5: 85 96 121.05
+#> 6: 115 129 166.00
A range of plots are returned (with the single summary plot shown
+below). These plots can also be generated using the following
+plot
method.
+plot(estimates)
The regional_epinow()
function runs the
+epinow()
function across multiple regions in an efficient
+manner.
Define cases in multiple regions delineated by the region +variable.
+
+reported_cases <- data.table::rbindlist(list(
+ data.table::copy(reported_cases)[, region := "testland"],
+ reported_cases[, region := "realland"]
+))
+head(reported_cases)
+#> date confirm region
+#> <Date> <num> <char>
+#> 1: 2020-02-22 14 testland
+#> 2: 2020-02-23 62 testland
+#> 3: 2020-02-24 53 testland
+#> 4: 2020-02-25 97 testland
+#> 5: 2020-02-26 93 testland
+#> 6: 2020-02-27 78 testland
Calling regional_epinow()
runs the epinow()
+on each region in turn (or in parallel depending on the settings used).
+Here we switch to using a weekly random walk rather than the full
+Gaussian process model giving us piecewise constant estimates by
+week.
+estimates <- regional_epinow(
+ data = reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(example_incubation_period + reporting_delay),
+ rt = rt_opts(prior = list(mean = 2, sd = 0.2), rw = 7),
+ gp = NULL,
+ stan = stan_opts(cores = 4, warmup = 250, samples = 1000)
+)
+#> INFO [2024-05-10 07:58:14] Producing following optional outputs: regions, summary, samples, plots, latest
+#> INFO [2024-05-10 07:58:14] Reporting estimates using data up to: 2020-04-21
+#> INFO [2024-05-10 07:58:14] No target directory specified so returning output
+#> INFO [2024-05-10 07:58:14] Producing estimates for: testland, realland
+#> INFO [2024-05-10 07:58:14] Regions excluded: none
+#> INFO [2024-05-10 07:58:29] Completed estimates for: testland
+#> INFO [2024-05-10 07:58:41] Completed estimates for: realland
+#> INFO [2024-05-10 07:58:41] Completed regional estimates
+#> INFO [2024-05-10 07:58:41] Regions with estimates: 2
+#> INFO [2024-05-10 07:58:41] Regions with runtime errors: 0
+#> INFO [2024-05-10 07:58:41] Producing summary
+#> INFO [2024-05-10 07:58:41] No summary directory specified so returning summary output
+#> INFO [2024-05-10 07:58:41] No target directory specified so returning timings
Results from each region are stored in a regional
list
+with across region summary measures and plots stored in a
+summary
list. All results can be set to be internally saved
+by setting the target_folder
and summary_dir
+arguments. Each region can be estimated in parallel using the
+future package (when in most scenarios cores
+should be set to 1). For routine use each MCMC chain can also be run in
+parallel (with future
= TRUE) with a time out
+(max_execution_time
) allowing for partial results to be
+returned if a subset of chains is running longer than expected. See the
+documentation for the future package for details on
+nested futures.
Summary measures that are returned include a table formatted for +reporting (along with raw results for further processing). Futures +updated will extend the S3 methods used above to smooth access to this +output.
+
+knitr::kable(estimates$summary$summarised_results$table)
Region | +New infections per day | +Expected change in daily reports | +Effective reproduction no. | +Rate of growth | +Doubling/halving time (days) | +
---|---|---|---|---|---|
realland | +2060 (1007 – 4579) | +Likely decreasing | +0.86 (0.59 – 1.2) | +-0.039 (-0.12 – 0.056) | +-18 (12 – -5.9) | +
testland | +2136 (1050 – 4356) | +Likely decreasing | +0.86 (0.62 – 1.2) | +-0.036 (-0.11 – 0.048) | +-19 (14 – -6.4) | +
A range of plots are again returned (with the single summary plot +shown below).
+
+estimates$summary$summary_plot
vignettes/case-studies.Rmd
+ case-studies.Rmd
This is a work in progress. Please consider submitting a PR +to improve it.
+Abbott, Sam, Joel Hellewell, Robin N. Thompson, Katharine +Sherratt, Hamish P. Gibbs, Nikos I. Bosse, James D. Munday, et al. 2020. +“Estimating the Time-Varying Reproduction Number of SARS-CoV-2 Using +National and Subnational Case Counts.” Wellcome Open Research 5 +(December): 112. http://dx.doi.org/10.12688/wellcomeopenres.16006.2
Sherratt, Katharine, Sam Abbott, Sophie R. Meakin, Joel +Hellewell, James D. Munday, Nikos Bosse, Null Null, Mark Jit, and +Sebastian Funk. 2021. “Exploring Surveillance Data Biases When +Estimating the Reproduction Number: With Insights into Subpopulation +Transmission of COVID-19 in England.” Philosophical Transactions of the +Royal Society of London. Series B, Biological Sciences 376 (1829): +20200283. http://dx.doi.org/10.1098/rstb.2020.0283
Bosse, Nikos I., Sam Abbott, Johannes Bracher, Habakuk Hain, +Billy J. Quilty, Mark Jit, Centre for the Mathematical Modelling of +Infectious Diseases COVID-19 Working Group, Edwin van Leeuwen, Anne +Cori, and Sebastian Funk. 2022. “Comparing Human and Model-Based +Forecasts of COVID-19 in Germany and Poland.” PLoS Computational Biology +18 (9): e1010405. http://dx.doi.org/10.1371/journal.pcbi.1010405
Davies, Nicholas G., Sam Abbott, Rosanna C. Barnard, Christopher +I. Jarvis, Adam J. Kucharski, James D. Munday, Carl A. B. Pearson, et +al. 2021. “Estimated Transmissibility and Impact of SARS-CoV-2 Lineage +B.1.1.7 in England.” Science 372 (6538). https://doi.org/10.1126/science.abg3055.
Meakin, Sophie, Sam Abbott, Nikos Bosse, James Munday, Hugo +Gruson, Joel Hellewell, Katharine Sherratt, CMMID COVID-19 Working +Group, and Sebastian Funk. 2022. “Comparative Assessment of Methods for +Short-Term Forecasts of COVID-19 Hospital Admissions in England at the +Local Level.” BMC Medicine 20 (1): 86. http://dx.doi.org/10.1186/s12916-022-02271-x
The EpiNow2 package contains functionality to run
+estimate_infections()
in production mode, i.e. with full
+logging and saving all relevant outputs and plots to dedicated folders
+in the hard drive. This is done with the epinow()
function,
+that takes the same options as estimate_infections()
with
+some additional options that determine, for example, where output gets
+stored and what output exactly. The function can be a useful option
+when, e.g., running the model daily with updated data on a
+high-performance computing server to feed into a dashboard. For more
+detail on the various model options available, see the Examples vignette, for more
+on the general modelling approach the Workflow, and for
+theoretical background the Model
+definitions vignette
To run the model in production mode for a single region, set the
+parameters up in the same way as for estimate_infections()
+(see the Workflow
+vignette). Here we use the example delay and generation time
+distributions that come with the package. This should be replaced with
+parameters relevant to the system that is being studied.
+library("EpiNow2")
+#>
+#> Attaching package: 'EpiNow2'
+#> The following object is masked from 'package:stats':
+#>
+#> Gamma
+options(mc.cores = 4)
+reported_cases <- example_confirmed[1:60]
+reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10)
+delay <- example_incubation_period + reporting_delay
+rt_prior <- list(mean = 2, sd = 0.1)
We can then run the epinow()
function with the same
+arguments as estimate_infections()
.
+res <- epinow(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior)
+)
+#> Logging threshold set at INFO for the EpiNow2 logger
+#> Writing EpiNow2 logs to the console and: /tmp/RtmpJDlhZu/regional-epinow/2020-04-21.log
+#> Logging threshold set at INFO for the EpiNow2.epinow logger
+#> Writing EpiNow2.epinow logs to the console and: /tmp/RtmpJDlhZu/epinow/2020-04-21.log
+#> WARN [2024-05-10 07:52:36] epinow: There were 11 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them. -
+#> WARN [2024-05-10 07:52:36] epinow: Examine the pairs() plot to diagnose sampling problems
+#> -
+res$plots$R
The initial messages here indicate where log files can be found. If
+you want summarised results and plots to be written out where they can
+be accessed later you can use the target_folder
+argument.
The package also contains functionality to conduct inference
+contemporaneously (if separately) in production mode on multiple time
+series, e.g. to run the model on multiple regions. This is done with the
+regional_epinow()
function.
Say, for example, we construct a dataset containing two regions,
+testland
and realland
(in this simple example
+both containing the same case data).
+cases <- example_confirmed[1:60]
+cases <- data.table::rbindlist(list(
+ data.table::copy(cases)[, region := "testland"],
+ cases[, region := "realland"]
+ ))
To then run this on multiple regions using the default options above, +we could use
+
+region_rt <- regional_epinow(
+ reported_cases = cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior),
+)
+#> Warning: The `reported_cases` argument of `regional_epinow()` is deprecated as of EpiNow2 1.5.0.
+#> ℹ Please use the `data` argument instead.
+#> ℹ The argument will be removed completely in the next version.
+#> This warning is displayed once every 8 hours.
+#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
+#> INFO [2024-05-10 07:52:38] Producing following optional outputs: regions, summary, samples, plots, latest
+#> Logging threshold set at INFO for the EpiNow2 logger
+#> Writing EpiNow2 logs to the console and: /tmp/RtmpJDlhZu/regional-epinow/2020-04-21.log
+#> Logging threshold set at INFO for the EpiNow2.epinow logger
+#> Writing EpiNow2.epinow logs to: /tmp/RtmpJDlhZu/epinow/2020-04-21.log
+#> INFO [2024-05-10 07:52:38] Reporting estimates using data up to: 2020-04-21
+#> INFO [2024-05-10 07:52:38] No target directory specified so returning output
+#> INFO [2024-05-10 07:52:38] Producing estimates for: testland, realland
+#> INFO [2024-05-10 07:52:38] Regions excluded: none
+#> INFO [2024-05-10 07:53:40] Completed estimates for: testland
+#> INFO [2024-05-10 07:54:29] Completed estimates for: realland
+#> INFO [2024-05-10 07:54:29] Completed regional estimates
+#> INFO [2024-05-10 07:54:29] Regions with estimates: 2
+#> INFO [2024-05-10 07:54:29] Regions with runtime errors: 0
+#> INFO [2024-05-10 07:54:29] Producing summary
+#> INFO [2024-05-10 07:54:29] No summary directory specified so returning summary output
+#> INFO [2024-05-10 07:54:29] No target directory specified so returning timings
+## summary
+region_rt$summary$summarised_results$table
+#> Region New infections per day Expected change in daily reports
+#> <char> <char> <fctr>
+#> 1: realland 2211 (921 -- 4610) Likely decreasing
+#> 2: testland 2238 (980 -- 4646) Likely decreasing
+#> Effective reproduction no. Rate of growth
+#> <char> <char>
+#> 1: 0.87 (0.57 -- 1.2) -0.036 (-0.16 -- 0.076)
+#> 2: 0.88 (0.58 -- 1.2) -0.034 (-0.16 -- 0.076)
+#> Doubling/halving time (days)
+#> <char>
+#> 1: -19 (9.2 -- -4.3)
+#> 2: -20 (9.1 -- -4.4)
+## plot
+region_rt$summary$plots$R
If instead, we wanted to use the Gaussian Process for
+testland
and a weekly random walk for realland
+we could specify these separately using the opts_list()
+function from the package and modifyList()
from
+R
.
+gp <- opts_list(gp_opts(), cases)
+gp <- modifyList(gp, list(realland = NULL), keep.null = TRUE)
+rt <- opts_list(rt_opts(), cases, realland = rt_opts(rw = 7))
+region_separate_rt <- regional_epinow(
+ reported_cases = cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt, gp = gp,
+)
+#> INFO [2024-05-10 07:54:30] Producing following optional outputs: regions, summary, samples, plots, latest
+#> Logging threshold set at INFO for the EpiNow2 logger
+#> Writing EpiNow2 logs to the console and: /tmp/RtmpJDlhZu/regional-epinow/2020-04-21.log
+#> Logging threshold set at INFO for the EpiNow2.epinow logger
+#> Writing EpiNow2.epinow logs to: /tmp/RtmpJDlhZu/epinow/2020-04-21.log
+#> INFO [2024-05-10 07:54:30] Reporting estimates using data up to: 2020-04-21
+#> INFO [2024-05-10 07:54:30] No target directory specified so returning output
+#> INFO [2024-05-10 07:54:30] Producing estimates for: testland, realland
+#> INFO [2024-05-10 07:54:30] Regions excluded: none
+#> INFO [2024-05-10 07:56:06] Completed estimates for: testland
+#> INFO [2024-05-10 07:56:24] Completed estimates for: realland
+#> INFO [2024-05-10 07:56:24] Completed regional estimates
+#> INFO [2024-05-10 07:56:24] Regions with estimates: 2
+#> INFO [2024-05-10 07:56:24] Regions with runtime errors: 0
+#> INFO [2024-05-10 07:56:24] Producing summary
+#> INFO [2024-05-10 07:56:24] No summary directory specified so returning summary output
+#> INFO [2024-05-10 07:56:24] No target directory specified so returning timings
+## summary
+region_separate_rt$summary$summarised_results$table
+#> Region New infections per day Expected change in daily reports
+#> <char> <char> <fctr>
+#> 1: realland 2106 (1042 -- 4307) Likely decreasing
+#> 2: testland 2128 (843 -- 4774) Likely decreasing
+#> Effective reproduction no. Rate of growth
+#> <char> <char>
+#> 1: 0.86 (0.6 -- 1.2) -0.038 (-0.11 -- 0.046)
+#> 2: 0.86 (0.52 -- 1.3) -0.038 (-0.18 -- 0.081)
+#> Doubling/halving time (days)
+#> <char>
+#> 1: -18 (15 -- -6.3)
+#> 2: -18 (8.6 -- -3.8)
+## plot
+region_separate_rt$summary$plots$R
vignettes/estimate_infections.Rmd
+ estimate_infections.Rmd
estimate_infections()
supports a range of model
+formulations. Here we describe the most commonly used and highlight
+other options. The two main models for how new infections arise in the
+model are a renewal equation model and a non-mechanistic
+infection model. The initialisation of both of these models
+involves estimating an initial infection trajectory during a
+seeding_time
\(t_\mathrm{seed}\) (set to the mean of
+modelled delays from infection to observation) that precedes the first
+observation at time \(t=0\).
This is the default model and is used when rt != NULL
.
+New infections are generated at discrete time steps of one day via the
+renewal equation[1]. These
+infections are then mapped to observations via discrete convolutions
+with delay distributions.
The model is initialised before the first observed data point by
+assuming constant exponential growth for the mean of modelled delays
+from infection to case report (called seeding_time
\(t_\mathrm{seed}\) in the model):
\[\begin{align} + I_0 &\sim \mathcal{LogNormal}(I_\mathrm{obs}, 0.2) \\ + r &\sim \mathcal{Normal}(r_\mathrm{obs}, 0.2)\\ + I_{0 < t \leq t_\mathrm{seed}} &= I_0 \exp \left(r t \right) +\end{align}\]
+where \(I_{t}\) is the number of +latent infections on day \(t\), \(r\) is the estimate of the initial growth +rate, and \(I_\mathrm{obs}\) and \(r_\mathrm{obs}\) are estimated from the +first week of observed data: \(I_\mathrm{obs}\) as the mean of reported +cases in the first 7 days (or the mean of all cases if fewer than 7 days +of data are given), divided by the prior mean reporting fraction if less +than 1 (see Delays and scaling); and +\(r_\mathrm{obs}\) as the point +estimate from fitting a linear regression model to the first 7 days of +data (or all data if fewer than 7 days of data are given),
+\[\begin{equation} +log(C_t) = a + r_\mathrm{obs} t + \epsilon_t +\end{equation}\] where \(C_{t}\) +is the number of reported cases on day \(t\), \(a\) +an estimated intercept, and \(\epsilon_{t}\) the error term.
+For the time window of the observed data and beyond infections are +then modelled by weighting previous infections with the generation time +and scaling by the instantaneous reproduction number:
+\[\begin{equation} + I_t = R_t \sum_{\tau = 1}^{g_\mathrm{max}} g(\tau | \mu_{g}, +\sigma_{g}) I_{t - \tau} +\end{equation}\]
+where \(g(\tau|\mu_{g}, +\sigma_{g})\) is the distribution of generation times (with +discretised gamma or discretised log normal distributions available as +options) with mean (or log mean in the case of lognormal distributions) +\(\mu_g\), standard deviation (or log +standard deviation in the case of lognormal distributions) \(\sigma_g\) and maximum \(g_\mathrm{max}\). Generation times can +either be specified as coming from a distribution with uncertainty by +giving mean and standard deviations of normal priors, weighted by +default by the number of observations (although this can be changed by +the user) and truncated to be positive where relevant for the given +distribution; or they can be specified as the parameters of a fixed +distribution, or as fixed values.
+The distribution of generation times \(g\) here represents the probability that +somebody who became infectious on day 0 and who infects someone else +during their course of infection does so on day \(\tau > 0\), assuming that infection +cannot happen on day 0. If not given this defaults to a fixed generation +time of 1, in which case \(R_{t}\) +represents the exponential of the daily growth rate of infections.
+Different options are available for setting a prior for \(R_t\), the instantaneous reproduction +number at time \(t\). The default prior +is an approximate zero-mean Gaussian Process (GP) for the first +differences in time on the log scale,
+\[\begin{equation} + \log R_{t} - \log R_{t-1} \sim \mathrm{GP}_t +\end{equation}\]
+More details on the mathematical form of the GP approximation and +implementation are given in the Gaussian Process +implementation details vignette. Other choices for the prior of +\(R_t\) are available such as a GP +prior for the difference between \(R_t\) and its mean value (implying that in +the absence of data \(R_t\) will revert +to its prior mean rather than the last value with robust support from +the data).
+\[\begin{equation} + \log R_{t} - \log R_0 \sim \mathrm{GP}_t +\end{equation}\]
+or, as a specific case of a Gaussian Process, a random walk of +arbitrary length \(w\).
+\[\begin{align} + \log R_{t \div w} &\sim \mathcal{Normal} (R_{t \div (w - 1)}, +\sigma_R)\\ + \sigma_R &\sim \mathcal{HalfNormal}(0, 0.1) +\end{align}\]
+where \(\div\) indicates +interval-valued division (i.e. the floor of the division), such that for +example \(w=1\) indicates a daily and +\(w=7\) a weekly random walk.
+The choice of prior for the time-varying reproduction number impact +run-time, smoothness of the estimates and real-time behaviour and may +alter the best use-case for the model.
+The initial reproduction number \(R_{0}\) has a log-normal prior with a given +log mean \(\mu_{R}\) and log standard +deviation \(\sigma_{R}\), calculated +from a given mean (default: 1) and standard deviation (default: 1).
+\[\begin{equation} + R_0 \sim \mathcal{LogNormal}(\mu_R, \sigma_R) +\end{equation}\]
+The simplest possible process model option is to use no time-varying +prior and rely on just the intial fixed reproduction number \(R_0\).
+Beyond the end of the observation period (\(T\)), by default, the Gaussian process is +assumed to continue. However, here again there are a range of options. +These included fixing transmission dynamics (optionally this can also be +done before the end of the observation period), and scaling \(R_t\) based on the proportion of the +population that remain susceptible. This is defined as followed,
+\[\begin{equation} + I_t = (N - I^c_{t-1}) \left(1 - \exp \left(\frac{-I'_t}{N - +I^c_{T}}\right)\right), +\end{equation}\]
+where \(I^c_t = \sum_{s< t} I_s\)
+are cumulative infections by \(t-1\)
+and \(I'_t\) are the unadjusted
+infections defined above. This adjustment is based on the one
+implemented in the epidemia
R package[2].
This is an alternative model that can be used by setting
+rt = NULL
that assumes less epidemiological mechanism by
+directly modelling infections on a log scale with a range of process
+models. By default, this uses a Gaussian Process prior for the number of
+new infections each day (on the log scale) although alternatively
+infections can be estimated using a prior based on a fixed backwards
+mapping of observed cases. In general, these model options will be more
+computationally efficient than the renewal process model but may be less
+robust due to the lack of an epidemiological process model (i.e. more
+dependence is placed on the assumptions of the Gaussian process
+prior).
In order to initialise this model, an initial estimate \(I_\mathrm{est}\) of the infection +trajectory is first created by first shifting observations back in time +by \(t_\mathrm{seed}\) and then +smoothing the observation data with a moving average of window size +\(z\) (default: \(z=14\)), allocated to the centre of the +window:
+\[\begin{align} + I_{\mathrm{est}, t \lt T - t_\mathrm{seed}} = \frac{1}{z} +\sum_{\max(-t_\mathrm{seed}, t - z/2)}^{\min(t_\mathrm{obs}, t + z/2)} +I_{\mathrm{obs}, t + t_\mathrm{seed}} +\end{align}\]
+where \(T\) is the day of the last +observation, \(z/2\) is rounded up to +the nearest integer in the limits of the sum, and \(I_\mathrm{obs, t \lt 0} = 0\). Any date +with \(I_{\mathrm{est}, t} = 0\) cases +following this procedure is then allocated 1 case to facilitate further +processing in the model.
+For any times \(t > T - +t_\mathrm{seed}\) the number of infections is then estimated by +fitting an exponential curve to the final week of data and extrapolating +this until the end of the forecast horizon.
+By default, a Gaussian Process prior is used for the number of +infections, resulting in smoother estimates of the infection curve. In +this case, as in the renewal equation model there are two alternative +formulations available. The default uses an approximate zero-mean GP for +the differences between modelled infections and the initial +estimate,
+\[\begin{equation} + \log I_{t} - \log I_{\mathrm{est}, t} \sim \mathrm{GP}_t +\end{equation}\]
+Alternatively, one can use is an approximate zero-mean Gaussian +Process (GP) for the first differences in time on the log scale,
+\[\begin{equation} + \log I_{t} - \log I_{t-1} \sim \mathrm{GP}_t +\end{equation}\]
+with \(\log I_{0} - \log I_{\mathrm{est}, +0} \sim \mathrm{GP}_{0}\)
+More details on the mathematical form of the Gaussian process +approximation are given in the Gaussian Process +implementation details vignette.
+As for the renewal equation model, the Gaussian process can be +replaced by a random walk of arbitrary length \(w\).
+When using a fixed shift from infections to reported cases there is +no process model and so \(I_\mathrm{est}\) is used as the estimated +infection curve (potentially scaled to take into account underreporting, +see section Delays and scaling).
+In this model there is no prior on the time-varying reproduction +number. Instead, this is calculated from the renewal equation as a +post-processing step
+\[\begin{equation} + R_t = \frac{I_{t}}{\sum_{\tau = 1}^{g_\mathrm{max}} g(\tau | \mu_{g}, +\sigma_{g}) I_{t - \tau}} +\end{equation}\]
+and optionally smoothed using a centred rolling mean with a window +size that can be set by the user.
+If infections are observed with a delay (for example, the incubation +period if based on symptomatic cases, and any delay from onset to +report), they are convolved in the model to infections at the time scale +of observations \(D_{t}\) using delay +distributions (with lognormal and gamma parameterisations available) +\(\xi\), scaled by an underreporting +factor \(\alpha\) (which is 1 if all +infections are observed). This model can be defined mathematically as +follows,
+\[\begin{equation} + D_t = \alpha \sum_{\tau = 0}^{\xi_\mathrm{max}} \xi (\tau | \mu_{\xi}, +\sigma_{\xi}) I_{t-\tau} +\end{equation}\]
+where \(\xi(\tau|\mu_{\xi}, +\sigma_{\xi})\) is the combined discrete distribution of delays +(with discretised gamma or discretised log normal distributions +available as options) with mean (or log mean in the case of lognormal +distributions) \(\mu_\xi\), standard +deviation (or log standard deviation in the case of lognormal +distributions) \(\sigma_\xi\) and +maximum \(\xi_\mathrm{max}\).
+Delays can either be specified as coming from a distribution with +uncertainty by giving mean and standard deviations of normal priors, +weighted by the number of observations and truncated to be positive +where relevant for the given distribution; or they can be specified as +the parameters of a fixed distribution, or as fixed values.
+The scaling factor \(\alpha\) +represents the proportion of cases that are ultimately reported, which +by default is set to 1 (i.e. no underreporting) but can instead be set +to come from a normal prior with given mean and standard deviation, +truncated to be between 0 and 1.
+The modelled counts \(D_{t}\) are +related to observations \(C_{t}\). By +default this is assumed to follow a negative binomial distribution with +overdispersion \(\varphi\) +(alternatively it can be modelled as a Poisson, in which case \(\varphi\) is not used):
+\[\begin{align} + C_t &\sim \mathcal{NegBinom}\left(\omega_{(t \mod n_\omega)}D_t, +\varphi\right) +\end{align}\]
+where \(\omega_{t \mod n_\omega}\) +is a daily reporting effect of cyclicity \(n_{\omega}\). If \(n_{\omega}=7\) this corresponds to a +day-of-the-week reporting effect.
+This model uses the following priors for the observation model,
+\[\begin{align} + \frac{\omega}{n_\omega} &\sim \mathcal{Dirichlet}(1, \ldots, 1) +\\ + \frac{1}{\sqrt{\varphi}} &\sim \mathcal{HalfNormal}(0, 1) +\end{align}\]
+The model supports counts that are right-truncated, i.e. reported +with a delay leading to recent counts being subject to future upwards +revision. Denoting the final truncated counts with \(D^{\ast}_{t}\) they are obtained form the +final modelled cases \(D_{t}\) by +applying a given discrete truncation distribution \(\zeta(\tau | \mu_{\zeta}, \sigma_{\zeta})\) +with cumulative mass function \(Z(\tau | +\mu_{\zeta})\):
+\[\begin{equation} + D^ast_t = Z(T - t | \mu_{Z}, \sigma_{Z}) D_{t} +\end{equation}\]
+If truncation is applied, the modelled cases \(D_{t}\) are replaced by the truncated +counts before confronting them with observations \(C_{t}\) as described above.
+vignettes/estimate_infections_options.Rmd
+ estimate_infections_options.Rmd
The estimate_infections()
function encodes a range of
+different model options. In this vignette we apply some of these to the
+example data provided with the EpiNow2 package, highlighting
+differences in inference results and run times. It is not meant as a
+comprehensive exploration of all the functionality in the package, but
+intended to give users a flavour of the kind of model options that exist
+for reproduction number estimation and forecasting within the package,
+and the differences in computational speed between them. For
+mathematical detail on the model please consult the model definition vignette, and for a
+more general description of the use of the function, the estimate_infections
+workflow vignette.
We first load the EpiNow2 package and also the +rstan package that we will use later to show the differences in +run times between different model options.
+
+library("EpiNow2")
+library("rstan")
+#> Loading required package: StanHeaders
+#>
+#> rstan version 2.32.6 (Stan version 2.32.2)
+#> For execution on a local, multicore CPU with excess RAM we recommend calling
+#> options(mc.cores = parallel::detectCores()).
+#> To avoid recompilation of unchanged Stan programs, we recommend calling
+#> rstan_options(auto_write = TRUE)
+#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
+#> change `threads_per_chain` option:
+#> rstan_options(threads_per_chain = 1)
In this examples we set the number of cores to use to 4 but the +optimal value here will depend on the computing resources available.
+
+options(mc.cores = 4)
We will use an example data set that is included in the package, +representing an outbreak of COVID-19 with an initial rapid increase +followed by decreasing incidence.
+
+library("ggplot2")
+reported_cases <- example_confirmed[1:60]
+ggplot(reported_cases, aes(x = date, y = confirm)) +
+ geom_col() +
+ theme_minimal() +
+ xlab("Date") +
+ ylab("Cases")
Before running the model we need to decide on some parameter values, +in particular any delays, the generation time, and a prior on the +initial reproduction number.
+Delays reflect the time that passes between infection and reporting, +if these exist. In this example, we assume two delays, an incubation +period (i.e. delay from infection to symptom onset) and a +reporting delay (i.e. the delay from symptom onset to being +recorded as a symptomatic case). These delays are usually not the same +for everyone and are instead characterised by a distribution. For the +incubation period we use an example from the literature that is included +in the package.
+
+example_incubation_period
+#> - lognormal distribution (max: 14):
+#> meanlog:
+#> - normal distribution:
+#> mean:
+#> 1.6
+#> sd:
+#> 0.064
+#> sdlog:
+#> - normal distribution:
+#> mean:
+#> 0.42
+#> sd:
+#> 0.069
For the reporting delay, we use a lognormal distribution with mean of
+2 days and standard deviation of 1 day. Note that the mean and standard
+deviation must be converted to the log scale, which can be done using
+the convert_log_logmean()
function.
+reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10)
+reporting_delay
+#> - lognormal distribution (max: 10):
+#> meanlog:
+#> 0.58
+#> sdlog:
+#> 0.47
EpiNow2 provides a feature that allows us to combine these +delays into one by summing them up
+
+delay <- example_incubation_period + reporting_delay
+delay
+#> Composite distribution:
+#> - lognormal distribution (max: 14):
+#> meanlog:
+#> - normal distribution:
+#> mean:
+#> 1.6
+#> sd:
+#> 0.064
+#> sdlog:
+#> - normal distribution:
+#> mean:
+#> 0.42
+#> sd:
+#> 0.069
+#> - lognormal distribution (max: 10):
+#> meanlog:
+#> 0.58
+#> sdlog:
+#> 0.47
If we want to estimate the reproduction number we need to provide a +distribution of generation times. Here again we use an example from the +literature that is included with the package.
+
+example_generation_time
+#> - gamma distribution (max: 14):
+#> shape:
+#> - normal distribution:
+#> mean:
+#> 1.4
+#> sd:
+#> 0.48
+#> rate:
+#> - normal distribution:
+#> mean:
+#> 0.38
+#> sd:
+#> 0.25
Lastly we need to choose a prior for the initial value of the +reproduction number. This is assumed by the model to be normally +distributed and we can set the mean and the standard deviation. We +decide to set the mean to 2 and the standard deviation to 1.
+
+rt_prior <- list(mean = 2, sd = 0.1)
We are now ready to run the model and will in the following show a +number of possible options for doing so.
+By default the model uses a renewal equation for infections and a +Gaussian Process prior for the reproduction number. Putting all the data +and parameters together and tweaking the Gaussian Process to have a +shorter length scale prior than the default we run.
+
+def <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior)
+)
+#> Warning: There were 11 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them.
+#> Warning: Examine the pairs() plot to diagnose sampling problems
+# summarise results
+summary(def)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2237 (956 -- 4859)
+#> 2: Expected change in daily reports Likely decreasing
+#> 3: Effective reproduction no. 0.88 (0.58 -- 1.3)
+#> 4: Rate of growth -0.034 (-0.16 -- 0.085)
+#> 5: Doubling/halving time (days) -21 (8.2 -- -4.4)
+# elapsed time (in seconds)
+get_elapsed_time(def$fit)
+#> warmup sample
+#> chain:1 22.546 17.871
+#> chain:2 43.610 35.665
+#> chain:3 21.929 18.197
+#> chain:4 18.185 17.063
+# summary plot
+plot(def)
To speed up the calculation of the Gaussian Process we could decrease +its accuracy, e.g. decrease the proportion of time points to use as +basis functions from the default of 0.2 to 0.1.
+
+agp <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior),
+ gp = gp_opts(basis_prop = 0.1)
+)
+#> Warning: There were 19 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them.
+#> Warning: Examine the pairs() plot to diagnose sampling problems
+# summarise results
+summary(agp)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2229 (1060 -- 4985)
+#> 2: Expected change in daily reports Likely decreasing
+#> 3: Effective reproduction no. 0.88 (0.61 -- 1.2)
+#> 4: Rate of growth -0.036 (-0.15 -- 0.076)
+#> 5: Doubling/halving time (days) -19 (9.1 -- -4.8)
+# elapsed time (in seconds)
+get_elapsed_time(agp$fit)
+#> warmup sample
+#> chain:1 14.264 17.067
+#> chain:2 14.501 16.719
+#> chain:3 17.512 24.243
+#> chain:4 16.937 17.436
+# summary plot
+plot(agp)
We might want to adjust for future susceptible depletion. Here, we do +so by setting the population to 1000000 and projecting the reproduction +number from the latest estimate (rather than the default, which fixes +the reproduction number to an earlier time point based on the given +reporting delays). Note that this only affects the forecasts and is done +using a crude adjustment (see the model definition).
+
+dep <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(
+ prior = rt_prior,
+ pop = 1000000, future = "latest"
+ )
+)
+#> Warning: There were 9 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them.
+#> Warning: Examine the pairs() plot to diagnose sampling problems
+# summarise results
+summary(dep)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2219 (934 -- 4735)
+#> 2: Expected change in daily reports Likely decreasing
+#> 3: Effective reproduction no. 0.88 (0.57 -- 1.2)
+#> 4: Rate of growth -0.034 (-0.16 -- 0.076)
+#> 5: Doubling/halving time (days) -20 (9.1 -- -4.3)
+# elapsed time (in seconds)
+get_elapsed_time(dep$fit)
+#> warmup sample
+#> chain:1 24.086 18.080
+#> chain:2 25.069 31.131
+#> chain:3 24.617 25.153
+#> chain:4 21.874 18.766
+# summary plot
+plot(dep)
We might further want to adjust for right-truncation of recent data +estimated using the estimate_truncation model. Here, +instead of doing so we assume that we know about truncation with mean of +1/2 day, sd 1/2 day, following a lognormal distribution and with a +maximum of three days.
+
+trunc_dist <- LogNormal(
+ mean = Normal(0.5, 0.1),
+ sd = Normal(0.5, 0.1),
+ max = 3
+)
+#> Warning in new_dist_spec(params, "lognormal"): Uncertain lognormal distribution
+#> specified in terms of parameters that are not the "natural" parameters of the
+#> distribution (meanlog, sdlog). Converting using a crude and very approximate
+#> method that is likely to produce biased results. If possible, it is preferable
+#> to specify the distribution directly in terms of the natural parameters.
+trunc_dist
+#> - lognormal distribution (max: 3):
+#> meanlog:
+#> - normal distribution:
+#> mean:
+#> -1
+#> sd:
+#> 0.14
+#> sdlog:
+#> - normal distribution:
+#> mean:
+#> 0.83
+#> sd:
+#> 0.13
We can then use this in the esimtate_infections()
+function using the truncation
option.
+trunc <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ truncation = trunc_opts(trunc_dist),
+ rt = rt_opts(prior = rt_prior)
+)
+#> Error in vapply(delays[parametric], attr, "weight_prior", FUN.VALUE = logical(1)): values must be length 1,
+#> but FUN(X[[3]]) result is length 0
+# summarise results
+summary(trunc)
+#> Error in object[[i]]: object of type 'builtin' is not subsettable
+# elapsed time (in seconds)
+get_elapsed_time(trunc$fit)
+#> Error in (function (cond) : error in evaluating the argument 'object' in selecting a method for function 'get_elapsed_time': object of type 'builtin' is not subsettable
+# summary plot
+plot(trunc)
Instead of keeping the reproduction number fixed from a certain time +point we might want to extrapolate the Gaussian Process into the future. +This will lead to wider uncertainty, and the researcher should check +whether this or fixing the reproduction number from an earlier is +desirable.
+
+project_rt <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(
+ prior = rt_prior, future = "project"
+ )
+)
+#> Warning: There were 8 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them.
+#> Warning: Examine the pairs() plot to diagnose sampling problems
+# summarise results
+summary(project_rt)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2200 (1010 -- 4815)
+#> 2: Expected change in daily reports Likely decreasing
+#> 3: Effective reproduction no. 0.87 (0.59 -- 1.2)
+#> 4: Rate of growth -0.036 (-0.15 -- 0.08)
+#> 5: Doubling/halving time (days) -19 (8.6 -- -4.5)
+# elapsed time (in seconds)
+get_elapsed_time(project_rt$fit)
+#> warmup sample
+#> chain:1 23.881 34.602
+#> chain:2 18.612 18.302
+#> chain:3 26.161 28.837
+#> chain:4 21.156 29.987
+# summary plot
+plot(project_rt)
We might want to estimate a fixed reproduction number, i.e. assume +that it does not change.
+
+fixed <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ gp = NULL
+)
+# summarise results
+summary(fixed)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 16780 (9407 -- 30551)
+#> 2: Expected change in daily reports Increasing
+#> 3: Effective reproduction no. 1.2 (1.1 -- 1.3)
+#> 4: Rate of growth 0.05 (0.035 -- 0.065)
+#> 5: Doubling/halving time (days) 14 (11 -- 20)
+# elapsed time (in seconds)
+get_elapsed_time(fixed$fit)
+#> warmup sample
+#> chain:1 1.270 0.951
+#> chain:2 1.109 0.804
+#> chain:3 1.151 0.893
+#> chain:4 1.242 0.926
+# summary plot
+plot(fixed)
Instead of assuming the reproduction number varies freely or is
+fixed, we can assume that it is fixed but with breakpoints. This can be
+done by adding a breakpoint
column to the reported case
+data set. e.g. if we think that the reproduction number was constant but
+would like to allow it to change on the 16th of March 2020 we would
+define a new case data set using
+bp_cases <- data.table::copy(reported_cases)
+bp_cases <- bp_cases[,
+ breakpoint := ifelse(date == as.Date("2020-03-16"), 1, 0)
+]
We then use this instead of reported_cases
in the
+estimate_infections()
function:
+bkp <- estimate_infections(bp_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior),
+ gp = NULL
+)
+# summarise results
+summary(bkp)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2356 (1932 -- 2898)
+#> 2: Expected change in daily reports Decreasing
+#> 3: Effective reproduction no. 0.89 (0.86 -- 0.92)
+#> 4: Rate of growth -0.027 (-0.034 -- -0.02)
+#> 5: Doubling/halving time (days) -25 (-35 -- -20)
+# elapsed time (in seconds)
+get_elapsed_time(bkp$fit)
+#> warmup sample
+#> chain:1 1.819 2.313
+#> chain:2 1.927 2.225
+#> chain:3 2.038 3.406
+#> chain:4 2.024 2.143
+# summary plot
+plot(bkp)
Instead of a smooth Gaussian Process we might want the reproduction
+number to change step-wise, e.g. every week. This can be achieved using
+the rw
option which defines the length of the time step in
+a random walk that the reproduction number is assumed to follow.
+rw <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior, rw = 7),
+ gp = NULL
+)
+# summarise results
+summary(rw)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2136 (1056 -- 4315)
+#> 2: Expected change in daily reports Likely decreasing
+#> 3: Effective reproduction no. 0.87 (0.63 -- 1.2)
+#> 4: Rate of growth -0.035 (-0.11 -- 0.044)
+#> 5: Doubling/halving time (days) -20 (16 -- -6.3)
+# elapsed time (in seconds)
+get_elapsed_time(rw$fit)
+#> warmup sample
+#> chain:1 4.182 6.196
+#> chain:2 4.202 6.631
+#> chain:3 4.999 7.652
+#> chain:4 4.127 4.635
+# summary plot
+plot(rw)
Whilst EpiNow2 allows the user to specify delays, it can +also run directly on the data as does e.g. the EpiEstim +package.
+
+no_delay <- estimate_infections(
+ reported_cases,
+ generation_time = generation_time_opts(example_generation_time)
+)
+#> Warning: There were 32 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them.
+#> Warning: Examine the pairs() plot to diagnose sampling problems
+#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
+#> Running the chains for more iterations may help. See
+#> https://mc-stan.org/misc/warnings.html#bulk-ess
+#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
+#> Running the chains for more iterations may help. See
+#> https://mc-stan.org/misc/warnings.html#tail-ess
+# summarise results
+summary(no_delay)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2759 (2325 -- 3311)
+#> 2: Expected change in daily reports Decreasing
+#> 3: Effective reproduction no. 0.87 (0.76 -- 0.98)
+#> 4: Rate of growth -0.042 (-0.098 -- 0.0045)
+#> 5: Doubling/halving time (days) -17 (150 -- -7.1)
+# elapsed time (in seconds)
+get_elapsed_time(no_delay$fit)
+#> warmup sample
+#> chain:1 34.378 41.249
+#> chain:2 24.455 27.261
+#> chain:3 28.171 43.757
+#> chain:4 32.226 33.559
+# summary plot
+plot(no_delay)
The package also includes a non-parametric infection model. This runs +much faster but does not use the renewal equation to generate +infections. Because of this none of the options defining the behaviour +of the reproduction number are available in this case, limiting user +choice and model generality. It also means that the model is +questionable for forecasting, which is why were here set the predictive +horizon to 0.
+
+non_parametric <- estimate_infections(reported_cases,
+ generation_time = generation_time_opts(example_generation_time),
+ delays = delay_opts(delay),
+ rt = NULL,
+ backcalc = backcalc_opts(),
+ horizon = 0
+)
+# summarise results
+summary(non_parametric)
+#> measure estimate
+#> <char> <char>
+#> 1: New infections per day 2539 (2066 -- 3083)
+#> 2: Expected change in daily reports Decreasing
+#> 3: Effective reproduction no. 0.92 (0.8 -- 0.99)
+#> 4: Rate of growth -0.023 (-0.044 -- -0.0015)
+#> 5: Doubling/halving time (days) -30 (-460 -- -16)
+# elapsed time (in seconds)
+get_elapsed_time(non_parametric$fit)
+#> warmup sample
+#> chain:1 1.631 0.451
+#> chain:2 1.862 0.363
+#> chain:3 1.969 0.454
+#> chain:4 2.113 0.539
+# summary plot
+plot(non_parametric)
vignettes/estimate_infections_workflow.Rmd
+ estimate_infections_workflow.Rmd
This vignette describes the typical workflow for estimating +reproduction numbers and performing short-term forecasts for a disease +spreading in a given setting using EpiNow2. The vignette uses +the default non-stationary Gaussian process model included in the +package. See other vignettes for a more thorough exploration of alternative model variants +and theoretical background.
+Obtaining a good and full understanding of the data being used is an
+important first step in any inference procedure such as the one applied
+here. EpiNow2 expects data in the format of a data frame with
+two columns, date
and confirm
, where
+confirm
stands for the number of confirmed counts -
+although in reality this can be applied to any data including suspected
+cases and lab-confirmed outcomes. The user might already have the data
+as such a time series provided, for example, on public dashboards or
+directly from public health authorities. Alternatively, they can be
+constructed from individual-level data, for example using the incidence2 R
+package. An example data set called example_confirm
is
+included in the package:
+head(example_confirmed)
+#> date confirm
+#> <Date> <num>
+#> 1: 2020-02-22 14
+#> 2: 2020-02-23 62
+#> 3: 2020-02-24 53
+#> 4: 2020-02-25 97
+#> 5: 2020-02-26 93
+#> 6: 2020-02-27 78
Any estimation procedure is only as good as the data that feeds into +it. A thorough understanding of the data that is used for +EpiNow2 and its limitations is a prerequisite for its use. This +includes but is not limited to biases in the population groups that are +represented (EpiNow2 assumes a closed population with all +infections being caused by other infections in the same population), +reporting artefacts and delays, and completeness of reporting. Some of +these can be mitigated using the routines available in EpiNow2 +as described below, but others will cause biases in the results and need +to be carefully considered when interpreting the results.
+We first load the EpiNow2 package.
+ +We then set the number of cores to use. We will want to run 4 MCMC +chains in parallel so we set this to 4.
+
+options(mc.cores = 4)
If we had fewer than 4 available or wanted to run fewer than 4 chains
+(at the expense of some robustness), or had fewer than 4 computing cores
+available we could set it to that. To find out the number of cores
+available one can use the detectCores
+function from the parallel
package.
Once a data set has been identified, a number of relevant parameters +need to be considered before using EpiNow2. As these will +affect any results, it is worth spending some time investigating what +their values should be.
+EpiNow2 works with different delays that apply to different
+parts of the infection and observation process. They are defined using a
+common interface that involves functions that are named after the
+probability distributions, i.e. LogNormal()
,
+Gamma()
, etc. For help with this function, see its manual
+page
+?EpiNow2::Distributions
In all cases, the distributions given can be fixed +(i.e. have no uncertainty) or variable (i.e. have associated +uncertainty). For example, to define a fixed gamma distribution with +mean 3, standard deviation (sd) 1 and maximum value 10, you would +write
+
+Gamma(mean = 3, sd = 1, max = 10)
+#> - gamma distribution (max: 10):
+#> shape:
+#> 9
+#> rate:
+#> 3
If distributions are variable, the values with uncertainty are +treated as prior probability +densities in the Bayesian inference framework used by +EpiNow2, i.e. they are estimated as part of the inference. For +example, to define a variable gamma distribution where uncertainty in +the mean is given by a normal distribution with mean 3 and sd 2, and +uncertainty in the standard deviation is given by a normal distribution +with mean 1 and sd 0.1, with a maximum value 10, you would write
+
+Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10)
+#> Warning in new_dist_spec(params, "gamma"): Uncertain gamma distribution
+#> specified in terms of parameters that are not the "natural" parameters of the
+#> distribution (shape, rate). Converting using a crude and very approximate
+#> method that is likely to produce biased results. If possible, it is preferable
+#> to specify the distribution directly in terms of the natural parameters.
+#> - gamma distribution (max: 10):
+#> shape:
+#> - normal distribution:
+#> mean:
+#> 9
+#> sd:
+#> 2.5
+#> rate:
+#> - normal distribution:
+#> mean:
+#> 3
+#> sd:
+#> 1.4
Note the warning about parameters. We used the mean and standard +deviation to define this distribution with uncertain parameters, but it +would be better to use the “natural” parameters of the gamma +distribution, shape and rate, for example using the values estimate and +reported back after calling the previous command.
+There are various ways the specific delay distributions mentioned
+below might be obtained. Often, they will come directly from the
+existing literature reviewed by the user and studies conducted
+elsewhere. Sometimes it might be possible to obtain them from existing
+databases, e.g. using the epiparameter R
+package. Alternatively they might be obtainable from raw data,
+e.g. line-listed individual-level records. The EpiNow2 package
+contains functionality for estimating delay distributions from observed
+delays in the estimate_delays()
function. For a more
+comprehensive treatment of delays and their estimation avoiding common
+biases one can consider, for example, the dynamicaltruncation
+R package and associated paper.
The generation interval is a delay distribution that describes the
+amount of time that passes between an individual becoming infected and
+infecting someone else. In EpiNow2, the generation time
+distribution is defined by a call to
+generation_time_opts()
, a function that takes a single
+argument defined as a dist_spec
object (returned by the
+function corresponding to the probability distribution,
+i.e. LogNormal()
, Gamma()
, etc.). For example,
+to define the generation time as gamma distributed with uncertain mean
+centered on 3 and sd centered on 1 with some uncertainty, a maximum
+value of 10 and weighted by the number of case data points we could use
+the shape and rate parameters suggested above (though notes that this
+will only very approximately produce the uncertainty in mean and
+standard deviation stated there):
+generation_time <- Gamma(
+ shape = Normal(9, 2.5), rate = Normal(3, 1.4), max = 10
+)
+generation_time_opts(generation_time)
EpiNow2 calculates reproduction numbers based on the
+trajectory of infection incidence. Usually this is not observed
+directly. Instead, we calculate case counts based on, for example, onset
+of symptoms, lab confirmations, hospitalisations, etc. In order to
+estimate the trajectory of infection incidence from this we need to
+either know or estimate the distribution of delays from infection to
+count. Often, such counts are composed of multiple delays for which we
+only have separate information, for example the incubation period (time
+from infection to symptom onset) and reporting delay (time from symptom
+onset to being a case in the data, e.g. via lab confirmation, if counts
+are not by the date of symptom onset). In this case, we can combine
+multiple delays with the plus (+
) operator, e.g.
+incubation_period <- LogNormal(
+ meanlog = Normal(1.6, 0.05),
+ sdlog = Normal(0.5, 0.05),
+ max = 14
+)
+reporting_delay <- LogNormal(meanlog = 0.5, sdlog = 0.5, max = 10)
+incubation_period + reporting_delay
+#> Composite distribution:
+#> - lognormal distribution (max: 14):
+#> meanlog:
+#> - normal distribution:
+#> mean:
+#> 1.6
+#> sd:
+#> 0.05
+#> sdlog:
+#> - normal distribution:
+#> mean:
+#> 0.5
+#> sd:
+#> 0.05
+#> - lognormal distribution (max: 10):
+#> meanlog:
+#> 0.5
+#> sdlog:
+#> 0.5
In EpiNow2, the reporting delay distribution is defined by a
+call to delay_opts()
, a function that takes a single
+argument defined as a dist_spec
object (returned by
+LogNormal()
, Gamma()
etc.). For example, if
+our observations were by symptom onset we would use
+delay_opts(incubation_period)
If they were by date of lab confirmation that happens with a delay
+given by reporting_delay
, we would use
+delay <- incubation_period + reporting_delay
+delay_opts(delay)
Besides the delay from infection to the event that is recorded in the +data, there can also be a delay from that event to being recorded in the +data. For example, data reported by symptom onset may only become part +of the dataset once lab confirmation has occurred, or even a day or two +after that confirmation. Statistically, this means our data is +right-truncated. In practice, it means that recent data will be unlikely +to be complete.
+The amount of such truncation that exists in the data can be
+estimated from multiple snapshots of the data, i.e. what the data looked
+like at multiple past dates. One can then use methods that use the
+amount of backfilling that occurred 1, 2, … days after data for a date
+are first reported. In EpiNow2, this can be done using the
+estimate_truncation()
method which returns, amongst others,
+posterior estimates of the truncation distribution. For more details on
+the model used for this, see the estimate_truncation vignette.
+?estimate_truncation
In the estimate_infections()
function, the truncation
+distribution is defined by a call to trunc_opts()
, a
+function that takes a single argument defined as a
+dist_spec
(either defined by the user or obtained from a
+call to estimate_truncation()
or any other method for
+estimating right truncation). This will then be used to correct for
+right truncation in the data.
The separation of estimation of right truncation on the one hand and +estimation of the reproduction number on the other may be attractive for +practical purposes but is questionable statistically as it separates two +processes that are not strictly separable, potentially introducing a +bias. An alternative approach where these are estimated jointly is being +implemented in the epinowcast package, which is +being developed by the EpiNow2 developers with +collaborators.
+Another issue affecting the progression from infections to reported
+outcomes is underreporting, i.e. the fact that not all infections are
+reported as cases. This varies both by pathogen and population (and
+e.g. the proportion of infections that are asymptomatic) as well as the
+specific outcome used as data and where it is located on the severity
+pyramid (e.g. hospitalisations vs. community cases). In EpiNow2
+we can specify the proportion of infections that we expect to be
+observed (with uncertainty assumed represented by a truncated normal
+distribution with bounds at 0 and 1) using the scale
+argument to the obs_opts()
function. For example, if we
+think that 40% (with standard deviation 1%) of infections end up in the
+data as observations we could specify.
The default model that estimate_infections()
uses to
+estimate reproduction numbers requires specification of a prior
+probability distribution for the initial reproduction number. This
+represents the user’s initial belief of the value of the reproduction
+number, where there is no data yet to inform its value. By default this
+is assumed to be represented by a lognormal distribution with mean and
+standard deviation of 1. It can be changed using the
+rt_opts()
function. For example, if the user believes that
+at the very start of the data the reproduction number was 2, with
+uncertainty in this belief represented by a standard deviation of 1,
+they would use
When providing uncertain delay distributions one can end up in a
+situation where the estimated means are shifted a long way from the
+given distribution means, and possibly further than is deemed realistic
+by the user. In that case, one could specify narrower prior
+distributions (e.g., smaller mean_sd
) in order to keep the
+estimated means closer to the given mean, but this can be difficult to
+do in a principled manner in practice. As a more straightforward
+alternative, one can choose to weigh the generation time priors by the
+number of data points in the case data set by setting
+weigh_delay_priors = TRUE
(the default).
All the options are combined in a call to the
+estimate_infections()
function. For example, using some of
+the options described above one could call
+def <- estimate_infections(
+ example_confirmed,
+ generation_time = generation_time_opts(generation_time),
+ delays = delay_opts(delay),
+ rt = rt_opts(prior = rt_prior)
+)
+#> Warning: There were 3 divergent transitions after warmup. See
+#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+#> to find out why this is a problem and how to eliminate them.
+#> Warning: Examine the pairs() plot to diagnose sampling problems
Alternatively, for production environments, we recommend using the
+epinow()
function. It uses
+estimate_infections()
internally and provides functionality
+for logging and saving results and plots in dedicated directories in the
+user’s file system.
The estimate_infections()
function works with a single
+time series of outcomes such as cases by symptom onset or
+hospitalisations. Sometimes one wants to further create forecasts of
+other secondary outcomes such as deaths. The package contains
+functionality to estimate the delay and scaling between multiple time
+series with the estimate_secondary()
function, as well as
+for using this to make forecasts with the
+forecast_secondary()
function.
To visualise the results one can use the plot()
function
+that comes with the package
+plot(def)
The results returned by the estimate_infections
model
+depend on the values assigned to all to parameters discussed in this
+vignette, i.e. delays, scaling, and reproduction numbers, as well as the
+model variant used and its parameters. Any interpretation of the results
+will therefore need to bear these in mind, as well as any properties of
+the data and/or the subpopulations that it represents. See the Model options vignette for
+an illustration of the impact of model choice.
vignettes/estimate_secondary.Rmd
+ estimate_secondary.Rmd
This is a work in progress. Please consider submitting a PR +to improve it.
+This model is based on a discrete convolution of primary cases, +scaled based on the fraction (here described as the secondary fraction +but depending on application potentially being the case fatality ratio, +case hospitalisation ratio, or the hospitalisation fatality ratio), and +a delay distribution that is assumed to follow a discretised daily log +normal distribution. This model can be thought of as a discrete time +ordinary differential equation approximation generalised to log normal, +rather than exponential, delay distributions.
+We generalise this simple model beyond the incidence cases to also +include prevalence indicators (for example hospital admissions and +occupancy) where the secondary notifications can be thought of as +depending on secondary notifications from the previous timestep, scaled +current primary notifications, and minus scaled historic primary +notifications weighted by some delay distribution.
+This model can be defined as follows,
+\[\begin{equation} + \hat{S}_{t} = \delta_p S_{t} + \alpha \left( \delta_p P_{t} + +\delta_c \sum_{\tau = 0}^{D} \xi(\tau | \mu, \sigma) P_{t-\tau} \right) +\end{equation}\]
+where \(S_t\) and \(P_t\) are observed primary and secondary +notifications, \(\hat{S}_t\) are +expected secondary notifications, \(\delta_p = +1\) and \(\delta_c = -1\) when +\(S_t\) is a prevalence measure, \(delta_p = 0\) and \(\delta_c = 1\) when it is an incidence +based measure. \(\alpha\) and \(\xi\) are defined as the secondary fraction +and delay from primary to secondary notification (or delay from +secondary notification to recovery etc in the prevalence case) with +\(\alpha\) typically being of most +interest to those interpreting the models posterior estimates. We +further assume that \(\xi\) follows a +discretised log normal distribution described by its mean \(\mu\) and standard deviation \(\sigma\) on the log scale (where we take +the cumulative mass function for time \(t\) minus the cumulative mass function for +\(t-1\)) normalised by the maximum +allowed delay \(D\) such that \(\sum^D_{\tau=0}{ \xi(\tau | \mu, \sigma)} = +1\).
+The above definition captures our mechanistic assumptions for the +expectation of secondary notifications but does not account for +potential observation noise or reporting patterns. Here we assume a +negative binomial observation model (though our implementation also +supports a Poisson observation model) in order to capture potential +reporting overdispersion \(\phi\) and +adjust expected counts using an optional day of the week effect based on +a simplex \(\omega_{(t \mod 7)}\) (such +that \(\sum^6_{t=0}{w_t} = 7\) so the +total effect over a week is balanced). This gives the following +observation process,
+\[\begin{equation} + S_{t} \sim \mathrm{NB}\left(\omega_{t \mod 7} \hat{S}_t, \phi +\right). +\end{equation}\]
+vignettes/estimate_truncation.Rmd
+ estimate_truncation.Rmd
This is a work in progress. Please consider submitting a PR +to improve it.
+This model deals with the problem of nowcasting, or
+adjusting for right-truncation in reported count data. This occurs when
+the quantity being observed, for example cases, hospitalisations or
+deaths, is reported with a delay, resulting in an underestimation of
+recent counts. The estimate_truncation()
model attempts to
+infer parameters of the underlying delay distributions from multiple
+snapshots of past data. It is designed to be a simple model that can
+integrate with the other models in the package and therefore may not be
+ideal for all uses. For a more principled approach to nowcasting please
+consider using the epinowcast package.
Given snapshots \(C^{i}_{t}\) +reflecting reported counts for time \(t\) where \(i=1\ldots S\) is in order of recency +(earliest snapshots first) and \(S\) is +the number of past snapshots used for estimation, we try to infer the +parameters of a discrete truncation distribution \(\zeta(\tau | \mu_{\zeta}, \sigma_{\zeta})\) +with corresponding probability mass function \(Z(\tau | \mu_{\zeta}\)).
+The model assumes that final counts \(D_{t}\) are related to observed snapshots +via the truncation distribution such that
+\[\begin{equation} +C^{i < S)_{t}_\sim \mathcal{NegBinom}\left(Z (T_i - t | \mu_{Z}, +\sigma_{Z}) D(t) + \sigma, \varphi\right) +\end{equation}\]
+where \(T_i\) is the date of the +final observation in snapshot \(i\), +\(Z(\tau)\) is defined to be zero for +negative values of \(\tau\) and \(\sigma\) is an additional error term.
+The final counts \(D_{t}\) are +estimated from the most recent snapshot as
+\[\begin{equation} +D_t = \frac{C^{S}}{Z (T_\mathrm{S} - t | \mu_{Z}, \sigma_{Z})} +\end{equation}\]
+Relevant priors are:
+\[\begin{align} +\mu_\zeta &\sim \mathrm{Normal}(0, 1)\\ +\sigma_\zeta &\sim \mathrm{HalfNormal}(0, 1)\\ +\varphi &\sim \mathrm{HalfNormal}(0, 1)\\ +\sigma &\sim \mathrm{HalfNormal}(0, 1) +\end{align}\]
+vignettes/gaussian_process_implementation_details.Rmd
+ gaussian_process_implementation_details.Rmd
We make use of Gaussian Processes in several places in
+EpiNow2
. For example, the default model for
+estimate_infections()
uses a Gaussian Process to model the
+1st order difference on the log scale of the reproduction number. This
+vignette describes the implementation details of the approximate
+Gaussian Process used in EpiNow2
.
The single dimension Gaussian Processes (\(\mathrm{GP}_t\)) we use can be written +as
+\[\begin{equation} +\mathcal{GP}(\mu(t), k(t, t')) +\end{equation}\]
+where \(\mu(t)\) and \(k(t,t')\) are the mean and covariance +functions, respectively. In our case as set out above, we have
+\[\begin{equation} +\mu(t) \equiv 0 \\ +k(t,t') = k(|t - t'|) = k(\Delta t) +\end{equation}\]
+where by default \(k\) is a Matern +3/2 covariance kernel,
+\[\begin{equation} +k(\Delta t) = \alpha \left( 1 + \frac{\sqrt{3} \Delta t}{l} \right) \exp +\left( - \frac{\sqrt{3} \Delta t}{l}\right) +\end{equation}\]
+with \(l>0\) and \(\alpha > 0\) the length scale and +magnitude, respectively, of the kernel. Alternatively, a squared +exponential kernel can be chosen to constrain the GP to be smoother.
+\[\begin{equation} +k(\Delta t) = \alpha \exp \left( - \frac{1}{2} \frac{(\Delta t^2)}{l^2} +\right) +\end{equation}\]
+In order to make our models computationally tractable, we approximate +the Gaussian Process using a Hilbert space approximation to the Gaussian +Process[1], centered around +mean zero.
+\[\begin{equation} +\mathcal{GP}(0, k(\Delta t)) \approx \sum_{j=1}^m +\left(S_k(\sqrt{\lambda_j}) \right)^\frac{1}{2} \phi_j(t) \beta_j +\end{equation}\]
+with \(m\) the number of basis +functions to use in the approximation, which we calculate from the +number of time points \(t_\mathrm{GP}\) +to which the Gaussian Process is being applied (rounded up to give an +integer value), as is recommended[1].
+\[\begin{equation} +m = b t_\mathrm{GP} +\end{equation}\]
+and values of \(\lambda_j\) given +by
+\[\begin{equation} +\lambda_j = \left( \frac{j \pi}{2 L} \right)^2 +\end{equation}\]
+where \(L\) is a positive number +termed boundary condition, and \(\beta_{j}\) are regression weights with +standard normal prior
+\[\begin{equation} +\beta_j \sim \mathcal{Normal}(0, 1) +\end{equation}\]
+The function \(S_k(x)\) is the
+spectral density relating to a particular covariance function \(k\). In the case of the Matern 3/2 kernel
+(the default in EpiNow2
) this is given by
\[\begin{equation} +S_k(x) = 4 \alpha^2 \left( \frac{\sqrt{3}}{\rho}\right)^3 \left(\left( +\frac{\sqrt{3}}{\rho} \right)^2 + w^2 \right)^{-2} +\end{equation}\]
+and in the case of a squared exponential kernel by
+\[\begin{equation} +S_k(x) = \alpha^2 \sqrt{2\pi} \rho \exp \left( -\frac{1}{2} \rho^2 w^2 +\right) +\end{equation}\]
+The functions \(\phi_{j}(x)\) are +the eigenfunctions of the Laplace operator,
+\[\begin{equation} +\phi_j(t) = \frac{1}{\sqrt{L}} \sin\left(\sqrt{\lambda_j} (t^* + +L)\right) +\end{equation}\]
+with time rescaled linearly to be between -1 and 1,
+\[\begin{equation} +t^* = \frac{t - \frac{1}{2}t_\mathrm{GP}}{\frac{1}{2}t_\mathrm{GP}} +\end{equation}\]
+Relevant priors are
+\[\begin{align} +\alpha &\sim \mathcal{Normal}(0, \sigma_{\alpha}) \\ +\rho &\sim \mathcal{LogNormal} (\mu_\rho, \sigma_\rho)\\ +\end{align}\]
+with \(\rho\) additionally +constrained to be between \(\rho_\mathrm{min}\) and \(\rho_\mathrm{max}\), \(\mu_{\rho}\) and \(\sigma_\rho\) calculated from given mean +\(m_{\rho}\) and standard deviation +\(s_\rho\), and default values (all of +which can be changed by the user):
+\[\begin{align} +b &= 0.2 \\ +L &= 1.5 \\ +m_\rho &= 21 \\ +s_\rho &= 7 \\ +\rho_\mathrm{min} &= 0\\ +\rho_\mathrm{max} &= 60\\ +\sigma_\alpha &= 0.05\\ +\end{align}\]
+