Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study.
Alpha Forna1, PhD, Ilaria Dorigatti2, PhD, Pierre Nouvellet2,3 PhD, and Christl A. Donnelly2,4, ScD
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.
- School of Life Sciences, University of Sussex, Brighton, UK.
- Department of Statistics, University of Oxford, Oxford, UK.
Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random - MCAR, missing at random – MAR, or missing not at random - MNAR).
Alogrithmic framework used to simulate outbreak data characteristics.
Main R packages required to reproduce the simulation experiments.
Sample script for visualising the outputs of the simulation experiments.
Sample of the simulated data used for these experiments. (These data were simulated based on real outbreak data from the World Health Organisation (WHO). However, request for access to a real-life outbreak data should be made directly to the WHO by individual researchers and/or research groups).