-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed interface for accumulation / missingness #839
Conversation
This looks really nice from reading the pitch but... My high-level concern is the proposed interaction between missingness and accumulation. The current design says that something is accumulated until an observation occurs which seems potentially error prone to me? Why did you discount the alternative where these are separable? The reason I think it concerning is that it seems very easy to make a mistake or to get unexpected behaviour. My impression is that having missing data in your weekly example would look like this:
(see the 24th) but would this reset the accumulation counter as you might expect? From the outline the mechanism to reset accumulation is implicit and I think occurs when data is present? A more restrictive separable approach would be: Example
The likelihood is evaluated on two days, 24 October and 28 October. Whilst removing the interaction this design switches to |
This is where I started but I think in this case we couldn't support partial ascertainment on the day that we reset (as we'd need |
Okay makes sense. Do we want to support partial ascertainment is the big question I have here. I am unclear vs how it interacts with other places so the answer is maybe for me? It could let you specify time-varying ascertainment kind of for free if you were careful which seems appealing If we do is there a way of addressing my concern and example within the framework you've outlined? If we can do that it just leaves the potentially tricky interaction but with good helpers that won't be so bad/. |
I think perhaps these things are perhaps best separated. We could easily support partial ascertaining here but this could be a separate numeric column. Accumulate could then be TRUE/FALSE with TRUE accumulating and FALSE resetting. Whether the likelihood is evaluated then depends only on whether the data is NA or not, i.e. it's evaluated at all non-missing data points. |
Yes agree |
I agree with making |
If anyone has an idea for a better name for this column than |
If people are happy with the latest version (now updated in the PR comment) please approve (or request more changes) and we can create an issue to implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't have a better name suggestion.
Following discussion in #547 here's a proposed design for the future accumulation / missingness interface. As proposed this would also accommodate changes in reporting. As default behaviour would be unchanged it would require fairly minimal deprecation. All comments welcome.
Supporting missing data
We want to support reporting patterns that include accumulation (i.e. batch reporting of data from multiple dates, for example weekly) and missingness (dates which are lacking reports) in incidence and prevalence data.
Proposed interface
Missing data
Any dates between the minimum date and maximum date in the data that is either absent, or present with an
NA
value (currently calledconfirm
) is interpreted as missing and ignored in the likelihood.All other data points are used in the likelihood.
This matches the current default behaviour, introduced in version 1.5.0.
Accumulation
If instead modelled values on these days should be accumulated onto the next reporting date, the passed
data.frame
must have an additional logical column,accumulate
.If
accumulate
is TRUE then the modelled value of the observed latent variable on that day is added to any existing accumulation and stored for later observation.If
accumulate
is FALSE the modelled value is added to any stored accumulated variables before potentially being used in the likelihood on that day (if notNA
).Subsequently the stored accumulated variable is reset to zero.
Example
The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).
Helper functions
A helper function,
fill_missing_dates()
can be used to convert weekly data to the required format, i.e.can be converted with
fill_missing_dates(missing = "accumulate", initial = 7)
to