Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed interface for accumulation / missingness #839

Merged
merged 6 commits into from
Oct 31, 2024
Merged

Conversation

sbfnk
Copy link
Contributor

@sbfnk sbfnk commented Oct 24, 2024

Following discussion in #547 here's a proposed design for the future accumulation / missingness interface. As proposed this would also accommodate changes in reporting. As default behaviour would be unchanged it would require fairly minimal deprecation. All comments welcome.

Supporting missing data

We want to support reporting patterns that include accumulation (i.e. batch reporting of data from multiple dates, for example weekly) and missingness (dates which are lacking reports) in incidence and prevalence data.

Proposed interface

Missing data

Any dates between the minimum date and maximum date in the data that is either absent, or present with an NA value (currently called confirm) is interpreted as missing and ignored in the likelihood.
All other data points are used in the likelihood.
This matches the current default behaviour, introduced in version 1.5.0.

Accumulation

If instead modelled values on these days should be accumulated onto the next reporting date, the passed data.frame must have an additional logical column, accumulate.
If accumulate is TRUE then the modelled value of the observed latent variable on that day is added to any existing accumulation and stored for later observation.
If accumulate is FALSE the modelled value is added to any stored accumulated variables before potentially being used in the likelihood on that day (if not NA).
Subsequently the stored accumulated variable is reset to zero.

Example

date confirm accumulate
2024-10-23 NA TRUE
2024-10-24 10 FALSE
2024-10-26 NA TRUE
2024-10-27 NA TRUE
2024-10-28 17 FALSE

The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).

Helper functions

A helper function, fill_missing_dates() can be used to convert weekly data to the required format, i.e.

date confirm
2024-10-24 10
2024-10-31 17
2024-11-07 11

can be converted with fill_missing_dates(missing = "accumulate", initial = 7) to

date confirm accumulate
2024-10-18 NA TRUE
2024-10-19 NA TRUE
2024-10-20 NA TRUE
2024-10-21 NA TRUE
2024-10-22 NA TRUE
2024-10-23 NA TRUE
2024-10-24 10 FALSE
2024-10-25 NA TRUE
2024-10-26 NA TRUE
2024-10-27 NA TRUE
2024-10-28 NA TRUE
2024-10-29 NA TRUE
2024-10-30 NA TRUE
2024-10-31 17 FALSE
2024-11-01 NA TRUE
2024-11-02 NA TRUE
2024-11-03 NA TRUE
2024-11-04 NA TRUE
2024-11-05 NA TRUE
2024-11-06 NA TRUE
2024-11-07 11 TRUE

@seabbs
Copy link
Contributor

seabbs commented Oct 24, 2024

This looks really nice from reading the pitch but...

My high-level concern is the proposed interaction between missingness and accumulation. The current design says that something is accumulated until an observation occurs which seems potentially error prone to me? Why did you discount the alternative where these are separable? The reason I think it concerning is that it seems very easy to make a mistake or to get unexpected behaviour. My impression is that having missing data in your weekly example would look like this:

date confirm accumulate
2024-10-18 NA 1
2024-10-19 NA 1
2024-10-20 NA 1
2024-10-21 NA 1
2024-10-22 NA 1
2024-10-23 NA 1
2024-10-24 NA 0
2024-10-25 NA 1
2024-10-26 NA 1
2024-10-27 NA 1
2024-10-28 NA 1
2024-10-29 NA 1
2024-10-30 NA 1
2024-10-31 17 1
2024-11-01 NA 1
2024-11-02 NA 1
2024-11-03 NA 1
2024-11-04 NA 1
2024-11-05 NA 1
2024-11-06 NA 1
2024-11-07 11 1

(see the 24th) but would this reset the accumulation counter as you might expect? From the outline the mechanism to reset accumulation is implicit and I think occurs when data is present?

A more restrictive separable approach would be:

Example

date confirm accumulate
2024-10-23 NA 1
2024-10-24 10 0
2024-10-26 NA 1
2024-10-27 NA 1
2024-10-28 17 0

The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).

Whilst removing the interaction this design switches to accumulate being logical. It loses the partial ascertainment and the ability to skip accumulation (i.e adding a delay effect as you did on the 27th). I am unclear as to the utility of those but happy to hear? You could relax this to make accumulate numeric and so allow for partial ascertainment.

@sbfnk
Copy link
Contributor Author

sbfnk commented Oct 24, 2024

Whilst removing the interaction this design switches to accumulate being logical. It loses the partial ascertainment and the ability to skip accumulation (i.e adding a delay effect as you did on the 27th). I am unclear as to the utility of those but happy to hear? You could relax this to make accumulate numeric and so allow for partial ascertainment.

This is where I started but I think in this case we couldn't support partial ascertainment on the day that we reset (as we'd need accumulate == 0). But perhaps that's clearer as a separate feature/column anyway, and accumulate should just be binary.

@seabbs
Copy link
Contributor

seabbs commented Oct 24, 2024

Okay makes sense. Do we want to support partial ascertainment is the big question I have here. I am unclear vs how it interacts with other places so the answer is maybe for me? It could let you specify time-varying ascertainment kind of for free if you were careful which seems appealing

If we do is there a way of addressing my concern and example within the framework you've outlined? If we can do that it just leaves the potentially tricky interaction but with good helpers that won't be so bad/.

@sbfnk
Copy link
Contributor Author

sbfnk commented Oct 25, 2024

I think perhaps these things are perhaps best separated. We could easily support partial ascertaining here but this could be a separate numeric column. Accumulate could then be TRUE/FALSE with TRUE accumulating and FALSE resetting. Whether the likelihood is evaluated then depends only on whether the data is NA or not, i.e. it's evaluated at all non-missing data points.

@seabbs
Copy link
Contributor

seabbs commented Oct 25, 2024

We could easily support partial ascertaining here but this could be a separate numeric column

Yes agree

@jamesmbaazam
Copy link
Contributor

I agree with making accumulate binary as it's easier to interpret and without nuance.

@sbfnk
Copy link
Contributor Author

sbfnk commented Oct 29, 2024

If anyone has an idea for a better name for this column than accumulate now is the time to suggest. Technically if accumulate==FALSE then the value is still accumulated (i.e. added to what was previously), just that it's reset to zero after then using the accumulated value in the likelihood. We could have a reset column instead of accumulate with the opposite logic but I don't think that's clearer.

@sbfnk
Copy link
Contributor Author

sbfnk commented Oct 30, 2024

If people are happy with the latest version (now updated in the PR comment) please approve (or request more changes) and we can create an issue to implement.

@seabbs seabbs enabled auto-merge October 31, 2024 02:41
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have a better name suggestion.

@seabbs seabbs added this pull request to the merge queue Oct 31, 2024
Merged via the queue into main with commit 7b9d5a0 Oct 31, 2024
9 checks passed
@seabbs seabbs deleted the accumulate-design branch October 31, 2024 04:25
@sbfnk sbfnk mentioned this pull request Nov 11, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants