Proposed interface for accumulation / missingness #839

sbfnk · 2024-10-24T14:42:34Z

Following discussion in #547 here's a proposed design for the future accumulation / missingness interface. As proposed this would also accommodate changes in reporting. As default behaviour would be unchanged it would require fairly minimal deprecation. All comments welcome.

Supporting missing data

We want to support reporting patterns that include accumulation (i.e. batch reporting of data from multiple dates, for example weekly) and missingness (dates which are lacking reports) in incidence and prevalence data.

Proposed interface

Missing data

Any dates between the minimum date and maximum date in the data that is either absent, or present with an NA value (currently called confirm) is interpreted as missing and ignored in the likelihood.
All other data points are used in the likelihood.
This matches the current default behaviour, introduced in version 1.5.0.

Accumulation

If instead modelled values on these days should be accumulated onto the next reporting date, the passed data.frame must have an additional logical column, accumulate.
If accumulate is TRUE then the modelled value of the observed latent variable on that day is added to any existing accumulation and stored for later observation.
If accumulate is FALSE the modelled value is added to any stored accumulated variables before potentially being used in the likelihood on that day (if not NA).
Subsequently the stored accumulated variable is reset to zero.

Example

date	confirm	accumulate
2024-10-23	NA	TRUE
2024-10-24	10	FALSE
2024-10-26	NA	TRUE
2024-10-27	NA	TRUE
2024-10-28	17	FALSE

The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).

Helper functions

A helper function, fill_missing_dates() can be used to convert weekly data to the required format, i.e.

date	confirm
2024-10-24	10
2024-10-31	17
2024-11-07	11

can be converted with fill_missing_dates(missing = "accumulate", initial = 7) to

date	confirm	accumulate
2024-10-18	NA	TRUE
2024-10-19	NA	TRUE
2024-10-20	NA	TRUE
2024-10-21	NA	TRUE
2024-10-22	NA	TRUE
2024-10-23	NA	TRUE
2024-10-24	10	FALSE
2024-10-25	NA	TRUE
2024-10-26	NA	TRUE
2024-10-27	NA	TRUE
2024-10-28	NA	TRUE
2024-10-29	NA	TRUE
2024-10-30	NA	TRUE
2024-10-31	17	FALSE
2024-11-01	NA	TRUE
2024-11-02	NA	TRUE
2024-11-03	NA	TRUE
2024-11-04	NA	TRUE
2024-11-05	NA	TRUE
2024-11-06	NA	TRUE
2024-11-07	11	TRUE

seabbs · 2024-10-24T15:41:20Z

This looks really nice from reading the pitch but...

My high-level concern is the proposed interaction between missingness and accumulation. The current design says that something is accumulated until an observation occurs which seems potentially error prone to me? Why did you discount the alternative where these are separable? The reason I think it concerning is that it seems very easy to make a mistake or to get unexpected behaviour. My impression is that having missing data in your weekly example would look like this:

date	confirm	accumulate
2024-10-18	NA	1
2024-10-19	NA	1
2024-10-20	NA	1
2024-10-21	NA	1
2024-10-22	NA	1
2024-10-23	NA	1
2024-10-24	NA	0
2024-10-25	NA	1
2024-10-26	NA	1
2024-10-27	NA	1
2024-10-28	NA	1
2024-10-29	NA	1
2024-10-30	NA	1
2024-10-31	17	1
2024-11-01	NA	1
2024-11-02	NA	1
2024-11-03	NA	1
2024-11-04	NA	1
2024-11-05	NA	1
2024-11-06	NA	1
2024-11-07	11	1

(see the 24th) but would this reset the accumulation counter as you might expect? From the outline the mechanism to reset accumulation is implicit and I think occurs when data is present?

A more restrictive separable approach would be:

Example

date	confirm	accumulate
2024-10-23	NA	1
2024-10-24	10	0
2024-10-26	NA	1
2024-10-27	NA	1
2024-10-28	17	0

The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).

Whilst removing the interaction this design switches to accumulate being logical. It loses the partial ascertainment and the ability to skip accumulation (i.e adding a delay effect as you did on the 27th). I am unclear as to the utility of those but happy to hear? You could relax this to make accumulate numeric and so allow for partial ascertainment.

sbfnk · 2024-10-24T16:03:07Z

Whilst removing the interaction this design switches to accumulate being logical. It loses the partial ascertainment and the ability to skip accumulation (i.e adding a delay effect as you did on the 27th). I am unclear as to the utility of those but happy to hear? You could relax this to make accumulate numeric and so allow for partial ascertainment.

This is where I started but I think in this case we couldn't support partial ascertainment on the day that we reset (as we'd need accumulate == 0). But perhaps that's clearer as a separate feature/column anyway, and accumulate should just be binary.

seabbs · 2024-10-24T16:25:25Z

Okay makes sense. Do we want to support partial ascertainment is the big question I have here. I am unclear vs how it interacts with other places so the answer is maybe for me? It could let you specify time-varying ascertainment kind of for free if you were careful which seems appealing

If we do is there a way of addressing my concern and example within the framework you've outlined? If we can do that it just leaves the potentially tricky interaction but with good helpers that won't be so bad/.

sbfnk · 2024-10-25T08:37:15Z

I think perhaps these things are perhaps best separated. We could easily support partial ascertaining here but this could be a separate numeric column. Accumulate could then be TRUE/FALSE with TRUE accumulating and FALSE resetting. Whether the likelihood is evaluated then depends only on whether the data is NA or not, i.e. it's evaluated at all non-missing data points.

inst/dev/accumulation.md

seabbs · 2024-10-25T14:34:36Z

We could easily support partial ascertaining here but this could be a separate numeric column

Yes agree

jamesmbaazam · 2024-10-25T16:22:04Z

I agree with making accumulate binary as it's easier to interpret and without nuance.

Co-authored-by: James Azam <[email protected]>

sbfnk · 2024-10-29T20:25:22Z

If anyone has an idea for a better name for this column than accumulate now is the time to suggest. Technically if accumulate==FALSE then the value is still accumulated (i.e. added to what was previously), just that it's reset to zero after then using the accumulated value in the likelihood. We could have a reset column instead of accumulate with the opposite logic but I don't think that's clearer.

sbfnk · 2024-10-30T14:49:29Z

If people are happy with the latest version (now updated in the PR comment) please approve (or request more changes) and we can create an issue to implement.

seabbs

LGTM. I don't have a better name suggestion.

design doc for accumulation / missingness

b5477b1

jamesmbaazam reviewed Oct 25, 2024

View reviewed changes

inst/dev/accumulation.md Outdated Show resolved Hide resolved

sbfnk and others added 4 commits October 29, 2024 16:19

fix typo

c69e2d8

Co-authored-by: James Azam <[email protected]>

update document following discussion

0b80f78

further clarification of likelihood

1d073a1

remove erroneous statement

601b574

sbfnk mentioned this pull request Oct 30, 2024

Add support for time-varying ascertainment #792

Open

Merge branch 'main' into accumulate-design

3a63d40

seabbs enabled auto-merge October 31, 2024 02:41

seabbs approved these changes Oct 31, 2024

View reviewed changes

seabbs added this pull request to the merge queue Oct 31, 2024

Merged via the queue into main with commit 7b9d5a0 Oct 31, 2024
9 checks passed

seabbs deleted the accumulate-design branch October 31, 2024 04:25

sbfnk mentioned this pull request Nov 11, 2024

updated interface for accumulation #851

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed interface for accumulation / missingness #839

Proposed interface for accumulation / missingness #839

sbfnk commented Oct 24, 2024 •

edited

Loading

seabbs commented Oct 24, 2024 •

edited

Loading

sbfnk commented Oct 24, 2024

seabbs commented Oct 24, 2024 •

edited

Loading

sbfnk commented Oct 25, 2024

seabbs commented Oct 25, 2024

jamesmbaazam commented Oct 25, 2024

sbfnk commented Oct 29, 2024

sbfnk commented Oct 30, 2024

seabbs left a comment

Proposed interface for accumulation / missingness #839

Proposed interface for accumulation / missingness #839

Conversation

sbfnk commented Oct 24, 2024 • edited Loading

Supporting missing data

Proposed interface

Missing data

Accumulation

Example

Helper functions

seabbs commented Oct 24, 2024 • edited Loading

Example

sbfnk commented Oct 24, 2024

seabbs commented Oct 24, 2024 • edited Loading

sbfnk commented Oct 25, 2024

seabbs commented Oct 25, 2024

jamesmbaazam commented Oct 25, 2024

sbfnk commented Oct 29, 2024

sbfnk commented Oct 30, 2024

seabbs left a comment

Choose a reason for hiding this comment

sbfnk commented Oct 24, 2024 •

edited

Loading

seabbs commented Oct 24, 2024 •

edited

Loading

seabbs commented Oct 24, 2024 •

edited

Loading