Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 793: Improve documentation for those specifying a non-parametric delay #799

Merged
merged 8 commits into from
Oct 1, 2024

Conversation

kaitejohnson
Copy link
Contributor

@kaitejohnson kaitejohnson commented Sep 29, 2024

Description

This PR closes #793.

It expands on the vignette details describing how to specify a generation interval and reporting delay. Specifically, it gives an example of how to provide a fixed non-parametric vector delay distribution. I added language explaining that the generation interval should be indexed starting at day 1 of an infection whereas the reporting delays should be indexed starting at day 0 (as is consistent with the model definition, but might not be clear to a user).

I looked through the generation_time_opts() and the distributions handling but wasn't sure if this distinction was appropriate in either of those, happy to add some language about this in the documentation if theres a specific place you all think is most appropriate.

The context for this was that its easy to accidentally pass in a shifted by one generation interval distribution if assuming the GI is 0 indexed.

I think a separate issue could be to use primarycensoreddist to generate a GI pmf in data-raw and pass that in as an example either instead of or in addition to the current example_generation_time

Initial submission checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have tested my changes locally (using devtools::test() and devtools::check()).
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required and rebuilt docs if yes (using devtools::document()).
  • I have followed the established coding standards (and checked using lintr::lint_package()).
  • I have added a news item linked to this PR.

After the initial Pull Request

  • I have reviewed Checks for this PR and addressed any issues as far as I am able.

@kaitejohnson kaitejohnson marked this pull request as draft September 29, 2024 20:20
@seabbs seabbs requested a review from sbfnk September 30, 2024 09:25
@sbfnk
Copy link
Contributor

sbfnk commented Sep 30, 2024

Thanks @kaitejohnson, this is great and definitely a very good idea to clarify.

I might be misremembering but I think generation intervals are in fact zero-indexed but the 0-component is set to zero as of version 1.4.0, here (left_truncate being set to 1 for generation intervals):

pmf = append_row(

The context for this was that its easy to accidentally pass in a shifted by one generation interval distribution if assuming the GI is 0 indexed.

The point that this is not clear and easy to get wrong by accident obviously stands regardless. I think it would be great to clarify also in gt_opts() as you suggest.

I think a separate issue could be to use primarycensoreddist to generate a GI pmf in data-raw and pass that in as an example either instead of or in addition to the current example_generation_time

I agree.

@kaitejohnson
Copy link
Contributor Author

I see I missed that change, can adjust the language accordingly.

To make sure my interpretation is now correct, if a user now passes in a GI pmf, it should be 0-indexed (just like the other delay pmfs), because under the hood the left truncation and renormalization occurs (in the function you sent).

@sbfnk
Copy link
Contributor

sbfnk commented Sep 30, 2024

To make sure my interpretation is now correct, if a user now passes in a GI pmf, it should be 0-indexed (just like the other delay pmfs), because under the hood the left truncation and renormalization occurs (in the function you sent).

Yes, exactly.

@seabbs
Copy link
Contributor

seabbs commented Sep 30, 2024

Looking at the code this does appear to be the case and I must say I had no idea.

I think we should throw an information message in gt_opts when a fixed distribution is passed as for thos passing vectors I can see this being very confusing.

@dylanhmorris
Copy link

If NonParametric generation interval PMFs are going to be zero-indexed, I suggest erroring whenever there is non-zero mass in the 0 bin. I think this will cause the fewest (and the least costly) surprises for users.

If a user wishes to specify a PMF bin-by-bin, it's reasonable to ask them to decide explicitly how to handle any mass in the 0th bin. So failing if a user falsely assumes 1-indexing is imo worth the cost of forcing drop-and-renormalize users to perform that operation manually.

@sbfnk
Copy link
Contributor

sbfnk commented Sep 30, 2024

If NonParametric generation interval PMFs are going to be zero-indexed, I suggest erroring whenever there is non-zero mass in the 0 bin. I think this will cause the fewest (and the least costly) surprises for users.

That, I think, is a really good suggestion and would do away with need for an explicit warning in the case that the first bin is zero.

@kaitejohnson
Copy link
Contributor Author

The vignette is updated to explain that the GI should be 0 indexed with a mass of 0 on the first element. I didn't adjust any documentation in generation_time_opts() as that seems to be addressed in #808.

If this feels like too much detail for a vignette, feel free to ignore. I think the warning will flag for most using a fixed non-parametric GI.

@kaitejohnson kaitejohnson marked this pull request as ready for review October 1, 2024 11:07
Copy link
Contributor

@sbfnk sbfnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-wise this looks great now.

Just one request: could you move these changes over to EpiNow2.Rmd.orig? The .Rmd files are generated from these with pre-computed results (so they don't have to be re-rendered every time we build the package which takes quite a lot of time/computation). We can then build the Rmd via the corresponding action in https://github.com/epiforecasts/EpiNow2/blob/main/.github/workflows/render-EpiNow2.yaml

@seabbs
Copy link
Contributor

seabbs commented Oct 1, 2024

And add yourself to the DESCRIPTION as a contributor

vignettes/EpiNow2.Rmd Outdated Show resolved Hide resolved
@kaitejohnson
Copy link
Contributor Author

I am guessing its ok that I didn't remove the changes from EpiNow2.Rmd since these will get overwritten by the GH action?

@sbfnk
Copy link
Contributor

sbfnk commented Oct 1, 2024

I am guessing its ok that I didn't remove the changes from EpiNow2.Rmd since these will get overwritten by the GH action?

You’re guessing right I think.

@seabbs seabbs enabled auto-merge October 1, 2024 12:51
@seabbs seabbs added this pull request to the merge queue Oct 1, 2024
Merged via the queue into epiforecasts:main with commit 9873893 Oct 1, 2024
9 checks passed
If this is not the case, a warning will indicate that the vector is being left-truncated and renormalized.

```r
example_non_parametric_gi <- NonParametric(pmf = c(0, 0.3, 0.5, 0.2))
Copy link
Contributor

@jamesmbaazam jamesmbaazam Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this has been merged but I was catching up and noticed the following: given the preceding text before the chunk, I would have expected to see the warning showcased here, i.e., the example should use gt_opts() instead of the direct call to NonParametric() like so

> gt_opts(NonParametric(pmf = c(0.1, 0.3, 0.5, 0.2)))
- nonparametric distribution
  PMF: [0.091 0.27 0.45 0.18]
Warning message:
Specifying nonparametric generation times with nonzero first element was deprecated in
EpiNow2 1.6.0.Since zero generation times are not supported by the model, the generation time will be
  left-truncated at one.In future versions this will cause an error. Please ensure that the first element of
  the nonparametric generation interval is zero.The deprecated feature was likely used in the EpiNow2 package.
  Please report the issue at <https://github.com/epiforecasts/EpiNow2/issues>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmbaazam Do you think it makes sense to include a non-parametric pmf that produces this warning as you have shown?

The one currently included has a value of 0 on day 0, so it won't produce the warning! Perhaps the preceding text isn't clear enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the text is fine but personally would have expected the code sample to showcase the warning thrown when the delay is not 0-indexed. Additionally, I would suggest to explicitly print the results of example_non_parametric_gi and example_non_parametric_delay in the vignette.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can open an issue and subsequent PR to add this.

Do you think there should be an example for both the correct specification (so with the 0 on day 0) and the incorrect specification that results in a warning (as you demonstrated)? My only concern is about bloating the vignette, otherwise I think it could make sense to show both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we want to show a way of specifying that we're discouraging in the warning (and that will cause an error in the future). Perhaps we should just remove the highlighted sentence if it's confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbfnk You're right about not encouraging the wrong specification.

Maybe, we can reword this part "If this is not the case, a warning will indicate that the vector is being left-truncated and normalized." -> "If this is not the case, the vector will be left-truncated and normalized."

I do see that the doc of gt_opts() has the following wording: "Because the discretised renewal equation used in the package does not support zero generation times, any distribution specified here will be left-truncated at one, i.e. the first element of the nonparametric or discretised probability distribution used for the generation time is set to zero and the resulting distribution renormalised." Shouldn't we just reuse that here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that in the vignette we should only tell users how to specify this correctly and not what happens if they fail to do so (which they'll find out with the warning anyway). So I'd vote for removing the sentence altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaitejohnson Would you like to take this one 😃 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve documentation for those passing in a GI
5 participants