Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally throw message for obs_opts(na = "missing") #774

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ce27e9c
Add message about NA treatment for option "missing"
jamesmbaazam Sep 16, 2024
ca091d8
Add missing tests for obs_opts
jamesmbaazam Sep 16, 2024
aa85d56
Add NEWS item
jamesmbaazam Sep 16, 2024
06a4444
Improve messages
jamesmbaazam Sep 16, 2024
cfc0a52
Check if na is explicitly specified by user
jamesmbaazam Sep 16, 2024
e1103a7
Change less regular to non-daily
jamesmbaazam Sep 17, 2024
1ca8466
Create function to test for complete data
jamesmbaazam Sep 17, 2024
6edd773
Add tests for test_data_complete
jamesmbaazam Sep 17, 2024
89e7952
Document test_data_complete
jamesmbaazam Sep 17, 2024
3161ed9
Add function to crosscheck na as missing setting with data
jamesmbaazam Sep 17, 2024
5d57534
Move message elsewhere and only check if na is specified for later use
jamesmbaazam Sep 17, 2024
01fa3da
Add new global
jamesmbaazam Sep 17, 2024
d759fc1
Apply na cross checking function in main functions
jamesmbaazam Sep 17, 2024
eafc9fb
Rename dirty data
jamesmbaazam Sep 17, 2024
2ad382b
Add test for na cross checking function
jamesmbaazam Sep 17, 2024
49a4fce
Add more tests to obs_opts
jamesmbaazam Sep 17, 2024
71b144e
Remove global variable
jamesmbaazam Sep 17, 2024
a22557f
Use alternative syntax to avoid linting issues
jamesmbaazam Sep 17, 2024
255a980
copy before modifying data.table
jamesmbaazam Sep 17, 2024
b683db6
Fix test
jamesmbaazam Sep 17, 2024
1fa1975
Update NEWS.md
jamesmbaazam Sep 17, 2024
6dafdf4
Make message/warning verbose and test for message/warning
jamesmbaazam Sep 18, 2024
a3a98ad
Simplify comment
jamesmbaazam Sep 20, 2024
4188463
Use vapply for type safety
jamesmbaazam Sep 20, 2024
a07a6b7
Add lifecycle badge
jamesmbaazam Sep 20, 2024
c50e4c1
Add lifecycle badge
jamesmbaazam Sep 20, 2024
1b24c30
Improve docs
jamesmbaazam Sep 20, 2024
276b1db
Return unmodified obs_opts
jamesmbaazam Sep 20, 2024
ac1e639
Move removal of element of obs_opts() into checker function
jamesmbaazam Sep 20, 2024
bfa360c
Use setequal instead of checking lengths
jamesmbaazam Sep 20, 2024
2a64561
Remove new unnecessary new lines
jamesmbaazam Sep 20, 2024
f05c653
Remove trailing white space
jamesmbaazam Sep 24, 2024
4e4b8b5
Add test to check for expected element introduced
jamesmbaazam Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
- Switch to broadcasting the day of the week effect. By @seabbs in #746 and reviewed by @jamesmbaazam.
- A warning is now thrown if nonparametric PMFs passed to delay options have consecutive tail values that are below a certain low threshold as these lead to loss in speed with little gain in accuracy. By @jamesmbaazam in #752 and reviewed by @seabbs.
- `dist_fit()` can now accept any number of `samples` without throwing a warning when `samples` < 1000 in #751 by @jamesmbaazam and reviewed by @seabbs and @sbfnk.
- `obs_opts()` now informs users about how NA observations are treated to help them decide on existing alternatives. By @jamesmbaazam in # and reviewed by <REVIEWER>.
jamesmbaazam marked this conversation as resolved.
Show resolved Hide resolved
- Users are now informed that `NA` observations will be treated as missing instead of zero when using the default `obs_opts()`. Options to treat `NA` as zeros or accumulate them are also provided. By @jamesmbaazam in #774 and reviewed by @sbfnk.

## Package changes

Expand Down
79 changes: 79 additions & 0 deletions R/checks.R
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,82 @@ check_sparse_pmf_tail <- function(pmf, span = 5, tol = 1e-6) {
)
}
}

#' Check if data has either explicit NA values or implicit missing dates.
#'
#' @param data The data to be checked
#'
#' @return `TRUE` if data is complete, else if data has implicit or explicit
#' missingness, `FALSE`.
#' @keywords internal
test_data_complete <- function(data) {
data <- setDT(data) # Convert data to data.table

# Check for explicit missingness in required columns
# (date, confirm, primary, secondary)
columns_to_check <- c(
"date",
intersect(c("confirm", "primary", "secondary"), names(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding these column names will require coming back to this when addressing #505 - perhaps we should just pass this as an argument (where the function asking for the tests specifies what to check)?

)
if (any(vapply(data[, columns_to_check, with = FALSE], anyNA, logical(1)))) {
return(FALSE)
}

# Check for implicit missingness by comparing the expected full date sequence
complete_dates <- seq(
min(data$date, na.rm = TRUE),
max(data$date, na.rm = TRUE),
by = "1 day"
)
if (!setequal(complete_dates, unique(data$date))) {
return(FALSE)
}

return(TRUE) # Return TRUE if no missing values or gaps in date sequence
}

#' Cross-check treatment of `NA` in obs_opts() against input data
#'
#' @description `r lifecycle::badge("experimental")`
#'
#' This function checks the input data for implicit and/or explicit missingness
#' and checks if the user specified `na = "missing"` in [obs_opts()].
#' If the two are TRUE, it throws a message about how the model treats
#' missingness and provides alternatives. It returns an unmodified [obs_opts()].
#'
#' This function is necessary because the data and observation model
#' do not currently interact internally. It will be deprecated in future
#' versions when the data specification interface is enhanced.
#'
#' @param obs A call to [obs_opts()]
#' @param data The raw data
#' @importFrom cli cli_inform col_red
#'
#' @return [obs_opts()]
#' @keywords internal
check_na_setting_against_data <- function(data, obs) {
# If users are using the default treatment of NA's and their data has
# implicit or explicit NA's, inform them of what's happening and alternatives
if (!obs$accumulate &&
obs$na_as_missing_default_used &&
!test_data_complete(data)) {
#nolint start: duplicate_argument_linter
cli_inform(
c(
"i" = "{col_red(\"As of version 1.5.0 missing dates or dates with `NA`
cases are treated as missing. This is in contrast to previous versions
where these were interpreted as dates with zero cases. \")}",
"i" = "In order to treat missing or `NA` cases as zeroes, see
solutions in {.url https://github.com/epiforecasts/EpiNow2/issues/767#issuecomment-2348805272}", #nolint
"i" = "If the data is reported at non-daily intervals (for example
weekly), consider using `obs_opts(na=\"accumulate\")`.",
"i" = "For more information on these options, see `?obs_opts`."
),
.frequency = "regularly",
.frequency_id = "check_na_setting_against_data"
)
#nolint end
}
obs$na_as_missing_default_used <- NULL
return(obs)
}
9 changes: 9 additions & 0 deletions R/estimate_infections.R
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,15 @@ estimate_infections <- function(data,
name = "EpiNow2.epinow.estimate_infections"
)
}

# If the user is using the default treatment of NA's as missing and
# their data has implicit or explicit NA's, inform them of what's
# happening and provide alternatives.
obs <- check_na_setting_against_data(
obs = obs,
data = dirty_reported_cases
)
# Create clean and complete cases
# Order cases
reported_cases <- create_clean_reported_cases(
data, horizon,
Expand Down
13 changes: 10 additions & 3 deletions R/estimate_secondary.R
Original file line number Diff line number Diff line change
Expand Up @@ -182,15 +182,22 @@ estimate_secondary <- function(data,
assert_logical(verbose)

reports <- data.table::as.data.table(data)
secondary_reports <- reports[, list(date, confirm = secondary)]
secondary_reports_dirty <- reports[, list(date, confirm = secondary)]
secondary_reports <- create_clean_reported_cases(
secondary_reports,
secondary_reports_dirty,
filter_leading_zeros = filter_leading_zeros,
zero_threshold = zero_threshold
)

# If the user is using the default treatment of NA's as missing and
# their data has implicit or explicit NA's, inform them of what's
# happening and provide alternatives.
obs <- check_na_setting_against_data(
obs = obs,
data = secondary_reports_dirty
)
## fill in missing data (required if fitting to prevalence)
complete_secondary <- create_complete_cases(secondary_reports)

## fill down
secondary_reports[, confirm := nafill(confirm, type = "locf")]
## fill any early data up
Expand Down
8 changes: 5 additions & 3 deletions R/opts.R
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,8 @@ obs_opts <- function(family = c("negbin", "poisson"),
na = c("missing", "accumulate"),
likelihood = TRUE,
return_likelihood = FALSE) {
# NB: This has to be checked first before the na argument is touched anywhere.
na_default_used <- missing(na)
na <- arg_match(na)
if (na == "accumulate") {
#nolint start: duplicate_argument_linter
Expand All @@ -637,9 +639,8 @@ obs_opts <- function(family = c("negbin", "poisson"),
.frequency = "regularly",
.frequency_id = "obs_opts"
)
#nolint end
}

#nolint end
if (length(phi) == 2 && is.numeric(phi)) {
cli_abort(
c(
Expand All @@ -657,7 +658,8 @@ obs_opts <- function(family = c("negbin", "poisson"),
scale = scale,
accumulate = as.integer(na == "accumulate"),
likelihood = likelihood,
return_likelihood = return_likelihood
return_likelihood = return_likelihood,
na_as_missing_default_used = na_default_used
)

for (param in c("phi", "scale")) {
Expand Down
29 changes: 29 additions & 0 deletions man/check_na_setting_against_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions man/test_data_complete.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

84 changes: 83 additions & 1 deletion tests/testthat/test-checks.R
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,88 @@ test_that("check_reports_valid errors for bad 'secondary' specifications", {
})

test_that("check_sparse_pmf_tail throws a warning as expected", {
# NB: The warning is set to be thrown once every 8 hours, so hard to test
# regularly. The fix is to change the local setting here to throw the
# warning on demand for the sake of multiple runs of the test within
# 8 hours. That's what the rlang call below does
rlang::local_options(rlib_warning_verbosity = "verbose")
pmf <- c(0.4, 0.30, 0.20, 0.05, 0.049995, 4.5e-06, rep(1e-7, 5))
expect_warning(check_sparse_pmf_tail(pmf), "PMF tail has")
expect_warning(
check_sparse_pmf_tail(pmf),
"PMF tail has"
)
})

test_that("test_data_complete detects complete and incomplete data", {
# example_confirmed with explicit missing dates
ec_missing_date <- copy(example_confirmed)[c(1, 3), date := NA]
# example_confirmed with explicit missing confirm
ec_missing_confirm <- copy(example_confirmed)[c(1, 3), confirm := NA]
# example_confirmed with implicit missing (missing entries)
ec_implicit_missing <- copy(example_confirmed)[-c(1,3,5), ]
# Create a hypothetical complete example_secondary
es <- copy(example_confirmed)[
, primary := confirm
][
, secondary := primary * 0.4
]
# example_secondary with explicit missing primary
es_missing_primary <- copy(es)[c(1, 3), primary := NA]
# example_secondary with explicit missing secondary
es_missing_secondary <- copy(es)[c(1, 3), secondary := NA]

# Expectations
expect_true(test_data_complete(example_confirmed))
expect_true(test_data_complete(es))
expect_false(test_data_complete(ec_missing_date))
expect_false(test_data_complete(ec_missing_confirm))
expect_false(test_data_complete(es_missing_primary))
expect_false(test_data_complete(es_missing_secondary))
expect_false(test_data_complete(ec_implicit_missing))
})

test_that("check_na_setting_against_data works as expected", {
# If data is incomplete and the default na = "missing" is being used,
# expect a message thrown once every 8 hours.
# NB: We change the local setting here to throw the message on demand, rather
# than every 8 hours, for the sake of multiple runs of the test within
# 8 hours.
rlang::local_options(rlib_message_verbosity = "verbose")
expect_message(
check_na_setting_against_data(
obs = obs_opts(),
data = copy(example_confirmed)[c(1, 3), confirm := NA]
),
"version 1.5.0 missing dates or dates"
)
# If data is incomplete but the user explicitly set na = "missing", then
# expect no message
expect_no_message(
check_na_setting_against_data(
obs = obs_opts(na = "missing"),
data = copy(example_confirmed)[c(1, 3), confirm := NA]
)
)
# If data is complete, expect no message even when using default na as
# missing setting
expect_no_message(
check_na_setting_against_data(
obs = obs_opts(),
data = example_confirmed
)
)
expect_identical(
setdiff(
names(
obs_opts()
),
names(
check_na_setting_against_data(
obs = obs_opts(),
data = example_confirmed
)
)
),
"na_as_missing_default_used"
)
})
35 changes: 35 additions & 0 deletions tests/testthat/test-obs_opts.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
test_that("obs_opts returns expected default values", {
result <- suppressWarnings(obs_opts())

expect_s3_class(result, "obs_opts")
expect_equal(result$family, "negbin")
expect_equal(result$weight, 1)
expect_true(result$week_effect)
expect_equal(result$week_length, 7L)
expect_equal(result$scale, list(mean = 1, sd = 0))
expect_equal(result$accumulate, 0)
expect_true(result$likelihood)
expect_false(result$return_likelihood)
})

test_that("obs_opts returns expected messages", {
# The option na = "accumulate" informs the user of what is
# going to be done once every 8 hours, so hard to test regularly.
# NB: We change the local setting here to throw the message on demand, rather
# than every 8 hours, for the sake of multiple runs of the test within
# 8 hours.
rlang::local_options(rlib_message_verbosity = "verbose")
expect_message(
obs_opts(),
"NA values will be treated as missing"
)
sbfnk marked this conversation as resolved.
Show resolved Hide resolved
expect_message(
obs_opts(na = "accumulate"),
"modelled values that correspond to NA values"
)
})

test_that("obs_opts behaves as expected for user specified na treatment", {
# If user explicitly specifies NA as missing, then don't throw message
expect_false(obs_opts(na = "missing")$na_as_missing_default_used)
})
Loading