-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time utils #528
base: dev
Are you sure you want to change the base?
Time utils #528
Changes from all commits
1125ce5
4804c34
44fe03f
a9375ce
09cf367
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -86,5 +86,6 @@ Collate: | |
'reexports.R' | ||
'revision_analysis.R' | ||
'slide.R' | ||
'time_converters.R' | ||
'utils.R' | ||
'utils_pipe.R' |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,127 @@ | ||||||||||||||
#' convert an archive from daily data to weekly data, summing where appropriate | ||||||||||||||
#' @details | ||||||||||||||
#' this function is slow, so make sure you are calling it correctly, and | ||||||||||||||
#' consider testing it on a small portion of your archive first | ||||||||||||||
#' @param day_of_week integer, day of the week, starting from Sunday, select the | ||||||||||||||
#' date to represent the week in the time_value column, based on it's | ||||||||||||||
#' corresponding day of the week. The default value represents the week using | ||||||||||||||
#' Wednesday. | ||||||||||||||
#' @param day_of_week_end integer, day of the week starting on Sunday. | ||||||||||||||
#' Represents the last day, so the week consists of data summed to this day. | ||||||||||||||
#' The default value `6` means that the week is summed from Sunday through | ||||||||||||||
#' Saturday. | ||||||||||||||
daily_to_weekly <- function(epi_arch, | ||||||||||||||
agg_columns, | ||||||||||||||
agg_method = c("sum", "mean"), | ||||||||||||||
day_of_week = 4L, | ||||||||||||||
day_of_week_end = 6L) { | ||||||||||||||
agg_method <- arg_match(agg_method) | ||||||||||||||
if (agg_method == "total") { | ||||||||||||||
agg_fun <- epi_slide_sum | ||||||||||||||
} else if (agg_method == "mean") { | ||||||||||||||
agg_fun <- epi_slide_mean | ||||||||||||||
} | ||||||||||||||
keys <- grep("time_value", key_colnames(epi_arch), invert = TRUE, value = TRUE) | ||||||||||||||
too_many_tibbles <- epix_slide( | ||||||||||||||
epi_arch, | ||||||||||||||
before = 99999999L, | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inf is default
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this should also give some perf boost) |
||||||||||||||
ref_time_values = ref_time_values, | ||||||||||||||
function(x, group, ref_time) { | ||||||||||||||
x %>% | ||||||||||||||
group_by(across(all_of(keys))) %>% | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. after new updates you can just do and get rid of the keys assignment above
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. glad to hear it! I was getting very tired of recreating |
||||||||||||||
agg_fun(agg_columns, before = 6L) %>% | ||||||||||||||
dshemetov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
select(-all_of(agg_columns)) %>% | ||||||||||||||
rename_with(~ gsub("slide_value_", "", .x)) %>% | ||||||||||||||
# only keep 1/week | ||||||||||||||
filter(wday(time_value) == day_of_week_end) %>% | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
If you want to ensure 0 = Sun, 6 = Sat. Could maybe call this OR
Suggested change
If you want to ensure 1 = Mon, 6 = Sat, 7 = Sun. OR
Suggested change
If you want to ensure 1 = Sun, 7 = Sat. [Update default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on you pointing out the default is customizable, I was just planning on doing the 3rd one. Most of our collaborators/expected users are in the US, so defaulting to Sunday being the first day of the week is probably safest. |
||||||||||||||
# switch time_value to the designated day of the week | ||||||||||||||
mutate(time_value = time_value - 7L + day_of_week) %>% | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue?: this might need to be
Suggested change
Probably easiest to construct a test case with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did a little test, seems to back up current suggestion above. library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
crossing(end_date = as.Date("2020-01-01") + 0:6, day_of_week = 1:7) %>%
mutate(day_of_week_end = lubridate::wday(end_date, week_start = 7L)) %>%
mutate(
method1 = end_date - 7L + day_of_week,
method2 = end_date - 7L + (day_of_week - day_of_week_end) %% 7L,
method3 = end_date - (day_of_week_end - day_of_week) %% 7L
) %>%
pivot_longer(method1:method3, names_to = "method", values_to = "date") %>%
mutate(day_of_week_actual = lubridate::wday(date, week_start = 7L),
right_day_of_week = day_of_week == day_of_week_actual,
right_week = as.integer(end_date - date) %in% 0:6) %>%
summarize(.by = "method", issues = sum(!right_day_of_week | !right_week))
#> # A tibble: 3 × 2
#> method issues
#> <chr> <int>
#> 1 method1 42
#> 2 method2 7
#> 3 method3 0 Created on 2024-09-28 with reprex v2.1.1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah this definite seems right, thanks for the catch |
||||||||||||||
as_tibble() | ||||||||||||||
} | ||||||||||||||
) | ||||||||||||||
too_many_tibbles %>% | ||||||||||||||
rename(version = time_value) %>% | ||||||||||||||
rename_with(~ gsub("slide_value_", "", .x)) %>% | ||||||||||||||
as_epi_archive(compactify = TRUE) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
#' fill in values between, and divide any numeric values equally between them | ||||||||||||||
#' @param to_complete epi_archive | ||||||||||||||
#' @param groups to be grouped by. Should include both time_value and version, and any epi_keys | ||||||||||||||
#' @param columns_to_complete any columns that need their values extended | ||||||||||||||
#' @param aggregate_columns any columns which have numerical data that is a sum | ||||||||||||||
#' across days, and thus needs to be divided into equal parts distributed | ||||||||||||||
#' accross days | ||||||||||||||
convert_to_period_upsample <- function(to_complete, groups, columns_to_complete, | ||||||||||||||
aggregate_columns, source_period = 7, target_period = 1) { | ||||||||||||||
to_complete_datatable <- to_complete$DT | ||||||||||||||
completed_time_values <- | ||||||||||||||
to_complete_datatable %>% | ||||||||||||||
group_by(across(all_of(groups))) %>% | ||||||||||||||
reframe( | ||||||||||||||
time_value = seq(from = time_value, to = time_value + source_period - 1, by = target_period) | ||||||||||||||
) %>% | ||||||||||||||
unique() | ||||||||||||||
completed <- to_complete_datatable %>% | ||||||||||||||
full_join( | ||||||||||||||
completed_time_values, | ||||||||||||||
by = join_by(season, geo_value, version, time_value) | ||||||||||||||
Check warning on line 71 in R/time_converters.R GitHub Actions / lint
|
||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, should probably just be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there is epiprocess::key_colnames. unless you mean something like epiprocess::add_key which would probably require a join and recreating the epi_df under the hood. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed kill_time_value and instead you can now just do |
||||||||||||||
) %>% | ||||||||||||||
arrange(geo_value, version, time_value) %>% | ||||||||||||||
fill(all_of(columns_to_complete), .direction = "down") %>% | ||||||||||||||
mutate(across(all_of(aggregate_columns), \(x) x / 7)) | ||||||||||||||
completed %>% | ||||||||||||||
arrange(geo_value, time_value) %>% | ||||||||||||||
as_epi_archive(compactify = TRUE) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
|
||||||||||||||
#' get a season e.g. "2020/21" for a given year-week pair | ||||||||||||||
#' @keywords internal | ||||||||||||||
convert_epiweek_to_season <- function(epiyear, epiweek, season_start_week = 40) { | ||||||||||||||
# Convert epiweek to season | ||||||||||||||
update_inds <- epiweek < season_start_week | ||||||||||||||
epiyear <- ifelse(update_inds, epiyear - 1, epiyear) | ||||||||||||||
|
||||||||||||||
season <- paste0(epiyear, "/", substr((epiyear + 1), 3, 4)) | ||||||||||||||
return(season) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#' get a total count of the epiweeks in a given year | ||||||||||||||
#' @keywords internal | ||||||||||||||
epiweeks_in_year <- function(year) { | ||||||||||||||
last_week_of_year <- seq.Date(as.Date(paste0(year, "-12-24")), | ||||||||||||||
as.Date(paste0(year, "-12-31")), | ||||||||||||||
by = 1 | ||||||||||||||
) | ||||||||||||||
return(max(as.numeric(MMWRweek(last_week_of_year)$MMWRweek))) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#' get the week in a season | ||||||||||||||
#' @keywords internal | ||||||||||||||
convert_epiweek_to_season_week <- function(epiyear, epiweek, season_start = 40) { | ||||||||||||||
season_week <- epiweek - season_start + 1 | ||||||||||||||
|
||||||||||||||
update_inds <- season_week <= 0 | ||||||||||||||
# last year's # of epiweeks determines which week in the season we're at at | ||||||||||||||
# the beginning of the year | ||||||||||||||
season_week[update_inds] <- season_week[update_inds] + | ||||||||||||||
sapply(epiyear[update_inds] - 1, epiweeks_in_year) | ||||||||||||||
|
||||||||||||||
return(season_week) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#' get a canonical date to represent a given epiweek | ||||||||||||||
#' @keywords internal | ||||||||||||||
convert_epiweek_to_date <- function(epiyear, epiweek, | ||||||||||||||
day_of_week = 1) { | ||||||||||||||
end_date <- MMWRweek2Date(epiyear, epiweek, day_of_week) | ||||||||||||||
|
||||||||||||||
return(end_date) | ||||||||||||||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.