Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time utils #528

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -86,5 +86,6 @@ Collate:
'reexports.R'
'revision_analysis.R'
'slide.R'
'time_converters.R'
'utils.R'
'utils_pipe.R'
127 changes: 127 additions & 0 deletions R/time_converters.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
#' convert an archive from daily data to weekly data, summing where appropriate
#' @details
#' this function is slow, so make sure you are calling it correctly, and
#' consider testing it on a small portion of your archive first
#' @param day_of_week integer, day of the week, starting from Sunday, select the
#' date to represent the week in the time_value column, based on it's
#' corresponding day of the week. The default value represents the week using
#' Wednesday.
#' @param day_of_week_end integer, day of the week starting on Sunday.
#' Represents the last day, so the week consists of data summed to this day.
#' The default value `6` means that the week is summed from Sunday through
#' Saturday.
daily_to_weekly <- function(epi_arch,
agg_columns,
agg_method = c("sum", "mean"),
day_of_week = 4L,
day_of_week_end = 6L) {
agg_method <- arg_match(agg_method)
if (agg_method == "total") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (agg_method == "total") {
if (agg_method == "sum") {

agg_fun <- epi_slide_sum
} else if (agg_method == "mean") {
agg_fun <- epi_slide_mean
}
keys <- grep("time_value", key_colnames(epi_arch), invert = TRUE, value = TRUE)
too_many_tibbles <- epix_slide(
epi_arch,
before = 99999999L,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inf is default

Suggested change
before = 99999999L,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this should also give some perf boost)

ref_time_values = ref_time_values,

Check warning on line 28 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=28,col=23,[object_usage_linter] no visible binding for global variable 'ref_time_values'
function(x, group, ref_time) {
x %>%
group_by(across(all_of(keys))) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after new updates you can just do and get rid of the keys assignment above

Suggested change
group_by(across(all_of(keys))) %>%
group_epi_df(exclude = "time_value") %>%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad to hear it! I was getting very tired of recreating keys all over the place

agg_fun(agg_columns, before = 6L) %>%
dshemetov marked this conversation as resolved.
Show resolved Hide resolved
select(-all_of(agg_columns)) %>%
rename_with(~ gsub("slide_value_", "", .x)) %>%

Check warning on line 34 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=34,col=9,[object_usage_linter] no visible global function definition for 'rename_with'
# only keep 1/week
filter(wday(time_value) == day_of_week_end) %>%

Check warning on line 36 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=36,col=16,[object_usage_linter] no visible global function definition for 'wday'
Copy link
Contributor

@brookslogan brookslogan Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filter(wday(time_value) == day_of_week_end) %>%
filter(as.POSIXlt(time_value)$wday == day_of_week_end) %>%

If you want to ensure 0 = Sun, 6 = Sat. Could maybe call this lt_day_of_week_end, but as long as we avoid wday in the arg name (which I didn't realize you already did) without a clarifying prefix/suffix then that sounds good.

OR

Suggested change
filter(wday(time_value) == day_of_week_end) %>%
filter(wday(time_value, week_start = 1) == day_of_week_end) %>%

If you want to ensure 1 = Mon, 6 = Sat, 7 = Sun.

OR

Suggested change
filter(wday(time_value) == day_of_week_end) %>%
filter(wday(time_value, week_start = 7) == day_of_week_end) %>%

If you want to ensure 1 = Sun, 7 = Sat. [Update default day_of_week_end to match docs then. Note week_start = 7 is the default default but we need to pass it because user could have set the lubridate option to change the default.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on you pointing out the default is customizable, I was just planning on doing the 3rd one. Most of our collaborators/expected users are in the US, so defaulting to Sunday being the first day of the week is probably safest.

# switch time_value to the designated day of the week
mutate(time_value = time_value - 7L + day_of_week) %>%
Copy link
Contributor

@brookslogan brookslogan Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue?: this might need to be

Suggested change
mutate(time_value = time_value - 7L + day_of_week) %>%
mutate(time_value = time_value - (day_of_week_end - day_of_week) %% 7L) %>%

Probably easiest to construct a test case with day_of_week_end not Saturday & see if anything gives the right result. [And also one with day_of_week equal to day_of_week_end; I think I messed up in an earlier version of this suggestion for that case, but hopefully fixed.]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a little test, seems to back up current suggestion above.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
crossing(end_date = as.Date("2020-01-01") + 0:6, day_of_week = 1:7) %>%
  mutate(day_of_week_end = lubridate::wday(end_date, week_start = 7L)) %>%
  mutate(
    method1 = end_date - 7L + day_of_week,
    method2 = end_date - 7L + (day_of_week - day_of_week_end) %% 7L,
    method3 = end_date - (day_of_week_end - day_of_week) %% 7L
  ) %>%
  pivot_longer(method1:method3, names_to = "method", values_to = "date") %>%
  mutate(day_of_week_actual = lubridate::wday(date, week_start = 7L),
         right_day_of_week = day_of_week == day_of_week_actual,
         right_week = as.integer(end_date - date) %in% 0:6) %>%
  summarize(.by = "method", issues = sum(!right_day_of_week | !right_week))
#> # A tibble: 3 × 2
#>   method  issues
#>   <chr>    <int>
#> 1 method1     42
#> 2 method2      7
#> 3 method3      0

Created on 2024-09-28 with reprex v2.1.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this definite seems right, thanks for the catch

as_tibble()
}
)
too_many_tibbles %>%
rename(version = time_value) %>%
rename_with(~ gsub("slide_value_", "", .x)) %>%

Check warning on line 44 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=44,col=5,[object_usage_linter] no visible global function definition for 'rename_with'
as_epi_archive(compactify = TRUE)
}




#' fill in values between, and divide any numeric values equally between them
#' @param to_complete epi_archive
#' @param groups to be grouped by. Should include both time_value and version, and any epi_keys
#' @param columns_to_complete any columns that need their values extended
#' @param aggregate_columns any columns which have numerical data that is a sum
#' across days, and thus needs to be divided into equal parts distributed
#' accross days
convert_to_period_upsample <- function(to_complete, groups, columns_to_complete,
aggregate_columns, source_period = 7, target_period = 1) {
to_complete_datatable <- to_complete$DT
completed_time_values <-
to_complete_datatable %>%
group_by(across(all_of(groups))) %>%
reframe(

Check warning on line 64 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=64,col=5,[object_usage_linter] no visible global function definition for 'reframe'
time_value = seq(from = time_value, to = time_value + source_period - 1, by = target_period)
) %>%
unique()
completed <- to_complete_datatable %>%
full_join(

Check warning on line 69 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=69,col=5,[object_usage_linter] no visible global function definition for 'full_join'
completed_time_values,
by = join_by(season, geo_value, version, time_value)

Check warning on line 71 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=71,col=12,[object_usage_linter] no visible global function definition for 'join_by'

Check warning on line 71 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=71,col=20,[object_usage_linter] no visible binding for global variable 'season'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

season seems particular to whatever you were working on. this probably needs to relate somehow to groups above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, should probably just be keys. Relatedly, we probably need a function to create keys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is epiprocess::key_colnames. unless you mean something like epiprocess::add_key which would probably require a join and recreating the epi_df under the hood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_colnames includes time_value, unless there's an option I missed. So every time I have to remove time_value; Dan even added a kill_time_value function in epipredict because it's such a recurring pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed kill_time_value and instead you can now just do key_colnames(edf, exclude = "time_value").

) %>%
arrange(geo_value, version, time_value) %>%
fill(all_of(columns_to_complete), .direction = "down") %>%

Check warning on line 74 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=74,col=5,[object_usage_linter] no visible global function definition for 'fill'
mutate(across(all_of(aggregate_columns), \(x) x / 7))
completed %>%
arrange(geo_value, time_value) %>%
as_epi_archive(compactify = TRUE)
}





#' get a season e.g. "2020/21" for a given year-week pair
#' @keywords internal
convert_epiweek_to_season <- function(epiyear, epiweek, season_start_week = 40) {
# Convert epiweek to season
update_inds <- epiweek < season_start_week
epiyear <- ifelse(update_inds, epiyear - 1, epiyear)

season <- paste0(epiyear, "/", substr((epiyear + 1), 3, 4))
return(season)
}

#' get a total count of the epiweeks in a given year
#' @keywords internal
epiweeks_in_year <- function(year) {
last_week_of_year <- seq.Date(as.Date(paste0(year, "-12-24")),
as.Date(paste0(year, "-12-31")),
by = 1
)
return(max(as.numeric(MMWRweek(last_week_of_year)$MMWRweek)))

Check warning on line 103 in R/time_converters.R

View workflow job for this annotation

GitHub Actions / lint

file=R/time_converters.R,line=103,col=25,[object_usage_linter] no visible global function definition for 'MMWRweek'
}

#' get the week in a season
#' @keywords internal
convert_epiweek_to_season_week <- function(epiyear, epiweek, season_start = 40) {
season_week <- epiweek - season_start + 1

update_inds <- season_week <= 0
# last year's # of epiweeks determines which week in the season we're at at
# the beginning of the year
season_week[update_inds] <- season_week[update_inds] +
sapply(epiyear[update_inds] - 1, epiweeks_in_year)

return(season_week)
}

#' get a canonical date to represent a given epiweek
#' @keywords internal
convert_epiweek_to_date <- function(epiyear, epiweek,
day_of_week = 1) {
end_date <- MMWRweek2Date(epiyear, epiweek, day_of_week)

return(end_date)
}
12 changes: 12 additions & 0 deletions man/convert_epiweek_to_date.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions man/convert_epiweek_to_season.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions man/convert_epiweek_to_season_week.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions man/convert_to_period_upsample.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 32 additions & 0 deletions man/daily_to_weekly.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/epiprocess.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions man/epiweeks_in_year.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading