Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slide improvements pass #477

Merged
merged 115 commits into from
Sep 27, 2024
Merged

Slide improvements pass #477

merged 115 commits into from
Sep 27, 2024

Conversation

brookslogan
Copy link
Contributor

@brookslogan brookslogan commented Jun 26, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@brookslogan
Copy link
Contributor Author

brookslogan commented Jun 26, 2024

(Random implementation notes: I also investigated an as_slide_computations() approach that would give a length>=1 list of functions to apply in the tidyeval case and length-1 lists in the function and formula cases. Pros of that way include reducing code duplication/complexity/divergence when validating computation outputs [though perhaps we do want some differences, e.g., regarding interpretation of NULL (removing column vs. a missing element in a unnamed list of lists?)], [maybe reducing some naming and data-frame-combining-by-name overhead,] and having more freedom if we want to try to detect and optimize some slide computations by transforming them to calls to (internals of) epi_slide_opt() (as dplyr grouped_df's already do or did at some point). Cons of that way are that we may introduce some computational overhead in the function and formula cases (setting up, storing, munging results_envs --- this is assuming the loop over computations has been hoisted above the group and/or ref time values loops to enable other optimizations; if that's not the other case, then there's the overhead of repeatedly executing a loop over 1 function which might actually be relevant if you're doing an epi_slide() with an unrecognized simple function) + need for yet another argument (.data / results, not just .x --- either added user-side or via transformations which also have time/debuggability overhead) or code complexity to avoid this.)

@brookslogan
Copy link
Contributor Author

@dshemetov I've updated epi_slide() with some of the changes I had planned, though it's probably harder to mess around with than an epix_slide() due to size rules / broadcasting. Maybe you could play around with the new behavior and see if it's desirable + take a look and see if named-list handling could fit in nicely? For the latter, you might need to modify: 1. epi_slide() around the slide_values_list validation, and potentially 2. as_slide_computation() in the tidyeval handling.

brookslogan and others added 2 commits July 29, 2024 17:05
- Correct check for whether data masking was used
- Update checks and error messages for currently-accepted kinds of objects
- Make some variable naming more consistent between files
@brookslogan brookslogan force-pushed the lcb/slide-improvements-2024-06 branch from f0d2d62 to a1c7020 Compare July 30, 2024 05:21
@dshemetov
Copy link
Contributor

dshemetov commented Sep 20, 2024

Perf has 3-4x slowdown due to validate_tibble and as_environment in as_slide_computation. To address later.


Perf comparisons of dev against this branch (branch a and branch b, respectively): profvis_slide_changes.zip, run with

# Profile
# Branch A
# epi_slide
p <- profvis::profvis({
  jhu_csse_county_level_subset %>%
    group_by(geo_value) %>%
    epi_slide(cases_sum5 = sum(.x$cases), before = 5)
})
now <- format(Sys.time(), "%Y-%m-%d %H_%M_%S %Z")
htmlwidgets::saveWidget(p, glue::glue("profvis_{now}.html"), selfcontained = TRUE)

# Branch B
# epi_slide
p <- profvis::profvis({
  jhu_csse_county_level_subset %>% epi_slide(cases_sum5 = sum(.x$cases), .window_size = 5)
})
now <- format(Sys.time(), "%Y-%m-%d %H_%M_%S %Z")
htmlwidgets::saveWidget(p, glue::glue("profvis_{now}.html"), selfcontained = TRUE)

@brookslogan brookslogan mentioned this pull request Sep 26, 2024
8 tasks
@dshemetov dshemetov force-pushed the lcb/slide-improvements-2024-06 branch 2 times, most recently from a1948bb to 5ce988b Compare September 26, 2024 19:40
@dshemetov dshemetov force-pushed the lcb/slide-improvements-2024-06 branch 3 times, most recently from 41263d3 to e3dfa32 Compare September 26, 2024 20:52
@dshemetov dshemetov force-pushed the lcb/slide-improvements-2024-06 branch 2 times, most recently from cc3cb79 to a3e805d Compare September 26, 2024 21:35
* key_colnames order change
* replace kill_time_value with exclude arg in key_colnames
* move duplicate time_values check in epi_slide
Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go. Will merge to main after this. epipredict PR will follow soon after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants