-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144
Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
R/list-combine.R
Outdated
@@ -22,6 +22,7 @@ | |||
#' same size (i.e. number of rows). | |||
#' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`. | |||
#' See [vctrs::vec_as_names()] for the meaning of these options. | |||
#' @param keep_empty An optional Logical to keep empty elements of a list as NA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try rewriting this using the formula "If FALSE
(the default), then ...; if TRUE
, then ...`.
R/list-combine.R
Outdated
@@ -30,17 +31,21 @@ | |||
#' | |||
#' x2 <- list( | |||
#' a = data.frame(x = 1:2), | |||
#' b = data.frame(y = "a") | |||
#' b = data.frame(y = "a"), | |||
#' c = data.frame(z = NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is equivalent to just data.frame()
?
R/list-combine.R
Outdated
) { | ||
check_list_of_data_frames(x) | ||
check_dots_empty() | ||
if(keep_empty) x <- convert_empty_element_to_NA(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this applies to list_cbind()
? I don't know why but I think it shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it's unclear what type the empty column should have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also weird that for list_cbind()
the name of empty columns comes from the outer name rather than the inner name.
R/list-combine.R
Outdated
|
||
## used to convert empty elements into NA for list_binding functions | ||
convert_empty_element_to_NA = function(x) { | ||
map(x, \(x) if(vctrs::vec_is_empty(x)) NA else x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this anonymous function syntax requires R 4.1? Otherwise, this implementation looks good to me!
tests/testthat/test-list-combine.R
Outdated
df2 <- data.frame(y = 1) | ||
|
||
expect_equal(list_c(list(1, NULL, 2), keep_empty = TRUE), c(1, NA, 2)) | ||
expect_equal(list_rbind(list(df1, NULL, df1), keep_empty = TRUE), vec_rbind(df1, NA, df1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd find it a little easier to verify that this is correct if the expected value looked like data.frame(x = c(1, NA, 1)
…quires R4.1 for implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on!
R/list-combine.R
Outdated
) { | ||
check_list_of_data_frames(x) | ||
check_dots_empty() | ||
if(keep_empty) x <- convert_empty_element_to_NA(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it's unclear what type the empty column should have.
@@ -58,19 +66,26 @@ list_cbind <- function( | |||
x, | |||
..., | |||
name_repair = c("unique", "universal", "check_unique"), | |||
size = NULL | |||
size = NULL, | |||
keep_empty = FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavisVaughan my intuition is that keep_empty
doesn't make sense for list_cbind()
, but I don't have anything concrete to back that up. Do you have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping in some initial comments while looking at this. No need to make any changes yet though.
But also see #1096 (comment). I think it is possible we made a small mistake by marking this as a TDD issue, since it can already be solved through tidyr::unchop()
in a way that I feel is much more robust for the motivating example given there (i.e. the size invariants are nicer when you use unchop()
).
@@ -22,6 +22,9 @@ | |||
#' same size (i.e. number of rows). | |||
#' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`. | |||
#' See [vctrs::vec_as_names()] for the meaning of these options. | |||
#' @param keep_empty An optional logical. If `FALSE` (the default), then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keep_empty
is the wrong name if double()
isn't promoted to NA_real_
and kept. It's more like keep_null
.
> list_c(list(1, double(), 2))
[1] 1 2
> list_c(list(1, double(), 2), keep_empty = T)
[1] 1 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare with
> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x)
# A tibble: 2 × 1
x
<dbl>
1 1
2 2
> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x, keep_empty = TRUE)
# A tibble: 3 × 1
x
<dbl>
1 1
2 NA
3 2
list_c <- function(x, ..., ptype = NULL, keep_empty = FALSE) { | ||
vec_check_list(x) | ||
check_dots_empty() | ||
if (keep_empty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd want to use check_bool(keep_empty)
everywhere
is_null <- map_lgl(x, is.null) | ||
x[is_null] <- list(NA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vctrs::vec_detect_missing()
would be much faster at detecting NULL
s since we know x
is a list.
See tidyr:::list_replace_null()
for a robust and very fast version of this operation
list_cbind(list(df1, z = NULL, df2), keep_empty = TRUE), | ||
data.frame(df1, z = NA, df2) | ||
) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to implement this then I'd like the chance to come through and write a lot of tests for the 3 cases individually. The keep_empty
logic in tidyr is quite hard to get 100% right, and required a lot of edge case tests.
We have decided not to implement this feature for now since:
@SokolovAnatoliy we apologize about that! We hope you had fun at TDD anyways, and it seems like you got to do some other PRs in other repos, which is awesome! Thanks again! |
Fixes #1096.