Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

SokolovAnatoliy · 2024-08-15T21:21:23Z

hadley

Thanks for working on this!

NEWS.md

hadley · 2024-08-15T21:31:36Z

R/list-combine.R

@@ -22,6 +22,7 @@
 #'   same size (i.e. number of rows).
 #' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`.
 #'   See [vctrs::vec_as_names()] for the meaning of these options.
+#' @param keep_empty An optional Logical to keep empty elements of a list as NA.


Could you try rewriting this using the formula "If FALSE (the default), then ...; if TRUE, then ...`.

hadley · 2024-08-15T21:32:06Z

R/list-combine.R

@@ -30,17 +31,21 @@
 #'
 #' x2 <- list(
 #'   a = data.frame(x = 1:2),
-#'   b = data.frame(y = "a")
+#'   b = data.frame(y = "a"),
+#'   c = data.frame(z = NULL)


I think this is equivalent to just data.frame()?

hadley · 2024-08-15T21:32:54Z

R/list-combine.R

  ) {
  check_list_of_data_frames(x)
  check_dots_empty()
+  if(keep_empty) x <- convert_empty_element_to_NA(x)


Are you sure this applies to list_cbind()? I don't know why but I think it shouldn't?

To me, it's unclear what type the empty column should have.

It's also weird that for list_cbind() the name of empty columns comes from the outer name rather than the inner name.

hadley · 2024-08-15T21:34:14Z

R/list-combine.R

+
+## used to convert empty elements into NA for list_binding functions
+convert_empty_element_to_NA = function(x) {
+  map(x, \(x) if(vctrs::vec_is_empty(x)) NA else x)


I think this anonymous function syntax requires R 4.1? Otherwise, this implementation looks good to me!

hadley · 2024-08-15T21:35:19Z

tests/testthat/test-list-combine.R

+  df2 <- data.frame(y = 1)
+
+  expect_equal(list_c(list(1, NULL, 2), keep_empty = TRUE), c(1, NA, 2))
+  expect_equal(list_rbind(list(df1, NULL, df1), keep_empty = TRUE), vec_rbind(df1, NA, df1))


I think I'd find it a little easier to verify that this is correct if the expected value looked like data.frame(x = c(1, NA, 1)

…quires R4.1 for implementation.

krlmlr

Thanks for taking this on!

R/list-combine.R

krlmlr · 2024-08-15T23:34:40Z

R/list-combine.R

  ) {
  check_list_of_data_frames(x)
  check_dots_empty()
+  if(keep_empty) x <- convert_empty_element_to_NA(x)


To me, it's unclear what type the empty column should have.

NEWS.md

R/list-combine.R

hadley · 2024-08-21T13:26:47Z

R/list-combine.R

@@ -58,19 +66,26 @@ list_cbind <- function(
    x,
    ...,
    name_repair = c("unique", "universal", "check_unique"),
-    size = NULL
+    size = NULL,
+    keep_empty = FALSE


@DavisVaughan my intuition is that keep_empty doesn't make sense for list_cbind(), but I don't have anything concrete to back that up. Do you have any thoughts?

DavisVaughan

Dropping in some initial comments while looking at this. No need to make any changes yet though.

But also see #1096 (comment). I think it is possible we made a small mistake by marking this as a TDD issue, since it can already be solved through tidyr::unchop() in a way that I feel is much more robust for the motivating example given there (i.e. the size invariants are nicer when you use unchop()).

DavisVaughan · 2024-08-28T14:10:38Z

R/list-combine.R

@@ -22,6 +22,9 @@
 #'   same size (i.e. number of rows).
 #' @param name_repair One of `"unique"`, `"universal"`, or `"check_unique"`.
 #'   See [vctrs::vec_as_names()] for the meaning of these options.
+#' @param keep_empty  An optional logical. If `FALSE` (the default), then


I think keep_empty is the wrong name if double() isn't promoted to NA_real_ and kept. It's more like keep_null.

> list_c(list(1, double(), 2)) [1] 1 2 > list_c(list(1, double(), 2), keep_empty = T) [1] 1 2

Compare with

> tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x) # A tibble: 2 × 1 x <dbl> 1 1 2 2 > tidyr::unchop(tibble::tibble(x = list(1, double(), 2)), x, keep_empty = TRUE) # A tibble: 3 × 1 x <dbl> 1 1 2 NA 3 2

DavisVaughan · 2024-08-28T14:24:15Z

R/list-combine.R

+list_c <- function(x, ..., ptype = NULL, keep_empty = FALSE) {
  vec_check_list(x)
  check_dots_empty()
+  if (keep_empty) {


We'd want to use check_bool(keep_empty) everywhere

DavisVaughan · 2024-08-28T14:43:12Z

R/list-combine.R

+  is_null <- map_lgl(x, is.null)
+  x[is_null] <- list(NA)


vctrs::vec_detect_missing() would be much faster at detecting NULLs since we know x is a list.

See tidyr:::list_replace_null() for a robust and very fast version of this operation

DavisVaughan · 2024-08-28T14:45:12Z

tests/testthat/test-list-combine.R

+    list_cbind(list(df1, z = NULL, df2), keep_empty = TRUE),
+    data.frame(df1, z = NA, df2)
+  )
+})


If we decide to implement this then I'd like the chance to come through and write a lot of tests for the 3 cases individually. The keep_empty logic in tidyr is quite hard to get 100% right, and required a lot of edge case tests.

DavisVaughan · 2024-08-28T14:55:31Z

We have decided not to implement this feature for now since:

tidyr::unchop() solves the problem already in a very robust way FR: New keep_empty = FALSE argument to list_c() and list_rbind() #1096 (comment)
We can't seem to justify the argument for list_cbind(), suggesting something feels off with putting it here in purrr

@SokolovAnatoliy we apologize about that! We hope you had fun at TDD anyways, and it seems like you got to do some other PRs in other repos, which is awesome! Thanks again!

Adding keep_empty argument to list_c, list_cbind, and list_rbind. Fixes

79cc196

tidyverse#1096.

hadley reviewed Aug 15, 2024

View reviewed changes

Updating documentation, examples and tests for Fix 1096. No longer re…

53881b3

…quires R4.1 for implementation.

krlmlr reviewed Aug 15, 2024

View reviewed changes

mgirlich reviewed Aug 16, 2024

View reviewed changes

R/list-combine.R Outdated Show resolved Hide resolved

hadley and others added 2 commits August 21, 2024 08:19

Apply suggestions from code review

c732e0e

Polishing

47ced5c

hadley reviewed Aug 21, 2024

View reviewed changes

Merge commit '581793305286b4a92b10013afead9804174cfeeb'

f726831

DavisVaughan reviewed Aug 28, 2024

View reviewed changes

DavisVaughan closed this Aug 28, 2024

DavisVaughan mentioned this pull request Aug 28, 2024

FR: New keep_empty = FALSE argument to list_c() and list_rbind() #1096

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

SokolovAnatoliy commented Aug 15, 2024

hadley left a comment

hadley Aug 15, 2024

hadley Aug 15, 2024

hadley Aug 15, 2024

krlmlr Aug 15, 2024

hadley Aug 21, 2024

hadley Aug 15, 2024

hadley Aug 15, 2024

krlmlr left a comment

krlmlr Aug 15, 2024

hadley Aug 21, 2024

DavisVaughan left a comment

DavisVaughan Aug 28, 2024

DavisVaughan Aug 28, 2024

DavisVaughan Aug 28, 2024

DavisVaughan Aug 28, 2024

DavisVaughan Aug 28, 2024

DavisVaughan commented Aug 28, 2024

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

Adding keep_empty argument to list_c, list_cbind, and list_rbind. #1144

Conversation

SokolovAnatoliy commented Aug 15, 2024

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavisVaughan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavisVaughan commented Aug 28, 2024