Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Vectors in base::data.frame #61

Open
teunbrand opened this issue Jan 27, 2020 · 3 comments
Open

Support for Vectors in base::data.frame #61

teunbrand opened this issue Jan 27, 2020 · 3 comments

Comments

@teunbrand
Copy link

Following from the recent discussion on the Bioconductor Slack, it would seem that base R data.frame classes are able to support S4 objects as columns. Below is a quick proof of principle:

suppressMessages(library(S4Vectors))

df <- data.frame(x = 1:6)
my_rle <- Rle(1:3, 3:1)
df$y <- my_rle

class(df$y)
#> [1] "Rle"
#> attr(,"package")
#> [1] "S4Vectors"

subsetted <- df[3:5, ]
identical(subsetted$y, my_rle[3:5])
#> [1] TRUE

Created on 2020-01-27 by the reprex package (v0.3.0)

I'm submitting this issue to serve as a discussion ground about possible ways to integrate S4Vectors derived classes more seamlessly into base R data.frames, in particular vector- and list-like classes that are linear but not rectangular in nature. Here are two ideas that would make that easier.

  1. As mentioned by @hpages on slack, it would be neat to have format() methods such that base R data.frames print more nicely with S4Vectors inheritors, similar to how the showAsCell generic handles printing of DataFrames.

  2. The data.frame(...) constructor internally calls as.data.frame() on every element in .... As an example, the Rle class coerces/expands to a regular vector during that procedure, whereas it would also be nice if users could have an option to preserve the S4 class during data.frame construction. This would prevent having to resort to the awkward syntax in the example above. An alternative constructor that does this would fit the bill, as repurposing the as.data.frame generic probably would break a lot of existing code.

I'm sure there are other places where some compatibility with base R data.frames could be achieved, but these would be a neat start.

@teunbrand
Copy link
Author

Any chance that if I prepare a PR to add the following functions has a chance to succeed?

  1. A format method for Vector, which defaults to showAsCell, which allows room for other classes to implement more specific format methods and still covers a good amount classes that inherit from Vector.
setMethod("format", "Vector",
          function(x) showAsCell(x)
)
  1. A data.frame constructor that preserves the S4 class of the columns.
as.preserved.data.frame <- function(from = NULL) 
{
    if (!inherits(from, "DataFrame")) {
        return(as.data.frame(from))
    }
    
    ans <- as.list(from, use.names = TRUE)
    class(ans) <- "data.frame"
    
    rn <- ROWNAMES(from)
    if (is.null(rn)) {
        rn <- .set_row_names(NROW(from))
    }
    
    attr(ans, "row.names") <- rn
    ans
}

One possible problem I can imagine is that some base data.frame manipulators can get fussy about S4 columns. For example:

DF <- DataFrame(x = 1:10, y = Rle(2, 10))
df <- as.preserved.data.frame(DF)
rbind(df, df)

Error: subscript contains out-of-bounds indices

The rbind.data.frame function is so full with nested loops and uinformative variablenames that I haven't been able to debug why this goes awry.

@lawremi
Copy link
Collaborator

lawremi commented Mar 6, 2020

A format() method makes sense. Consider the list2DF() function in R-devel as an alternative to the as.preserved.data.frame() function. The rbind() gotcha reminds me of why we went to DataFrame in the first place.

@teunbrand
Copy link
Author

You're right the list2DF() would probably be better, but I don't think it is a generic. Might be neat to mirror that function for the List-class, i.e. List2df(), that just calls list2DF(as.list(x)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants