Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistency in methods to convert Series/DataFrame to R #921

Open
etiennebacher opened this issue Mar 15, 2024 · 3 comments
Open

More consistency in methods to convert Series/DataFrame to R #921

etiennebacher opened this issue Mar 15, 2024 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@etiennebacher
Copy link
Collaborator

etiennebacher commented Mar 15, 2024

Currently this is what we have for DataFrame and Series:

library(polars)

options(polars.do_not_repeat_call = TRUE)

####### Series

series_vec = pl$Series(letters[1:3])

# output depends on datatype
series_vec$to_r()
#> [1] "a" "b" "c"

# identical
series_vec$to_r_vector() 
#> [1] "a" "b" "c"
series_vec$to_vector() 
#> [1] "a" "b" "c"

# not identical
series_vec$to_r_list() 
#> [[1]]
#> [1] "a"
#> 
#> [[2]]
#> [1] "b"
#> 
#> [[3]]
#> [1] "c"
series_vec$to_list() 
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_list is not a method/attribute of the class RPolarsSeries


####### DataFrame

df = pl$DataFrame(x = letters[1:3])

# output depends on datatype
df$to_r()
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_r is not a method/attribute of the class RPolarsDataFrame

# not identical
df$to_r_data_frame() 
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_r_data_frame is not a method/attribute of the class RPolarsDataFrame
df$to_data_frame() 
#>   x
#> 1 a
#> 2 b
#> 3 c

# not identical
df$to_r_list() 
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_r_list is not a method/attribute of the class RPolarsDataFrame
df$to_list()
#> $x
#> [1] "a" "b" "c"

I think we should have more consistency in the names of the methods:

  • remove the _r_ in the names of Series methods, so that we have to_r(), to_list() and to_vector(). That would be more consistent with to_data_frame() and to_list() that we have for DataFrame.
  • I'm even wondering if we should have to_r() for Series. When we use it we don't know the class of the output and I'm not sure we should allow that.

@eitsupi what do you think? Also, @grantmcdermott in case you want to participate (mostly about the second point)

@eitsupi
Copy link
Collaborator

eitsupi commented Mar 15, 2024

Agree.
I don't see the benefit of having more than one, so I think we need to remove some of them and encourage users to use the S3 method e.g. as.vector(), as.data.frame() instead of $to_vector(), $to_data_frame(). (i.e. should not use them on the document)

I really dislike not knowing at first glance whether the $to_data_frame() will be an RPolarsDataFrame or a data.frame, so I actually don't even have a problem with all these methods being private.

@eitsupi
Copy link
Collaborator

eitsupi commented Mar 19, 2024

Another thing I feel is does it make sense for <DataFrame>$to_list() to return "list of vectors" instead of "list of Serieses"?

@eitsupi
Copy link
Collaborator

eitsupi commented Mar 30, 2024

Another thing I feel is does it make sense for <DataFrame>$to_list() to return "list of vectors" instead of "list of Serieses"?

I noticed that there is already a method in DetaFrame called get_columns() that returns a list of Series.
https://docs.rs/polars/0.38.3/polars/frame/struct.DataFrame.html#method.get_columns

The name to_list() may not be consistent...

@eitsupi eitsupi added this to the Rewrite milestone Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants