Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we have a way to create object and struct with classic R functions? #1012

Closed
etiennebacher opened this issue Apr 2, 2024 · 8 comments · Fixed by #1014
Closed

Do we have a way to create object and struct with classic R functions? #1012

etiennebacher opened this issue Apr 2, 2024 · 8 comments · Fixed by #1014
Labels
documentation Improvements or additions to documentation

Comments

@etiennebacher
Copy link
Collaborator

etiennebacher commented Apr 2, 2024

I don't think we have a way to create object and struct from our standard c() and list() but maybe I'm missing something?

It would be good to have a small table in the docs to show the equivalent (if any) of those:

pl.Series(values=[1])
shape: (1,)
Series: '' [i64]
[
        1
]

> pl$Series(values = 1)
polars Series: shape: (1,)
Series: '' [f64]
[
	1.0
]
>>> pl.Series(values=[[1]])
shape: (1,)
Series: '' [list[i64]]
[
        [1]
]

> pl$Series(values = list(1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
	[1.0]
]
>>> pl.Series(values=[{1}])
shape: (1,)
Series: '' [o][object]
[
        {1}
]

???
>>> pl.Series(values=[{"a": 1}])
shape: (1,)
Series: '' [struct[1]]
[
        {1}
]

???
@etiennebacher etiennebacher added the documentation Improvements or additions to documentation label Apr 2, 2024
@eitsupi
Copy link
Collaborator

eitsupi commented Apr 2, 2024

Are you looking for pl$Series(values = data.frame(a = 1))?

@eitsupi
Copy link
Collaborator

eitsupi commented Apr 2, 2024

IIUC, the object type is Python-specific, not a real Apache Arrow type (so we don't support it).

@etiennebacher
Copy link
Collaborator Author

etiennebacher commented Apr 2, 2024

Are you looking for pl$Series(values = data.frame(a = 1))?

This is equivalent to calling a list:

> pl$Series(values = data.frame(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
	[1.0]
]
> pl$Series(values = list(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
	[1.0]
]

@eitsupi
Copy link
Collaborator

eitsupi commented Apr 2, 2024

Oh, sorry. This is the one.

r-polars/R/as_polars.R

Lines 367 to 371 in 3c0d0ec

#' @rdname as_polars_series
#' @export
as_polars_series.data.frame = function(x, name = NULL, ...) {
pl$DataFrame(unclass(x))$to_struct(name = name)
}

@eitsupi
Copy link
Collaborator

eitsupi commented Apr 6, 2024

Can we close this now that #1015 has been merged?
As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.
(Since R's list can contain a variety of things, we can always use the base R data.frame if we want to store something that is not supported by Apache Arrow)

@etiennebacher
Copy link
Collaborator Author

As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.

That's something worth mentioning in the docs I think. I'll add that in #1014 and close this issue with this PR

@etiennebacher
Copy link
Collaborator Author

etiennebacher commented Apr 10, 2024

Actually it's hard to construct Struct for Series:

>>> pl.Series([{"a": 1, "b": ["x", "y"]}, {"a": 2, "b": ["z"]}])
shape: (2,)
Series: '' [struct[2]]
[
        {1,["x", "y"]}
        {2,["z"]}
]
as_polars_series(
  data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)

polars Series: shape: (2,)
Series: '' [struct[3]]
[
	{1,"x","z"}
	{2,"y","z"}
]

And it doesn't work for DataFrame:

pl$DataFrame(
  data.frame(a = 1)
)

shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘

Maybe we should say that we can't reliably create a Struct from scratch and point towards $to_struct() instead

@eitsupi
Copy link
Collaborator

eitsupi commented Apr 10, 2024

Actually it's hard to construct Struct for Series:

We should use the I() function to create a list type column with data.frame().
Or, we can use tibble::tibble() or data.table::data.table() instead.

polars::as_polars_series(
  data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[3]]
#> [
#>  {1,"x","z"}
#>  {2,"y","z"}
#> ]

polars::as_polars_series(
  data.frame(a = 1:2, b = I(list(c("x", "y"), "z")))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

polars::as_polars_series(
  tibble::tibble(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

polars::as_polars_series(
  data.table::data.table(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

Created on 2024-04-10 with reprex v2.1.0

And it doesn't work for DataFrame:

pl$DataFrame() works like as_polars_df() when it receives a data.frame.
(I think this behavior is worth removing because I find it confusing, but the point is that data.frame() works the same way, and in Python, polars.DataFrame.__init__() will convert a pandas.DataFrame to a polars.DataFame, so this is consistent behavior)

polars::pl$DataFrame(data.frame(a = 1))
#> shape: (1, 1)
#> ┌─────┐
#> │ a   │
#> │ --- │
#> │ f64 │
#> ╞═════╡
#> │ 1.0 │
#> └─────┘
polars::pl$DataFrame(a = data.frame(a = 1))
#> shape: (1, 1)
#> ┌───────────┐
#> │ a         │
#> │ ---       │
#> │ struct[1] │
#> ╞═══════════╡
#> │ {1.0}     │
#> └───────────┘

Created on 2024-04-10 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants