Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars Expression plugins for R #1024

Open
eitsupi opened this issue Apr 11, 2024 · 5 comments
Open

Polars Expression plugins for R #1024

eitsupi opened this issue Apr 11, 2024 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@eitsupi
Copy link
Collaborator

eitsupi commented Apr 11, 2024

We needs:

  1. Mechanism for registering subnamespaces from outside the package something like https://docs.pola.rs/py-polars/html/reference/api.html
  2. Rust crate something like https://github.com/pola-rs/pyo3-polars
@eitsupi eitsupi added the enhancement New feature or request label Apr 11, 2024
@eitsupi
Copy link
Collaborator Author

eitsupi commented Apr 13, 2024

Note: Serialization and deserialization of R objects that may be needed are already defined here (I don't know if this is sufficient)

pub fn serialize_robj(robj: Robj) -> RResult<Vec<u8>> {
call!("serialize", &robj, NULL)
.map_err(RPolarsErr::from)
.bad_robj(&robj)
.when("serializing an R object")?
.as_raw_slice()
.ok_or(RPolarsErr::new())
.bad_robj(&robj)
.when("accessing raw bytes of an serialized R object")
.map(|bits| bits.to_vec())
}
pub fn deserialize_robj(bits: Vec<u8>) -> RResult<Robj> {
call!("unserialize", &bits)
.map_err(RPolarsErr::from)
.bad_val(rdbg(bits))
.when("deserializing an R object")
}
pub fn serialize_dataframe(dataframe: &mut polars::prelude::DataFrame) -> RResult<Vec<u8>> {
use polars::io::SerWriter;
let mut dump = Vec::new();
polars::io::ipc::IpcWriter::new(&mut dump)
.finish(dataframe)
.map_err(polars_to_rpolars_err)?;
Ok(dump)
}
pub fn deserialize_dataframe(bits: &[u8]) -> RResult<polars::prelude::DataFrame> {
use polars::io::SerReader;
polars::io::ipc::IpcReader::new(std::io::Cursor::new(bits))
.finish()
.map_err(polars_to_rpolars_err)
}
pub fn serialize_series(series: PSeries) -> RResult<Vec<u8>> {
serialize_dataframe(&mut std::iter::once(series).collect())
}
pub fn deserialize_series(bits: &[u8]) -> RResult<PSeries> {
let tn = std::any::type_name::<PSeries>();
deserialize_dataframe(bits)?
.get_columns()
.split_first()
.ok_or(RPolarsErr::new())
.mistyped(tn)
.and_then(|(s, r)| {
r.is_empty()
.then_some(s.clone())
.ok_or(RPolarsErr::new())
.mistyped(tn)
})
}

@eitsupi
Copy link
Collaborator Author

eitsupi commented Jun 16, 2024

  1. Mechanism for registering subnamespaces from outside the package something like docs.pola.rs/py-polars/html/reference/api.html

I was able to make this work in an implementation that I am rewriting from scratch using py-polars as a reference.
https://github.com/eitsupi/neo-r-polars/blob/afac2ae8020e4dbe3d02f7515653a574283b577a/man/polars_api_register_series_namespace.Rd#L20-L44

# s: polars series
math_shortcuts <- function(s) {
  # Create a new environment to store the methods
  self <- new.env(parent = emptyenv())

  # Store the series
  self$`_s` <- s

  # Add methods
  self$square <- function() self$`_s` * self$`_s`
  self$cube <- function() self$`_s` * self$`_s` * self$`_s`

  # Set the class
  class(self) <- "polars_namespace_series"

  # Return the environment
  self
}

polars_api_register_series_namespace("math", math_shortcuts)

s <- as_polars_series(c(1.5, 31, 42, 64.5))
s$math$square()$rename("s^2")

s <- as_polars_series(1:5)
s$math$cube()$rename("s^3")

The current concern is performance degradation due to frequent for loops (basically each call to a single method).
I believe the current implementation of r-polars registers all active bindings and methods when the package is installed, but it registers methods each time an R class instance is built, which would degrade performance (Of course, if it's acceptable, no problem)
https://github.com/eitsupi/neo-r-polars/blob/afac2ae8020e4dbe3d02f7515653a574283b577a/R/series-series.R#L7-L31

@eitsupi
Copy link
Collaborator Author

eitsupi commented Jun 23, 2024

I have looked into this and it appears that this is accomplished by connecting to a dynamic library via the libloading crate.
https://docs.rs/libloading/latest/libloading/
https://github.com/pola-rs/polars/blob/5cad69e5d4af47e75ae0abbf88dc2bafbc8f66d2/crates/polars-plan/src/dsl/function_expr/plugin.rs#L5

In the case of R packages, it is the static libraries, not the dynamic libraries, that are built by rustc.
Dynamic libraries are built by R.

We need to find a way to generate the proper expected C ABI on the plugin side, but this is obviously beyond my knowledge.

@eitsupi eitsupi added the help wanted Extra attention is needed label Jun 26, 2024
@etiennebacher
Copy link
Collaborator

In the case of R packages, it is the static libraries, not the dynamic libraries, that are built by rustc. Dynamic libraries are built by R.

We need to find a way to generate the proper expected C ABI on the plugin side, but this is obviously beyond my knowledge.

The recent libr might be of use here: https://github.com/posit-dev/ark/tree/main/crates#readme

@eitsupi
Copy link
Collaborator Author

eitsupi commented Jun 29, 2024

My understanding is that dynamic libraries are built by R, so it doesn't matter which Rust crate is chosen to build the static library.
The question here is that I don't know how to make a proper C ABI for the dynamic library created by R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants