-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_elements
completely breaks after an error
#1253
Comments
Hi, thanks for the report. I can reproduce but |
According to my observations when I implemented r-polars/src/rust/src/utils/extendr_concurrent.rs Lines 219 to 224 in 0122e29
The thread will not terminate successfully and will not be able to reconnect thereafter. It may be possible to fix this by having it terminate when an error occurs, as in the next branch: r-polars/src/rust/src/r_threads.rs Lines 159 to 163 in d5994dc
|
After a long time away polars with small kids and a new job. I was just going to give a small introduction on r-polars next week and I noticed this bug too. This is obviously very annoying :) I don't think it was always like this, but I cannot prove it. I can try to take a look at it within the next month. |
I see my use of initcell is not quite as intended by rust crate author. If one ThreadCom (link between rust-polars threads and single r session) crashed then it replaced with kill_global + update_global. I wildly guess the polars threads in the polars thread pool still link to the crashed global threadcom, because init_cell does not support safely mutating the global state. It probably worked in the past, but since this is undefined behavior it could fail at any point.
The use of this global state, is to allow new oblivious threads spawned by rust-polars to look for and clone the current active functioning ThreadCom. chatty_gippity says try once::sync::Lazy instead use once_cell::sync::Lazy;
use std::sync::RwLock;
// Assuming ThreadCom<S, R> is defined elsewhere in your code
pub struct ThreadCom<S, R> {
// Your fields here
}
// Define the global state
static GLOBAL_STATE: Lazy<RwLock<Option<ThreadCom<S, R>>>> = Lazy::new(|| RwLock::new(None)); maybe something comepletely different |
It must have been this way for a long time, because it reproduces even in v0.9.0, the oldest version that can be easily installed today. (We need to use |
This bug was not caused init_cell vs once_cell, swapping to once_cell changed nothing. However maybe that change should be adopted in another PR for tidyness sake. It turns out to be plain bug in how user errors were handled and polars states reset. If a user map_ function raises an R error. The R interpreter will return directly and not gracefully shut down the polars query including closing ("killing") the "ThreadCom" object (lets multiple polars threads share the single R interpreter). This ThreadCom then survives in the global register (once_cell/init_cell) due to no gracefull shutdown, but will be defunct in next polars query hence bug. If I force the global register to be reset at every polars query, the bug goes away (solution 1). However I vaguely remember that is a problem if calling a polars query within user function of a polars query. In that case the inner polars query should not reset global threadCom as it will sever communication for possible other map_ functions My candidate (solution 2) is to implicitly wrap any R user function in some tryCatch to ensure graceful shut down of polars. This might have a performance loss of 1-5ms or so per R user function call. Solution 3a. when ever new polars query recycles a ThreadCom from the global register, it could just check once that it works by running a simple function. That might take 1ms once only. If it does not work, it will reset it. Solution 3b, it would be even faster with some 'rust only' verification of ThreadCom, but then I might to rewrite some function signatures to allow a non R request via threadCom. Maybe not worth the hazzle. I will look into 3b -> 3a -> 2 or so |
This is the right docs and our current use seems not be discouraged. I should probably revert back to InitCell from once_cell::sync::Lazy. Either behaves very similar and are drop in replacements. |
Hello,
I found a quite interesting bug that causes
map_elements
to stop completely working after it encounters an error at any given point before the execution. This examplary code works well and as expected:However, if an error occurs inside
map_elements
in an evaluation that happened before, the identical code stops working.The first error is expected, however, the same function that was previously run now stops working. The only resolution is to restart the R session. I am using the latest polars version.
Any guess why that might be? Thanks!
The text was updated successfully, but these errors were encountered: