Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obscure error message from unclear error #203

Open
e-clin opened this issue Aug 25, 2023 · 1 comment
Open

Obscure error message from unclear error #203

e-clin opened this issue Aug 25, 2023 · 1 comment

Comments

@e-clin
Copy link

e-clin commented Aug 25, 2023

I tried to use the purrr and xlsx packages together to read multiple password-protected .xlsx files in a folder into a nested data frame. Each file has multiple sheets. I plan to:

  1. List and store the file names in a "file" column.
  2. Use purrr::map_chr() and xlsx::getSheets() to get the names of the sheets in each file in a "sheets" column.
  3. Use a nested purrr::map() and purrr::map_dfr() to apply xlsx::read.xlsx() to every cell in the "file" column and every cell in the "sheets" column to read all sheets in a file and row-bind them into a data frame into a "data" column.

My code is as follows:

df <- tibble(
  file = list.files(path_to_files),
  sheets = map(
    file,
    \(file) paste0(path_to_files, file) %>%
      xlsx::loadWorkbook() %>%
      xlsx::getSheets() # %>% 
  #    names()
  )#,
  #  Code below has not been run
  #  data = map(
  #    file,
  #    \(file) map_dfr(
  #      sheets,
  #      \(sheet) xlsx::read.xlsx(
  #         paste0(path_to_files, file),
  #         sheetName = sheet
  #      )
  #    )
  #  )
)

Somehow this code only ran once. When I tried to add the names() line to extract the list names (the sheet names) from the Java objects in the "sheets" column, I encountered the following error:

Error in ._jobjRef_dollar(x[["jobj"]], name) :
no field, method or inner class called 'use_cli_format'

The Traceback seemed to suggest that there was some overhead limit exceedance error:

stop(structure(list(message = "java.lang.OutOfMemoryError: GC overhead limit exceeded",
call = .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, .jcast(if (inherits(o, "jobjRef") || inherits(o,
"jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", ...

Even if I removed the newly added names() line didn't help. I had to restart the R session to be able to rerun the code.

I also had this problem before with xlsx::read.xlsx(), when I was experimenting with my code. Once the above error occurred, even previously working code ran into this problem. Again, I had to restart the R session to be able to rerun the code.

Was it just me doing something wrong?

@colearendt
Copy link
Owner

Yeah, this sounds like an issue with RAM.

I thought we had some docs on this, but apparently not. You have to modify the setup for the JVM before it is started (i.e. before you load any packages, because many packages will start the JVM).

options(java.parameters = "-Xmx8000m")

https://stackoverflow.com/a/44143679/6570011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants