Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache engine for reticulate using dill #1210

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
228624b
added unit tests that cover knitr #1505
tmastny Feb 26, 2018
b52ecaa
added cache_eng_python to add Python session caching between chunks. …
tmastny Feb 26, 2018
c345ce2
dill caching engine for knitr, with tests
tmastny Feb 28, 2018
ff39889
changes from feedback on knitr #1518 with updated tests
tmastny Apr 18, 2018
8e07779
fixed testing utils source in dill tests
tmastny Apr 19, 2018
638a4e7
Merge 'rstudio/main' with 'tmastny/master' into branch 'cache-engine'
leogama May 13, 2022
9753870
cache engine: update 'r' object identification logic
leogama Apr 14, 2022
02c1771
fix 'cache_path' when 'output.dir' is different from 'knitr:::input_d…
leogama Apr 20, 2022
dbebab3
cache loading should run in the input directory
leogama Apr 20, 2022
bd29f84
remove duplicated conversion functions
leogama May 26, 2022
f8497a0
Merge branch 'main' into cache-engine
leogama Sep 3, 2022
fe4cd9f
remove trailing whitespaces and empty line
leogama Sep 3, 2022
5d6f7a7
First version of cache implementation with new knitr API
leogama Sep 13, 2022
a33ed39
Expose the cache$available() method to knitr
leogama Sep 14, 2022
266463c
Use the same warning for missing and old dill module cases
leogama Sep 15, 2022
445a5ca
Set environment() as default argument in eng_python_initialize()
leogama Sep 15, 2022
c6a88ad
Basic test for knitr engine cache
leogama Sep 17, 2022
975c1b0
minor
leogama Sep 17, 2022
401b1ba
Workflows: install module dill in the testing virtualenv
leogama Sep 17, 2022
62c77d8
Docs: remove @params from cache_eng_python, add it to pkgdown index
leogama Sep 17, 2022
f487b52
Correctly initialize Python in knitr, honoring 'engine.path'
leogama Sep 19, 2022
cb9ee1f
Implement the 'cache.vars' chunk option; some style changes
leogama Sep 19, 2022
7d4eeec
Remove unused 'envir' parameter from 'eng_python_initialize*' functions
leogama Sep 20, 2022
395627e
update cache engine docs
leogama Dec 13, 2022
55d1e03
cache: adapt code and tests to dill package v0.3.6
leogama Dec 13, 2022
d43b593
fix typo, update generated documentation
leogama Dec 13, 2022
f354f60
Workflow: use PR branch from knitr for testing
leogama Dec 16, 2022
38ef3ce
fix typo
leogama Dec 20, 2022
79b9732
Merge branch 'main' into cache-engine
t-kalinowski Jun 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
reticulate::virtualenv_create("r-reticulate", Sys.which("python"))
reticulate::virtualenv_install("r-reticulate",
c("docutils", "pandas", "scipy", "matplotlib", "ipython",
"tabulate", "plotly", "psutil", "kaleido", "wrapt"))
"tabulate", "plotly", "psutil", "kaleido", "wrapt", "dill"))
python <- reticulate::virtualenv_python("r-reticulate")
writeLines(sprintf("RETICULATE_PYTHON=%s", python),
Sys.getenv("GITHUB_ENV"))
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ export("%as%")
export(PyClass)
export(array_reshape)
export(as_iterator)
export(cache_eng_python)
export(conda_binary)
export(conda_clone)
export(conda_create)
Expand Down
10 changes: 3 additions & 7 deletions R/config.R
Original file line number Diff line number Diff line change
Expand Up @@ -774,11 +774,6 @@ python_config <- function(python,
}
}

as_numeric_version <- function(version) {
version <- clean_version(version)
numeric_version(version)
}

# check for numpy
numpy <- NULL
if (!is.null(config$NumpyPath)) {
Expand Down Expand Up @@ -927,8 +922,9 @@ is_rstudio_desktop <- function() {
identical(version$mode, "desktop")
}

clean_version <- function(version) {
gsub("\\.$", "", gsub("[A-Za-z_+].*$", "", version))
as_numeric_version <- function(version) {
version <- sub("\\.$", "", sub("[A-Za-z_+].*$", "", version))
numeric_version(version)
}

reticulate_python_versions <- function() {
Expand Down
102 changes: 102 additions & 0 deletions R/knitr-cache.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#' A reticulate cache engine for Knitr
#'
#' This provides caching of Python variables to the `reticulate` engine for
#' `knitr`. The cache allows `knitr` to save and load the state of Python
#' variables between cached chunks. The cache engine depends on the `dill`
#' Python module. Therefore, you must have a recent version of `dill` installed
#' in your Python environment.
#'
#' The Python cache is activated the same way as the R cache, by setting the
#' `cache` chunk option to `TRUE`. To _deactivate_ the Python cache globally
#' while keeping the R cache active, one may set the option `reticulate.cache`
#' to `FALSE`. For example:
#'
#' ```
#' knitr::opts_knit$set(reticulate.cache = FALSE)
#' ```
#'
#' @note Different from `knitr`'s R cache, the Python cache is capable of saving
#' most, but not all types of Python objects. Some Python objects are
#' "unpickleable" and will rise an error when attepmted to be saved.
#'
#' @export
cache_eng_python <- (function() {
closure <- environment()
dill <- NULL

cache_path <- function(path) {
paste(path, "pkl", sep=".")
}

check_cache_available <- function(options) {
MINIMUM_PYTHON_VERSION <- "3.7"
MINIMUM_DILL_VERSION <- "0.3.6"

eng_python_initialize(options)

# does the python version is supported by 'dill'?
if (py_version() < MINIMUM_PYTHON_VERSION) {
warning("Python cache requires Python version >= ", MINIMUM_PYTHON_VERSION)
return(FALSE)
}

# is the module 'dill' loadable and recent enough?
closure$dill <- tryCatch(import("dill"), error = identity)
if (!inherits(dill, "error")) {
dill_version <- as_numeric_version(dill$`__version__`)
if (dill_version >= MINIMUM_DILL_VERSION)
return(TRUE)
} else {
# handle non-import error
error <- reticulate::py_last_error()
if (!error$type %in% c("ImportError", "ModuleNotFoundError"))
stop(error$value, call. = FALSE)
}

# 'dill' isn't available
warning("Python cache requires module dill>=", MINIMUM_DILL_VERSION)
FALSE
}

cache_available <- function(options) {
if (is.null(closure$.cache_available))
closure$.cache_available <- check_cache_available(options)
.cache_available
}

cache_exists <- function(options) {
file.exists(cache_path(options$hash))
}

cache_load <- function(options) {
if (!cache_available(options)) return()
dill$load_module(filename = cache_path(options$hash), module = "__main__")
}

cache_save <- function(options) {
if (!cache_available(options)) return()

# remove injected 'r' object before saving session (and after executing block)
main <- import_main(convert = FALSE)
if (py_has_attr(main, "r")) {
builtins <- import_builtins(convert = TRUE)
if (builtins$isinstance(main$r, builtins[["__R__"]]))
py_del_attr(main, "r")
}

tryCatch({
# refimported: save imported objects by reference when possible
dill$dump_module(cache_path(options$hash), refimported = TRUE)
}, error = function(e) {
cache_purge(options$hash)
stop(e)
})
}

cache_purge <- function(glob_path) {
unlink(cache_path(glob_path))
}

list(available = cache_available, exists = cache_exists, load = cache_load, save = cache_save,
purge = cache_purge)
})()
55 changes: 27 additions & 28 deletions R/knitr-engine.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,22 +38,12 @@ eng_python <- function(options) {
return(wrap(outputs, options))
}

engine.path <- if (is.list(options[["engine.path"]]))
options[["engine.path"]][["python"]]
else
options[["engine.path"]]

# if the user has requested a custom Python, attempt
# to honor that request (warn if Python already initialized
# to a different version)
if (is.character(engine.path)) {

# if Python has not yet been loaded, then try
# to load it with the requested version of Python
if (!py_available())
use_python(engine.path, required = TRUE)

# double-check that we've loaded the requested Python
# if the user has requested a custom Python, attempt to honor that request
eng_python_initialize(options)

# double-check that we've loaded the requested Python (warn if Python already
# initialized to a different version)
if (is.character(engine.path <- get_engine_path(options))) {
conf <- py_config()
requestedPython <- normalizePath(engine.path)
actualPython <- normalizePath(conf$python)
Expand All @@ -70,8 +60,6 @@ eng_python <- function(options) {
# a list of pending plots / outputs
.engine_context$pending_plots <- stack()

eng_python_initialize(options = options, envir = environment())

# helper function for extracting range of code, dropping blank lines
extract <- function(code, range) {
snippet <- code[range[1]:range[2]]
Expand Down Expand Up @@ -328,13 +316,24 @@ eng_python <- function(options) {

}

eng_python_initialize <- function(options, envir) {
get_engine_path <- function(options) {
option <- options[["engine.path"]]
engine.path <- if (is.list(option)) option[["python"]] else option
if (is.character(engine.path))
stopifnot(length(engine.path) == 1L)
engine.path
}

eng_python_initialize <- function(options) {

if (is.character(options$engine.path))
use_python(options$engine.path[[1]])
# if Python has not yet been loaded, then try
# to load it with the requested version of Python
engine.path <- get_engine_path(options)
if (is.character(engine.path) && !py_available())
use_python(engine.path, required = TRUE)

ensure_python_initialized()
eng_python_initialize_hooks(options, envir)
eng_python_initialize_hooks(options)

}

Expand Down Expand Up @@ -384,7 +383,7 @@ eng_python_matplotlib_show <- function(plt, options) {

}

eng_python_initialize_hooks <- function(options, envir) {
eng_python_initialize_hooks <- function(options) {

# set up hooks for matplotlib modules
matplotlib_modules <- c(
Expand All @@ -395,7 +394,7 @@ eng_python_initialize_hooks <- function(options, envir) {

for (module in matplotlib_modules) {
py_register_load_hook(module, function(...) {
eng_python_initialize_matplotlib(options, envir)
eng_python_initialize_matplotlib(options)
})
}

Expand All @@ -407,13 +406,13 @@ eng_python_initialize_hooks <- function(options, envir) {

for (module in plotly_modules) {
py_register_load_hook(module, function(...) {
eng_python_initialize_plotly(options, envir)
eng_python_initialize_plotly(options)
})
}

}

eng_python_initialize_matplotlib <- function(options, envir) {
eng_python_initialize_matplotlib <- function(options) {

# mark initialization done
if (identical(.globals$matplotlib_initialized, TRUE))
Expand All @@ -437,7 +436,7 @@ eng_python_initialize_matplotlib <- function(options, envir) {
if ("matplotlib.backends" %in% names(sys$modules)) {
matplotlib$pyplot$switch_backend("agg")
} else {
version <- numeric_version(matplotlib$`__version__`)
version <- as_numeric_version(matplotlib$`__version__`)
if (version < "3.3.0")
matplotlib$use("agg", warn = FALSE, force = TRUE)
else
Expand Down Expand Up @@ -478,7 +477,7 @@ eng_python_initialize_matplotlib <- function(options, envir) {

}

eng_python_initialize_plotly <- function(options, envir) {
eng_python_initialize_plotly <- function(options) {

# mark initialization done
if (identical(.globals$plotly_initialized, TRUE))
Expand Down
58 changes: 31 additions & 27 deletions R/python.R
Original file line number Diff line number Diff line change
Expand Up @@ -475,14 +475,14 @@ as.environment.python.builtin.object <- function(x) {
if (inherits(x, "python.builtin.dict")) {

names <- py_dict_get_keys_as_str(x)
names <- names[substr(names, 1, 1) != '_']
names <- names[substr(names, 1, 1) != "_"]
Encoding(names) <- "UTF-8"
types <- rep_len(0L, length(names))

} else {
# get the names and filter out internal attributes (_*)
names <- py_suppress_warnings(py_list_attributes(x))
names <- names[substr(names, 1, 1) != '_']
names <- names[substr(names, 1, 1) != "_"]
# replace function with `function`
names <- sub("^function$", "`function`", names)
names <- sort(names, decreasing = FALSE)
Expand Down Expand Up @@ -1555,37 +1555,41 @@ py_inject_r <- function() {
if (py_has_attr(main, "r"))
return(FALSE)

# define our 'R' class
py_run_string("class R(object): pass")
builtins <- import_builtins(convert = FALSE)
if (!py_has_attr(builtins, "__R__")) {

# extract it from the main module
main <- import_main(convert = FALSE)
R <- main$R
# define our 'R' class
py_run_string("class R(object): pass")
R <- main$R

# define the getters, setters we'll attach to the Python class
getter <- function(self, code) {
envir <- py_resolve_envir()
object <- eval(parse(text = as_r_value(code)), envir = envir)
r_to_py(object, convert = is.function(object))
}
# copy it to 'builtins'
py_set_attr(builtins, "__R__", R)

setter <- function(self, name, value) {
envir <- py_resolve_envir()
name <- as_r_value(name)
value <- as_r_value(value)
assign(name, value, envir = envir)
}
# remove the 'R' class object from '__main__'
py_del_attr(main, "R")

# define the getters, setters we'll attach to the Python class
getter <- function(self, code) {
envir <- py_resolve_envir()
object <- eval(parse(text = as_r_value(code)), envir = envir)
r_to_py(object, convert = is.function(object))
}

py_set_attr(R, "__getattr__", getter)
py_set_attr(R, "__setattr__", setter)
py_set_attr(R, "__getitem__", getter)
py_set_attr(R, "__setitem__", setter)
setter <- function(self, name, value) {
envir <- py_resolve_envir()
name <- as_r_value(name)
value <- as_r_value(value)
assign(name, value, envir = envir)
}

# now define the R object
py_run_string("r = R()")
py_set_attr(R, "__getattr__", getter)
py_set_attr(R, "__setattr__", setter)
py_set_attr(R, "__getitem__", getter)
py_set_attr(R, "__setitem__", setter)
}

# remove the 'R' class object
py_del_attr(main, "R")
# now define the R object
py_run_string("r = __R__()")

# indicate success
TRUE
Expand Down
2 changes: 1 addition & 1 deletion R/testthat-helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ skip_if_no_scipy <- function() {
skip("scipy not available for testing")

scipy <- import("scipy")
if (clean_version(scipy$`__version__`) < "1.0")
if (as_numeric_version(scipy$`__version__`) < "1.0")
skip("scipy version is less than v1.0")

}
Expand Down
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ reference:
contents:
- py_save_object
- py_load_object
- cache_eng_python

- title: "Low-Level Interface"
contents:
Expand Down
32 changes: 32 additions & 0 deletions man/cache_eng_python.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading