Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: experimental $sql() method for LazyFrame and DataFrame #1065

Merged
merged 5 commits into from
Apr 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ jobs:
run: task build-website

- name: upload docs
if: ${{ github.event_name == 'pull_request' }}
if: always()
uses: actions/upload-artifact@v4
with:
name: docs
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### New features

- New method `<SQLContext>$register_globals()` (#1064).
- New experimental method `$sql()` for DataFrame and LazyFrame (#1065).

## Polars R Package 0.16.2

Expand Down
51 changes: 51 additions & 0 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2435,3 +2435,54 @@ DataFrame_clear = function(n = 0) {

out
}


# TODO: we can't use % in the SQL query
# <https://github.com/r-lib/roxygen2/issues/1616>
#' Execute a SQL query against the DataFrame
#'
#' @inherit LazyFrame_sql description details params seealso
#' @inherit pl_DataFrame return
#' @examplesIf polars_info()$features$sql
#' df1 = pl$DataFrame(
#' a = 1:3,
#' b = c("zz", "yy", "xx"),
#' c = as.Date(c("1999-12-31", "2010-10-10", "2077-08-08"))
#' )
#'
#' # Query the DataFrame using SQL:
#' df1$sql("SELECT c, b FROM self WHERE a > 1")
#'
#' # Join two DataFrames using SQL.
#' df2 = pl$DataFrame(a = 3:1, d = c(125, -654, 888))
#' df1$sql(
#' "
#' SELECT self.*, d
#' FROM self
#' INNER JOIN df2 USING (a)
#' WHERE a > 1 AND EXTRACT(year FROM c) < 2050
#' "
#' )
#'
#' # Apply transformations to a DataFrame using SQL, aliasing "self" to "frame".
#' df1$sql(
#' query = r"(
#' SELECT
#' a,
#' MOD(a, 2) == 0 AS a_is_even,
#' CONCAT_WS(':', b, b) AS b_b,
#' EXTRACT(year FROM c) AS year,
#' 0::float AS 'zero'
#' FROM frame
#' )",
#' table_name = "frame"
#' )
DataFrame_sql = function(query, ..., table_name = NULL, envir = parent.frame()) {
self$lazy()$sql(
query,
table_name = table_name,
envir = envir
)$collect() |>
result() |>
unwrap("in $sql():")
}
69 changes: 67 additions & 2 deletions R/lazyframe__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,7 @@ LazyFrame_width = method_as_active_binding(\() length(self$schema))
#'
#' @param ... Anything that is accepted by `pl$DataFrame()`
#'
#' @return LazyFrame
#' @keywords LazyFrame_new
#' @return [LazyFrame][LazyFrame_class]
#'
#' @examples
#' pl$LazyFrame(
Expand Down Expand Up @@ -2078,3 +2077,69 @@ LazyFrame_to_dot = function(
LazyFrame_clear = function(n = 0) {
pl$DataFrame(schema = self$schema)$clear(n)$lazy()
}


# TODO: we can't use % in the SQL query
# <https://github.com/r-lib/roxygen2/issues/1616>
#' Execute a SQL query against the LazyFrame
#'
#' The calling frame is automatically registered as a table in the SQL context
#' under the name `"self"`. All [DataFrames][DataFrame_class] and
#' [LazyFrames][LazyFrame_class] found in the `envir` are also registered,
#' using their variable name.
#' More control over registration and execution behaviour is available by
#' the [SQLContext][SQLContext_class] object.
#'
#' This functionality is considered **unstable**, although it is close to
#' being considered stable. It may be changed at any point without it being
#' considered a breaking change.
#' @inherit pl_LazyFrame return
#' @inheritParams SQLContext_execute
#' @inheritParams SQLContext_register_globals
#' @param table_name `NULL` (default) or a character of an explicit name for the table
#' that represents the calling frame (the alias `"self"` will always be registered/available).
#' @seealso
#' - [SQLContext][SQLContext_class]
#' @examplesIf polars_info()$features$sql
#' lf1 = pl$LazyFrame(a = 1:3, b = 6:8, c = c("z", "y", "x"))
#' lf2 = pl$LazyFrame(a = 3:1, d = c(125, -654, 888))
#'
#' # Query the LazyFrame using SQL:
#' lf1$sql("SELECT c, b FROM self WHERE a > 1")$collect()
#'
#' # Join two LazyFrames:
#' lf1$sql(
#' "
#' SELECT self.*, d
#' FROM self
#' INNER JOIN lf2 USING (a)
#' WHERE a > 1 AND b < 8
#' "
#' )$collect()
#'
#' # Apply SQL transforms (aliasing "self" to "frame") and subsequently
#' # filter natively (you can freely mix SQL and native operations):
#' lf1$sql(
#' query = r"(
#' SELECT
#' a,
#' MOD(a, 2) == 0 AS a_is_even,
#' (b::float / 2) AS 'b/2',
#' CONCAT_WS(':', c, c, c) AS c_c_c
#' FROM frame
#' ORDER BY a
#' )",
#' table_name = "frame"
#' )$filter(!pl$col("c_c_c")$str$starts_with("x"))$collect()
LazyFrame_sql = function(query, ..., table_name = NULL, envir = parent.frame()) {
result({
ctx = pl$SQLContext()$register_globals(envir = envir)$register("self", self)

if (!is.null(table_name)) {
ctx$register(table_name, self)
}

ctx$execute(query)
}) |>
unwrap("in $sql():")
}
5 changes: 3 additions & 2 deletions R/sql.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pl_SQLContext = function(...) {
#' Execute SQL query against the registered data
#'
#' Parse the given SQL query and execute it against the registered frame data.
#' @param query A valid string SQL query.
#' @param query A character of the SQL query to execute.
#' @return A [LazyFrame][LazyFrame_class]
#' @examplesIf polars_info()$features$sql
#' query = "SELECT * FROM mtcars WHERE cyl = 4"
Expand Down Expand Up @@ -174,7 +174,8 @@ SQLContext_tables = function() {
#' Automatically maps variable names to table names.
#' @inherit SQLContext_register details return
#' @param ... Ignored.
#' @param envir The environment to search for polars DataFrames/LazyFrames.
#' @param envir The environment to search for polars
#' [DataFrames][DataFrame_class]/[LazyFrames][LazyFrame_class].
#' @seealso
#' - [`<SQLContext>$register()`][SQLContext_register]
#' - [`<SQLContext>$register_many()`][SQLContext_register_many]
Expand Down
77 changes: 77 additions & 0 deletions man/DataFrame_sql.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

74 changes: 74 additions & 0 deletions man/LazyFrame_sql.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/SQLContext_execute.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/SQLContext_register_globals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions man/pl_LazyFrame.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 13 additions & 13 deletions tests/testthat/_snaps/after-wrappers.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,12 @@
[41] "quantile" "rechunk" "rename" "reverse"
[45] "rolling" "sample" "schema" "select"
[49] "select_seq" "shape" "shift" "shift_and_fill"
[53] "slice" "sort" "std" "sum"
[57] "tail" "to_data_frame" "to_list" "to_series"
[61] "to_struct" "transpose" "unique" "unnest"
[65] "var" "width" "with_columns" "with_columns_seq"
[69] "with_row_index" "write_csv" "write_ipc" "write_json"
[73] "write_ndjson" "write_parquet"
[53] "slice" "sort" "sql" "std"
[57] "sum" "tail" "to_data_frame" "to_list"
[61] "to_series" "to_struct" "transpose" "unique"
[65] "unnest" "var" "width" "with_columns"
[69] "with_columns_seq" "with_row_index" "write_csv" "write_ipc"
[73] "write_json" "write_ndjson" "write_parquet"

---

Expand Down Expand Up @@ -164,13 +164,13 @@
[41] "shift_and_fill" "sink_csv"
[43] "sink_ipc" "sink_ndjson"
[45] "sink_parquet" "slice"
[47] "sort" "std"
[49] "sum" "tail"
[51] "to_dot" "unique"
[53] "unnest" "var"
[55] "width" "with_columns"
[57] "with_columns_seq" "with_context"
[59] "with_row_index"
[47] "sort" "sql"
[49] "std" "sum"
[51] "tail" "to_dot"
[53] "unique" "unnest"
[55] "var" "width"
[57] "with_columns" "with_columns_seq"
[59] "with_context" "with_row_index"

---

Expand Down
Loading
Loading