Skip to content

Commit

Permalink
feat!: bump polars to 0.44.2 (#1271)
Browse files Browse the repository at this point in the history
  • Loading branch information
eitsupi authored Nov 17, 2024
1 parent a37b3c0 commit 5737455
Show file tree
Hide file tree
Showing 24 changed files with 711 additions and 543 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -120,5 +120,5 @@ Collate:
'zzz.R'
Config/rextendr/version: 0.3.1
VignetteBuilder: knitr
Config/polars/LibVersion: 0.43.1
Config/polars/RustToolchainVersion: nightly-2024-09-19
Config/polars/LibVersion: 0.44.0
Config/polars/RustToolchainVersion: nightly-2024-10-28
62 changes: 33 additions & 29 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@

### Breaking changes

- Updated Rust Polars to 0.44.2 (#1271).
- Minimum supported Rust version (MSRV) is now 1.82.0.
- `$reshape()`'s `nested_type` argument is removed.
- `$approx_n_unique()` no longer works on Categorical type.
- `<Series>$compare()` is removed. (#1272)

### Deprecations
Expand All @@ -18,7 +22,7 @@

## Polars R Package 0.20.0

- Updated rust-polars to 0.43.1 (#1230).
- Updated Rust Polars to 0.43.1 (#1230).

### Breaking changes

Expand Down Expand Up @@ -50,7 +54,7 @@

### Breaking changes

- Updated rust-polars to unreleased 2024-08-20, after 0.42.0 (#1183).
- Updated Rust Polars to unreleased 2024-08-20, after 0.42.0 (#1183).
- `$describe_plan()` and `$describe_optimized_plan()` are removed. Use
respectively `$explain(optimized = FALSE)` and `$explain()` instead (#1182).
- The parameter `inherit_optimization` is removed from all functions that had it
Expand Down Expand Up @@ -117,7 +121,7 @@

### Breaking changes

- Updated rust-polars to 0.41.3 (#1147, #1156).
- Updated Rust Polars to 0.41.3 (#1147, #1156).
- In `$n_chunks()`, the default value of `strategy` now is `"first"` (#1137).
- `$sample()` for Expr and DataFrame (#1136):
- the argument `frac` is renamed `fraction`;
Expand Down Expand Up @@ -174,7 +178,7 @@

### Breaking changes

- Updated rust-polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
- Updated Rust Polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
- In `$join()`, there is a new argument `coalesce` and the `how` options now
accept `"full"` instead of `"outer"` and `"outer_coalesce"`.
- `$top_k()` and `$bottom_k()` gain three arguments `nulls_last`,
Expand Down Expand Up @@ -290,7 +294,7 @@

## Polars R Package 0.16.1

This is a small hot-fix release to update dependent Rust polars to 0.39.1 (#1042).
This is a small hot-fix release to update dependent Rust Polars to 0.39.1 (#1042).

Also, there are some updates.

Expand All @@ -306,7 +310,7 @@ Also, there are some updates.

### Breaking changes

- Rust polars is updated to 0.39.0 (#937, #1034).
- Rust Polars is updated to 0.39.0 (#937, #1034).
- R objects inside an R list are now converted to Polars data types via
`as_polars_series()` (#1021, #1022, #1023). For example, up to polars 0.15.1,
a list containing a data.frame with a column of `{clock}` naive-time class
Expand Down Expand Up @@ -533,7 +537,7 @@ Also, there are some updates.
### New features
- rust-polars is updated to 0.38.2 (#907).
- Rust Polars is updated to 0.38.2 (#907).
- Minimum supported Rust version (MSRV) is now 1.76.0.
- `as_polars_df(<nanoarrow_array>)` is added (#893).
- It is now possible to create an empty `DataFrame` with a specific schema
Expand All @@ -551,9 +555,9 @@ Also, there are some updates.
## Polars R Package 0.15.0
### Breaking changes due to Rust-polars update
### Breaking changes due to Rust Polars update
- rust-polars is updated to 0.38.1 (#865, #872).
- Rust Polars is updated to 0.38.1 (#865, #872).
- in `$pivot()`, arguments `aggregate_function`, `maintain_order`,
`sort_columns` and `separator` must be named. Values that are passed
by position are ignored.
Expand Down Expand Up @@ -715,9 +719,9 @@ Also, there are some updates.
## Polars R Package 0.14.0
### Breaking changes due to Rust-polars update
### Breaking changes due to Rust Polars update
- rust-polars is updated to 0.37.0 (#776).
- Rust Polars is updated to 0.37.0 (#776).
- Minimum supported Rust version (MSRV) is now 1.74.1.
- `$with_row_count()` for `DataFrame` and `LazyFrame` is deprecated and
will be removed in 0.15.0. It is replaced by `$with_row_index()`.
Expand Down Expand Up @@ -904,9 +908,9 @@ a large amount of documentation improvements.
- `pl$polars_info()` is moved to `polars_info()`. `pl$polars_info()` is deprecated
and will be removed in 0.13.0 (#662).

### Rust-polars update
### Rust Polars update

- rust-polars is updated to 0.36.2 (#659). Most of the changes from 0.35.x to 0.36.2
- Rust Polars is updated to 0.36.2 (#659). Most of the changes from 0.35.x to 0.36.2
were covered in R polars 0.12.0.
The main change is that `pl$Utf8` is replaced by `pl$String`.
`pl$Utf8` is an alias and will keep working, but `pl$String` is now preferred
Expand All @@ -927,9 +931,9 @@ a large amount of documentation improvements.

## Polars R Package 0.12.0

### BREAKING CHANGES DUE TO RUST-POLARS UPDATE
### BREAKING CHANGES DUE TO Rust Polars UPDATE

- rust-polars is updated to 2023-12-25 unreleased version (#601, #622).
- Rust Polars is updated to 2023-12-25 unreleased version (#601, #622).
This is the same version of Python Polars package 0.20.2, so please check
the [upgrade guide](https://pola-rs.github.io/polars/releases/upgrade/0.20/) for details too.
- `pl$scan_csv()` and `pl$read_csv()`'s `comment_char` argument is renamed `comment_prefix`.
Expand Down Expand Up @@ -984,9 +988,9 @@ a large amount of documentation improvements.
## Polars R Package 0.11.0
### BREAKING CHANGES DUE TO RUST-POLARS UPDATE
### BREAKING CHANGES DUE TO Rust Polars UPDATE
- rust-polars is updated to 0.35.0 (2023-11-17) (#515)
- Rust Polars is updated to 0.35.0 (2023-11-17) (#515)
- changes in `$write_csv()` and `sink_csv()`: `has_header` is renamed
`include_header` and there's a new argument `include_bom`.
- `pl$cov()` gains a `ddof` argument.
Expand Down Expand Up @@ -1065,9 +1069,9 @@ a large amount of documentation improvements.

## Polars R Package 0.10.0

### BREAKING CHANGES DUE TO RUST-POLARS UPDATE
### BREAKING CHANGES DUE TO Rust Polars UPDATE

- rust-polars is updated to 2023-10-25 unreleased version (#442)
- Rust Polars is updated to 2023-10-25 unreleased version (#442)
- Minimum supported Rust version (MSRV) is now 1.73.
- New subnamespace `"name"` that contains methods `$prefix()`, `$suffix()`
`keep()` (renamed from `keep_name()`) and `map()` (renamed from `map_alias()`).
Expand Down Expand Up @@ -1109,9 +1113,9 @@ a large amount of documentation improvements.

## Polars R Package 0.9.0

### BREAKING CHANGES DUE TO RUST-POLARS UPDATE
### BREAKING CHANGES DUE TO Rust Polars UPDATE

- rust-polars is updated to 0.33.2 (#417)
- Rust Polars is updated to 0.33.2 (#417)
- In all date-time related methods, the argument `use_earliest` is replaced by `ambiguous`.
- In `$sample()` and `$shuffle()`, the argument `fixed_seed` is removed.
- In `$value_counts()`, the arguments `multithreaded` and `sort`
Expand All @@ -1121,7 +1125,7 @@ a large amount of documentation improvements.
- Using `$is_in()` with `NA` on both sides now returns `NA` and not `TRUE` anymore.
- Argument `pattern` of `$str$count_matches()` can now use expressions.
- Needs Rust toolchain `nightly-2023-08-26` for to build with full features.
- Rename R functions to match rust-polars
- Rename R functions to match Rust Polars
- `$str$count_match()` -> `$str$count_matches()` (#417)
- `$str$strip()` -> `$str$strip_chars()` (#417)
- `$str$lstrip()` -> `$str$strip_chars_start()` (#417)
Expand Down Expand Up @@ -1187,9 +1191,9 @@ a large amount of documentation improvements.

## Polars R Package 0.8.0

### BREAKING CHANGES DUE TO RUST-POLARS UPDATE
### BREAKING CHANGES DUE TO Rust Polars UPDATE

rust-polars was updated to 0.32.0, which comes with many breaking changes and new
Rust Polars was updated to 0.32.0, which comes with many breaking changes and new
features. Unrelated breaking changes and new features are put in separate sections
(#334):

Expand Down Expand Up @@ -1259,7 +1263,7 @@ features. Unrelated breaking changes and new features are put in separate sectio
will trigger something like `Cargo build --features "full_features"` which is not exactly the same
as `Cargo build --all-features`. Some dev features are not included in "full_features" (#311).
- Fix bug to allow using polars without library(polars) (#355).
- New methods `<LazyFrame>$optimization_toggle()` + `$profile()` and enable rust-polars feature
- New methods `<LazyFrame>$optimization_toggle()` + `$profile()` and enable Rust Polars feature
CSE: "Activate common subplan elimination optimization" (#323)
- Named expression e.g. `pl$select(newname = pl$lit(2))` are no longer experimental
and allowed as default (#357).
Expand All @@ -1269,7 +1273,7 @@ features. Unrelated breaking changes and new features are put in separate sectio
can define a custom way to convert their format to Polars format. This generic
must return a Polars series. See #368 for an example (#369).
- Private API Support for Arrow Stream import/export of DataFrame between two R packages that uses
rust-polars. [See R package example here](https://github.com/rpolars/extendrpolarsexamples)
Rust Polars. [See R package example here](https://github.com/rpolars/extendrpolarsexamples)
(#326).

## Polars R Package 0.7.0
Expand All @@ -1278,7 +1282,7 @@ features. Unrelated breaking changes and new features are put in separate sectio

- Replace the argument `reverse` by `descending` in all sorting functions. This
is for consistency with the upstream Polars (#291, #293).
- Bump rust-polars from 2023-04-20 unreleased version to version 0.30.0 released in 2023-05-30 (#289).
- Bump Rust Polars from 2023-04-20 unreleased version to version 0.30.0 released in 2023-05-30 (#289).
- Rename `concat_lst` to `concat_list`.
- Rename `$str$explode` to `$str$str_explode`.
- Remove `tz_aware` and `utc` arguments from `str_parse`.
Expand All @@ -1299,7 +1303,7 @@ features. Unrelated breaking changes and new features are put in separate sectio
- Fix memory leak on error bug. Fix printing of `%` bug. Prepare for renaming of polars classes (#252).
- Add helpful reference landing page at `polars.github.io/reference_home` (#223, #264).
- Supports Rust 1.65 (#262, #280)
- rust-polars' `simd` feature is now disabled by default. To enable it, set the environment variable
- Rust Polars' `simd` feature is now disabled by default. To enable it, set the environment variable
`RPOLARS_ALL_FEATURES` to `true` when build r-polars (#262).
- `opt-level` of `argminmax` is now set to `1` in the `release` profile to support Rust < 1.66.
The profile can be changed by setting the environment variable `RPOLARS_PROFILE` (when set to `release-optimized`,
Expand Down Expand Up @@ -1332,7 +1336,7 @@ features. Unrelated breaking changes and new features are put in separate sectio

### BREAKING CHANGES

- Bump rust-polars from 2023-02-17 unreleased version to 2023-04-20 unreleased version. (#183)
- Bump Rust Polars from 2023-02-17 unreleased version to 2023-04-20 unreleased version. (#183)
- `top_k`'s `reverse` option is removed. Use the new `bottom_k` method instead.
- The name of the `fmt` argument of some methods (e.g. `parse_date`) has been changed to `format`.
Expand Down
13 changes: 4 additions & 9 deletions R/expr__expr.R
Original file line number Diff line number Diff line change
Expand Up @@ -1758,8 +1758,7 @@ Expr_n_unique = use_extendr_wrapper
#' This is done using the HyperLogLog++ algorithm for cardinality estimation.
#' @return Expr
#' @examples
#' as_polars_df(iris[, 4:5])$
#' with_columns(count = pl$col("Species")$approx_n_unique())
#' as_polars_df(mtcars)$select(count = pl$col("cyl")$approx_n_unique())
Expr_approx_n_unique = use_extendr_wrapper

#' Count missing values
Expand Down Expand Up @@ -3022,9 +3021,6 @@ Expr_arctanh = use_extendr_wrapper
#' @param dimensions A integer vector of length of the dimension size.
#' If `-1` is used in any of the dimensions, that dimension is inferred.
#' Currently, more than two dimensions not supported.
#' @param nested_type The nested data type to create. [List][DataType_List] only
#' supports 2 dimensions, whereas [Array][DataType_Array] supports an arbitrary
#' number of dimensions.
#' @return [Expr][Expr_class].
#' If a single dimension is given, results in an expression of the original data
#' type. If a multiple dimensions are given, results in an expression of data
Expand All @@ -3042,11 +3038,10 @@ Expr_arctanh = use_extendr_wrapper
#' # One can specify more than 2 dimensions by using the Array type
#' df = pl$DataFrame(foo = 1:12)
#' df$select(
#' pl$col("foo")$reshape(c(3, 2, 2), nested_type = pl$Array(pl$Float32, 2))
#' pl$col("foo")$reshape(c(3, 2, 2))
#' )
Expr_reshape = function(dimensions, nested_type = pl$List()) {
is_list = nested_type$is_list()
.pr$Expr$reshape(self, dimensions, is_list) |>
Expr_reshape = function(dimensions) {
.pr$Expr$reshape(self, dimensions) |>
unwrap("in $reshape():")
}

Expand Down
4 changes: 2 additions & 2 deletions R/extendr-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ import_arrow_ipc <- function(path, n_rows, cache, rechunk, row_name, row_index,

new_from_ndjson <- function(path, infer_schema_length, batch_size, n_rows, low_memory, rechunk, row_index_name, row_index_offset, ignore_errors) .Call(wrap__new_from_ndjson, path, infer_schema_length, batch_size, n_rows, low_memory, rechunk, row_index_name, row_index_offset, ignore_errors)

new_from_parquet <- function(path, n_rows, cache, parallel, rechunk, row_name, row_index, storage_options, use_statistics, low_memory, hive_partitioning, hive_schema, try_parse_hive_dates, glob, include_file_paths) .Call(wrap__new_from_parquet, path, n_rows, cache, parallel, rechunk, row_name, row_index, storage_options, use_statistics, low_memory, hive_partitioning, hive_schema, try_parse_hive_dates, glob, include_file_paths)
new_from_parquet <- function(path, n_rows, cache, parallel, rechunk, row_name, row_index, storage_options, use_statistics, low_memory, hive_partitioning, schema, hive_schema, try_parse_hive_dates, glob, include_file_paths, allow_missing_columns) .Call(wrap__new_from_parquet, path, n_rows, cache, parallel, rechunk, row_name, row_index, storage_options, use_statistics, low_memory, hive_partitioning, schema, hive_schema, try_parse_hive_dates, glob, include_file_paths, allow_missing_columns)

test_rpolarserr <- function() .Call(wrap__test_rpolarserr)

Expand Down Expand Up @@ -680,7 +680,7 @@ RPolarsExpr$arccosh <- function() .Call(wrap__RPolarsExpr__arccosh, self)

RPolarsExpr$arctanh <- function() .Call(wrap__RPolarsExpr__arctanh, self)

RPolarsExpr$reshape <- function(dimensions, is_list) .Call(wrap__RPolarsExpr__reshape, self, dimensions, is_list)
RPolarsExpr$reshape <- function(dimensions) .Call(wrap__RPolarsExpr__reshape, self, dimensions)

RPolarsExpr$shuffle <- function(seed) .Call(wrap__RPolarsExpr__shuffle, self, seed)

Expand Down
18 changes: 15 additions & 3 deletions R/io_parquet.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,17 @@
#' @param rechunk In case of reading multiple files via a glob pattern, rechunk
#' the final DataFrame into contiguous memory chunks.
#' @param glob Expand path given via globbing rules.
#' @param schema Specify the datatypes of the columns. The datatypes must match the datatypes in the file(s).
#' If there are extra columns that are not in the file(s), consider also enabling `allow_missing_columns`.
#' @param use_statistics Use statistics in the parquet file to determine if pages
#' can be skipped from reading.
#' @param storage_options Experimental. List of options necessary to scan
#' parquet files from different cloud storage providers (GCP, AWS, Azure,
#' HuggingFace). See the 'Details' section.
#' @param allow_missing_columns When reading a list of parquet files, if a column existing in the first
#' file cannot be found in subsequent files, the default behavior is to raise an error.
#' However, if `allow_missing_columns` is set to `TRUE`, a full-NULL column is returned
#' instead of erroring for the files that do not contain the column.
#'
#' @rdname IO_scan_parquet
#' @details
Expand Down Expand Up @@ -101,12 +107,14 @@ pl_scan_parquet = function(
hive_schema = NULL,
try_parse_hive_dates = TRUE,
glob = TRUE,
schema = NULL,
rechunk = FALSE,
low_memory = FALSE,
storage_options = NULL,
use_statistics = TRUE,
cache = TRUE,
include_file_paths = NULL) {
include_file_paths = NULL,
allow_missing_columns = FALSE) {
new_from_parquet(
path = source,
n_rows = n_rows,
Expand All @@ -122,7 +130,9 @@ pl_scan_parquet = function(
try_parse_hive_dates = try_parse_hive_dates,
storage_options = storage_options,
glob = glob,
include_file_paths = include_file_paths
schema = schema,
include_file_paths = include_file_paths,
allow_missing_columns = allow_missing_columns
) |>
unwrap("in pl$scan_parquet():")
}
Expand Down Expand Up @@ -162,12 +172,14 @@ pl_read_parquet = function(
hive_schema = NULL,
try_parse_hive_dates = TRUE,
glob = TRUE,
schema = NULL,
rechunk = TRUE,
low_memory = FALSE,
storage_options = NULL,
use_statistics = TRUE,
cache = TRUE,
include_file_paths = NULL) {
include_file_paths = NULL,
allow_missing_columns = FALSE) {
.args = as.list(environment())
result({
do.call(pl$scan_parquet, .args)$collect()
Expand Down
3 changes: 1 addition & 2 deletions man/Expr_approx_n_unique.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 2 additions & 6 deletions man/Expr_reshape.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 5737455

Please sign in to comment.