Release Modin 0.21.0 · modin-project/modin

Modin 0.21.0

This release includes many bug fixes, performance enhancements, and new features.

Key Features and Updates Since 0.20.0

Stability and Bugfixes
- FIX-#4828: allow dict_apply_builder use keyword argument internal_indices (#5945)
- FIX-#5091: Handle pd.Grouper objects correctly (#6174)
- FIX-#5203: don't raise AttributeError: 'list' object has no attribute '_query_compiler' in join op (#5939)
- FIX-#5985: BUG: ArrowPeriodType and ArrowIntervalType are not supported by HDK (#5987)
- FIX-#5988: BUG: Concatenation of frames with strings is not supported by HDK (#5989)
- FIX-#5993: Fix documentation building in CI (#5994)
- FIX-#5997: Run build-docs CI job regardless of the files being changed (#5998)
- FIX-#6000: HDK: read_csv(): Do not parse dates, if the parse_dates argument is not specified (#6001)
- FIX-#6022: support lazy import of modin.pandas module (#6023)
- FIX-#6037: Simplified filter node expression for ranges (#6038)
- FIX-#6053: align 'Series.str' signatures with pandas (#6054)
- FIX-#6069: Improve the way resample is handled at the API layer (#6179)
- FIX-#6070: Simplify implementation of shift (#6168)
- FIX-#6074: cap pyarrow<12 to fix CI (#6075)
- FIX-#6094: pin 'urllib3<2' for pip command in 'test-ray-master' job (#6178)
- FIX-#6095: Implement the to_csv() method in the HDK backend (#6099)
- FIX-#6097: Pass storage_options to the to_csv function of PandasOnRayIO class with fsspec (#6098)
- FIX-#6106: Fix API layer implementation of reindex_like (#6131)
- FIX-#6107: Allow pass through of tz_convert and tz_localize to QC if possible (#6137)
- FIX-#6109: Don't use join() when indicator is true (#6130)
- FIX-#6110: Generalize logic to test if an index is a MultiIndex (#6135)
- FIX-#6112: Ensure that truncate verifies that before <= after (#6134)
- FIX-#6113: Add QC Layer implementation for idxmin/max (#6170)
- FIX-#6114: Fix series groupby list of numpy methods (#6129)
- FIX-#6115: Check for _to_datetime attribute in pd.to_datetime (#6133)
- FIX-#6117: Add error checking at API level for diff (#6167)
- FIX-#6120: HDK read_csv(): Fixed parsing dates with nanosecond precision (#6121)
- FIX-#6146: Fix pivot when values=None (#6166)
- FIX-#6152: make numeric_only default to True (#6162)
- FIX-#6154: Ensure GroupBy.getitem preserves key order (#6164)
- FIX-#6155: Fully implement droplevel for axis=0 (#6180)
- FIX-#6175: Fix groupby agg columns for empty column partition (#6176)
- FIX-#6181: Do not ignore copy argument in tz_convert and tz_localize (#6182)
- FIX-#6183: Ensure array resets index and columns for all storage formats (#6185)
- FIX-#6184: Make Series.to_list return proper list (#6188)
- FIX-#6186: Don't use pandas extension types (#6187)
- FIX-#6194: Fix crashes on groupby.{pct_change,diff} (#6195)
- FIX-#6196: Align 'Series.cat' signatures with pandas (#6061)
- FIX-#6204: Use reset_index instead of insert in to_sql (#6205)
- FIX-#6172: Pass storage_options to the to_csv function of PandasOnUnidist class with fsspec (#6173)
Performance enhancements
- PERF-#5835: Introduce lazy categorical proxy for pandas backend (#6055)
- PERF-#5840: Precompute dtypes cache for binary operations more often (#5949)
- PERF-#5841: Precompute dtypes for boolean setitem (#5952)
- PERF-#5999: Do not set Ray's runtime_env for a single-node case (#6028)
- PERF-#6122: Extract Feather's metadata without reading a whole file (#6123)
Refactor Codebase
- REFACTOR-#5844: remove inplace kwarg from query compiler clip arguments (#5954)
- REFACTOR-#5951: remove code duplication for to_pickle_distributed (#5950)
- REFACTOR-#5992: remove 'apply_license_header.py' as unused (#5990)
- REFACTOR-#6012: move experimental dispatchers under modin/experimental/... folder (#6011)
- REFACTOR-#6024: remove code duplication for to_* functions (#5953)
- REFACTOR-#6044: remove code duplication for 'get_objects_from_partitions' (#6045)
- REFACTOR-#6046: remove code duplication for 'progress_bar_wrapper' (#6047)
- REFACTOR-#6062: Add query compiler interfaces for expanding methods (#6064)
- REFACTOR-#6063: Add query compiler interfaces for some strings methods. (#6088)
- REFACTOR-#6065: Use between_time in at_time (#6158)
- REFACTOR-#6066: Support rolling.{rank,quantile,sem} (#6084)
- REFACTOR-#6067: Simplify describe() query compiler interface (#6082)
- REFACTOR-#6068: Simplify info() call (#6087)
- REFACTOR-#6071: Push first and last down to query compiler. (#64) (#6125)
- REFACTOR-#6091: Push more of memory_usage down to query compiler. (#6092)
- REFACTOR-#6105: Explicitly pass default value of np.nan to Series.reindex (#6138)
- REFACTOR-#6108: Move implementation of pd.cut to QC layer (#6136)
- REFACTOR-#6116: Move groupby_ohlc implementation to QC layer (#6132)
- REFACTOR-#6119: #6118: Add query compiler methods for groupby diff, pct_change (#6128)
- REFACTOR-#6151: Get slicer without consructing pandas dataframe. (#6161)
- REFACTOR-#6159: Stop defaulting at API layer for a few more methods (#6160)
Update testing suite
- TEST-#5956: Verify dtypes equality in tests (#5955)
- TEST-#5980: use cancel-in-progress only for PRs (#5917)
- TEST-#5991: add simple tests for read_orc, read_spss, json_normalize, read_xml, read_gbq (#5983)
- TEST-#6004: add more '# pragma: no cover' for io functions (#6002)
- TEST-#6006: test modin/test/test_partition_api.py on unidist and dask (#6003)
- TEST-#6009: use tmp_path fixture instead of ensure_clean_dir as pandas 2.0.0 does (#6008)
- TEST-#6010: add some more test directories into 'setup.cfg' (#6007)
- TEST-#6020: exclude '_version.py' from coverage (#6019)
- TEST-#6027: Test installing Unidist via pip in a clean environment, as we do for Dask and Ray (#6025)
- TEST-#6030: test the function parameters of Series.str accessor for pandas equivalence (#6033)
- TEST-#6031: test the function parameters of 'Series.dt' accessor for pandas equivalence (#6197)
- TEST-#6076: Use 2 cores for experimental groupby on dask (#6077)
- TEST-#6198: add 'pragma: no cover' for unidist and ray utils that used in remote context (#6059)
- TEST-#6260: Increase test_io timeout (#6207)
Documentation improvements
- DOCS-#5449: Add page for Modin interoperability with select third party libraries (#5517)
- DOCS-#6021: Add a section regarding reshuffling groupby to Modin's documentation (#6051)
- DOCS-#6078: correct default values for MODIN_CPUS and MODIN_NPARTITIONS (#6177)
- DOCS-#6079: Make 'experimental/index.html' accessible through the readthedocs website (#6080)
New Features
- FEAT-#5816: Implement '.split' method for axis partitions (#5856)
- FEAT-#5867: Introduce groupby implementation via range-partitioning (#5928)
- FEAT-#6014: Stop defaulting to pandas in groupby frontend for fill-like methods (#5996)
- FEAT-#6039: Implement Series.str through CachedAccessor (#6043)
- FEAT-#6040: implement 'Series.dt' through 'CachedAccessor' (#6056)
- FEAT-#6041: implement 'Series.cat' through 'CachedAccessor' (#6057)
- FEAT-#6144: Stop defaulting at API layer for a bunch of methods (#6145)
- FEAT-#6147: HDK: Arrow-based columns concatenation of frames with trivial index. (#6148)
- FEAT-#6153: Add API layer implementations for some stat methods. (#6156)

Contributors

@AndreyPavlenko
@RehanSD
@YarShev
@anmyachev
@arunjose696
@dchigarev
@devin-petersohn
@helmeleegy
@jkew
@labanyamukhopadhyay
@mdatre
@mvashishtha
@noloerino
@pyrito
@vnlitvinov
@naren-ponder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modin 0.21.0

Key Features and Updates Since 0.20.0

Contributors

Contributors