Releases: G-Research/spark-extension
Releases · G-Research/spark-extension
[2.12.0] - 2024-04-26
Fixes
- Diff change column should respect comparators (#238)
Changed
- Make create_temporary_dir work with pyspark-extension only (#222). This allows installing PIP packages and Poetry projects via pure Python spark-extension package (Maven package not required any more).
- Add map diff comparator to Python API (#226)
[2.11.0] - 2024-01-04
Added
[2.10.0] - 2023-09-27
[2.9.0] - 2023-08-23
[2.8.0] - 2023-05-24
[2.7.0] - 2023-05-05
Added
- Spark app to diff files or tables and write result back to file or table. (#160)
- Add null value count to
parquetBlockColumns
andparquet_block_columns
. (#162) - Add
parallelism
argument to Parquet metadata methods. (#164)
Changed
- Change data type of column name in
parquetBlockColumns
andparquet_block_columns
to array of strings.
Cast to string to get earlier behaviour (string column name). (#162)
[2.6.0] - 2023-04-11
Added
- Add reader for parquet metadata. (#154)
[2.5.0] - 2023-03-23
Added
This is the first version that releases Python packages to PyPi: https://pypi.org/project/pyspark-extension/
[2.4.0] - 2022-12-08
[2.3.0] - 2022-10-26
Added
- Add diffWith to Scala, Java and Python Diff API. (#109)
Changed
- Diff similar Datasets with ignoreColumns. Before, only similar DataFrame could be diffed with ignoreColumns. (#111)
Fixed
- Cache before writing via partitionedBy to work around SPARK-40588. Unpersist via UnpersistHandle. (#124)