rust-v0.17.3 (2024-05-01)
Implemented enhancements:
- Limit concurrent ObjectStore access to avoid resource limitations in constrained environments #2457
- How to get a DataFrame in Rust? #2404
- Allow checkpoint creation when partion column is "timestampNtz " #2381
- is there a way to make writing timestamp_ntz optional #2339
- Update arrow dependency #2328
- Release GIL in deltalake.write_deltalake #2234
- Unable to retrieve custom metadata from tables in rust #2153
- Refactor commit interface to be a Builder #2131
Fixed bugs:
- Handle rate limiting during write contention #2451
- regression : delta.logRetentionDuration don't seems to be respected #2447
- Issue writing to mounted storage in AKS using delta-rs library #2445
- TableMerger - when_matched_delete() fails when Column names contain special characters #2438
- Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
- Merge on predicate throw error on date colum: Unable to convert expression to string #2420
- Writing Tables with Append mode errors if the schema metadata is different #2419
- Logstore issues on AWS Lambda #2410
- Datafusion timestamp type doesn't respect delta lake schema #2408
- Compacting produces smaller row groups than expected #2386
- ValueError: Partition value cannot be parsed from string. #2380
- Very slow s3 connection after 0.16.1 #2377
- Merge update+insert truncates a delta table if the table is big enough #2362
- Do not add readerFeatures or writerFeatures keys under checkpoint files if minReaderVersion or minWriterVersion do not satisfy the requirements #2360
- Create empty table failed on rust engine #2354
- Getting error message when running in lambda: message: "Too many open files" #2353
- Temporary files filling up _delta_log folder - increasing table load time #2351
- compact fails with merged schemas #2347
- Cannot merge into table partitioned by date type column on 0.16.3 #2344
- Merge breaks using logical datatype decimal128 #2343
- Decimal types are not checked against max precision/scale at table creation #2331
- Merge update+insert truncates a delta table #2320
- Extract
add.stats_parsed
with wrong type #2312 - Process fails without error message when executing merge #2310
- delta_rs don't seems to respect the row group size #2309
- Auth error when running inside VS Code #2306
- Unable to read deltatables with binary columns: Binary is not supported by JSON #2302
- Schema evolution not coercing with Large arrow types #2298
- Panic in
deltalake_core::kernel::snapshot::log_segment::list_log_files_with_checkpoint::{{closure}}
#2290 - Checkpoint does not preserve reader and writer features for the table protocol. #2288
- Z-Order with larger dataset resulting in memory error #2284
- Successful writes return error when using concurrent writers #2279
- Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275
- Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262
- DeltaTable is not resilient to corrupted checkpoint state #2258
- Inconsistent units of time #2256
- Partition column comparison is an assertion rather than if block with raise exception #2242
- Unable to merge column names starting from numbers #2230
- Merging to a table with multiple distinct partitions in parallel fails #2227
- cleanup_metadata not respecting custom
logRetentionDuration
#2180 - Merge predicate fails with a field with a space #2167
- When_matched_update causes records to be lost with explicit predicate #2158
- Merge execution time grows exponetially with the number of column #2107
- _internal.DeltaError when merging #2084
rust-v0.17.1 (2024-03-06)
Implemented enhancements:
- Get statistics metadata #2233
- add option to append only a subsets of columns #2212
- add documentation how to configure delta.logRetentionDuration #2072
- Add
drop constraint
#2070 - Add 0.16 deprecation warnings for DynamoDB lock #2049
Fixed bugs:
- cleanup_metadata not respecting custom
logRetentionDuration
#2180 - Rust writer panics on empty record batches #2253
- DeltaLake executed Rust: write method not found in
DeltaOps
#2244 - DELTA_FILE_PATTERN regex is incorrectly matching tmp commit files #2201
- Failed to create checkpoint with "Parquet does not support writing empty structs" #2189
- Error when parsing delete expressions #2187
- terminate called without an active exception #2184
- Now conda-installable on M1 #2178
- Add error message for parition_by check #2177
- deltalake 0.15.2 prints partitions_values and paths which is not desired #2176
- cleanup_metadata can potentially delete most recent checkpoint, corrupting table #2174
- Broken filter for newly created delta table #2169
- Hash for StructField should consider more than the name #2045
- Schema comparaison in writer #1853
- fix(python): sort before schema comparison #2209 (ion-elgreco)
- fix: prevent writing checkpoints with a version that does not exist in table state #1863 (rtyler)
Closed issues:
- Bug/Question: arrow's
FixedSizeList
is not roundtrippable #2162
Merged pull requests:
- fix: fixes panic on empty write #2254 (aersam)
- fix(rust): typo deletionvectors #2251 (ion-elgreco)
- fix(rust): make interval parsing compatible with plural form #2250 (ion-elgreco)
- chore: bump to 0.16 #2248 (ion-elgreco)
- feat: merge schema support for the write operation and Python #2246 (rtyler)
- fix: object_store 0.9.0 since 0.9.1 causes CI failure #2245 (aersam)
- chore(python): bump version #2241 (ion-elgreco)
- fix: fix ruff and mypy version and do formatting #2240 (aersam)
- feat(python, rust): timestampNtz support #2236 (ion-elgreco)
- chore: clean up some compilation failures and un-ignore some tests #2231 (rtyler)
- docs: fixing example in CONTRIBUTING.md #2224 (gacharya)
- perf: directly create projection instead of using DataFrame::with_column #2222 (emcake)
- chore: remove caches from github actions #2215 (rtyler)
- fix:
is_commit_file
should only catch commit jsons #2213 (emcake) - chore: fix the Cargo.tomls to publish information properly on docs.rs #2211 (rtyler)
- fix(writer): retry storage.put on temporary network errors #2207 (qinix)
- fix: canonicalize config keys #2206 (emcake)
- docs: update README code samples for newer versions #2202 (jhoekx)
- docs: dask integration fix formatting typo #2196 (avriiil)
- fix: add data_type and nullable to StructField hash (#2045) #2190 (sonhmai)
- fix: removed panic in method #2185 (mightyshazam)
- feat: implement string representation for PartitionFilter #2183 (sonhmai)
- fix: correct map field names #2182 (emcake)
- feat: add comment to explain why assert has failed and show state #2179 (braaannigan)
- docs: include the 0.17.0 changelog #2173 (rtyler)
- fix(python): skip empty row groups during stats gathering #2172 (ion-elgreco)
- chore: 0.17.0 publish changes #2171 (rtyler)
- chore(python): bump version #2170 (ion-elgreco)
- chore: update all the package metadata for publication to crates.io #2168 (rtyler)
- fix: rm println in python lib #2166 (ion-elgreco)
- chore: cleanup minor clippies and other warns #2161 (rtyler)
- feat: implement clone for DeltaTable struct #2160 (mightyshazam)
- fix: allow loading of tables with identity columns #2155 (rtyler)
- fix: replace BTreeMap with IndexMap to preserve insertion order #2150 (roeap)
- fix: made generalize_filter less permissive, also added more cases #2149 (emcake)
- docs: add delta lake best practices #2147 (MrPowers)
- chore: shorten up the crate folder names in the tree #2145 (rtyler)
- fix(#2143): keep specific error type when writing fails #2144 (abaerptc)
- refactor(python): drop custom filesystem in write_deltalake #2137 (ion-elgreco)
- docs: use transparent logo in README #2132 (roeap)
- fix: order logical schema to match physical schema #2129 (Blajda)
- feat: expose stats schema on Snapshot #2128 (roeap)
- feat: update table config to contain new config keys #2127 (roeap)
- fix: clean-up paths created during tests #2126 (roeap)
- fix: prevent empty stats struct during parquet write #2125 (alexwilcoxson-rel)
- fix: temporarily skip s3 roundtrip test #2124 (roeap)
- fix: do not write empty parquet file/add on writer close; accurately … #2123 (alexwilcoxson-rel)
- docs: add dask page to integration docs #2122 (avriiil)
- chore: upgrade to DataFusion 35.0 #2121 (philippemnoel)
- fix(s3): restore working test for DynamoDb log store repair log on read #2120 (dispanser)
- fix: set partition values for added files when building compaction plan #2119 (alexwilcoxson-rel)
- fix: add missing pandas import #2116 (Tim-Haarman)
- chore: temporarily ignore the repair on update test #2114 (rtyler)
- docs: delta lake is great for small data #2113 (MrPowers)
- chore: removed unnecessary print statement from update method #2111 (LilMonk)
- fix: schema issue within writebuilder #2106 (universalmind303)
- docs: fix arg indent #2103 (wchatx)
- docs: delta lake file skipping #2096 (MrPowers)
- docs: move dynamo docs into new docs page #2093 (ion-elgreco)
- chore: bump python #2092 (ion-elgreco)
- feat: allow merge_execute to release the GIL #2091 (emcake)
- docs: how delta lake transactions work #2089 (MrPowers)
- fix: reinstate copy-if-not-exists passthrough #2083 (emcake)
- docs: make an overview tab visible in docs #2080 (r3stl355)
- docs: add usage guide for check constraints #2079 (hntd187)
- docs: update docs for rust print statement #2077 (skariyania)
- docs: add page on why to use delta lake #2076 (MrPowers)
- feat(rust, python): add
drop constraint
operation #2071 (ion-elgreco) - refactor: add deltalake-gcp crate #2061 (ion-elgreco)
- fix: allow checkpoints to contain metadata actions without a createdTime value #2059 (rtyler)
- chore: bump version python #2047 (ion-elgreco)
- fix: ensure metadata cleanup do not corrupt tables without checkpoints #2044 (Blajda)
- docs: update docs for merge #2042 (Blajda)
- chore: update documentation for S3 / DynamoDb log store configuration #2041 (dispanser)
- feat: arrow backed log replay and table state #2037 (roeap)
- fix: properly deserialize percent-encoded file paths of Remove actions, to make sure tombstone and file paths match #2035 (sigorbor)
- fix: remove casts of structs to record batch #2033 (Blajda)
- feat(python, rust): expose custom_metadata for all operations #2032 (ion-elgreco)
- feat: refactor WriterProperties class #2030 (ion-elgreco)
- chore: update datafusion #2029 (roeap)
- refactor: increase metadata action usage #2027 (roeap)
- fix: github actions for releasing docs #2026 (r3stl355)
- feat: introduce schema evolution on RecordBatchWriter #2024 (rtyler)
- refactor: move azure integration to dedicated crate #2023 (roeap)
- fix: use temporary table names during the constraint checks #2017 (r3stl355)
- docs: add alterer #2014 (ion-elgreco)
- chore: version bump python release #2011 (ion-elgreco)
- fix: fix the test_restore_by_datetime test #2010 (r3stl355)
- feat(rust): add more commit info to most operations #2009 (ion-elgreco)
- feat(python): add schema conversion of FixedSizeBinaryArray and FixedSizeListType #2005 (balbok0)
- feat(python): expose large_dtype param in
merge
#2003 (ion-elgreco) - docs: add writer properties to docs #2002 (ion-elgreco)
- chore: fix CI breaking lint issues #1999 (r3stl355)
- feat: implementation for replaceWhere #1996 (r3stl355)
- chore: refactoring AWS code out of the core crate #1995 (rtyler)
- feat(python): expose custom metadata to writers #1994 (ion-elgreco)
- docs: datafusion integration #1993 (MrPowers)
- fix: flakey gcs test #1987 (roeap)
- fix: implement consistent formatting for constraint expressions #1985 (Blajda)
- fix: case sensitivity for z-order #1982 (Blajda)
- feat(python): add writer_properties to all operations #1980 (ion-elgreco)
- refactor: trigger metadata retrieval only during
DeltaTable.metadata
#1979 (ion-elgreco) - feat: retry with exponential backoff for DynamoDb interaction #1975 (dispanser)
- feat(python): expose
add constraint
operation #1973 (ion-elgreco) - fix: properly decode percent-encoded file paths coming from parquet checkpoints #1970 (sigorbor)
- feat: omit unmodified files during merge write #1969 (Blajda)
- feat(python): combine load_version/load_with_datetime into
load_as_version
#1968 (ion-elgreco) - fix: enable S3 integration tests to be configured via environment vars #1966 (dispanser)
- fix: handle empty table response in unity api #1963 (JonasDev1)
- docs: add auto-release when docs are merged to main #1962 (r3stl355)
- feat: cast list items to default before write with different item names #1959 (JonasDev1)
- feat: merge using partition filters #1958 (emcake)
- chore: relocate cast_record_batch into its own module to shed the datafusion dependency #1955 (rtyler)
- fix: respect case sensitivity on operations #1954 (Blajda)
- docs: add better installation instructions #1951 (MrPowers)
- docs: add polars integration #1949 (MrPowers)
- fix: add arrow page back #1944 (ion-elgreco)
- fix: remove the get_data_catalog() function #1941 (rtyler)
- chore: update runs-on value in python_release.yml #1940 (wjones127)
- docs: start how delta lake works #1938 (MrPowers)
- docs: add logo, dark mode, boost search #1936 (ion-elgreco)
- refactor: prefer usage of metadata and protocol fields #1935 (roeap)
- chore: update python version #1934 (wjones127)
- feat(python): expose create to DeltaTable class #1932 (ion-elgreco)
- docs: fix all examples and change overall structure #1931 (ion-elgreco)
- feat: update to include pyarrow-hotfix #1930 (dennyglee)
- fix: get rid of panic in during table #1928 (dimonchik-suvorov)
- fix(rust/python):
optimize.compact
not working with tables with mixed large/normal arrow #1926 (ion-elgreco) - feat: extend write_deltalake to accept Deltalake schema #1922 (r3stl355)
- fix: fail fast for opening non-existent path #1917 (dimonchik-suvorov)
- feat: check constraints #1915 (hntd187)
- docs: delta lake arrow integration page #1914 (MrPowers)
- feat: add more info for contributors #1913 (r3stl355)
- fix: add buffer flushing to filesystem writes #1911 (r3stl355)
- docs: update docs home page and add pandas integration #1905 (MrPowers)
- feat: implement S3 log store with transactions backed by DynamoDb #1904 (dispanser)
- fix: prune each merge bin with only 1 file #1902 (haruband)
- docs: update python docs link in readme.md #1899 (thomasfrederikhoeck)
- docs: on append, overwrite, delete and z-ordering #1897 (MrPowers)
- feat: compare timestamp partition values as timestamps instead of strings #1895 (sigorbor)
- feat(python): expose rust writer as additional engine v2 #1891 (ion-elgreco)
- feat: add high-level checking for append-only tables #1887 (junjunjd)
- test: loading version 0 Delta table #1885 (dimonchik-suvorov)
- fix: improve catalog failure error message, add missing Glue native-tls feature dependency #1883 (r3stl355)
- refactor: simplify
DeltaTableState
#1877 (roeap) - refactor: express log schema in delta types #1876 (roeap)
- docs: add Rust installation instructions #1875 (MrPowers)
- chore: clippy #1871 (roeap)
- fix: docs deployment action #1869 (r3stl355)
- docs: tell how to claim an issue #1866 (wjones127)
- feat: drop python 3.7 and adopt 3.12 #1859 (roeap)
- feat: create benchmarks for merge #1857 (Blajda)
- chore: add @ion-elgreco to python/ #1855 (rtyler)
- fix: compile error with lifetime issues on optimize (#1843) #1852 (dispanser)
- feat: implement issue auto-assign on
take
comment #1851 (r3stl355) - docs: add docs on small file compaction with optimize #1850 (MrPowers)
- fix: checkpoint error with Azure Synapse #1848 (PierreDubrulle)
- feat(python): expose
convert_to_deltalake
#1842 (ion-elgreco) - ci: adopt
ruff format
for formatting #1841 (roeap)
rust-v0.17.0 (2024-02-06)
The 0.17.0 release moves storage implementations into their own crates, such as
deltalake-aws
. A consequence of that refactoring is that custom storage and
file scheme handlers must be registered/initialized at runtime. Storage
subcrates conventionally define a register_handlers
function which performs
that task. Users may see errors such as:
thread 'main' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/deltalake-core-0.17.0/src/table/builder.rs:189:48:
The specified table_uri is not valid: InvalidTableLocation("Unknown scheme: s3")
- Users of the meta-crate (
deltalake
) can call the storage crate via:deltalake::aws::register_handlers(None);
at the entrypoint for their code. - Users who adopt
core
and storage crates independently (e.g.deltalake-aws
) can register viadeltalake_aws::register_handlers(None);
.
The AWS, Azure, and GCP crates must all have their custom file schemes registered in this fashion.
The locking mechanism is fundamentally different between deltalake
v0.16.x and v0.17.0, starting with this release the deltalake
and deltalake-aws
crates this library now relies on the same protocol for concurrent writes on AWS as the Delta Lake/Spark implementation.
Fundamentally the DynamoDB table structure changes, which is documented here. The configuration of a Rust process should continue to use the AWS_S3_LOCKING_PROVIDER
environment value of dynamodb
. The new table must be specified with the DELTA_DYNAMO_TABLE_NAME
environment or configuration variable, and that should name the new S3DynamoDbLogStore
compatible DynamoDB table.
Because locking is required to ensure safe cconsistent writes, there is no iterative migration, 0.16 and 0.17 writers cannot safely coexist. The following steps should be taken when upgrading:
- Stop all 0.16.x writers
- Ensure writes are completed, and lock table is empty.
- Deploy 0.17.0 writers
Implemented enhancements:
- Expose the ability to compile DataFusion with SIMD #2118
- Updating Table log retention configuration with
write_deltalake
silently changes nothing #2108 - ALTER table, ALTER Column, Add/Modify Comment, Add/remove/rename partitions, Set Tags, Set location, Set TBLProperties #2088
- Docs: Update docs for check constraints #2063
- Don't
ensure_table_uri
when creating a tablewith_log_store
#2036 - Exposing custom_metadata in merge operation #2031
- Support custom table properties via TableAlterer and write/merge #2022
- Remove parquet2 crate support #2004
- Merge operation that only touches necessary partitions #1991
- store userMetadata on write operations #1990
- Create Dask integration page #1956
- Merge: Filtering on partitions #1918
- Rethink the load_version and load_with_datetime interfaces #1910
- docs: Delta Lake + Arrow Integration #1908
- docs: Delta Lake + Polars integration #1906
- Rethink decision to expose the public interface in namespaces #1900
- Add documentation on how to build and run documentation locally #1893
- Add API to create an empty Delta Lake table #1892
- Implementing CHECK constraints #1881
- Check Invariants are respecting table features for write paths #1880
- Organize docs with single lefthand sidebar #1873
- Make sure invariants are handled properly throughout the codebase #1870
- Unable to use deltalake
Schema
inwrite_deltalake
#1862 - Add a Rust-backed engine for write_deltalake #1861
- Run doctest in CI for Python API examples #1783
- [RFC] Use arrow for checkpoint reading and state handling #1776
- Expose Python exceptions in public module #1771
- Expose cleanup_metadata or create_checkpoint_from_table_uri_and_cleanup to the Python API #1768
- Expose convert_to_delta to Python API #1767
- Add high-level checking for append-only tables #1759
Fixed bugs:
- Row order no longer preserved after merge operation #2165
- Error when reading delta table with IDENTITY column #2152
- Merge on IS NULL condition doesn't work for empty table #2148
- JsonWriter converts structured parsing error into plain string #2143
- Pandas import error when merging tables #2112
- test_repair_on_update broken in main #2109
WriteBuilder::with_input_execution_plan
does not apply the schema to the log's metadata fields #2105- MERGE logical plan vs execution plan schema mismatch #2104
- Partitions not pushed down #2090
- Cant create empty table with write_deltalake #2086
- Unexpected high costs on Google Cloud Storage #2085
- Unable to read s3 table:
Unknown scheme: s3
#2065 - write_deltalake not respecting writer_properties #2064
- Unable to read/write tables with the "gs" schema in the table_uri in 0.15.1 #2060
- LockClient requiered error for S3 backend in 0.15.1 python #2057
- Error while writing Pandas DataFrame to Delta Lake (S3) #2051
- Error with dynamo locking provider on 0.15 #2034
- Conda version 0.15.0 is missing files #2021
- Rust panicking through Python library when a delete predicate uses a nullable field #2019
- No snapshot or version 0 found, perhaps /Users/watsy0007/resources/test_table/ is an empty dir? #2016
- Generic DeltaTable error: type_coercion in Struct column in merge operation #1998
- Constraint expr not formatted during commit action #1971
- .load_with_datetime() is incorrectly rounding to nearest second #1967
- vacuuming log files #1965
- Unable to merge uppercase column names #1960
- Schema error: Invalid data type for Delta Lake: Null #1946
- Python v0.14 wheel files not up to date #1945
- python Release 0.14 is missing Windows wheels #1942
- CI integration test fails randomly: test_restore_by_datetime #1925
- Merge data freezes indefenetely #1920
- Load DeltaTable from non-existing folder causing empty folder creation #1916
- Reoptimizes merge bins with only 1 file, even though they have no effect. #1901
- The Python Docs link in README.MD points to old docs #1898
- optimize.compact() fails with bad schema after updating to pyarrow 8.0 #1889
- Python build is broken on main #1856
- Checkpoint error with Azure Synapse #1847
- merge very slow compared to delete + append on larger dataset #1846
- get_add_actions fails with deltalake 0.13 #1835
- Handle PyArrow CVE-2023-47248 #1834
- Delta-rs writer hangs with to many file handles open (Azure) #1832
- Encountering NotATable("No snapshot or version 0 found, perhaps xxx is an empty dir?") #1831
- write_deltalake is not creating checkpoints #1815
- Problem writing tables in directory named with char
~
#1806 - DeltaTable Merge throws in merging if there are uppercase in Schema. #1797
- rust merge error - datafusion panics #1790
- expose use_dictionary=False when writing Delta Table and running optimize #1772
Closed issues:
- Is this print necessary? Can we remove this. #2110
- Azure concurrent writes #2069
- Fix docs deployment #1867
- Add a header in old docs and direct users to new docs #1865
rust-v0.16.5 (2023-11-15)
Implemented enhancements:
- When will upgrade object_store to 0.8? #1858
- No Official Help #1849
- Auto assign GitHub issues with a "take" message #1791
Fixed bugs:
- cargo clippy fails on core in main #1843
rust-v0.16.4 (2023-11-12)
Implemented enhancements:
- Unable to add deltalake git dependency to cargo.toml #1821
rust-v0.16.3 (2023-11-08)
Implemented enhancements:
Fixed bugs:
- Code Owners no longer valid #1794
MERGE
works incorrectly with partitioned table if the data column order is not same as table column order #1787- errors when using pyarrow dataset as a source #1779
- Write to Microsoft OneLake failed. #1764
rust-v0.16.2 (2023-10-21)
rust-v0.16.1 (2023-10-21)
rust-v0.16.0 (2023-09-27)
Implemented enhancements:
- Expose Optimize option min_commit_interval in Python #1640
- Expose create_checkpoint_for #1513
- integration tests regularly fail for HDFS #1428
- Add Support for Microsoft OneLake #1418
- add support for atomic rename in R2 #1356
Fixed bugs:
- Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
- [python] Different stringification of partition values in reader and writer #1653
- Unable to interface with data written from Spark Databricks #1651
get_last_checkpoint
does some unnecessary listing #1643PartitionWriter
'sbuffer_len
doesn't include incomplete row groups #1637- Slack community invite link has expired #1636
- delta-rs does not appear to support tables with liquid clustering #1626
- Internal Parquet panic when using a Map type. #1619
- partition_by with "$" on local filesystem #1591
- ProtocolChanged error when perfoming append write #1585
- Unable to
cargo update
using git tag or rev on Rust 1.70 #1580 - NoMetadata error when reading detlatable #1562
- Cannot read delta table:
Delta protocol violation
#1557 - Update the CODEOWNERS to capture the current reviewers and contributors #1553
- [Python] Incorrect file URIs when partition values contain escape character #1533
- add documentation how to Query Delta natively from datafusion #1485
- Python: write_deltalake to ADLS Gen2 issue #1456
- Partition values that have been url encoded cannot be read when using deltalake #1446
- Error optimizing large table #1419
- Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
- ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
- Invalid JSON in log record missing field
schemaString
for DLT tables #1302 - Special characters in partition path not handled locally #1299
Merged pull requests:
- chore: bump rust crate version #1675 (rtyler)
- fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
- feat: allow to set large dtypes for the schema check in
write_deltalake
#1668 (ion-elgreco) - docs: small consistency update in guide and readme #1666 (ion-elgreco)
- fix: exception string in writer.py #1665 (sebdiem)
- chore: increment python library version #1664 (wjones127)
- docs: fix some typos #1662 (ion-elgreco)
- fix: more consistent handling of partition values and file paths #1661 (roeap)
- docs: add docstring to protocol method #1660 (MrPowers)
- docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
- fix: enable offset listing for s3 #1654 (eeroel)
- chore: fix the incorrect Slack link in our readme #1649 (rtyler)
- fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
- chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
- feat: expose min_commit_interval to
optimize.compact
andoptimize.z_order
#1645 (ion-elgreco) - fix: avoid excess listing of log files #1644 (eeroel)
- fix: introduce support for Microsoft OneLake #1642 (rtyler)
- fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
- fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
- chore: relax chrono pin to 0.4 #1635 (houqp)
- chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
- docs: update Readme #1633 (dennyglee)
- chore: pin the chrono dependency #1631 (rtyler)
- feat: pass known file sizes to filesystem in Python #1630 (eeroel)
- feat: implement parsing for the new
domainMetadata
actions in the commit log #1629 (rtyler) - ci: fix python release #1624 (wjones127)
- ci: extend azure timeout #1622 (wjones127)
- feat: allow multiple incremental commits in optimize #1621 (kvap)
- fix: change map nullable value to false #1620 (cmackenzie1)
- Introduce the changelog for the last couple releases #1617 (rtyler)
- chore: bump python version to 0.10.2 #1616 (wjones127)
- perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
- fix: don't re-encode paths #1613 (wjones127)
- feat: use url parsing from object store #1592 (roeap)
- feat: buffered reading of transaction logs #1549 (eeroel)
- feat: merge operation #1522 (Blajda)
- feat: expose create_checkpoint_for to the public #1514 (haruband)
- docs: update Readme #1440 (roeap)
- refactor: re-organize top level modules #1434 (roeap)
- feat: integrate unity catalog with datafusion #1338 (roeap)
rust-v0.15.0 (2023-09-06)
Implemented enhancements:
- Configurable number of retries for transaction commit loop #1595
Fixed bugs:
- Unable to read table using VM Managed Identity on Azure #1462
- Unable to query by partition column #1445
Merged pull requests:
- fix: update python test #1608 (wjones127)
- chore: update datafusion to 30, arrow to 45 #1606 (scsmithr)
- fix: just make pyarrow 12 the max #1603 (wjones127)
- fix: support partial statistics in JSON #1599 (CurtHagenlocher)
- feat: allow configurable number of
commit
attempts #1596 (cmackenzie1) - fix: querying on date partitions (fixes #1445) #1594 (watfordkcf)
- refactor: clean up arrow schema defs #1590 (polynomialherder)
- feat: add metadata for operations::write::WriteBuilder #1584 (abhimanyusinghgaur)
- feat: add metadata for deletion vectors #1583 (aersam)
- fix: remove alpha classifier #1578 (marcelotrevisani)
- refactor: use pa.table.cast in delta_arrow_schema_from_pandas #1573 (ion-elgreco)
rust-v0.14.0 (2023-08-01)
Implemented enhancements:
Fixed bugs:
- Excessive integration test sizes causing builds to fail #1550
- Slack invite link is not working #1530
Merged pull requests:
- fix: correct whitespace in delta protocol reader minimum version error message #1576 (polynomialherder)
- chore: move deps to
[workspace.dependencies]
#1575 (cmackenzie1) - chore: update
datafusion
to28
and arrow to43
#1571 (cmackenzie1) - ci: don't run benchmark in debug mode #1566 (wjones127)
- ci: install newer rust for macos python release #1565 (wjones127)
- feat: make find_files public #1560 (yjshen)
- feat!: bulk delete for vacuum #1556 (Blajda)
- chore: address some integration test bloat of disk usage for development #1552 (rtyler)
- docs: port docs to mkdocs #1548 (MrPowers)
- chore: disable incremental builds in CI for saving space #1545 (rtyler)
- fix: revert premature merge of an attempted fix for binary column statistics #1544 (rtyler)
- chore: increment python version #1542 (wjones127)
- feat: add restore command in python binding #1529 (loleek)
rust-v0.13.1 (2023-07-18)
Fixed bugs:
- Revert premature merge of an attempted fix for binary column statistics #1544
rust-v0.13.0 (2023-07-15)
Implemented enhancements:
- Add nested struct supports #1518
- Support FixedLenByteArray UUID statistics as a logical scalar #1483
- Exposing create_add in the API #1458
- Update features table on README #1404
- docs(python): show data catalog options in Python API reference #1347
- Add optimization to only list log files starting at a certain name #1252
- Support configuring parquet compression #1235
- parallel processing in Optimize command #1171
Fixed bugs:
- get_add_actions() MAX is not showing complete value #1534
- Can't get stats's minValues in add actions #1515
- Pyarrow is_null filter not working as expected after loading using deltalake #1496
- Can't write to table that uses generated columns #1495
- Json error: Binary is not supported by JSON when writing checkpoint files #1493
- _last_checkpoint size field is incorrect #1468
- Error when Z Ordering a larger dataset #1459
- Timestamp parsing issue #1455
- File options are ignored when writing delta #1444
- Slack Invite Link No Longer Valid #1425
cleanup_metadata
doesn't remove.checkpoint.parquet
files #1420- The test of reading the data from the blob storage located in Azurite container failed #1415
- The test of reading the data from the bucket located in Minio container failed #1408
- Datafusion: unreachable code reached when parsing statistics with missing columns #1374
- vacuum is very slow on Cloudflare R2 #1366
Closed issues:
- Expose Compression Options or WriterProperties for writing to Delta #1469
- Support out-of-core Z-order using DataFusion #1460
- Expose Z-order in Python #1442
Merged pull requests:
- chore: fix the latest clippy warnings with the newer rustc's #1536 (rtyler)
- docs: show data catalog options in Python API reference #1532 (omkar-foss)
- fix: handle nulls in file-level stats #1520 (wjones127)
- feat: add nested struct supports #1519 (haruband)
- fix: tiny typo in AggregatedStats #1516 (haruband)
- refactor: unify with_predicate for delete ops #1512 (Blajda)
- chore: remove deprecated table functions #1511 (roeap)
- chore: update datafusion and related crates #1504 (roeap)
- feat: implement restore operation #1502 (loleek)
- chore: fix mypy failure #1500 (wjones127)
- fix: avoid writing statistics for binary columns to fix JSON error #1498 (ChewingGlass)
- feat(rust): expose WriterProperties method on RecordBatchWriter and DeltaWriter #1497 (theelderbeever)
- feat: add UUID statistics handling #1484 (atefsaw)
- feat: expose create_add to the public #1482 (atefsaw)
- fix: add
sizeInBytes
to _last_checkpoint and changesize
to # of actions #1477 (cmackenzie1) - fix(python): match Field signatures #1463 (guilhem-dvr)
- feat: handle larger z-order jobs with streaming output and spilling #1461 (wjones127)
- chore: increment python version #1449 (wjones127)
- chore: upgrade to arrow 40 and datafusion 26 #1448 (rtyler)
- feat(python): expose z-order in Python #1443 (wjones127)
- ci: prune CI/CD pipelines #1433 (roeap)
- refactor: remove
LoadCheckpointError
andApplyLogError
#1432 (roeap) - feat: update writers to include compression method in file name #1431 (Blajda)
- refactor: move checkpoint and errors into separate module #1430 (roeap)
- feat: add z-order optimize #1429 (wjones127)
- fix: casting when data to be written does not match table schema #1427 (Blajda)
- docs: update README.adoc to fix expired Slack link #1426 (dennyglee)
- chore: remove no-longer-necessary build.rs for Rust bindings #1424 (rtyler)
- chore: remove the delta-checkpoint lambda which I have moved to a new repo #1423 (rtyler)
- refactor: rewrite redundant_async_block #1422 (cmackenzie1)
- fix: update cleanup regex to include
checkpoint.parquet
files #1421 (cmackenzie1) - docs: update features table in README #1414 (ognis1205)
- fix:
get_prune_stats
returns homogenousArrayRef
#1413 (cmackenzie1) - feat: explicit python exceptions #1409 (roeap)
- feat: implement update operation #1390 (Blajda)
- feat: allow concurrent file compaction #1383 (wjones127)
rust-v0.12.0 (2023-05-30)
Implemented enhancements:
- Release delta-rs
0.11.0
(next release after0.10.0
) #1362 - Support writing statistics for date columns in Rust #1209
Fixed bugs:
- Rust writer in operations makes a lot of data copies #1394
- Unable to read timestamp fields from column statistics #1372
- Unable to write custom metadata via configuration since version 0.9.0 #1353
- .get_add_actions() returns wrong column statistics when dataSkippingNumIndexedCols property of the table was changed #1223
- Ensure decimal statistics are written correctly in Rust #1208
Merged pull requests:
- feat: add list_with_offset to DeltaObjectStore #1410 (ognis1205)
- chore: type-check friendlier exports #1407 (roeap)
- chore: remove ancillary crates from the git tree #1406 (rtyler)
- chore: bump the version for the next release #1405 (rtyler)
- feat: more efficient parquet writer and more statistics #1397 (wjones127)
- perf: improve record batch partitioning #1396 (roeap)
- chore: bump datafusion to 25 #1389 (roeap)
- refactor!: remove
DeltaDataType
aliases #1388 (cmackenzie1) - feat: vacuum with concurrent requests #1382 (wjones127)
- feat: add datafusion storage catalog #1381 (roeap)
- docs: updated schema.rs to use the right signature for decimal data type in documentation #1377 (rahulj51)
- fix: delete operation when partition and non partition columns are used #1375 (Blajda)
- fix: add conversion for string for
Field::TimestampMicros
(#1372) #1373 (cmackenzie1) - fix: allow user defined config keys #1365 (roeap)
- ci: disable full debug symbol generation #1364 (roeap)
- fix: include stats for all columns (#1223) #1342 (mrjoe7)
rust-v0.11.0 (2023-05-12)
Implemented enhancements:
- Implement simple delete case #832
Merged pull requests:
- chore: update Rust package version #1346 (rtyler)
- fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
- feat: delete operation #1176 (Blajda)
- feat: add
wasbs
to known schemes #1345 (iajoiner) - test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
- feat: write command improvements #1267 (roeap)
- feat: added support for Databricks Unity Catalog #1331 (nohajc)
- fix: double url encode of partition key #1324 (mrjoe7)
rust-v0.10.0 (2023-05-02)
Implemented enhancements:
- Support Optimize on non-append-only tables #1125
Fixed bugs:
- DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
- Datafusion: SQL projection returns wrong column for partitioned data #1292
- Unable to query partitioned tables #1291
Merged pull requests:
- chore: add deprecation notices for commit logic on
DeltaTable
#1323 (roeap) - fix: handle local paths on windows #1322 (roeap)
- fix: scan partitioned tables with datafusion #1303 (roeap)
- fix: allow special characters in storage prefix #1311 (wjones127)
- feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
- Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
- Enable the json feature for the parquet crate #1300 (rtyler)
rust-v0.9.0 (2023-04-14)
Implemented enhancements:
- hdfs support #300
- Add decimal primitive type to document #1280
- Improve error message when filtering on non-existant partition columns #1218
Fixed bugs:
- Datafusion table provider: issues with timestamp types #441
- Not matching column names when creating a RecordBatch from MapArray #1257
- All stores created using
DeltaObjectStore::new
have an identicalobject_store_url
#1188
Merged pull requests:
- Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
- chore: df / arrow changes after update #1288 (roeap)
- feat: read schema from parquet files in datafusion scans #1266 (roeap)
- HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
- Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
- Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
- Simplify the Store Backend Configuration code #1265 (mrjoe7)
- feat: optimistic transaction protocol #632 (roeap)
- Write support for additional Arrow datatypes #1044(chitralverma)
- Unique delta object store url #1212 (gruuya)
- improve err msg on use of non-partitioned column #1221 (marijncv)
rust-v0.8.0 (2023-03-10)
Implemented enhancements:
- feat(rust): support additional types for partition values #1170
Fixed bugs:
- File pruning does not occur on partition columns #1175
- Bug: Error loading Delta table locally #1157
- Deltalake 0.7.0 with s3 feature compliation error due to rusoto_dynamodb version conflict #1191
- Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186
Merged pull requests:
- build(deps): bump datafusion #1217 (roeap)
- Implement pruning on partition columns #1179 (Blajda)
- feat: enable passing storage options to Delta table builder via Datafusion's CREATE EXTERNAL TABLE #1043 (gruuya)
- feat: typed commit info #1207 (roeap)
- add boolean, date, timestamp & binary partition types #1180 (marijncv)
- feat: extend configuration handling #1206 (marijncv)
- fix: load command for local tables #1205 (roeap)
- Enable passing Datafusion session state to WriteBuilder #1187 (gruuya)
- chore: increment dynamodb_lock version #1202 (wjones127)
- fix: update out-of-date doc about datafusion #1183 (xudong963)
- feat: move and update Optimize operation #1154 (roeap)
- add test for extract_partition_values #1159 (marijncv)
- fix typo #1166 (spebern)
- chore: remove star dependencies #1139 (wjones127)
rust-v0.7.0 (2023-02-11)
Implemented enhancements:
- Support FSCK REPAIR TABLE Operation #1092
- Expose the Delta Log in a DataFrame that's easy for analysis #1031
- Provide case-insensitive storage options in backend #999
- Support local file path in CreateBuilder::with_location() #998
- Save operational params in the same way with delta io #1054 (ismoshkov)
Fixed bugs:
- DeltaTable DataFusion TableProvider does not support filter pushdown #1064
- DeltaTable DataFusion scan does not prune files properly #1063
- deltalake.DeltaTable constructor hangs in Jupyter #1093
- Transaction log JSON formatting issue when writing data via Python bindings #1017
- crates.io entry is missing link to rustdoc documentation #1076
- URL Registered with ObjectStore registry is different from url in DeltaScan #1018
- Not able to connect to Azure Storage with client id/secret #977
- Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
- Overwrite mode does not work with Azure #939
- Use Chrono without default features #914
cargo test
does not run due to tls conflict #985- Azure SAS authorization fails with
<AuthenticationErrorDetail>Signature fields not well formed.
#910
Merged pull requests:
- Make rustls default across all packages #1097 (wjones127)
- Implement filesystem check #1103 (Blajda)
- refactor: move vacuum command to operations module #1045 (roeap)
- feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
- feat: improve storage location handling #1065 (roeap)
- Fix to support UTC timezone #1022 (andrei-ionescu)
- feat: harmonize and simplify storage configuration #1052 (roeap)
- feat: expose function to get table of add actions #1033 (wjones127)
- fix: change unexpected field logging level to debug #1112 (houqp)
- fix: datafusion predicate pushdown and dependencies #1071 (roeap)
- fix: azure sas key url encoding #1036 (roeap)
- Add provisional workaround to support CDC #1039 #1042 (Fazzani)
- improve debuggability of json ser/de errors #1119 (houqp)
- Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
- minor: optimize partition lookup for vacuum loop #1120 (houqp)
- Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
- add test for null_count_schema_for_fields #1135 (marijncv)
- add test for min_max_schema_for_fields #1122 (marijncv)
- add test for get_boolean_from_metadata #1121 (marijncv)
- add test for left_larger_than_right #1110 (marijncv)
- Add test for: to_scalar_value #1086 (marijncv)
- Fix typo in delta-inspect #1072 (byteink)
- chore: update datafusion #1114 (roeap)
rust-v0.6.0 (2022-12-16)
Implemented enhancements:
- Support Apache Arrow DataFusion 15 #1020
- Python package: Loosen version requirements for maturin #1004
- Remove
Cargo.lock
from library crates and addCargo.lock
to binary ones #1000 - More frequent Rust releases #969
- Thoughts on adding read_delta to pandas #869
- Add the support of the AWS_PROFILE environment variable for S3 #986 (fvaleye)
Fixed bugs:
- Azure SAS signatures ending in "=" don't work #1003
- Fail to compile deltalake crate, need to update dynamodb_lock in crates.io #1002
- error reading delta table to pandas: runtime dropped the dispatch task #975
- MacOS arm64 wheels are generated incorrectly #972
- Overwrite creates new file #960
- The written delta file has corrupted structure #956
- Write mode doesn't work with Azure storage #955
- Python: We don't error on reader protocol v2 #886
- Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855
Merged pull requests:
- Support DataFusion 15 #1021 (andrei-ionescu)
- fix truncating signature on SAS #1007 (damiondoesthings)
- Loosen version requirement for maturin #1005 (gyscos)
- Update
.gitignore
and add/removeCargo.lock
when appropriate #1001 (iajoiner) - fix: get azure client secret from config #981 (roeap)
- feat: check invariants in write command #980 (roeap)
- Add a new release github action for Python binding: macos with universal2 wheel #976 (fvaleye)
- Bump version of the Python binding to 0.6.4 #970 (fvaleye)
- Handle pandas timestamps #958 (hayesgb)
- test(python): add azure integration tests #912 (wjones127)
* This Changelog was automatically generated by github_changelog_generator