Skip to content

Commit

Permalink
Update _posts/2024-04-20-16.0.0-release.md
Browse files Browse the repository at this point in the history
Co-authored-by: Alenka Frim <[email protected]>
  • Loading branch information
raulcd and AlenkaF authored Apr 26, 2024
1 parent fa9e452 commit 8177fc6
Showing 1 changed file with 41 additions and 0 deletions.
41 changes: 41 additions & 0 deletions _posts/2024-04-20-16.0.0-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,47 @@ Thanks for your contributions and participation in the project!

## Python notes

Compatibility notes:
* To ensure PyArrow compatibility with NumPy 2.0 umbrella issue has been closed [GH-39532](https://github.com/apache/arrow/issues/39532) with last issues included in 16.0.0 Arrow release ([GH-41098](https://github.com/apache/arrow/issues/41098), [GH-39848](https://github.com/apache/arrow/issues/39848) and [GH-40376](https://github.com/apache/arrow/issues/40376)).
* We no longer use internals to create Block objects and started using new pandas API with pandas version 3 [GH-35081](https://github.com/apache/arrow/issues/35081)
* Pandas compatibility code has been simplified as old pandas and Python versions are not supported anymore [GH-40720](https://github.com/apache/arrow/issues/40720)
* Deprecated `pyarrow.filesystem` legacy implementations have been removed [GH-20127](https://github.com/apache/arrow/issues/20127)

New features:
* Converting Arrow `Table` and `RecordBatch` to a `Tensor` (not the same as [tensor extension array](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#official-list)) is being developed in Arrow C++ with bindings in Python. Umbrella issue: ([GH-40058](https://github.com/apache/arrow/issues/40058)). In current release the option to convert a `RecordBatch` to `Tensor` with `pyarrow.RecordBatch.to_tensor(...)` is added returning a row or column major tensor with an option of writing missing values as `NaN` in the result.
* `ListView` and `LargeListView` array formats are now supported by PyArrow ([GH-39812](https://github.com/apache/arrow/issues/39812), [GH-39855](https://github.com/apache/arrow/issues/39855), [GH-40205](https://github.com/apache/arrow/issues/40205), [GH-41039](https://github.com/apache/arrow/issues/41039), [GH-40266](https://github.com/apache/arrow/issues/40266))
* `Binary` and `StringView` are now supported in PyArrow ([GH-39651](https://github.com/apache/arrow/issues/39651), [GH-39852](https://github.com/apache/arrow/issues/39852), [GH-40092](https://github.com/apache/arrow/issues/40092))
* Final support for Run-End Encoded arrays in PyArrow has been included (conversion to numpy and pandas [GH-40659](https://github.com/apache/arrow/issues/40659), construction in `pa.array(...)` [GH-40273](https://github.com/apache/arrow/issues/40273))
* `AsofJoinNode` C++ functionality is now exposed in Python as a `join_asof` [GH-34235](https://github.com/apache/arrow/issues/34235)
* Minimal python bindings are added for AzureFilesystem [GH-39968](https://github.com/apache/arrow/issues/39968)
* `FixedSizeTensorScalar` class is added [GH-37484](https://github.com/apache/arrow/issues/37484)

Other improvements:
* Add ChunkedArray import/export to/from C [GH-39984](https://github.com/apache/arrow/issues/39984)
* `pyarrow.Field` and `pyarrow.ChunkedArray` can now be constructed from objects supporting the PyCapsule Arrow C Data Interface [GH-38010](https://github.com/apache/arrow/issues/38010)
* Requested_schema is supported in `__arrow_c_stream__` implementations [GH-40066](https://github.com/apache/arrow/issues/40066)
* Add low-level bindings for exporting/importing the C Device Interface
[GH-39979](https://github.com/apache/arrow/issues/39979)
* Function to download and extract timezone database on a Windows machine is added [GH-37328](https://github.com/apache/arrow/issues/37328)
* Missing methods are added to `pyarrow.RecordBatch` [GH-30915](https://github.com/apache/arrow/issues/30915)
* Dictionary is now also accepted in `pyarrow.record_batch` factory function (as in `pyarrow.table`) [GH-40291](https://github.com/apache/arrow/issues/40291)
* Usage of scalar legacy cast has been removed [GH-40023](https://github.com/apache/arrow/issues/40023)
* Missing byte_width attribute are added to all DataType classes [GH-39277](https://github.com/apache/arrow/issues/39277)
* `FileInfo` instances can now be used to construct Dataset objects [GH-40142](https://github.com/apache/arrow/issues/40142)
* Support hashing for `FileMetaData` and `ParquetSchema` [GH-39780](https://github.com/apache/arrow/issues/39780)
* `force_virtual_addressing` is exposed in PyArrow [GH-39779](https://github.com/apache/arrow/issues/39779)

Relevant bug fixes:
* Calling `pyarrow.dataset.ParquetFileFormat.make_write_options` as a class method now returns a warning [GH-39440](https://github.com/apache/arrow/issues/39440)
* `ScalarMemoTable`is now initiated only when deduplication is enabled which fixes large memory consumption in the other case [GH-40316](https://github.com/apache/arrow/issues/40316)
* Slicing an array backwards beyond the start doesn't include first item ([GH-38768](https://github.com/apache/arrow/issues/38768) and [GH-40642](https://github.com/apache/arrow/issues/40642))
* Memory leaks when creating Arrow array from Python list of dicts is fixed [GH-37989](https://github.com/apache/arrow/issues/37989)
* `FixedSizeListType` has not been considered as a nested type and is now added to `_NESTED_TYPES` [GH-40171](https://github.com/apache/arrow/issues/40171)
* `max_chunksize` is now validated in `Table.to_batches` [GH-39788](https://github.com/apache/arrow/issues/39788)
* Raising `ValueError` on `_ensure_partitioning`in Dataset is fixed [GH-39579](https://github.com/apache/arrow/issues/39579)

* Python stacktrace is now attached to errors in `ConvertPyError` [GH-37164](https://github.com/apache/arrow/issues/37164)

## R notes

### New features:
Expand Down

0 comments on commit 8177fc6

Please sign in to comment.