Skip to content

Commit

Permalink
C++ additions
Browse files Browse the repository at this point in the history
  • Loading branch information
pitrou committed Jul 17, 2024
1 parent 10c89aa commit dc6b2ba
Showing 1 changed file with 83 additions and 0 deletions.
83 changes: 83 additions & 0 deletions _posts/2024-07-16-17.0.0-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,89 @@ Thanks for your contributions and participation in the project!

## C++ notes

- Half-float values can now be parsed and formatted correctly (GH-41089).
- Record batches can now be converted to row-major tensors, not only column-major (GH-40866).
- The CSV writer is now able to write large string arrays that are larger than
2 GiB (GH-40270).
- A possible invalid memory access in `BooleanArray.true_count()` has been fixed (GH-41016).
- A new method `FlattenRecursively` allows recursive nesting of list and
fixed-size list arrays (GH-41055).
- The scratch space in some `Scalar` subclasses is now immutable. This is required
for proper concurrent access to `Scalar` instances (GH-40069).
- Calling the `bit_width` or `byte_width` method of an extension type now defers
to the underlying storage type (GH-41353).
- Fixed a bug where `MapArray::FromArrays` would behave incorrectly if the given
offsets array has a non-zero offset (GH-40750).
- `MapArray::FromArrays` now accepts an optional null bitmap argument
(GH-41684).
- The `ARROW_NO_DEPRECATED_API` macro was unused and has been removed (GH-41343).
- Building with libc++ and C++20 enabled has been fixed (GH-43095).
- mimalloc is now preferred over jemalloc as the default memory pool (GH-43254).

### Acero

- The left anti join filter no longer crashes when the filter rows are empty (GH-41121).
- A race condition was fixed in the asof join (GH-41149).
- A potential stack overflow has been fixed (GH-41334, GH-41738).
- Potential crashes on very large data have been fixed (GH-41813, GH-43046).
- A potential data corruption on very large data has been fixed (GH-43202).

### Compute

- List views and maps are now supported by the `if_else`, `case_when` and
`coalesce` functions (GH-41418).
- List views are now supported by the functions `list_slice` (GH-42065),
`list_parent_indices` (GH-42235), `take` and `filter` (GH-42116).
- `list_flatten` can now be recursive based on new optional argument
(GH-41183, GH-41055)
- The `take` and `filter` functions have been made significantly faster on fixed-width
types, including fixed-size lists of fixed-width types (GH-39798).

### Dataset

- Repeated scanning of an encrypted Parquet dataset now works correctly (GH-41431).

### Filesystems

- Standard filesystem implementations are now tracked in a global registry which
also allows loading third-party filesystem implementations, for example from
runtime-loaded DLLs (GH-40342,
- Directory metadata operations on Azure filesystems are now more aligned with
the common expectations for filesystems (GH-41034).
- `CopyFile` is now supported for Azure filesystems with hierarchical namespace
enabled (GH-41095).
- Azure credentials can now be loaded explicitly from the environment (GH-39345),
or using the Azure CLI (GH-39344).
- A potential deadlock was fixed when closing an S3 output stream (GH-41862).

### GPU

- Non-CPU data can now be pretty-printed (GH-41664).
- Non-CPU data with offsets, such as list and binary data, can now be properly
sent over IPC (GH-42198).

### IPC

- Flatbuffers serialization is now more deterministic (GH-40361).

### Parquet

- A crash was fixed when reading an invalid Parquet file where columns claim to
be of different lengths (GH-41317).
- Definition and repetition levels are now more strictly checked, avoiding later
crashes when reading an invalid Parquet file (GH-41321).
- A crash was fixed when reading an invalid encrypted Parquet file (GH-43070).
- Fixed a bug where the BYTE_STREAM_SPLIT decoder could behave incorrectly
when nulls are present in a column (GH-41562).
- Fixed a bug where `DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize` could
return an invalid estimate in some situations (GH-41545).
- Delimiting records is now faster for columns with nested repeating (GH-41361).

### Substrait

- Support for more Arrow data types was added: some temporal types, half floats,
large string and large binary (GH-40695).

## C# notes

## Go Notes
Expand Down

0 comments on commit dc6b2ba

Please sign in to comment.