Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website: Add blog post for 17.0.0 #537

Merged
merged 16 commits into from
Jul 19, 2024
173 changes: 173 additions & 0 deletions _posts/2024-07-16-17.0.0-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
layout: post
title: "Apache Arrow 17.0.0 Release"
date: "2024-07-16 00:00:00"
author: pmc
categories: [release]
---
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->


The Apache Arrow team is pleased to announce the 17.0.0 release. This covers
over 3 months of development work and includes [**331 resolved issues**][1]
on [**529 distinct commits**][2] from [**92 distinct contributors**][2].
See the [Install Page](https://arrow.apache.org/install/)
to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bugfixes and improvements have been made: we refer
you to the [complete changelog][3].

## Community

Since the 16.0.0 release, Dane Pitkin has been invited to be committer.
No new members have joined the Project Management Committee (PMC).

Thanks for your contributions and participation in the project!

## C Data Interface notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

raulcd marked this conversation as resolved.
Show resolved Hide resolved
- `ArrowDeviceArrayStream` can now be imported and exported (GH-40078)
## Arrow Flight RPC notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

- Flight SQL was formally stabilized (GH-39204).
- Flight SQL added a bulk ingestion command (GH-38255).
- The JDBC Flight SQL driver now accepts "catalog" as a connection parameter (GH-41947).
- "Stateless" prepared statements are now supported (GH-37220, GH-41262).
- Java added `FlightStatusCode.RESOURCE_EXHAUSTED` (GH-35888).
- C++ has some basic support for logging with OpenTelemetry (GH-39898).
Comment on lines +53 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Arrow Flight RPC notes
- Flight SQL was formally stabilized (GH-39204).
- Flight SQL added a bulk ingestion command (GH-38255).
- The JDBC Flight SQL driver now accepts "catalog" as a connection parameter (GH-41947).
- "Stateless" prepared statements are now supported (GH-37220, GH-41262).
- Java added `FlightStatusCode.RESOURCE_EXHAUSTED` (GH-35888).
- C++ has some basic support for logging with OpenTelemetry (GH-39898).
## Arrow Flight RPC notes
**The experimental UCX backend will be removed in 18.0.0.**
- Flight SQL was formally stabilized (GH-39204).
- Flight SQL added a bulk ingestion command (GH-38255).
- The JDBC Flight SQL driver now accepts "catalog" as a connection parameter (GH-41947).
- "Stateless" prepared statements are now supported (GH-37220, GH-41262).
- Java added `FlightStatusCode.RESOURCE_EXHAUSTED` (GH-35888).
- C++ has some basic support for logging with OpenTelemetry (GH-39898).


## C++ notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

pitrou marked this conversation as resolved.
Show resolved Hide resolved
- Half-float values can now be parsed and formatted correctly (GH-41089).
- Record batches can now be converted to row-major tensors, not only column-major (GH-40866).
- The CSV writer is now able to write large string arrays that are larger than
2 GiB (GH-40270).
- A possible invalid memory access in `BooleanArray.true_count()` has been fixed (GH-41016).
- A new method `FlattenRecursively` allows recursive nesting of list and
fixed-size list arrays (GH-41055).
- The scratch space in some `Scalar` subclasses is now immutable. This is required
for proper concurrent access to `Scalar` instances (GH-40069).
- Calling the `bit_width` or `byte_width` method of an extension type now defers
to the underlying storage type (GH-41353).
- Fixed a bug where `MapArray::FromArrays` would behave incorrectly if the given
offsets array has a non-zero offset (GH-40750).
- `MapArray::FromArrays` now accepts an optional null bitmap argument
(GH-41684).
- The `ARROW_NO_DEPRECATED_API` macro was unused and has been removed (GH-41343).

### Acero

- The left anti join filter no longer crashes when the filter rows are empty (GH-41121).
- A race condition was fixed in the asof join (GH-41149).
- A potential stack overflow has been fixed (GH-41334, GH-41738).
- A potential crash on very large data has been fixed (GH-41813).

kou marked this conversation as resolved.
Show resolved Hide resolved
### Compute

- List views and maps are now supported by the `if_else`, `case_when` and
`coalesce` functions (GH-41418).
- List views are now supported by the functions `list_slice` (GH-42065),
`list_parent_indices` (GH-42235), `take` and `filter` (GH-42116).
- `list_flatten` can now be recursive based on new optional argument
(GH-41183, GH-41055)
- The `take` and `filter` functions have been made significantly faster on fixed-width
types, including fixed-size lists of fixed-width types (GH-39798).

### Dataset

- Repeated scanning of an encrypted Parquet dataset now works correctly (GH-41431).

### Filesystems

- Standard filesystem implementations are now tracked in a global registry which
also allows loading third-party filesystem implementations, for example from
runtime-loaded DLLs (GH-40342,
- Directory metadata operations on Azure filesystems are now more aligned with
the common expectations for filesystems (GH-41034).
- `CopyFile` is now supported for Azure filesystems with hierarchical namespace
enabled (GH-41095).
- Azure credentials can now be loaded explicitly from the environment (GH-39345),
or using the Azure CLI (GH-39344).
- A potential deadlock was fixed when closing an S3 output stream (GH-41862).

### GPU

- Non-CPU data can now be pretty-printed (GH-41664).
- Non-CPU data with offsets, such as list and binary data, can now be properly
sent over IPC (GH-42198).

### IPC

- Flatbuffers serialization is now more deterministic (GH-40361).

### Parquet

- A crash was fixed when reading an invalid Parquet file where columns claim to
be of different lengths (GH-41317).
- Definition and repetition levels are now more strictly checked, avoiding later
crashes when reading an invalid Parquet file (GH-41321).
- A crash was fixed when reading an invalid encrypted Parquet file (GH-43070).
- Fixed a bug where the BYTE_STREAM_SPLIT decoder could behave incorrectly
when nulls are present in a column (GH-41562).
kou marked this conversation as resolved.
Show resolved Hide resolved
- Fixed a bug where `DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize` could
return an invalid estimate in some situations (GH-41545).
- Delimiting records is now faster for columns with nested repeating (GH-41361).

### Substrait

- Support for more Arrow data types was added: some temporal types, half floats,
large string and large binary (GH-40695).

## C# notes
kou marked this conversation as resolved.
Show resolved Hide resolved

## Go Notes
kou marked this conversation as resolved.
Show resolved Hide resolved

## Java notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

**Some changes are coming up in the next version, Arrow 18. Java 8 support will be removed. The version of the flight-core artifact with shaded gRPC will no longer be distributed.**

- Basic support for ListView (GH-41287) and StringView (GH-40339) has been added. These types should still be considered experimental.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @vibhatha anything else to add?

## JavaScript notes
raulcd marked this conversation as resolved.
Show resolved Hide resolved

## Python notes
kou marked this conversation as resolved.
Show resolved Hide resolved

## R notes
kou marked this conversation as resolved.
Show resolved Hide resolved

For more on what’s in the 17.0.0 R package, see the [R changelog][4].

## Ruby and C GLib notes
kou marked this conversation as resolved.
Show resolved Hide resolved

### Ruby

### C GLib

## Rust notes

The Rust projects have moved to separate repositories outside the
main Arrow monorepo. For notes on the latest release of the Rust
implementation, see the latest [Arrow Rust changelog][5].

[1]: https://github.com/apache/arrow/milestone/62?closed=1
[2]: {{ site.baseurl }}/release/17.0.0.html#contributors
[3]: {{ site.baseurl }}/release/17.0.0.html#changelog
[4]: {{ site.baseurl }}/docs/r/news/
[5]: https://github.com/apache/arrow-rs/tags