-
Notifications
You must be signed in to change notification settings - Fork 108
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add announcement for nanoarrow 0.2.0 release (#364)
Rendered version at: https://github.com/paleolimbot/arrow-site/blob/nanoarrow-0-2-0-release/_posts/2023-06-13-nanoarrow-0.120-release.md
- Loading branch information
1 parent
9ba0465
commit b8b2300
Showing
1 changed file
with
161 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
--- | ||
layout: post | ||
title: "Apache Arrow nanoarrow 0.2 Release" | ||
date: "2023-06-22 00:00:00" | ||
author: pmc | ||
categories: [release] | ||
--- | ||
<!-- | ||
{% comment %} | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to you under the Apache License, Version 2.0 | ||
(the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
{% endcomment %} | ||
--> | ||
|
||
The Apache Arrow team is pleased to announce the 0.2.0 release of | ||
Apache Arrow nanoarrow. This initial release covers 19 resolved issues from | ||
6 contributors. | ||
|
||
## Release Highlights | ||
|
||
- Addition of the Arrow [IPC stream reader extension](#ipc-stream-support) | ||
- Addition of the [Getting Started with nanoarrow](#getting-started-with-nanoarrow) | ||
tutorial | ||
- Improvements in reliability and platform test coverage of the [C library](#c-library) | ||
- Improvements in reliability and type support in the [R bindings](#r-bindings) | ||
|
||
See the | ||
[Changelog](https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0/CHANGELOG.md) | ||
for a detailed list of contributions to this release. | ||
|
||
## IPC stream support | ||
|
||
This release includes support for reading schemas and record batches serialized | ||
using the | ||
[Arrow IPC format](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc). Based on the | ||
[flatcc](https://github.com/dvidelabs/flatcc) | ||
flatbuffers implementation, the nanoarrow IPC read support is implemented as | ||
an optional extension to the core nanoarrow library. The easiest way to get | ||
started is with the `ArrowArrayStream` provider using one of the built-in | ||
`ArrowIpcInputStream` implementations: | ||
|
||
```c | ||
#include <stdio.h> | ||
#include <stdbool.h> | ||
|
||
#include "nanoarrow_ipc.h" | ||
|
||
int main(int argc, char* argv[]) { | ||
FILE* file_ptr = freopen(NULL, "rb", stdin); | ||
|
||
struct ArrowIpcInputStream input; | ||
NANOARROW_RETURN_NOT_OK(ArrowIpcInputStreamInitFile(&input, file_ptr, false)); | ||
|
||
struct ArrowArrayStream stream; | ||
NANOARROW_RETURN_NOT_OK(ArrowIpcArrayStreamReaderInit(&stream, &input, NULL)); | ||
|
||
struct ArrowSchema schema; | ||
NANOARROW_RETURN_NOT_OK(stream.get_schema(&stream, &schema)); | ||
|
||
struct ArrowArray array; | ||
while (true) { | ||
NANOARROW_RETURN_NOT_OK(stream.get_next(&stream, &array)); | ||
if (array.release == NULL) { | ||
break; | ||
} | ||
} | ||
|
||
return 0; | ||
} | ||
``` | ||
Facilities for advanced usage are also provided via the low-level `ArrowIpcDecoder`, | ||
which takes care of the details of deserializing the flatbuffer headers and | ||
assembling buffers into an `ArrowArray`. The current implementation can read | ||
schema and record batch messages that contain any Arrow type that supported by | ||
the C data interface. The initial version can read both big and little endian | ||
streams originating from platforms of either endian. Dictionary encoding and | ||
compression are not currently supported. | ||
## Getting started with nanoarrow | ||
Early users of the nanoarrow C library respectfully noted that it was difficult | ||
to know where to begin. This release includes improvements in reference documentation | ||
but also includes a long-form tutorial for those just getting started with or | ||
considering adopting the library. You can find the tutorial at the | ||
[nanoarrow documentation site](https://apache.github.io/arrow-nanoarrow/dev/getting-started.html). | ||
## C library | ||
The nanoarrow 0.2.0 release also includes a number of bugfixes and improvements | ||
to the core C library, many of which were identified as a result of usage | ||
connected with development of the IPC extension. | ||
- Helpers for extracting/appending `Decimal128` and `Decimal256` elements | ||
from/to arrays were added. | ||
- The C library can now perform "full" validation to validate untrusted input | ||
(e.g., serialized IPC from the wire). | ||
- The C library can now perform "minimal" validation that performs all checks | ||
that do not access any buffer data. This feature was added to facilitate | ||
future support for the `ArrowDeviceArray` that was recently added as the | ||
Arrow C device interface. | ||
- Release verification on Ubuntu (x86_64, arm64), Fedora (x86_64), | ||
Archlinux (x86_64), Centos 7 (x86_64, arm64), Alpine (x86_64, arm64, s390x), | ||
Windows (x86_64), and MacOS (x86_64) were added to the continuous | ||
integration system. Linux verification is implemented using `docker compose` | ||
to facilitate local checks when developing features that may affect | ||
a specific platform. | ||
## R bindings | ||
The nanoarrow R bindings are distributed as the `nanoarrow` package on | ||
[CRAN](https://cran.r-project.org/). The 0.2.0 release of the R bindings includes | ||
improvements in type support, improvements in stability, and features required | ||
for the forthcoming release of Arrow Database Connectivity (ADBC) R bindings. | ||
Notably: | ||
- Support for conversion of union arrays to R objects was added to facilitate | ||
support for an ADBC function that returns such an array. | ||
- Support for adding an R-level finalizer to an `ArrowArrayStream` was added | ||
to facilitate safely wrapping a stream resulting from an ADBC call at the | ||
R level. | ||
## Python bindings? | ||
The nanoarrow 0.2.0 release does not include Python bindings, but improvements to the | ||
[unreleased draft bindings](https://github.com/apache/arrow-nanoarrow/tree/main/python) | ||
were added to facilitate discussion among Python developers regarding the useful | ||
scope of a potential future nanoarrow Python package. If one of those developers is | ||
you, feel free to | ||
[open an issue](https://github.com/apache/arrow-nanoarrow/issues/new/choose) | ||
or send a post to the | ||
[developer mailing list](https://lists.apache.org/[email protected]) | ||
to engage in the discussion. | ||
## Contributors | ||
This initial release consists of contributions from 6 contributors in addition | ||
to the invaluable advice and support of the Apache Arrow developer mailing list. | ||
Special thanks to David Li for reviewing nearly every PR in this release! | ||
``` | ||
$ git shortlog -sn d91c35b33c7b6ff94f5f929384879352e241ed71..apache-arrow-nanoarrow-0.2.0 | grep -v "GitHub Actions" | ||
61 Dewey Dunnington | ||
2 Dirk Eddelbuettel | ||
2 Joris Van den Bossche | ||
2 Kirill Müller | ||
2 William Ayd | ||
1 David Li | ||
``` |