Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add announcement for nanoarrow 0.2.0 release #364

Merged
merged 2 commits into from
Jun 23, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions _posts/2023-06-13-nanoarrow-0.120-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
layout: post
title: "Apache Arrow nanoarrow 0.2 Release"
date: "2023-06-22 00:00:00"
author: pmc
categories: [release]
---
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->

The Apache Arrow team is pleased to announce the 0.2.0 release of
Apache Arrow nanoarrow. This initial release covers 19 resolved issues from
6 contributors.

## Release Highlights

- Addition of the Arrow [IPC stream reader extension](#ipc-stream-support)
- Addition of the [Getting Started with nanoarrow](#getting-started-with-nanoarrow)
tutorial
- Improvements in reliability and platform test coverage of the [C library](#c-library)
- Improvements in reliability and type support in the [R bindings](#r-bindings)

See the
[Changelog](https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0/CHANGELOG.md)
for a detailed list of contributions to this release.

## IPC stream support

This release includes support for reading schemas and record batches serialized
using the
[Arrow IPC format](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc). Based on the
[flatcc](https://github.com/dvidelabs/flatcc)
flatbuffers implementation, the nanoarrow IPC read support is implemented as
an optional extension to the core nanoarrow library. The easiest way to get
started is with the `ArrowArrayStream` provider using one of the built-in
`ArrowIpcInputStream` implementations:

```c
#include <stdio.h>
#include <stdbool.h>

#include "nanoarrow_ipc.h"

int main(int argc, char* argv[]) {
FILE* file_ptr = freopen(NULL, "rb", stdin);

struct ArrowIpcInputStream input;
NANOARROW_RETURN_NOT_OK(ArrowIpcInputStreamInitFile(&input, file_ptr, false));

struct ArrowArrayStream stream;
NANOARROW_RETURN_NOT_OK(ArrowIpcArrayStreamReaderInit(&stream, &input, NULL));

struct ArrowSchema schema;
NANOARROW_RETURN_NOT_OK(stream.get_schema(&stream, &schema));

struct ArrowArray array;
while (true) {
NANOARROW_RETURN_NOT_OK(stream.get_next(&stream, &array));
if (array.release == NULL) {
break;
}
}

return 0;
}
```

Facilities for advanced usage are also provided via the low-level `ArrowIpcDecoder`,
which takes care of the details of deserializing the flatbuffer headers and
assembling buffers into an `ArrowArray`. The current implementation can read
schema and record batch messages that contain any Arrow type that supported by
the C data interface. The initial version can read both big and little endian
streams originating from platforms of either endian. Dictionary encoding and
compression are not currently supported.

## Getting started with nanoarrow

Early users of the nanoarrow C library respectfully noted that it was difficult
to know where to begin. This release includes improvements in reference documentation
but also includes a long-form tutorial for those just getting started with or
considering adopting the library. You can find the tutorial at the
[nanoarrow documentation site](https://apache.github.io/arrow-nanoarrow/dev/getting-started.html).

## C library

The nanoarrow 0.2.0 release also includes a number of bugfixes and improvements
to the core C library, many of which were identified as a result of usage
connected with development of the IPC extension.

- Helpers for extracting/appending `Decimal128` and `Decimal256` elements
from/to arrays were added.
- The C library can now perform "full" validation to validate untrusted input
(e.g., serialized IPC from the wire).
- The C library can now perform "minimal" validation that performs all checks
that do not access any buffer data. This feature was added to facilitate
future support for the `ArrowDeviceArray` that was recently added as the
Arrow C device interface.
- Release verification on Ubuntu (x86_64, arm64), Fedora (x86_64),
Archlinux (x86_64), Centos 7 (x86_64, arm64), Alpine (x86_64, arm64, s390x),
Windows (x86_64), and MacOS (x86_64) were added to the continuous
integration system. Linux verification is implemented using `docker compose`
to facilitate local checks when developing features that may affect
a specific platform.

## R bindings

The nanoarrow R bindings are distributed as the `nanoarrow` package on
[CRAN](https://cran.r-project.org/). The 0.2.0 release of the R bindings includes
improvements in type support, improvements in stability, and features required
for the forthcoming release of Arrow Database Connectivity (ADBC) R bindings.
Notably:

- Support for conversion of union arrays to R objects was added to facilitate
support for an ADBC function that returns such an array.
- Support for adding an R-level finalizer to an `ArrowArrayStream` was added
to facilitate safely wrapping a stream resulting from an ADBC call at the
R level.

## Python bindings?

The nanoarrow 0.2.0 release does not include Python bindings, but improvements to the
[unreleased draft bindings](https://github.com/apache/arrow-nanoarrow/tree/main/python)
were added to facilitate discussion among Python developers regarding the useful
scope of a potential future nanoarrow Python package. If one of those developers is
you, feel free to
[open an issue](https://github.com/apache/arrow-nanoarrow/issues/new/choose)
or send a post to the
[developer mailing list](https://lists.apache.org/[email protected])
to engage in the discussion.

## Contributors

This initial release consists of contributions from 6 contributors in addition
to the invaluable advice and support of the Apache Arrow developer mailing list.
Special thanks to David Li for reviewing nearly every PR in this release!

```
$ git shortlog -sn d91c35b33c7b6ff94f5f929384879352e241ed71..apache-arrow-nanoarrow-0.2.0 | grep -v "GitHub Actions"
61 Dewey Dunnington
2 Dirk Eddelbuettel
2 Joris Van den Bossche
2 Kirill Müller
2 William Ayd
1 David Li
```