diff --git a/_posts/2023-06-13-nanoarrow-0.120-release.md b/_posts/2023-06-13-nanoarrow-0.120-release.md new file mode 100644 index 000000000000..a18a0ef9033a --- /dev/null +++ b/_posts/2023-06-13-nanoarrow-0.120-release.md @@ -0,0 +1,161 @@ +--- +layout: post +title: "Apache Arrow nanoarrow 0.2 Release" +date: "2023-06-22 00:00:00" +author: pmc +categories: [release] +--- + + +The Apache Arrow team is pleased to announce the 0.2.0 release of +Apache Arrow nanoarrow. This initial release covers 19 resolved issues from +6 contributors. + +## Release Highlights + +- Addition of the Arrow [IPC stream reader extension](#ipc-stream-support) +- Addition of the [Getting Started with nanoarrow](#getting-started-with-nanoarrow) + tutorial +- Improvements in reliability and platform test coverage of the [C library](#c-library) +- Improvements in reliability and type support in the [R bindings](#r-bindings) + +See the +[Changelog](https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0/CHANGELOG.md) +for a detailed list of contributions to this release. + +## IPC stream support + +This release includes support for reading schemas and record batches serialized +using the +[Arrow IPC format](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc). Based on the +[flatcc](https://github.com/dvidelabs/flatcc) +flatbuffers implementation, the nanoarrow IPC read support is implemented as +an optional extension to the core nanoarrow library. The easiest way to get +started is with the `ArrowArrayStream` provider using one of the built-in +`ArrowIpcInputStream` implementations: + +```c +#include +#include + +#include "nanoarrow_ipc.h" + +int main(int argc, char* argv[]) { + FILE* file_ptr = freopen(NULL, "rb", stdin); + + struct ArrowIpcInputStream input; + NANOARROW_RETURN_NOT_OK(ArrowIpcInputStreamInitFile(&input, file_ptr, false)); + + struct ArrowArrayStream stream; + NANOARROW_RETURN_NOT_OK(ArrowIpcArrayStreamReaderInit(&stream, &input, NULL)); + + struct ArrowSchema schema; + NANOARROW_RETURN_NOT_OK(stream.get_schema(&stream, &schema)); + + struct ArrowArray array; + while (true) { + NANOARROW_RETURN_NOT_OK(stream.get_next(&stream, &array)); + if (array.release == NULL) { + break; + } + } + + return 0; +} +``` + +Facilities for advanced usage are also provided via the low-level `ArrowIpcDecoder`, +which takes care of the details of deserializing the flatbuffer headers and +assembling buffers into an `ArrowArray`. The current implementation can read +schema and record batch messages that contain any Arrow type that supported by +the C data interface. The initial version can read both big and little endian +streams originating from platforms of either endian. Dictionary encoding and +compression are not currently supported. + +## Getting started with nanoarrow + +Early users of the nanoarrow C library respectfully noted that it was difficult +to know where to begin. This release includes improvements in reference documentation +but also includes a long-form tutorial for those just getting started with or +considering adopting the library. You can find the tutorial at the +[nanoarrow documentation site](https://apache.github.io/arrow-nanoarrow/dev/getting-started.html). + +## C library + +The nanoarrow 0.2.0 release also includes a number of bugfixes and improvements +to the core C library, many of which were identified as a result of usage +connected with development of the IPC extension. + +- Helpers for extracting/appending `Decimal128` and `Decimal256` elements + from/to arrays were added. +- The C library can now perform "full" validation to validate untrusted input + (e.g., serialized IPC from the wire). +- The C library can now perform "minimal" validation that performs all checks + that do not access any buffer data. This feature was added to facilitate + future support for the `ArrowDeviceArray` that was recently added as the + Arrow C device interface. +- Release verification on Ubuntu (x86_64, arm64), Fedora (x86_64), + Archlinux (x86_64), Centos 7 (x86_64, arm64), Alpine (x86_64, arm64, s390x), + Windows (x86_64), and MacOS (x86_64) were added to the continuous + integration system. Linux verification is implemented using `docker compose` + to facilitate local checks when developing features that may affect + a specific platform. + +## R bindings + +The nanoarrow R bindings are distributed as the `nanoarrow` package on +[CRAN](https://cran.r-project.org/). The 0.2.0 release of the R bindings includes +improvements in type support, improvements in stability, and features required +for the forthcoming release of Arrow Database Connectivity (ADBC) R bindings. +Notably: + +- Support for conversion of union arrays to R objects was added to facilitate + support for an ADBC function that returns such an array. +- Support for adding an R-level finalizer to an `ArrowArrayStream` was added + to facilitate safely wrapping a stream resulting from an ADBC call at the + R level. + +## Python bindings? + +The nanoarrow 0.2.0 release does not include Python bindings, but improvements to the +[unreleased draft bindings](https://github.com/apache/arrow-nanoarrow/tree/main/python) +were added to facilitate discussion among Python developers regarding the useful +scope of a potential future nanoarrow Python package. If one of those developers is +you, feel free to +[open an issue](https://github.com/apache/arrow-nanoarrow/issues/new/choose) +or send a post to the +[developer mailing list](https://lists.apache.org/list.html?dev@arrow.apache.org) +to engage in the discussion. + +## Contributors + +This initial release consists of contributions from 6 contributors in addition +to the invaluable advice and support of the Apache Arrow developer mailing list. +Special thanks to David Li for reviewing nearly every PR in this release! + +``` +$ git shortlog -sn d91c35b33c7b6ff94f5f929384879352e241ed71..apache-arrow-nanoarrow-0.2.0 | grep -v "GitHub Actions" + 61 Dewey Dunnington + 2 Dirk Eddelbuettel + 2 Joris Van den Bossche + 2 Kirill Müller + 2 William Ayd + 1 David Li +```