Skip to content

Commit

Permalink
chore(docs): Ensure roadmap is up-to-date (#615)
Browse files Browse the repository at this point in the history
The roadmap contained some (very) oudated ideas regarding ideas for
future scope. This PR updates them with more recent text based on the
existing list of issues and implementation work that has happened over
the past few months.

---------

Co-authored-by: Benjamin Kietzman <[email protected]>
  • Loading branch information
paleolimbot and bkietz authored Sep 19, 2024
1 parent 6118e24 commit 3488ff1
Showing 1 changed file with 42 additions and 39 deletions.
81 changes: 42 additions & 39 deletions docs/source/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,60 +27,63 @@ have not yet been scheduled for implementation.
## C library

- **Type coverage**: The C library currently provides support for all types that
are available via the Arrow C Data interface. When the recently-added run-end
encoded (REE) types and potentially forthcoming string view/list view types
are available via the Arrow C Data interface, support should be added in
nanoarrow as well.
- **Array append**: The `ArrowArrayAppend*()` family of functions provide a means
by which to incrementally build arrays; however, there is no built-in way to
append an `ArrowArrayView`, potentially more efficiently appending multiple
values at once. Among other things, this would provide a route to an
unoptimized filter/take implementation.
are available via the Arrow C Data interface except string view/list view types.
Support for these should be added in nanoarrow as well
([#583](https://github.com/apache/arrow-nanoarrow/issues/583),
[#616](https://github.com/apache/arrow-nanoarrow/issues/616),
[#510](https://github.com/apache/arrow-nanoarrow/issues/510)).
- **Remove Arrow C++ dependency for tests**: The C library and IPC extension rely
on Arrow C++ for some test code that was written early in the library's development.
These tests are valuable to ensure compatibility between nanoarrow and Arrow C++;
however, including them in the default test suite complicates release verification
for some users and prevents testing in environments where Arrow C++ does not
currently build (e.g., WASM, compilers without C++17 support).
currently build (e.g., WASM, compilers without C++17 support)
([#619](https://github.com/apache/arrow-nanoarrow/issues/619)).
- **Test verbosity**: Tests for the C library were written before testing utilities
in the `nanoarrow_testing` library were available (and before there was a
`nanoarrow_testing` library in which to put new ones). As a result, some of them
are very verbose and can be difficult to read, which can and should be improved
([#577](https://github.com/apache/arrow-nanoarrow/issues/577),
[#566](https://github.com/apache/arrow-nanoarrow/issues/566)).
- **C++ integration**: The existing C++ integration is intentionally minimal;
however, there are likely improvements that could be made to better integrate
nanoarrow into existing C++ projects.
nanoarrow into existing C++ projects
([#599](https://github.com/apache/arrow-nanoarrow/issues/599)).
- **Documentation**: As the C library and its user base evolves, documentation
needs to be refined and expanded to support the current set of use cases.

## IPC extension

- **Write support**: The IPC extension currently provides support for reading
IPC streams but not writing them.
- **Dictionary support**: The IPC extension does not currently support reading
dictionary messages an IPC stream.
- **Compression**: The IPC extension does not currently support compressed streams.

## Device extension

This entire extension is currently experimental and awaiting use-cases that will
drive future development.
needs to be refined and expanded to support the current set of use cases
([#187](https://github.com/apache/arrow-nanoarrow/issues/187),
[#497](https://github.com/apache/arrow-nanoarrow/issues/497)).
- **IPC Dictionary support**: The IPC extension does not currently support reading
dictionary messages an IPC stream
([#622](https://github.com/apache/arrow-nanoarrow/issues/622)).
- **IPC Compression support**: The IPC extension does not currently support
compressed streams using per-buffer compression, although streams can be compressed
outside the nanoarrow library (e.g., gzip compression of the entire stream)
([#621](https://github.com/apache/arrow-nanoarrow/issues/621))

## R bindings

- **Type support**: The R bindings currently do not provide support for extension
types and relies on Arrow C++ for some dictionary-encoded types.
- **Conversion internals**: The initial implementation of conversion from
Arrow data to R vectors was implemented in C and its verbosity makes it
difficult to add support for new types. The internals should be refactored
to make the conversion code easier to understand for new developers
([#392](https://github.com/apache/arrow-nanoarrow/pull/392)).
- **Type support**: The R bindings currently rely on the Arrow R package for
conversion of some R types (e.g., list_of), and some types are not supported
in nanoarrow nor the arrow R package (e.g., run-end encoding, list view, and
string/binary view)
([#617](https://github.com/apache/arrow-nanoarrow/issues/617)).
- **ALTREP support**: A recent R release added enhanced ALTREP support such that
types that convert to `list()` can defer materialization cost/allocation.
Arrow sources that arrive in chunks (e.g., from a `Table` or `ChunkedArray`)
currently can't be converted via any ALTREP mechanism and support could be
added.
- **IPC support**: The IPC reader is not currently exposed in the R bindings.
added ([#219](https://github.com/apache/arrow-nanoarrow/issues/219)).

## Python bindings

- **Packaging**: The Python bindings are currently unpublished (pypi or conda) and
are not included in release verification.
- **Element conversion**: There is currently no mechanism to extract an element
of an `ArrowArrayView` as a Python object (e.g., an `int` or `str`).
- **numpy/Pandas conversion**: The Python bindings currently expose the `ArrowArrayView`
but do not provide a means by which to convert to popular packages such as
numpy or Pandas.
- **Creating arrays**: The Python bindings do not currently provide a means by
which to create an `ArrowArray` from buffers or incrementally.
- **IPC support**: The IPC reader is not currently exposed in the Python bindings.
- **Type support**: The Python bindings do not currently support unions,
string/binary view, or list view, or run-end-encoded types. When creating
Arrow arrays from iterables of Python objects, some types are not yet
supported (e.g., struct, list, datetime objects)
([#618](https://github.com/apache/arrow-nanoarrow/issues/618),
[#620](https://github.com/apache/arrow-nanoarrow/issues/620)).

0 comments on commit 3488ff1

Please sign in to comment.