-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(python): Add user-facing
Array
class (#396)
This PR implements the `nanoarrow.Array` which basically a `pyarrow.ChunkedArray`. This can represent a `Table`, `RecordBatch`, `ChunkedArray`, and `Array`. It doesn't quite play nicely with pyarrow's ChunkedArray (but will after the next release, since `__arrow_c_stream__` was just added there). The user-facing class is backed by a Cython class, the `CMaterializedArrayStream`, which manages some of the c-level details like resolving a chunk + offset when there is more than one chunk in the array. An early version of this PR implemented the `CMaterializedArrayStream` using C pointers (e.g., `ArrowArray* arrays`), but I decided that was to complex and went back to `List[CArray]`. I think this is also better for managing ownership (e.g., more unneeded `CArray` instances can be released by the garbage collector). The `Array` class as implemented here is device-aware, although until we have non-CPU support it's difficult to test this. The methods I added here are basically stubs just to demonstrate the intention. This PR also implements the `Scalar`, whose main purpose for testing and other non-performance sensitive things (like lazier reprs for very large items or interactive inspection). They may also be useful for working with arrays that contain elements with very long strings or large arrays (e.g., geometry). I also added some basic accessors like `buffer()`, `child()`, and some ways one might want to iterate over an `Array` to make the utility of this class more clear. Basic usage: ```python import nanoarrow as na na.Array(range(100), na.int64()) ``` ``` nanoarrow.Array<int64>[100] 0 1 2 3 4 5 6 7 8 9 ...and 90 more items ``` More involved example reading from an IPC stream: ```python import nanoarrow as na from nanoarrow.ipc import Stream url = "https://github.com/apache/arrow-testing/raw/master/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_primitive.stream" with Stream.from_url(url) as inp: array = na.Array(inp) array.child(25) ``` ``` nanoarrow.Array<string>[37] 'co矢2p矢m' 'w€acrd' 'kjd1dlô' 'pib矢d5w' '6nnpwôg' 'ndj£h£4' 'ôôf4aµg' 'kwÂh£fr' '°g5dk€e' 'r€cbmdn' ...and 27 more items ``` --------- Co-authored-by: Joris Van den Bossche <[email protected]>
- Loading branch information
1 parent
dc50114
commit 7af6dff
Showing
13 changed files
with
1,077 additions
and
108 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.