Release Initial alpha release for version 1.0.0 · KxSystems/arrowkdb

Apache Arrow is its in-memory columnar format, a standardized, language-agnostic specification for representing structured, table-like datasets in memory. This data format has a rich datatype system (included nested data types) designed to support the needs of analytic database systems, dataframe libraries, and more.

The arrowkdb integration enables kdb+ users to read and write Arrow tables created from kdb+ data using:

Parquet file format
Arrow IPC record batch file format
Arrow IPC record batch stream format

Currently Arrow supports over 35 datatypes including concrete, parameterized and nested datatypes. Each Arrow datatype is mapped to a kdb+ type and arrowkdb can seamlessly convert between both representations.

Separate APIs are provided where the Arrow table is either created from a kdb+ table using an inferred schema or from an Arrow schema and the table’s list of array data.

Inferred schemas. If you are less familiar with Arrow or do not wish to use the more complex or nested Arrow datatypes, arrowkdb can infer the schema from a kdb+ table where each column in the table is mapped to a field in the schema.
Constructed schemas. Although inferred schemas are easy to use, they support only a subset of the Arrow datatypes and are considerably less flexible. Where more complex schemas are required then these should be manually constructed using the datatype/field/schema constructor functions which arrowkdb exposes, similar to the C++ Arrow library and PyArrow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial alpha release for version 1.0.0