Skip to content

Releases: KxSystems/arrowkdb

Release candidate for 1.4.1

21 May 17:34
33f5c95
Compare
Choose a tag to compare

Note: the 1.4.1-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Arrow only supports a single string array containing up to 2GB of data. If the kdb string/symbol list contains more than this amount of data then it has to be populated into an Arrow chunked array. Chunked arrays were already support by arrowkdb when writing Arrow IPC files or streams, but not when when writing Parquet files.

Therefore, in order to support the used of chunked arrays when writing Parquet files, the ARROW_CHUNK_ROWS option has been added to:

  • pq.writeParquet
  • pq.writeParquetFromTable

Note: This only applies to how kdb lists are chunked internally to the Parquet file writer. This is different to the row groups configuration (set using PARQUET_CHUNK_SIZE) which controls how the Parquet file is structured when written.

Release candidate for 1.4.0

12 Sep 13:55
72d7290
Compare
Choose a tag to compare

Note: the 1.4.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

  1. New COMPRESSION option to specify the codec to use when writing Parquet files, IPC files or IPC streams.
  2. Bug fix for handling float32 and float64 nulls when mapping to/from 0n and 0nf.

Release candidate for 1.3.0

04 Apr 09:03
c864f83
Compare
Choose a tag to compare

Note: the 1.3.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

  1. New APIs for reading and writing Apache ORC files (Linux and macOS only). This includes NULL support via the NULL_MAPPING and WITH_NULL_BITMAP options.
  2. When building from source, arrrowkdb detects your libarrow version and selects c++14 (libarrow < 10.0) or c++17 (libarrow >= 10.0) as appropriate.

Release candidate for 1.2.0

15 Mar 08:56
0c4e452
Compare
Choose a tag to compare

Note: the 1.2.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

  1. Support for converting kdbs null to arrow nulls when reading and writing via a new NULL_MAPPING option when:
  • Reading and writing Parquet files
  • Reading and writing Arrow IPC files
  • Reading and writing Arrow IPC streams
  1. Support for reading the arrow bitmap as a separate structure via a new WITH_NULL_BITMAP option when:
  • Reading Parquet files
  • Reading Arrow IPC files
  • Reading Arrow IPC streams
  1. Arrow IPC files and streams can be written with chunking via a new ARROW_CHUNK_ROWS option

Release candidate for 1.1.0

01 Nov 15:04
73ef9fc
Compare
Choose a tag to compare

Note: the 1.1.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

  • Support multithreaded use of arrowkdb with peach
  • Add support for reading Parquet files with row groups (chunking)
  • Upgrade build to use libarrow and libparquet 9.0.0
  • Support latest v2 Parquet file formats

New functions:

  • pq.readParquetNumRowGroups
  • pq.readParquetRowGroups
  • pq.readParquetRowGroupsToTable

Release candidiate for 1.0.0

29 Jul 16:03
751d6e6
Compare
Choose a tag to compare

Note: the 1.0.0-rc.1 arrowkdb package was built against Apache Arrow version 5.0.0. If you have a different version of the libarrow runtime installed, it may be nessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Arrowkdb enhancements:

  1. Make the API more future proof and extensible by adding an options parameter to the read and write functions where it was not already present:
  • pq.readParquetColumn
  • ipc.writeArrow
  • ipc.writeArrowFromTable
  • ipc.serializeArrow
  • ipc.serializeArrowFromTable
  • ipc.parseArrowData
  • ipc.parseArrowToTable
  • ar.prettyPrintArray
  • ar.prettyPrintArrayFromList
  • tb.prettyPrintTable
  • tb.prettyPrintTableFromTable
  1. Support mapping the Arrow decimal128 datatype to and from a kdb+ 9h list via the new option DECIMAL128_AS_DOUBLE.

Initial alpha release for version 1.0.0

25 Feb 13:49
Compare
Choose a tag to compare
Pre-release

Apache Arrow is its in-memory columnar format, a standardized, language-agnostic specification for representing structured, table-like datasets in memory. This data format has a rich datatype system (included nested data types) designed to support the needs of analytic database systems, dataframe libraries, and more.

The arrowkdb integration enables kdb+ users to read and write Arrow tables created from kdb+ data using:

  • Parquet file format
  • Arrow IPC record batch file format
  • Arrow IPC record batch stream format

Currently Arrow supports over 35 datatypes including concrete, parameterized and nested datatypes. Each Arrow datatype is mapped to a kdb+ type and arrowkdb can seamlessly convert between both representations.

Separate APIs are provided where the Arrow table is either created from a kdb+ table using an inferred schema or from an Arrow schema and the table’s list of array data.

  • Inferred schemas. If you are less familiar with Arrow or do not wish to use the more complex or nested Arrow datatypes, arrowkdb can infer the schema from a kdb+ table where each column in the table is mapped to a field in the schema.
  • Constructed schemas. Although inferred schemas are easy to use, they support only a subset of the Arrow datatypes and are considerably less flexible. Where more complex schemas are required then these should be manually constructed using the datatype/field/schema constructor functions which arrowkdb exposes, similar to the C++ Arrow library and PyArrow.