v0.17.0 (2023-03-27)
Breaking changes:
- Changed async ipc writer to accept schema by value #1439 (ritchie46)
- Made
len/len_proxy
consistent withOffsets
#1434 (ritchie46) - Changed methods to slice arrays #1396 (jorgecarleitao)
New features:
- Added buffer interoperability with arrow-rs #1437 (tustvold)
- Added MapScalar #1428 (b41sh)
- Added support for JSON serialization of dictionary #1424 (ritchie46)
- Added support for MapArray read and write to parquet #1419 (b41sh)
Fixed bugs:
parquet_read
panics when working withdate64
s #1400- Round Trip [Rust -> arrow2_convert -> Arrow -> Parquet -> Arrow -> Rust] #1376
- Parquet writes incorrect
List<u32>
#1368 - Slicing nullable list arrays into multiple parquet pages doesn't work #1356
- Reading parquet file with multiple row groups and nested nullable struct types panics #1249
- Changed encoded float::Inf as null in json #1427 (SimonSchneider)
- Fixed statistics writing flag and correct null_count in dictionaries #1414 (ritchie46)
- Fixed ahash dependency for wasm #1407 (hzuo)
- Fixed writing of sliced arrays to parquet #1397 (jorgecarleitao)
- Fixed writing nested parquet #1390 (jorgecarleitao)
Enhancements:
- Added interoperability with arrow-schema #1442 (tustvold)
- Updated dependencies #1441 (ritchie46)
- Updated multiversion and support wider registers #1440 (ritchie46)
- Added impl_mutable_array_mut_validity macro for mutable arrays #1435 (Arty-Maly)
- Re-exported the
bloom_filter
module fromparquet2
crate #1420 (ozgrakkurt) - Updated base64 to 0.21 #1408 (WindSoilder)
- Added apply_validity and set_validity to mutable utf8 array #1406 (Arty-Maly)
- Added cast for FixedSizeBinary to (Large)Binary #1403 (ritchie46)
- Improved support for date64 written by pyarrow to parquet #1402 (jorgecarleitao)
- Simplified code #1401 (jorgecarleitao)
- Improved API of getting mutable from Buffer #1399 (jorgecarleitao)
- Simplified code via DRY #1398 (jorgecarleitao)
- Added
set_len
method to Buffer #1374 (haixuanTao)
Documentation updates:
v0.16.0 (2023-02-09)
Breaking changes:
- Made IPC writer take owned schema #1361 (ritchie46)
- Correctly update child-offsets in
GrowableUnion
#1360 (jleibs)
Fixed bugs:
- invalid written parquet file of nested structures. (Mixing list with structs) #1325
- Fix incorrect downcast in
estimated_size_bytes
#1351 (jleibs) - fix(parquet): nested struct /list writing #1347 (ritchie46)
- Fixed csv infer_schema on empty fields #1342 (tripokey)
Enhancements:
- Added support for
take
ofFixedSizeListArray
#1386 (kylebarron) - Renamed
factory
argument on parquet read functions toreader_factory
#1380 (ozgrakkurt) - Made some structs and functions public #1375 (b41sh)
- Added
Utf8Array::apply_validity
#1367 (Arty-Maly) - Added set/get scratches #1363 (ritchie46)
- Amortized intermediate allocations in IPC writer #1362 (ritchie46)
- Improved clippy #1353 (jorgecarleitao)
Documentation updates:
- Fixed typo in
OffsetsBuffer
docs #1373 (DzenanJupic) - Update README.md to fix capitalization and spelling #1338 (yerke)
Testing updates:
v0.15.0 (2022-12-18)
Breaking changes:
- Added values' capacity to
MutableBinaryArray::reserve
#1277 - Removed
from_data
from all arrays #1328 (jorgecarleitao) - Added
Offsets
andOffsetsBuffer
#1316 (jorgecarleitao) - Bumped parquet2 dependency #1304 (ritchie46)
- Added data_pagesize_limit to write parquet pages #1303 (sundy-li)
- Bumped arrow-format to 0.8 #1298 (Xuanwo)
- Improved iterators #1270 (jorgecarleitao)
New features:
- Added
TryExtendFromSelf
#1278 (jorgecarleitao) - Added support for JSON ser/de records layout #1275 (AnIrishDuck)
Fixed bugs:
- Parquet writes all values of sliced arrays? #1323
- Avro schema: Invalid record names #1269
- Fixed writing nested/sliced arrays to parquet #1326 (ritchie46)
- Fixed failing to accept dictionary full of nulls #1312 (ritchie46)
- Added support for Extension types in ffi #1300 (jondo2010)
- Fixed error in memory usage of sliced binary/list/utf8arrays #1293 (ritchie46)
- Fixed descending ordering when specify nulls first #1286 (sandflee)
- Added avro record names when converting arrow schema to avro #1279 (Samrose-Ahmed)
Enhancements:
- Fixed clippy #1336 (jorgecarleitao)
- Improved
UnionArray
#1331 (jorgecarleitao) - Bumped json-deserializer version #1321 (universalmind303)
- Removed flushing during arrow IPC writing to improve performance when using a buffered writer #1318 (cyr)
- Improved performance of check_indexes #1313 (ritchie46)
- Improved performance of checking offsets
~-64-73%
#1305 (ritchie46) - Added
reserve
to pushable containers in parquet extend_from_decoder #1301 (ritchie46) - Optimized slicing #1285 (jorgecarleitao)
- Improved ZipValidity iterators #1284 (ritchie46)
- Added
MutableBinaryValuesArray
#1276 (jorgecarleitao)
Documentation updates:
- Fixed link from the API to the guide #1290 (datapythonista)
v0.15.0 (2022-12-18)
Breaking changes:
- Added values' capacity to
MutableBinaryArray::reserve
#1277 - Removed
from_data
from all arrays #1328 (jorgecarleitao) - Added
Offsets
andOffsetsBuffer
#1316 (jorgecarleitao) - Bumped parquet2 dependency #1304 (ritchie46)
- Added data_pagesize_limit to write parquet pages #1303 (sundy-li)
- Bumped arrow-format to 0.8 #1298 (Xuanwo)
- Improved iterators #1270 (jorgecarleitao)
New features:
- Added
TryExtendFromSelf
#1278 (jorgecarleitao) - Added support for JSON ser/de records layout #1275 (AnIrishDuck)
Fixed bugs:
- Parquet writes all values of sliced arrays? #1323
- Avro schema: Invalid record names #1269
- Fixed writing nested/sliced arrays to parquet #1326 (ritchie46)
- Fixed failing to accept dictionary full of nulls #1312 (ritchie46)
- Added support for Extension types in ffi #1300 (jondo2010)
- Fixed error in memory usage of sliced binary/list/utf8arrays #1293 (ritchie46)
- Fixed descending ordering when specify nulls first #1286 (sandflee)
- Added avro record names when converting arrow schema to avro #1279 (Samrose-Ahmed)
Enhancements:
- Fixed clippy #1336 (jorgecarleitao)
- Improved
UnionArray
#1331 (jorgecarleitao) - Bumped json-deserializer version #1321 (universalmind303)
- Removed flushing during arrow IPC writing to improve performance when using a buffered writer #1318 (cyr)
- Improved performance of check_indexes #1313 (ritchie46)
- Improved performance of checking offsets
~-64-73%
#1305 (ritchie46) - Added
reserve
to pushable containers in parquet extend_from_decoder #1301 (ritchie46) - Optimized slicing #1285 (jorgecarleitao)
- Improved ZipValidity iterators #1284 (ritchie46)
- Added
MutableBinaryValuesArray
#1276 (jorgecarleitao)
Documentation updates:
- Fixed link from the API to the guide #1290 (datapythonista)
v0.14.2 (2022-10-05)
New features:
- Added MutableUtf8ValuesArray #1260 (jorgecarleitao)
Fixed bugs:
- Unnecessary println in library code #1263
Testing updates:
- Added test for
MutableUtf8Array::as_box
#1266 (jorgecarleitao)
v0.14.1 (2022-09-27)
Fixed bugs:
- Potentially unneeded call in Parquet repetition-level encoding #1254
- Potential bug in reading lists from avro? #1252
- Removed un-used code #1258 (jorgecarleitao)
- Fixed error reading unbounded Avro list #1253 (jorgecarleitao)
- Add missing call to
try_push_valid
for nested avro deserialization #1248 (shaeqahmed)
Enhancements:
- Bump json_deserializer version to 0.4.1 #1261 (cjermain)
- Fixed clippy for 1.60 #1259 (jorgecarleitao)
- Added
BinaryArray::into_mut
and double-ended support for its iterator #1255 (ozgrakkurt)
Testing updates:
- Improved test for nullable struct read from Avro #1250 (jorgecarleitao)
v0.14.0 (2022-09-12)
Breaking changes:
- Removed
Count
(parquet statistics) #1217 (jorgecarleitao) - Exposed parquet indexed page filtering to
FileReader
#1216 (jorgecarleitao) - Simpler IPC API #1208 (jorgecarleitao)
- Migrated Avro code to avro-schema repo #1199 (jorgecarleitao)
- Added support for decimal 256 #1194 (jorgecarleitao)
New features:
- Added support for decoding delta-length-encoded binary (parquet) #1228 (jorgecarleitao)
- Added support to read and write Parquet's delta-bitpacked (integer encoding) #1226 (jorgecarleitao)
- Added support for parquet sidecar to
FileReader
#1215 (jorgecarleitao) - Write 64bit aligned IPC files #1201 (jorgecarleitao)
- Added support to mmap IPC format #1197 (jorgecarleitao)
- Added
MutableStructArray
#1196 (hohav)
Fixed bugs:
- Stack overflow in parquet RowGroupReader with groups_filter #1206
- fixed comparisson and validity kernels #1243 (ritchie46)
- Fixed reading nested stats #1240 (jorgecarleitao)
FileSink
now closes the underlying writer. #1213 (samkaufman)- Fixed JSON infer order #1212 (jorgecarleitao)
- Fixed StackOverflow in skipping many parquet row groups #1210 (jorgecarleitao)
- Fix escaped like wildcards #1204 (daniel-martinez-maqueda-sap)
- Removed println :( #1203 (jorgecarleitao)
Enhancements:
- Added schema to FileReader #1246 (jorgecarleitao)
- Simpler nested parquet read #1241 (jorgecarleitao)
- Removed unneeded code #1229 (jorgecarleitao)
- Improved
MutableStruct::push
#1223 (hohav) - Reduced binary size #1221 (jorgecarleitao)
- Added utf8 <> binary cast #1220 (jorgecarleitao)
- split parquet compression backend features #1207 (ritchie46)
- Improved API of
mmap
#1205 (ritchie46) - Added
MutableArray::reserve
#1202 (jorgecarleitao) - Delayed dict #1185 (jorgecarleitao)
Documentation updates:
- Fixed guide and improved examples #1247 (jorgecarleitao)
- Added documentation on parquet compatibility under
TimeUnit
. #1238 (TurnOfACard) - Fixed typo in error message for impl StructArray #1237 (knil-sama)
- Fixed incorrect command in doc for generating ORC files #1234 (poga)
- Improved github page generation #1233 (jorgecarleitao)
- Fix a typo in the docs #1225 (teymour-aldridge)
- Fix some doc links/typos #1211 (AnIrishDuck)
Testing updates:
- Fixed clippy warnings #1227 (jorgecarleitao)
- Updated integration test #1214 (jorgecarleitao)
v0.13.0 (2022-07-31)
Breaking changes:
- Made
nested
argument ofarray_to_pages
non-owning #1174 - Replaced
Result
bypanic
in boolean comparison #1159 (jorgecarleitao) - Improved dictionary invariants #1137 (jorgecarleitao)
- Change signature of PrimitiveScalar::value to return reference #1129 (ncpenke)
- Removed need to pass encodings by value #1123 (ritchie46)
- Removed unused
NativeType::to_ne_bytes
#1112 (jorgecarleitao) - Avoid clone in
with_validity
#1104 (jorgecarleitao) - Reduced need of
unsafe
in FFI #1100 (jorgecarleitao) - Removed
Buffer::into_mut
andmake_mut
functions #1089 (jorgecarleitao) - Renamed
Bitmap::null_count
toBitmap::unset_bits
#1087 (jorgecarleitao) - Made
chunk_size
optional in parquet'scolumn_iter_to_arrays
#1055 (jorgecarleitao) - Migrated from
Arc<dyn Array>
toBox<dyn Array>
#1042 (jorgecarleitao)
New features:
- Added support to read ORC #1189 (jorgecarleitao)
- Added support for limit pushdown to IPC reading #1135 (jorgecarleitao)
- Added support to write and read Intervals from and to parquet #1122 (jorgecarleitao)
- Added support to write
FixedSizeBinary
to Avro #1118 (jorgecarleitao) - Added support for projections in reading IPC streams #1097 (joshuataylor)
- Added support to write parquet
_metadata
sidecar #1063 (jorgecarleitao) - Added cow APIs (2x-10x vs non-cow) #1061 (jorgecarleitao)
- Added support to read and write f16 #1051 (jorgecarleitao)
Fixed bugs:
- Fixed error not implemented error when reading plain, after-dict pages for fix-len-binary from parquet #1192 (jorgecarleitao)
- Fixed error in decoding nested multi-page columns from parquet #1188 (jorgecarleitao)
- Fixed error in counting items in nested parquet #1182 (jorgecarleitao)
- Fixed reading stats from int96 parquet #1181 (jorgecarleitao)
- Fixed limit pushdown in parquet #1180 (jorgecarleitao)
- use
FnOnce
forPrimitiveArray::apply_validity
#1176 (ritchie46) - release memory on predicate with 0% selectivity #1163 (ritchie46)
- Fixed error in reading
Struct<List<...>>
from parquet #1150 (jorgecarleitao) - Fixed IPC projection #1149 (ritchie46)
- Fixed casting dictionary keys #1143 (ritchie46)
- Fixed reading arrays from parquet with required children #1140 (jorgecarleitao)
- Fixed panic in deserializing nested statistics #1139 (jorgecarleitao)
- Aligned name of
FixedSizeBinaryArray::values_iter
#1117 (jorgecarleitao) - Fixed error in
FixedSizeListArray::new_null
#1114 (jorgecarleitao) - Fixed panic in writing dictionaries to parquet #1113 (jorgecarleitao)
- Fixed error in reading chunked parquet #1108 (jorgecarleitao)
- Raise error when invalid fields are passed to flight #1093 (jorgecarleitao)
- Made IPC projection not sort projection #1082 (jorgecarleitao)
- Fixed error in chunked_mut bitmap #1081 (jorgecarleitao)
- Fixed panic in bitmap assign_mut #1078 (ritchie46)
- Panic-free read of IPC files #1075 (jorgecarleitao)
- Bumped parquet2 (minor) requirement #1071 (jorgecarleitao)
- Fixed divide by zero on reading empty row group #1062 (jorgecarleitao)
- Fixed missing validation of number of encodings passed when writing to parquet #1057 (jorgecarleitao)
Enhancements:
- Improved performance of reading Binary from parquet #1190 (ritchie46)
- Bumped to latest nightly #1186 (gyscos)
- Improved error message #1179 (jorgecarleitao)
- Added support to read and write nested dictionaries to parquet #1175 (jorgecarleitao)
- Added
MutableUtf8Array::into_data
#1170 (ritchie46) - Added
Default
forUtf8Array
#1169 (ritchie46) - fix(parquet): allow to read other logical types from parquet #1168 (sundy-li)
- fix(parquet): enforce to use ParquetTimeUnit::Nanoseconds for PhysicalType::Int96 #1167 (sundy-li)
- Added constructor
MutableFixedSizeListArray::new_from
#1161 (hohav) - Removed unneeded
Default
constraint #1157 (hohav) - Improved checks to safety invariants in FFI #1154 (jorgecarleitao)
- Removed un-needed indirection #1153 (jorgecarleitao)
- Soften generic constraint of
Buffer
#1152 (sundy-li) - Use ahash by default #1148 (ritchie46)
- Reduced bound checks #1142 (ritchie46)
- Moved
Bytes
to own crate #1141 (jorgecarleitao) - Fixed clippy for 1.62 #1134 (Xuanwo)
- Cleaned example #1130 (jorgecarleitao)
- Removed
O(N)
clone in writing CSV #1128 (jorgecarleitao) - Avoid zeroed allocation in reading avro #1127 (jorgecarleitao)
- Reduced allocations of reading bitmaps from IPC #1126 (jorgecarleitao)
- Improved performance of reading from IPC #1125 (jorgecarleitao)
- Improved parquet read performance #1124 (jorgecarleitao)
- Optimized write nulls to Avro #1119 (jorgecarleitao)
- Made
row_group::get_field_columns
public #1110 (ritchie46) - Removed some panics reading invalid parquet files #1106 (jorgecarleitao)
- Reduced reallocations when reading from IPC (
~12%
) #1105 (ritchie46) - Exposed utilities in
io::flight
#1094 (jorgecarleitao) - Accept decoding parquet's
i64
intou32
written bypyarrow
#1090 (jorgecarleitao) - Simplified code #1088 (jorgecarleitao)
- Removed un-necessary allocation in
assign_ops
#1085 (jorgecarleitao) - Replaced some macros by generics #1084 (jorgecarleitao)
- Improved performance of
Bitmap::make_mut
with offset #1079 (jorgecarleitao) - Implemented
Default
forPrimitiveArray
#1073 (ritchie46) - Expose share counts in
Buffer
#1072 (ritchie46) - Added
compute::arity_assign
#1070 (jorgecarleitao) - Improved performance in lexical write (~5%) #1067 (ritchie46)
- Added cast to/from
Null
from/to every type #1066 (jorgecarleitao) - prevent unneeded offset check #1059 (ritchie46)
Documentation updates:
- Fixed parquet write example #1193 (rajasekarv)
- Improved docs #1164 (jorgecarleitao)
- Minor cleanup of internal namings #1160 (jorgecarleitao)
- Added example reading Avro produced by Kafka #1151 (jorgecarleitao)
- Updated license wording #1138 (jorgecarleitao)
- Fixed wrong package name in examples #1133 (Xuanwo)
- Improved example #1131 (jorgecarleitao)
- Added more tests #1111 (jorgecarleitao)
- Improved examples #1109 (jorgecarleitao)
- Improved internal docs #1107 (jorgecarleitao)
- Added notes about creating parquet files and submodules in the development documentation #1096 (joshuataylor)
- Improved docs for
BooleanArray
#1083 (jorgecarleitao) - Added missing link to guide #1065 (jorgecarleitao)
- Improve Docs Readability #1054 (ryanrussell)
Testing updates:
- Temporary skip decimal256 integration tests #1198 (jorgecarleitao)
- Simplified code #1183 (jorgecarleitao)
- Made kafka schema_id
u32
in example #1162 (jorgecarleitao) - Added more tests #1158 (jorgecarleitao)
- Bumped MIRI #1156 (jorgecarleitao)
- Simplified code in flight integration tests #1136 (jorgecarleitao)
- Added more tests for nested parquet #1121 (jorgecarleitao)
- Added more tests for reading and writing CSV #1120 (jorgecarleitao)
- Added test for scalar division #1115 (jorgecarleitao)
- Added more tests #1103 (jorgecarleitao)
- Enabled more integration tests with pyarrow #1102 (jorgecarleitao)
- Simplified
Bytes
(internal) #1099 (jorgecarleitao) - Updated patch to arrow integration tests #1068 (jorgecarleitao)
- Added more tests #1064 (jorgecarleitao)
v0.12.0 (2022-06-05)
Breaking changes:
- Require one encoding per parquet column on write #1012
- Bumped parquet2 #1035 (jorgecarleitao)
- Improved performance of deserializing JSON (2x) #1024 (jorgecarleitao)
- Remove
from_trusted_len_*
fromBuffer
#1020 (jorgecarleitao) - Bumped arrow-format #1011 (jorgecarleitao)
- Replace
fn Offset::is_large()
asconst Offset::IS_LARGE
#1002 (HaoYang670) - Renamed
ArrowError
toError
#993 (jorgecarleitao)
New features:
- Added support to deserialize
MapArray
from parquet #1045 (jorgecarleitao) - Added support for random access reads from IPC #1034 (jorgecarleitao)
- Added support for custom sort
build_compare_fn
#1016 (b41sh) - Added support to write nested parquet #1007 (jorgecarleitao)
- Added support for deserializing JSON from iterator #989 (cjermain)
Fixed bugs:
- Writing of
ListArray
does not preserve all values #1008 - Write a two-dimensional list to parquet file failed #992
- Writing to Parquet fails for extension types that contain lists #830
- Fixed using lower limit than size of first parquet row group #1046 (arxra)
- Fixed error in consuming sliced
FixedSizedBinary
from c data interface (FFI) #1026 (jorgecarleitao) - Fixed lexsort limit equal or greater than row_count #1021 (b41sh)
- Fixed error in reading nested parquet structs #1015 (jorgecarleitao)
- Fixed panic on debug print of invalid timezones #1013 (jorgecarleitao)
- Treat empty timezone string as no-timezone #1009 (dbr)
- Fixed encoding of
NaN
to json #990 (SimonSchneider) - Fixed error in writing
ListArray
to parquet #984 (jorgecarleitao) - Fixed decoding Binary Plain pages with dictionary pages #982 (aptr322)
Enhancements:
- Added
Debug
andPartialEq
forMapArray
#1043 (jorgecarleitao) - Exposed compression levels for parquet #1041 (ritchie46)
- Added
.arced
/.boxed
to arrays #1040 (jorgecarleitao) - Added utility to create encodings #1018 (jorgecarleitao)
- Made
parquet_to_arrow_schema
public #1006 (martingallagher) - Speeded up
min_max_boolean
for the case where all values are null #1005 (HaoYang670) - Simplified
min_max_string
andmin_max_binary
#1004 (HaoYang670) - Added support for Decimal in
build_compare
#998 (GPSnoopy) - remove accidental quadratic null_count #991 (ritchie46)
- Aligns MutableDictionaryArray's with MutablePrimitiveArrays with TryPush #981 (TurnOfACard)
Documentation updates:
- Cleaned docs for BinaryArray #1047 (jorgecarleitao)
- Improved API docs for
MutableBitmap
#1025 (jorgecarleitao) - Improved docs for
bitmap
#1022 (jorgecarleitao) - Improved API docs for
PrimitiveArray
andUtf8Array
#1017 (jorgecarleitao) - Fixed dev guide #1003 (jorgecarleitao)
Testing updates:
- Added more tests #1029 (jorgecarleitao)
- Moved coverage reporting to
cargo-llvm-cov
#1028 (jorgecarleitao) - Added more tests (increase coverage) #1027 (jorgecarleitao)
- Moved tests from lib to
tests
#1001 (jorgecarleitao) - Allowed feature-specific test runs #985 (jorgecarleitao)
v0.11.2 (2022-05-05)
New features:
- Added support to append to existing IPC Arrow file #972 (jorgecarleitao)
- Added pop to utf8/binary/fixedSize MutableArray #966 (ygf11)
- Added support for union scalars #930 (ncpenke)
Fixed bugs:
- Added support to read nested binary from parquet #978 (jorgecarleitao)
- Fixed empty reader panic for NDJSON type infer #974 (Roberto-XY)
- Prevented SO in large parquet files #973 (ritchie46)
- Fixed API bug in
async
read of IPC metadata #969 (jorgecarleitao) - Fixed writing required list to parquet #968 (jorgecarleitao)
Enhancements:
- Added support Parquet deserialize LargeList and Uint data types #979 (b41sh)
- Made reading of IPC dictionaries lazy #971 (jorgecarleitao)
- Allowed creating IPC
FileWriter
without writing to the file #970 (jorgecarleitao)
v0.11.1 (2022-04-27)
Fixed bugs:
v0.11.0 (2022-04-27)
Breaking changes:
- Refactored parquet statistics deserialization #962 (jorgecarleitao)
- Made GroupFilter
Send + Sync
#947 (jorgecarleitao)
New features:
- Added support for non-ordered projections to IPC reading #961 (jorgecarleitao)
- Added support for reading indexed parquet pages #923 (jorgecarleitao)
Fixed bugs:
- Parquet regression:
exceptions.ArrowErrorException: NotYetImplemented("Can't read Dictionary(UInt32, LargeUtf8, false) from parquet")
#955 - Reading Parquet binary column panics during deserialization 'attempt to subtract with overflow` #944
- Reading Parquet file written by pyarrow with
lz4
compression fails withOutOfSpec("Thrift out of range")
#940 - Issues when trying to create a parquet file with FixedSizedListArray #691
- Fixed bug in writing csv with buffer resizing #965 (ritchie46)
- Fixed bug in reading binary parquet #945 (jorgecarleitao)
- Fixed error in writing fixedSizeListArray to parquet #941 (jorgecarleitao)
- Fixed support to read dict nested binary parquet #924 (jorgecarleitao)
Enhancements:
- Reduced memory usage in reading parquet #964 (jorgecarleitao)
- Simpler IPC code #939 (jorgecarleitao)
- don't allocate string when writing to csv #935 (ritchie46)
- Removed un-needed generic parameter #927 (jorgecarleitao)
- update to odbc-api 0.36.0 #925 (pacman82)
Documentation updates:
- Fixed example of parallel read via rayon #958 (jorgecarleitao)
- Fixed guide deployment #931 (jorgecarleitao)
- Typo fix #919 (bkmgit)
Testing updates:
- Fixed patch of integration tests #960 (jorgecarleitao)
- Added test for MapArray #942 (jorgecarleitao)
- Fixed wrong clippy warning #938 (jorgecarleitao)
v0.10.1 (2022-03-16)
New features:
- Added support to write
StructArray
to Avro #909 (jorgecarleitao) - Added support to write
ListArray
to Avro #908 (jorgecarleitao)
Fixed bugs:
- Fixed error in
FixedSizeBinaryArray::new_null
#914 (jorgecarleitao)
Enhancements:
- remove csv dependency for csv-write #917 (ritchie46)
- Added
capacity
to some mutable arrays and tests #913 (jorgecarleitao) - Support
sum
,min
andmax
for extension and decimal #907 (jorgecarleitao)
Testing updates:
- Added more tests #910 (jorgecarleitao)
v0.10.0 (2022-03-12)
Breaking changes:
- Renamed
Ffi_ArrowArray
andFfi_ArrowSchema
#859 - Improved performance and stability of writing to CSV #866 (ritchie46)
- Simplified API for writing to JSON #864 (jorgecarleitao)
- Simplified API to import from FFI #854 (jorgecarleitao)
- Simplified compute (lower/upper) #847 (jorgecarleitao)
- Simplified infering arrow schema from a parquet schema #819 (jorgecarleitao)
- Bumped parquet and aligned API to fit into it #795 (jorgecarleitao)
New features:
- Added
GrowableUnion
#902 (jorgecarleitao) - Added cast to
months_days_ns
#900 (jorgecarleitao) - Added support for
hash
ofmonth_day_ns
arrays #899 (jorgecarleitao) - IPC sink types and IPC file stream #878 (dexterduck)
- implemented
futures::Sink
for parquet async writer #877 (dexterduck) - Added
try_new
andnew
to all arrays #873 (jorgecarleitao) - Added support for datatypes serde #858 (houqp)
- Added support to the Arrow C stream interface (read and write) #857 (jorgecarleitao)
- Support to read/write from/to ODBC #849 (jorgecarleitao)
- Added operators that include validities in comparisons #846 (ritchie46)
- Added support to read and write
Decimal128
to Avro #837 (potter420) - Added support to read Arrow streams asynchronously #832 (jorgecarleitao)
- Added support to write
LargeUtf8
andLargeBinary
to Avro #828 (illumination-k) - Added support for pushdown projection in reading Avro #827 (jorgecarleitao)
- Added support to read Avro's structs #826 (jorgecarleitao)
- Added support to write largeUtf8/Binary to Avro #825 (jorgecarleitao)
- Added json serialization of timestamp/date32/date64 #814 (ritchie46)
- Added
BooleanArray::from_trusted_len_values_iter_unchecked
#799 (ritchie46) - Added
MutableUtf8Array::extend_values
#798 (ritchie46) - Added COW semantics to
Buffer
,Bitmap
and some arrays #794 (ritchie46) - Added support to read parquet row groups in chunks #789 (jorgecarleitao)
- Added scalar bitwise ops #788 (jorgecarleitao)
- Migrated to portable simd #747 (jorgecarleitao)
Fixed bugs:
- Fixed edge case in reading multiple parquet pages #904 (jorgecarleitao)
- Bug fix in offset for sliced unions #891 (ncpenke)
- Fix edge case in reading nested parquet #884 (jorgecarleitao)
- Fixed unsoundness of
#derive(Clone)
for FFI structs #882 (jorgecarleitao) - Fixed json writing of dates and datetimes #867 (jorgecarleitao)
- Fixed reading parquet with timezone #862 (jorgecarleitao)
- Fixed error in writing compressed IPC arrow #855 (jorgecarleitao)
- Fixed wrong null_count when slicing a sliced Bitmap #848 (satlank)
- Fixed error in writing compressed IPC files #840 (jorgecarleitao)
- Fixed float to i128 cast #817 (houqp)
- fix unescaped '"' in json writing #812 (ritchie46)
- Fixed reading parquet binary dict page #791 (danburkert)
Enhancements:
- Add
FixedSizeBinaryScalar
#782 - Use more idiomatic versions #898 (jorgecarleitao)
- Added support for min/max for decimal #897 (jorgecarleitao)
- Made
FixedSizeList::try_push_valid
public and addednew_with_field
#887 (ncpenke) - Added
MutableFixedList::mut_values
#886 (jorgecarleitao) - Made IPC IO use
try_new
#879 (jorgecarleitao) - expose
ListValuesIter
#874 (ritchie46) - Bumped crc #856 (jorgecarleitao)
- DRY parquet reading #845 (jorgecarleitao)
- Refactored (internal) fmt #842 (jorgecarleitao)
- Bumped zstd #841 (jorgecarleitao)
- inline push #835 (ritchie46)
- Increased API consistency for COW and respective docs #833 (jorgecarleitao)
- Improved flexibility of reading parquet #820 (jorgecarleitao)
- Small improvement to deserializing fixed-len parquet statistics. #818 (jorgecarleitao)
- Added support for other timestamp units from parquet #803 (jorgecarleitao)
- More to
into_mut
implementations #801 (ritchie46) - Added
FixedSizeListScalar
andFixedSizeBinaryScalar
#786 (illumination-k) - DRY parquet module #785 (jorgecarleitao)
Documentation updates:
- Improved documentation #860 (jorgecarleitao)
- Made crate
deny(missing_docs)
#808 (jorgecarleitao) - Fixed doc for
Bitmap::set_bit
#802 (yjshen) - Fixed
dyn Array::slice
docstring #792 (ritchie46)
Testing updates:
- Simpler code (DRY) #901 (jorgecarleitao)
- Fixed integration test #885 (jorgecarleitao)
- Simplified code to generate parquet files for tests #883 (jorgecarleitao)
- Removed un-needed
unsafe
#843 (jorgecarleitao) - Added more tests #810 (jorgecarleitao)
- Reduced code duplication #805 (jorgecarleitao)
- upgrade to clap 3.0 #797 (Jimexist)
- Simplified avro reading and added more tests #737 (jorgecarleitao)
v0.9.1 (2022-01-19)
New features:
- Added support for compare dictionary-encoded with scalar #686 (jorgecarleitao)
Fixed bugs:
- Allowed passing
None
as ipc_fields in flight API #780 (jorgecarleitao)
Enhancements:
- Read dict binary from parquet #781 (jorgecarleitao)
- Added support to read and write float dict from parquet #778 (jorgecarleitao)
Testing updates:
- Fixed CI for SIMD #779 (jorgecarleitao)
v0.9.0 (2022-01-14)
Breaking changes:
- Added number of rows read in CSV inference #765 (jorgecarleitao)
- Refactored
nullif
#753 (jorgecarleitao) - Migrated to latest parquet2 #752 (jorgecarleitao)
- Replace flatbuffers dependency by Planus #732 (jorgecarleitao)
- Simplified
Schema
andField
#728 (jorgecarleitao) - Replaced
RecordBatch
byChunk
#717 (jorgecarleitao) - Removed
Option
from fields' metadata #715 (jorgecarleitao) - Moved dict_id to IPC-specific IO #713 (jorgecarleitao)
- Moved is_ordered from
Field
toDataType::Dictionary
#711 (jorgecarleitao) - Refactored JSON writing (5-10x) #709 (jorgecarleitao)
- Made Avro read API use
Block
andCompressedBlock
#698 (jorgecarleitao) - Simplified most traits #696 (jorgecarleitao)
- Replaced
Display
byDebug
forArray
#694 (jorgecarleitao) - Replaced
MutableBuffer
bystd::Vec
#693 (jorgecarleitao) - Simplified
Utf8Scalar
andBinaryScalar
#660 (jorgecarleitao) - Simplified Primitive and Boolean scalar #648 (jorgecarleitao)
New features:
- Add
and_scalar
andor_scalar
for boolean_kleene #662 - Add
lower
andupper
support for string #635 - Added support to cast decimal #761 (jorgecarleitao)
- Added support to deserialize JSON (!= NDJSON) #758 (jorgecarleitao)
- Added support to infer nested json structs #750 (jorgecarleitao)
- Added support to compare intervals #746 (jorgecarleitao)
- Added
any
andall
kernel #739 (ritchie46) - Added support to write Avro async #736 (jorgecarleitao)
- Added support to write interval to Avro #734 (jorgecarleitao)
- Added
and_scalar
andor_scalar
for boolean kleene #723 (silathdiir) - Added
and_scalar
andor_scalar
for boolean #707 (silathdiir) - Refactored JSON read to split IO-bounded from CPU-bounded tasks #706 (jorgecarleitao)
- Added more conversions from parquet #701 (jorgecarleitao)
- Added support for compressed Avro write #699 (jorgecarleitao)
- Added support to write to Avro #690 (jorgecarleitao)
- Added dynamic version of negation #685 (jorgecarleitao)
- Added support to read dictionary-encoded required parquet pages #683 (mdrach)
- Added
upper
#664 (Xuanwo) - Added
lower
#641 (Xuanwo) - Added support for
async
read of Avro #620 (jorgecarleitao)
Fixed bugs:
- Pyarrow and Arrow2 don't agree on Timestamp resolution #700
- Writing compressed dictionary in parquet corrupts the files #667
- Replaced assert by error in IPC read #748 (jorgecarleitao)
- Made all panics in IPC read errors #722 (jorgecarleitao)
- Fixed error in compare booleans #721 (jorgecarleitao)
- Fixed error in dispatching scalar arithmetics #682 (jorgecarleitao)
- Fixed error in reading negative decimals from parquet #679 (mdrach)
- Made IPC reader less restrictive #678 (jorgecarleitao)
- Fixed error in trait constraint in compute #665 (jorgecarleitao)
- Fixed performance regression of CSV reading #657 (jorgecarleitao)
- Fixed filter of predicate with validity #653 (ritchie46)
- Made
Scalar: Send+Sync
#644 (jorgecarleitao)
Enhancements:
- Feature: JSON IO? #712
- Simplified code #760 (jorgecarleitao)
- Added iterator of values of
FixedBinaryArray
#757 (jorgecarleitao) - Remove un-needed
unsafe
#756 (jorgecarleitao) - Replaced un-needed
unsafe
#755 (jorgecarleitao) - Made IO
#[forbid(unsafe)]
#749 (jorgecarleitao) - Improved reading nullable Avro arrays #727 (Igosuki)
- Allow to create primitive array by vec without extra memcopy #710 (sundy-li)
- Removed requirement of
use Array
to access primitives'data_type
#697 (jorgecarleitao) - Cleaned up trait usage and added forbid_unsafe to parts #695 (jorgecarleitao)
- Migrated from
avro-rs
toavro-schema
#692 (jorgecarleitao) - Added
MutablePrimitiveArray::extend_constant
#689 (jorgecarleitao) - Do not write validity without nulls in IPC #688 (jorgecarleitao)
- DRY code via macro #681 (jorgecarleitao)
- Made
dyn Array
andScalar
usable in#[derive(PartialEq)]
#680 (jorgecarleitao) - Made IPC ZSTD-compressed consumable by pyarrow #675 (jorgecarleitao)
- Simplified trait bounds in arithmetics #671 (jorgecarleitao)
- Improved performance of reading utf8 required from parquet (-15%) #670 (jorgecarleitao)
- Avoid double utf8 checks on MutableUtf8 -> Utf8 #655 (jorgecarleitao)
- Made
Buffer::offset
public #652 (ritchie46) - Improved performance in cast Primitive to Binary/String (2x) #646 (sundy-li)
- Made
Filter: Send+Sync
#645 (jorgecarleitao) - Made API to create field accept
String
#643 (jorgecarleitao)
Documentation updates:
- Fixed clippy (coming from 1.58) #763 (jorgecarleitao)
- Described how to run part of the tests #762 (jorgecarleitao)
- Improved README #735 (jorgecarleitao)
- clarify boolean value in DataType::Dictionary #718 (ritchie46)
- readme typo #687 (max-sixty)
- Added example to read parquet in parallel with rayon #658 (jorgecarleitao)
- Added documentation to
Bitmap::as_slice
#654 (ritchie46)
Testing updates:
- Improved json tests #742 (jorgecarleitao)
- Added integration tests for writing compressed parquet #740 (jorgecarleitao)
- Updated patch for integration test #731 (jorgecarleitao)
- Added cargo check to benchmarks #730 (sundy-li)
- More tests to CSV writing #724 (jorgecarleitao)
- Added integration tests for other compressions with parquet from pyarrow #674 (jorgecarleitao)
- Bumped nightly in CI #672 (jorgecarleitao)
- Invalidate caches from CI. #656 (jorgecarleitao)
v0.8.1 (2021-11-27)
Fixed bugs:
v0.8.0 (2021-11-27)
Breaking changes:
- Made CSV write options use chrono formatting by default #624
- Add
compression
toIpcWriteOptions
#570 - Made
cast
acceptCastOptions
parameter #569 - Simplified
ArrowError
#640 (jorgecarleitao) - Use
DynComparator
forlexsort
andpartition
#637 (yjshen) - Split "compute" feature #634 (jorgecarleitao)
- Removed unneeded trait. #628 (jorgecarleitao)
- Sealed 2 traits to forbid downstream implementations #621 (jorgecarleitao)
- Simplified arithmetics compute #607 (jorgecarleitao)
- Refactored comparison
Operator
#604 (jorgecarleitao) - Simplified dictionary indexes #584 (jorgecarleitao)
- Simplified IPC APIs #576 (jorgecarleitao)
- Simplified IPC stream writer / remove finish on drop from stream writer #575 (jorgecarleitao)
- Simplified trait in compute. #572 (jorgecarleitao)
- Compute: add partial option into CastOptions #561 (sundy-li)
- Introduced
UnionMode
enum #557 (simonvandel) - Changed DataType::FixedSize*(i32) to DataType::FixedSize*(usize) #556 (simonvandel)
New features:
- Added support to write timestamps with timezones for CSV #623 (jorgecarleitao)
- Added support to read Avro files' metadata asynchronously #614 (jorgecarleitao)
- Added iterator for
StructArray
#613 (illumination-k) - Added support to read snappy-compressed Avro #612 (jorgecarleitao)
- Added support to read decimal from csv #602 (jorgecarleitao)
- Added support to cast
NullArray
to all other types #589 (flaneur2020) - Added support dictionaries in nested types over IPC #587 (jorgecarleitao)
- Added support to write Arrow IPC streams asynchronously #577 (jorgecarleitao)
- Added support to write compressed Arrow IPC (feather v2) #566 (jorgecarleitao)
- Added support for ffi for
FixedSizeList
andFixedSizeBinary
#565 (jorgecarleitao) - Added support for
async
csv reading. #562 (jorgecarleitao) - Added support for
bitwise
operations #553 (1aguna) - Added support to read
StructArray
from parquet #547 (jorgecarleitao)
Fixed bugs:
- Fixed error in reading nullable from Avro. #631 (jorgecarleitao)
- Fixed error in union FFI #625 (jorgecarleitao)
- Fixed error in computing projection in
io::ipc::read::reader::FileReader
#596 (illumination-k) - Fixed error in compressing IPC LZ4 #593 (jorgecarleitao)
- Fixed growable of dictionaries negative keys #582 (ritchie46)
- Made substring kernel on utf8 take chars into account. #568 (ritchie46)
- Fixed error in passing sliced arrays via FFI #564 (jorgecarleitao)
Enhancements:
- Faster
take
with null values (2-3x) #633 (jorgecarleitao) - Improved error message for missing feature in compressed parquet #632 (jorgecarleitao)
- Added
to
conversion toFixedSizeBinary
#622 (ritchie46) - Bumped
confy-table
#618 (jorgecarleitao) - Made
MutableArray
Send + Sync
#617 (jorgecarleitao) - Removed most of allocations in IPC reading #611 (jorgecarleitao)
- Speed up boolean comparison kernels (~3x) #610 (Dandandan)
- Improved performance of decimal arithmetics #605 (jorgecarleitao)
- Simplified traits and added documentation #603 (jorgecarleitao)
- Improved performance of
is_not_null
. #600 (jorgecarleitao) - Added
len
to every array #599 (jorgecarleitao) - Added support for
NullArray
at FFI. #598 (jorgecarleitao) - Optimized
MutableBinaryArray
#597 (jorgecarleitao) - Speedup/simplify bitwise operations (avoid extra allocation) #586 (Dandandan)
- Improved performance of
bitmap::from_trusted
(3x) #578 (jorgecarleitao) - Made bitmap not cache null count #563 (jorgecarleitao)
- Avoided redundant checks in creating an
Utf8Array
fromMutableUtf8Array
#560 (jorgecarleitao) - Avoid unnecessary allocations #559 (simonvandel)
- Surfaced errors in reading from avro #558 (jorgecarleitao)
Documentation updates:
- Simplified example #619 (jorgecarleitao)
- Made example of parallel parquet write be over multiple batches #544 (jorgecarleitao)
Testing updates:
- Cleaned up benches #636 (jorgecarleitao)
- Ignored tests code in coverage report #615 (yjhmelody)
- Added more tests #601 (jorgecarleitao)
- Mitigated
RUSTSEC-2020-0159
#595 (jorgecarleitao) - Added more tests #591 (jorgecarleitao)
v0.7.0 (2021-10-29)
Breaking changes:
- Simplified reading parquet #532 (jorgecarleitao)
- Change IPC
FileReader
to own the underlying reader #518 (blakesmith) - Migrate to
arrow_format
crate #517 (jorgecarleitao)
New features:
- Added read of 2-level nested lists from parquet #548 (jorgecarleitao)
- add dictionary serialization for csv-writer #515 (ritchie46)
- Added
checked_negate
andwrapping_negate
forPrimitiveArray
#506 (yjhmelody)
Fixed bugs:
- Fixed error in reading fixed len binary from parquet #549 (jorgecarleitao)
- Fixed ffi of sliced arrays #540 (jorgecarleitao)
- Fixed s3 example #536 (jorgecarleitao)
- Fixed error in writing compressed parquet dict pages #523 (jorgecarleitao)
- Validity taken into account when writing
StructArray
to json #511 (VasanthakumarV)
Enhancements:
- Bumped Prost and Tonic #550 (PsiACE)
- Speedup scalar boolean operations #546 (Dandandan)
- Added fast path for validating ASCII text (~1.12-1.89x improvement on reading ASCII parquet data) #542 (Dandandan)
- Exposed missing APIs to write parquet in parallel #539 (jorgecarleitao)
- improve utf8 init validity #530 (ritchie46)
- export missing
BinaryValueIter
#526 (yjhmelody)
Documentation updates:
- Added more IPC documentation #534 (HagaiHargil)
- Fixed clippy and fmt #521 (ritchie46)
Testing updates:
- Added more tests for
utf8
#543 (jorgecarleitao) - Ignored RUSTSEC-2020-0071 and RUSTSEC-2020-0159 #537 (jorgecarleitao)
- Improved parquet read benches #533 (jorgecarleitao)
- Added fmt and clippy checks to CI. #522 (xudong963)
v0.6.2 (2021-10-09)
New features:
Fixed bugs:
- Do not check offsets or utf8 validity in ffi (#505) #510 (NilsBarlaug)
- Made
try_push_valid
public again #509 (ritchie46)
Enhancements:
v0.6.1 (2021-10-07)
Breaking changes:
- Bring
MutableFixedSizeListArray
to the spec used by the rest of the Mutable API #475 - Removed
ALIGNMENT
invariant from[Mutable]Buffer
#449 - Un-nested
compute::arithemtics::basic
#461 (jorgecarleitao) - Added more serialization options for csv writer. #453 (ritchie46)
- Changed validity from
&Option<Bitmap>
toOption<&Bitmap>
. #431 (jorgecarleitao) - Bumped parquet2 #422 (jorgecarleitao)
- Changed IPC
FileWriter
to own thewriter
. #420 (yjshen) - Made
DynComparator
Send+Sync
#414 (yjshen)
New features:
- Read Decimal from Parquet File #444
- Add IO read for Avro #401
- Added support to read Avro logical types,
List
,Enum
,Duration
andFixed
. #493 (jorgecarleitao) - Added read
Decimal
from parquet #489 (potter420) - Implement
BitXor
trait forBitmap
#485 (houqp) - Added
extend
/extend_unchecked
forMutableBooleanArray
#478 (VasanthakumarV) - expose
shrink_to_fit
to mutable arrays #467 (ritchie46) - Added support for
DataType::Map
andMapArray
#464 (jorgecarleitao) - Extract parts of datetime #433 (VasanthakumarV)
- Added support to add an interval to a timestamp #417 (jorgecarleitao)
- Added support to read Avro. #406 (jorgecarleitao)
- Replaced own allocator by
std::Vec
. #385 (jorgecarleitao)
Fixed bugs:
- crash in parquet read #459
- Made writing stream to parquet require a non-static lifetime #471 (GrandChaman)
- Made importing from FFI
unsafe
#458 (jorgecarleitao) - Fixed panic in division using nulls. #438 (jorgecarleitao)
- Fixed error writing dictionary extension to IPC #397 (jorgecarleitao)
- Fixed error in extending
MutableBitmap
#393 (jorgecarleitao)
Enhancements:
- Some
compare
function are not exported #349 - Investigate how to add support for timezones in timestamp #23
- Made
hash
work for extension type #487 (jorgecarleitao) - Added
extend
/extend_unchecked
forMutableBinaryArray
#486 (VasanthakumarV) - Improved inference and deserialization of CSV #483 (jorgecarleitao)
- Added
GrowableFixedSizeList
and improvedMutableFixedSizeListArray
#470 (jorgecarleitao) - Added
MutableBitmap::shrink_to_fit
#468 (jorgecarleitao) - Added
MutableArray::as_box
#450 (sd2k) - Improved performance of sum aggregation via aligned loads (-10%) #445 (ritchie46)
- Removed
assert
fromMutableBuffer::set_len
#443 (ritchie46) - Optimized
null_count
#442 (ritchie46) - Improved performance of list iterator (- 10-20%) #441 (ritchie46)
- Improved performance of
PrimitiveGrowable
for nulls (-10%) #434 (jorgecarleitao) - Allowed accessing validity without importing
Array
#432 (jorgecarleitao) - Optimize hashing using
ahash
andmultiversion
(-30%) #428 (Dandandan) - Improved performance of iterator of
Utf8Array
andBinaryArray
(3-4x) #427 (jorgecarleitao) - Improved performance of utf8 validation of large strings via
simdutf8
(-40%) #426 (Dandandan) - Added reading of parquet required dictionary-encoded binary. #419 (jorgecarleitao)
- Add
extend
/extend_unchecked
forMutableUtf8Array
#413 (VasanthakumarV) - Added support to extract hours and years from timestamps with timezone #412 (jorgecarleitao)
- Added
io_csv_read
andio_csv_write
feature #408 (ritchie46) - Improve
comparison
docs and re-export the array-comparing function #404 (HagaiHargil) - Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
- Added
Array::with_validity
#399 (ritchie46)
Documentation updates:
- Improved documentation #491 (jorgecarleitao)
- Added more API docs. #479 (jorgecarleitao)
- Added more documentation #476 (jorgecarleitao)
- Improved documentation #462 (jorgecarleitao)
- Added example showing parallel writes to parquet (x num_cores) #436 (jorgecarleitao)
- Improved documentation #430 (jorgecarleitao)
- [0.5] The docs
io
module has no submodules #390 - Made docs be compiled with feature
full
#391 (jorgecarleitao)
Testing updates:
- DRY via macro. #477 (jorgecarleitao)
- DRY of type check and len check code in
compute
#474 (yjhmelody) - Added property testing #460 (jorgecarleitao)
- Added fmt to CI. #455 (jorgecarleitao)
- Simplified CI #452 (jorgecarleitao)
- fix filter kernels bench #440 (ritchie46)
- Reduced number of combinations in feature tests. #429 (jorgecarleitao)
- Move tests from
src/compute/
totests/
#423 (VasanthakumarV) - Skipped some feature permutations. #411 (jorgecarleitao)
- Added tests to some invariants of
unsafe
#403 (jorgecarleitao) - Added support to read and write extension types to and from parquet #396 (jorgecarleitao)
- Fix testing of SIMD #394 (jorgecarleitao)
v0.5.3 (2021-09-14)
New features:
- Added support to read and write extension types to and from parquet #396 (jorgecarleitao)
Fixed bugs:
- Fixed error writing dictionary extension to IPC #397 (jorgecarleitao)
- Fixed error in extending
MutableBitmap
#393 (jorgecarleitao)
Enhancements:
- Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
- Added
Array::with_validity
#399 (ritchie46)
Testing updates:
- Fix testing of SIMD #394 (jorgecarleitao)
v0.5.1 (2021-09-09)
Documentation updates:
- [0.5] The docs
io
module has no submodules #390 - Made docs be compiled with feature
full
#391 (jorgecarleitao)
v0.5.0 (2021-09-07)
Breaking changes:
- Added
Extension
toDataType
#361 MonthDayNano
added to enumIntervalUnit
#360- Make
io::parquet::write::write_*
return size of file in bytes #354 - Renamed
bitmap::utils::null_count
tobitmap::utils::count_zeros
#342 - Made
GroupFilter
optional in parquet'sRecordReader
and added method to set it. #386 (jorgecarleitao) - Removed
PartialOrd
andOrd
of all enums indatatypes
#379 (jorgecarleitao) - Made
cargo
features not default #369 (jorgecarleitao) - Prepare APIs for extension types #357 (jorgecarleitao)
New features:
- Added support for
async
parquet write #372 (GrandChaman) - Add support to extension types in FFI #363 (jorgecarleitao)
- Added support for field's metadata via FFI #362 (jorgecarleitao)
- Added support for
Extension
(logical) type #359 (jorgecarleitao) - Added support for compute to
BinaryArray
#346 (zhyass) - Added support for reading binary from CSV #337 (jorgecarleitao)
- Added support for
MONTH_DAY_NANO
interval type #268 (jorgecarleitao)
Fixed bugs:
- Parquet read skips a few rows at the end of the page #373
parquet_read
fails when a column has too many rows with string values #366parquet_read
panics withindex_out_of_bounds
#351- Fixed error in
MutableBitmap::push_unchecked
#384 (jorgecarleitao) - Fixed display of timestamp with tz. #375 (jorgecarleitao)
Enhancements:
- Added
extend_*values
toMutablePrimitiveArray
#383 (ritchie46) - Improved performance of writing to CSV (20-25%) #382 (jorgecarleitao)
- Bumped
lexical-core
#378 (jorgecarleitao) - Fixed casting of utf8 <> Timestamp with and without timezone #376 (jorgecarleitao)
- Added
Send+Sync
toMutableBuffer
#368 (jorgecarleitao) - Improved performance of unary _not_ for aligned bitmaps (3x) #365 (jorgecarleitao)
- Reduced dependencies within
num
#353 (jorgecarleitao) - Bumped to parquet2 v0.4 #352 (jorgecarleitao)
- Bumped tonic and prost in flight #344 (PsiACE)
- Improved null count calculation (5x) #343 (jorgecarleitao)
- Improved perf of deserializing integers from json (30%) #340 (jorgecarleitao)
- Simplified code of json schema inference #339 (jorgecarleitao)
Documentation updates:
- Moved guide examples to examples/ #387 (jorgecarleitao)
- Added more docs. #358 (jorgecarleitao)
- Improved API docs. #355 (jorgecarleitao)
Testing updates:
- Moved tests to
tests/
#389 (jorgecarleitao) - Moved compute tests to tests/ #388 (jorgecarleitao)
- Added more tests. #380 (jorgecarleitao)
- Pinned nightly in SIMD tests #364 (jorgecarleitao)
- Improved benches for take #348 (jorgecarleitao)
- Made IPC integration tests run tests that are not run by arrow-rs #278 (jorgecarleitao)
v0.4.0 (2021-08-24)
Breaking changes:
- Change dictionary iterator of values from
Array
s of one element toScalar
s #335 - Align FFI API with arrow's C++ API #328
- Make
*_compare_scalar
not returnResult
#316 - Make
io::print
,get_value_display
andget_display
not returnResult
#286 - Add
MetadataVersion
to IPC interfaces #282 - Change
DataType::Union
to enable round trips in IPC #281 - Removed clone requirement in
StructArray -> RecordBatch
#307 (jorgecarleitao) - Fixed error in reading a non-finished IPC stream. #302 (jorgecarleitao)
- Generalized ZipIterator to accept a
BitmapIter
#296 (jorgecarleitao)
New features:
- Added API to FFI
Field
#321 (jorgecarleitao) - Added
compare_scalar
#317 (jorgecarleitao) - Add
UnionArray
#283 (jorgecarleitao)
Fixed bugs:
- SliceIterator of last bytes is not correct #292
- Fixed error in displaying dictionaries with nulls in values #334 (jorgecarleitao)
- Fixed error in dict equality #333 (jorgecarleitao)
- Fixed small inconsistencies between
compute::cast
andcompute::can_cast
#295 (jorgecarleitao) - Removed order implementation for
days_ms
/Interval(DayTime)
#285 (jorgecarleitao)
Enhancements:
- Added support for remaining non-nested datatypes #336 (jorgecarleitao)
- Made
multiversion
andlexical-core
optional #324 (jorgecarleitao) - Improved performance of utf8 comparison (1.7x-4x) #322 (jorgecarleitao)
- Improved performance of boolean comparison (5x-14x) #318 (jorgecarleitao)
- Added trait
TryPush
#314 (jorgecarleitao) - Added cast
date32 -> i64
anddate64 -> i32
#308 (ritchie46) - Improved performance of comparison with SIMD feature flag (2x-3.5x) #305 (jorgecarleitao)
- Added support to read json to
BinaryArray
#304 (jorgecarleitao) - Improved
MutableFixedSizeBinaryArray
#303 (jorgecarleitao) - Improved
MutablePrimitiveArray
andMutableUtf8Array
#299 (jorgecarleitao) - Improved
MutableBooleanArray
#297 (jorgecarleitao) - Improved performance of concatenating non-aligned validities (15x) #291 (jorgecarleitao)
- Added support for timestamps with tz and interval to
io::print::write
#287 (jorgecarleitao) - Improved debug repr of buffers and bitmaps. #284 (jorgecarleitao)
- Cleaned up internals of json integration #280 (jorgecarleitao)
- Removed
serde_derive
dependency #279 (jorgecarleitao) - Simplified IPC code. #277 (jorgecarleitao)
- Reduced dependencies from confi-table and enabled
wasm
onio_print
feature. #276 (jorgecarleitao) - Improve performance of
rem_scalar/div_scalar
for integer types (4x-10x) #275 (ritchie46)
Documentation updates:
- Cleaned examples and docs from old API. #330 (jorgecarleitao)
- Improved documentation #306 (jorgecarleitao)
Testing updates:
- Improved naming of testing workflows #315 (jorgecarleitao)
- Added tests to scalar API #300 (jorgecarleitao)
- Made CSV and JSON tests not use files. #290 (jorgecarleitao)
- Moved tests to integration tests #289 (jorgecarleitao)
Closed issues:
- Make parquet_read_record support async #331
- Panic due to SIMD comparison #312
- Bitmap::mutable line 155 may Panic/segfault #309
- IPC's
StreamReader
may abort due to excessive memory by overflowing ausize
d variable #301 - Improve performance of
rem_scalar/div_scalar
for integer types (4x-10x) #259
v0.3.0 (2021-08-11)
Breaking changes:
- Renamed
sum
tosum_primitive
#273 - Moved trait
Index
fromarray::Index
totypes::Index
#272 - Added optional
projection
to IPC FileReader #271 - Added optional
page_filter
to parquet'sRecordReader
andget_page_iterator
#270 - Renamed parquets'
CompressionCodec
toCompression
#269
New features:
- Added support for FFI of dictionary-encoded arrays #267 (jorgecarleitao)
- Added support for projection pushdown on IPC files #264 (jorgecarleitao)
- Added support to read parquet asynchronously #260 (jorgecarleitao)
- Added support to filter parquet pages. #256 (jorgecarleitao)
- Added wrapping_cast to cast kernels #254 (sundy-li)
- Added support to parquet IO on wasm32 #239 (jorgecarleitao)
- Added support to round-trip dictionary arrays on parquet #232 (jorgecarleitao)
- Added Scalar API #56 (jorgecarleitao)
Fixed bugs:
- Fixed error in computing remainder of chunk iterator #262 (jorgecarleitao)
- Fixed error in slicing bitmap. #250 (jorgecarleitao)
Enhancements:
- Improve the performance in cast kernel using AsPrimitive trait in generic dispatch #252
- Poor performance in
sort::sort_to_indices
with limit option in arrow2 #245 - Support loading Feather v2 (IPC) files with more than 1 million tables #231
- Migrated to parquet2 v0.3 #265 (jorgecarleitao)
- Added more tests to cast and min/max #253 (jorgecarleitao)
- Prettytable is unmaintained. Change to comfy-table #251 (PsiACE)
- Added IndexRange to remove checks in hot loops #247 (jorgecarleitao)
- Make merge_sort_slices MergeSortSlices public #243 (sundy-li)
Documentation updates:
- Added example and guide section on compute #242 (jorgecarleitao)
Closed issues:
- Allow projection pushdown to IPC files #261
- Add support to write dictionary-encoded pages #211
- Make IpcWriteOptions easier to find. #120
v0.2.0 (2021-07-30)
Breaking changes:
- Simplified
new
signature of growable API #238 (jorgecarleitao) - Add support to merge sort with a limit #222 (sundy-li)
- Generalized sort to accept indices other than i32. #220 (jorgecarleitao)
- Added support for limited sort #218 (jorgecarleitao)
New features:
- Merge sort support limit option #221
- Introduce limit option to sort #215
- Added support for take of interval of days_ms #219 (jorgecarleitao)
- Added FFI for remaining types #213 (jorgecarleitao)
Fixed bugs:
- Filter operation on sliced utf8 arrays are incorrect #233
- Fixed error in slicing bitmap. #237 (jorgecarleitao)
- Fixed nested FFI. #212 (jorgecarleitao)
Enhancements:
- Avoid materialization of indices in filter_record_batch for single arrays #234
- Add integration tests for writing to parquet #80
- Short-circuited boolean evaluation in GrowableList #228 (ritchie46)
- Add extra inlining to speed up take #226 (Dandandan)
- Removed un-needed
unsafe
#225 (jorgecarleitao)
Documentation updates:
- Add documentation to guide #96
- Add git submodule command to correct the test doc #223 (sundy-li)
- Added badges to README #216 (sundy-li)
- Clarified differences with arrow crate #210 (alamb)
- Clarified differences with arrow crate #209 (alamb)
* This Changelog was automatically generated by github_changelog_generator