Releases: Bears-R-Us/arkouda
Releases · Bears-R-Us/arkouda
Release Notes v2022.04.15
Release Notes 2022-04-15
Major updates:
- Issue #1218 - Extends pdarray setops to work on multiple pdarrays
- Issue #1197 - Adds Segarray setops functionality
- Issue #1234 - Removes
ls_hdf
in favor of a genericls
which automatically handles hdf5 or parquet files - Issue #1265 - Adds
to_upper/to_lower
andis_upper/is_lower
functionality to Strings - PRs #1222, #1227, #1233 - Add support for writing string Parquet files and an append mode for Parquet file writing
- Issue #1272 - Changes
ak.histogram
to behave likeakutil.hist
- Issue #1256 - Moves
akutil.join
functionality into arkouda - Issues #1133, #1210 and, #1279 - Fix uint64 indexing and broadcasting errors
- Issue #1260 - Enables GroupBy with one of more boolean pdarrays
- Issue #1240 - Updates client dtype classes to be uint compatible and enables
ak.ip_address
to accept python lists - Issue #1154 - Adds
dtype
parameter toak.array
Minor fixes:
- Issue #1174 - Fixes randint range bug
- Issue #1190 - Updates
ak.Dataframe
to use the Index object - Issues #1039 and #1241 - Improves performance for String methods (peel, stick, and substring search)
- Issue #1251 and PR #1243 - Update README install instructions and adds documentation for
--saveUsedModules
flag - PRs #1219 and #1231 - Improves performance of parquet read and append
Auto-generated release notes
- Make regex calls compatible with chapel 1.26 by @ronawho in #1216
- Closes #1133 and #1210: Support indexing with uint64 by @pierce314159 in #1217
- Optimize calculation of Parquet column byte sizes for string reads by @bmcdonald3 in #1219
- Closes #1174 ak.randint() fails for ranges greater than 2**63 by @Ethan-DeBandi99 in #1214
- Closes #1154: dtype parameter for
ak.array
by @pierce314159 in #1223 - Closes #1209 fix deprecation warning by @Ethan-DeBandi99 in #1226
- Add initial support for writing string Parquet files by @bmcdonald3 in #1222
- Add append mode for Parquet file writing by @bmcdonald3 in #1227
- Bulk append values in Parquet and switch some places to int64 by @bmcdonald3 in #1231
- Closes #1232: Add Parquet string appending by @bmcdonald3 in #1233
- Unify HDF5 and Parquet ls calls into a single function by @bmcdonald3 in #1224
- Closes #1190 Index for Indexing DataFrame by @Ethan-DeBandi99 in #1235
- Fix append Parquet test when running with more than 10 locales by @bmcdonald3 in #1237
- Add documentation for saveUsedModules flag by @bmcdonald3 in #1243
- Closes #1039 Update peel/stick to use aggregation by @Ethan-DeBandi99 in #1228
- Closes #1218 extend pdarray setops to work for multiple arrays by @Ethan-DeBandi99 in #1225
- Revert "Closes #1218 extend pdarray setops to work for multiple arrays" by @mhmerrill in #1246
- Add null checking for strings by @bmcdonald3 in #1244
- Update null test file to be uncompressed by @bmcdonald3 in #1250
- Closes #1251 - Reconcile the README.md TOC sections describing install between Mac, Linux, and Windows by @Ethan-DeBandi99 in #1252
- Closes #1241: Use
computeOnSegments
forsubstringSearch
by @pierce314159 in #1248 - Part of #1254: Adds typechecked to pdarraycreation methods by @pierce314159 in #1255
- Closes #1256 -
join.py
from akutil to arkouda by @Ethan-DeBandi99 in #1258 - Fix server crash when reading string columns from multiple locales by @bmcdonald3 in #1262
- 1240 Uint64 compatibility and other improvements to client dtype classes by @reuster986 in #1264
- Closes #1218 Extend
pdarray
setops to work for multiple arrrays by @Ethan-DeBandi99 in #1266 - Closes #1265: Add
to/is_lower
andto/is_upper
methods toStrings
by @pierce314159 in #1274 - Closes #1260 - GroupBy w/ Booleans by @Ethan-DeBandi99 in #1270
- Closes #1267 - unexpected results with GroupBy() when 2nd element is string array by @Ethan-DeBandi99 in #1277
- Closes #1272 - akutil.hist move to ak.histogram. by @Ethan-DeBandi99 in #1276
- Resolves #1279 - Add
uint64
support forbroadcast
by @pierce314159 in #1283
Full Changelog: v2022.03.15...v2022.04.15
v2022.03.15
Highlights
- Lots of new functionality from akutil, including DataFrames, Series, Index, and SegArray
- New grouping API
- Lots of parquet improvements, including error handling and string read performance
What's Changed
- Closes #1110 legacy_placeholder removal by @Ethan-DeBandi99 in #1123
- more variations on the arkouda logo by @mhmerrill in #1143
- Fix quoting when getting vars from the Makefile by @ronawho in #1148
- Fix Parquet unit test version number test when Arrow version < 5 by @bmcdonald3 in #1150
- Fix crash when passing in bad column names to Parquet by @bmcdonald3 in #1157
- Remove trailing space for ak.get_datasets by @bmcdonald3 in #1159
- Closes #1121 uint64 hash return by @Ethan-DeBandi99 in #1147
- Closes #1145: Add
uint64
support forak.arange
by @pierce314159 in #1152 - Closes #1012
.astype()
method by @Ethan-DeBandi99 in #1160 - Change ak.read_parquet to read all supported datasets by default by @bmcdonald3 in #1146
- Closes #1118 move segarray/dataframe from akutil by @Ethan-DeBandi99 in #1139
- Add support for reading and writing bool Parquet columns by @bmcdonald3 in #1156
- Add support for reading string columns from Parquet files by @bmcdonald3 in #1163
- Add support for reading and writing float64 Arrow columns by @bmcdonald3 in #1170
- Closes #1126 dtypes and util to arkouda by @Ethan-DeBandi99 in #1171
- Closes #1127 alignment to arkouda by @Ethan-DeBandi99 in #1172
- Get Parquet string reading offsets in C++ function by @bmcdonald3 in #1175
- Add UsedModules.cfg to .gitignore by @bmcdonald3 in #1179
- Add line to pass Arrow float to Chapel side by @bmcdonald3 in #1186
- Closes #1165 drop on axis by @Ethan-DeBandi99 in #1177
- Update Arkouda to work with CTypes by @bradcray in #1185
- Catch all Parquet errors and report them to the client by @bmcdonald3 in #1188
- Closes #1181: use apache/parquet-testing files by @reuster986 in #1183
- Update Arrow version to 7 and add snappy to install by @bmcdonald3 in #1196
- Add support for reading binary data columns in Parquet by @bmcdonald3 in #1195
- Closes #1176 doc version num by @Ethan-DeBandi99 in #1198
- Closes #1022 collapse unique and value_counts server messages by @Ethan-DeBandi99 in #1192
- Issue #1189: Fix type of H5Literate argument to H5_iter_order_t by @glitch in #1201
- Closes #1128 series/index into arkouda by @Ethan-DeBandi99 in #1182
- Closes #1155:
ak.cast
doesn't work with numpy dtypes by @pierce314159 in #1202 - Closes #1178 ak.DataFrame.from_pandas() by @Ethan-DeBandi99 in #1184
- Only read necessary elements in Chapel array for Parquet by @bmcdonald3 in #1204
- Closes #1207: Implement
ak.full
andak.full_like
by @pierce314159 in #1208 - Update optional Parquet test to account for string and binary changes by @bmcdonald3 in #1205
- Closes #1203 -
SegArray
- Allow Empty Segments by @Ethan-DeBandi99 in #1206 - Closes #1164
ak.DataFrame
Testing Adjustments by @Ethan-DeBandi99 in #1191 - Consolidate and standardize grouping API by @reuster986 in #1212
Full Changelog: v2022.02.23...v2022.03.15
v2022.02.23
- Support for a few remaining operations on
uint64
- Improvements and tuning of
in1d
- Support for listing columns of parquet files via
ak.get_datasets()
v2022.02.16
Highlights:
v2022.02.01
Highlights
- Modular build process allows building with external modules
- Optimized
ak.in1d()
- Improved Parquet read performance
- Fixed precision issue affecting grouped sum on float64 and related bug in grouped AND/OR
What's Changed
- Update compatibility to numpy - 1.21.5 by @glitch in #1033
- Speed up gasnet unit tests by only using 1 thread per locale by @ronawho in #1035
- Bump
substring_search
problem size back to10**8
by @ronawho in #1038 - fixes float precision bug #964 by @reuster986 in #1036
- Revert "fixes float precision bug #964" by @mhmerrill in #1043
- Add support for passing server args to runClient/run_benchmarks.py by @ronawho in #1041
- Closes #1031: Remove non-regex substring search by @pierce314159 in #1032
- Optimize small and medium int in1d operations by @ronawho in #1044
- Refactor binary operator function with doBinOp function by @bmcdonald3 in #1034
- Allow modules from outside the Arkouda source dir to be built into the server by @bmcdonald3 in #1047
- Parallelize Parquet file reading on-node by @bmcdonald3 in #1050
- Issue #1049: Adds capability to print server commands from the client. by @glitch in #1052
- Update modularization docs to more clearly specify absolute paths by @bmcdonald3 in #1056
- #964 Fix sum precision for real this time by @reuster986 in #1055
Full Changelog: v2022.01.20...v2022.02.01
Release Notes v2022.01.20
Release Notes 2022-01-20
Major updates:
- Issue #786 - Server side complex object support and Symbol Table Type Hierarchy
How server side objects are managed has changed from individualized pdarrays to encapsulated complex objects. This reduces the complexity in the message passing layer where complex objects would pass multiple ids for each component and now need only pass one. The first implementation of the complex object is the SegString - segmented string array. A basic Type hierarchy for complex objects was also introduced so there is a root type stored in the Symbol Table. - Adding build support for Chapel version 1.25.1 which is now the Recommended version (see PR#1027).
- Modular server builds & internal I/O code refactor, see PR#1017 and Issue #1005. Developers can now choose to exclude various modules in the server build process by commenting them out in
ServerModules.cfg
(we plan to improve this capability in future releases) - Issue #963 & #940 - Performance improvements on regex and string search methods etc.
- Issue #985 (ongoing) - Parquet improvements regarding error handling, timestamps(Issue #1026), and performance; (PRs #1014, #992, #993, #1023, #1024, #1028)
- Issue #930 - Externally generated server tokens are now allowed.
Minor fixes:
- Issue #933 Documentation fix and final removal of Read-the-docs in favor of Github pages
- Issue #990 A logic error in Categoricals involving in1d was fixed.
- Apache Arrow version info has been add to the server configuration information (PR#995)
- Server configuration information is now cached instead of being recreated on each call.
- Issue #973 - new benchmarks were added for various data distributions
- Issue #914 was fixed by changing string writes to HDF5 by using aggregators
Auto-generated release notes
- 786 SegString as single entry (commits grouped by type) & Complex Object in Symbol Table by @glitch in #830
- Fix # 786 Adds typing hierarchy regarding SymEntry types to Parquet code. by @glitch in #991
- Add Parquet support information in env file by @bmcdonald3 in #992
- Closes #933 by fixing docs and removing read-the-docs in favor of Github pages by @glitch in #1003
- Refresh akutil by @reuster986 in #996
- Add handling of errors in Parquet code and minor clean up by @bmcdonald3 in #993
- Add Arrow versioning to ak.get_config() call by @bmcdonald3 in #995
- Remove out of date comment about poor sort performance on IB by @ronawho in #998
- Closes #999: adds call to super.init for SegStringSymEntry. by @glitch in #1000
- Add in1d to list of benchmarks to run by @ronawho in #1002
- Cache the server config string instead of recreating it by @ronawho in #1004
- Closes #990: Logic error in Categorical.in1d and other methods by @pierce314159 in #1001
- Data distributions for sort benchmarking #973 by @reuster986 in #977
- update README.md with ArkoudaWeeklyCall by @mhmerrill in #1009
- Optimize Parquet reading by reading batches rather than creating a copy by @bmcdonald3 in #1014
- Drop in1d down to 1 trial by @ronawho in #1021
- Issue #914: Uses aggregator for SegString to HDF5 writes by @glitch in #1016
- Modularize build process by @bmcdonald3 in #1017
- Change Parquet C++ types from int to int64_t and reorganization by @bmcdonald3 in #1023
- 980 allow externally generated server tokens by @hokiegeek2 in #1025
- Add support for reading timestamps in Parquet files by @bmcdonald3 in #1024
- re-add sort distributions benchmark by @reuster986 in #1029
- Recommend Chapel 1.25.1 and use it for CI testing by @ronawho in #1027
- Optimize Parquet file writing with WriteBatch function by @bmcdonald3 in #1028
- Part of Issue #940: Simplify Regex Substring Search Methods by @pierce314159 in #1030
- Issue 1005 file io refactor by @glitch in #1007
Full Changelog: v2021.12.02...v2022.01.20
Release Notes v2021.12.02
Highlights
- Introduces optional support for Parquet, Issue #903
- Official move to Chapel 1.25.0 (with backwards compatibility for Chapel 1.24.x). See Issue #954 and
- General support for HDF5 1.10.x and 1.12.x (Arkouda 1-D pdarray read/write with HDF5) See Issue #975 and #979
- Chapel's
twoArrayRadixSort
is now a runtime sorting option, see #984
Minor changes
Regex functionality and Categorical perf fix
The two major updates in this release are:
- Full implementation of the python
re
API for regex match/search/split/findall - Fixed a performance bug where it was taking inordinately long to display the head/tail of a Categorical with a large number of codes
Additionally, this release saw the complete transition to Chapel 1.25 in both the docs and CI.
2021.10.07 fixes and improvements
Highlights
It's only been a week since the last release, but several important things have happened:
- #941 fixes a bug affecting correctness of the LSD radix sort in a corner case
- #945 implements element-wise bit operations like popcount, clz, and rotations
- #935 greatly improves performance for operations that allocate many small arrays
- #931 speeds up string regex operations by improving how string segments are localized
This is an update of the (now deleted) v2021.10.06
release to incorporate the bug fix in #949 . I could not reuse the tag, so I future-dated it.
v2021.09.30
2021-09-30 Release Notes
New Functionality
- Strings regex support (Issues #894, #910, #911, #917) using
re2
library.- Adds
findall
,find_locations
,match
functions with regex support - Adds regex support for
contains
,endswith
,startswith
,flatten
,peel
,rpeel
- Adds
- Issue #822 : Adds hdf5 save/load functionality for Categoricals
- Issue #919 : Hashing for general arrays
Benchmarking updates
- Results for scalability & sorts benchmakr runs were added to the
runs
directory - Issue #930 speed up answer creation for flatten benchmark
Chapel related support/improvements
- Support for Chapel v1.25.0 by updating GenSymIO to use new
subprocess.exitCode
(see PR#916) - Use of interleave-memory if available (see PR#913)
- PR#935 memTrack/memThreshold configuration change for performance improvement