Skip to content

Releases: Bears-R-Us/arkouda

Release Notes v2022.04.15

15 Apr 14:44
563867b
Compare
Choose a tag to compare

Release Notes 2022-04-15

Major updates:

  • Issue #1218 - Extends pdarray setops to work on multiple pdarrays
  • Issue #1197 - Adds Segarray setops functionality
  • Issue #1234 - Removes ls_hdf in favor of a generic ls which automatically handles hdf5 or parquet files
  • Issue #1265 - Adds to_upper/to_lower and is_upper/is_lower functionality to Strings
  • PRs #1222, #1227, #1233 - Add support for writing string Parquet files and an append mode for Parquet file writing
  • Issue #1272 - Changes ak.histogram to behave like akutil.hist
  • Issue #1256 - Moves akutil.join functionality into arkouda
  • Issues #1133, #1210 and, #1279 - Fix uint64 indexing and broadcasting errors
  • Issue #1260 - Enables GroupBy with one of more boolean pdarrays
  • Issue #1240 - Updates client dtype classes to be uint compatible and enables ak.ip_address to accept python lists
  • Issue #1154 - Adds dtype parameter to ak.array

Minor fixes:

  • Issue #1174 - Fixes randint range bug
  • Issue #1190 - Updates ak.Dataframe to use the Index object
  • Issues #1039 and #1241 - Improves performance for String methods (peel, stick, and substring search)
  • Issue #1251 and PR #1243 - Update README install instructions and adds documentation for --saveUsedModules flag
  • PRs #1219 and #1231 - Improves performance of parquet read and append

Auto-generated release notes

Full Changelog: v2022.03.15...v2022.04.15

v2022.03.15

15 Mar 21:57
e479ed0
Compare
Choose a tag to compare

Highlights

  • Lots of new functionality from akutil, including DataFrames, Series, Index, and SegArray
  • New grouping API
  • Lots of parquet improvements, including error handling and string read performance

What's Changed

Full Changelog: v2022.02.23...v2022.03.15

v2022.02.23

23 Feb 21:35
899e312
Compare
Choose a tag to compare
  • Support for a few remaining operations on uint64
  • Improvements and tuning of in1d
  • Support for listing columns of parquet files via ak.get_datasets()

v2022.02.16

17 Feb 00:26
929f788
Compare
Choose a tag to compare

Highlights:

  • Full support for uint64 dtype (many PRs)
  • Improved performance of string hashing (#1060 ), casting (#1107 ), and comparison (#1116 )
  • Optimizations to argsort (#1061 and #1063 )
  • Resolved a major cause of "unknown symbol" errors (#1096 )
  • Support for Snappy/RLE compression in parquet writes (#1104 )

v2022.02.01

01 Feb 22:56
4b59849
Compare
Choose a tag to compare

Highlights

  • Modular build process allows building with external modules
  • Optimized ak.in1d()
  • Improved Parquet read performance
  • Fixed precision issue affecting grouped sum on float64 and related bug in grouped AND/OR

What's Changed

Full Changelog: v2022.01.20...v2022.02.01

Release Notes v2022.01.20

20 Jan 14:51
697d868
Compare
Choose a tag to compare

Release Notes 2022-01-20

Major updates:

  • Issue #786 - Server side complex object support and Symbol Table Type Hierarchy
    How server side objects are managed has changed from individualized pdarrays to encapsulated complex objects. This reduces the complexity in the message passing layer where complex objects would pass multiple ids for each component and now need only pass one. The first implementation of the complex object is the SegString - segmented string array. A basic Type hierarchy for complex objects was also introduced so there is a root type stored in the Symbol Table.
  • Adding build support for Chapel version 1.25.1 which is now the Recommended version (see PR#1027).
  • Modular server builds & internal I/O code refactor, see PR#1017 and Issue #1005. Developers can now choose to exclude various modules in the server build process by commenting them out in ServerModules.cfg (we plan to improve this capability in future releases)
  • Issue #963 & #940 - Performance improvements on regex and string search methods etc.
  • Issue #985 (ongoing) - Parquet improvements regarding error handling, timestamps(Issue #1026), and performance; (PRs #1014, #992, #993, #1023, #1024, #1028)
  • Issue #930 - Externally generated server tokens are now allowed.

Minor fixes:

  • Issue #933 Documentation fix and final removal of Read-the-docs in favor of Github pages
  • Issue #990 A logic error in Categoricals involving in1d was fixed.
  • Apache Arrow version info has been add to the server configuration information (PR#995)
  • Server configuration information is now cached instead of being recreated on each call.
  • Issue #973 - new benchmarks were added for various data distributions
  • Issue #914 was fixed by changing string writes to HDF5 by using aggregators

Auto-generated release notes

  • 786 SegString as single entry (commits grouped by type) & Complex Object in Symbol Table by @glitch in #830
  • Fix # 786 Adds typing hierarchy regarding SymEntry types to Parquet code. by @glitch in #991
  • Add Parquet support information in env file by @bmcdonald3 in #992
  • Closes #933 by fixing docs and removing read-the-docs in favor of Github pages by @glitch in #1003
  • Refresh akutil by @reuster986 in #996
  • Add handling of errors in Parquet code and minor clean up by @bmcdonald3 in #993
  • Add Arrow versioning to ak.get_config() call by @bmcdonald3 in #995
  • Remove out of date comment about poor sort performance on IB by @ronawho in #998
  • Closes #999: adds call to super.init for SegStringSymEntry. by @glitch in #1000
  • Add in1d to list of benchmarks to run by @ronawho in #1002
  • Cache the server config string instead of recreating it by @ronawho in #1004
  • Closes #990: Logic error in Categorical.in1d and other methods by @pierce314159 in #1001
  • Data distributions for sort benchmarking #973 by @reuster986 in #977
  • update README.md with ArkoudaWeeklyCall by @mhmerrill in #1009
  • Optimize Parquet reading by reading batches rather than creating a copy by @bmcdonald3 in #1014
  • Drop in1d down to 1 trial by @ronawho in #1021
  • Issue #914: Uses aggregator for SegString to HDF5 writes by @glitch in #1016
  • Modularize build process by @bmcdonald3 in #1017
  • Change Parquet C++ types from int to int64_t and reorganization by @bmcdonald3 in #1023
  • 980 allow externally generated server tokens by @hokiegeek2 in #1025
  • Add support for reading timestamps in Parquet files by @bmcdonald3 in #1024
  • re-add sort distributions benchmark by @reuster986 in #1029
  • Recommend Chapel 1.25.1 and use it for CI testing by @ronawho in #1027
  • Optimize Parquet file writing with WriteBatch function by @bmcdonald3 in #1028
  • Part of Issue #940: Simplify Regex Substring Search Methods by @pierce314159 in #1030
  • Issue 1005 file io refactor by @glitch in #1007

Full Changelog: v2021.12.02...v2022.01.20

Release Notes v2021.12.02

02 Dec 19:05
20906e9
Compare
Choose a tag to compare

Highlights

  • Introduces optional support for Parquet, Issue #903
  • Official move to Chapel 1.25.0 (with backwards compatibility for Chapel 1.24.x). See Issue #954 and
  • General support for HDF5 1.10.x and 1.12.x (Arkouda 1-D pdarray read/write with HDF5) See Issue #975 and #979
  • Chapel's twoArrayRadixSort is now a runtime sorting option, see #984

Minor changes

  • Performance improvements for substring search and regex, see Issue #963
  • Additional benchmarks
  • Minor fixes from h5ls removal, see #971

Regex functionality and Categorical perf fix

02 Nov 15:53
8256fe3
Compare
Choose a tag to compare

The two major updates in this release are:

  1. Full implementation of the python re API for regex match/search/split/findall
  2. Fixed a performance bug where it was taking inordinately long to display the head/tail of a Categorical with a large number of codes

Additionally, this release saw the complete transition to Chapel 1.25 in both the docs and CI.

2021.10.07 fixes and improvements

06 Oct 16:41
7c18062
Compare
Choose a tag to compare

Highlights

It's only been a week since the last release, but several important things have happened:

  • #941 fixes a bug affecting correctness of the LSD radix sort in a corner case
  • #945 implements element-wise bit operations like popcount, clz, and rotations
  • #935 greatly improves performance for operations that allocate many small arrays
  • #931 speeds up string regex operations by improving how string segments are localized

This is an update of the (now deleted) v2021.10.06 release to incorporate the bug fix in #949 . I could not reuse the tag, so I future-dated it.

v2021.09.30

30 Sep 16:38
db8c105
Compare
Choose a tag to compare

2021-09-30 Release Notes

New Functionality

  • Strings regex support (Issues #894, #910, #911, #917) using re2 library.
    • Adds findall, find_locations, match functions with regex support
    • Adds regex support for contains, endswith, startswith, flatten, peel, rpeel
  • Issue #822 : Adds hdf5 save/load functionality for Categoricals
  • Issue #919 : Hashing for general arrays

Benchmarking updates

  • Results for scalability & sorts benchmakr runs were added to the runs directory
    • 6bc8dd7 Add 06/21/21 hero sort runs for HPE Apollo
    • e4d4978 Add 06/07/21 scalablity results for HPE Apollo
    • 22890ef Add 04/06/21 scalability results for Cray XC and HPE Apollo
  • Issue #930 speed up answer creation for flatten benchmark

Chapel related support/improvements

  • Support for Chapel v1.25.0 by updating GenSymIO to use new subprocess.exitCode (see PR#916)
  • Use of interleave-memory if available (see PR#913)
  • PR#935 memTrack/memThreshold configuration change for performance improvement

Misc

  • Issue #904 Fixes issue with tmp dir on test systems where /tmp may not be mounted.
  • Issue #924 Avoid using pyzmq 22.3.0