This release introduced explicit support for netcoreapp2.1
, and is mainly focused on perf-improvements over v1.0.0. Some key-points are:
- improved SIMD reduction #49
- loop unrolling "eats" more iterations #52
- extensively used intruction level parallelism (ISP) #57
- SIMD operations are done aligned to register size #60
Perf improved about +25...+200%
(depending on data-size, whether parallelized compuation is done or not, and so on).
As example a graph for the combined compuation of average and variance is shown.