BSL (the block-sorting lab) benchmark:
- A lot of experiments with CUDA MTF implementations. The best one, depending on actual data, is either mtf_scalar, mtf_2buffers or mtf_4by8 (see results.txt).
- CUDA MTF raw speed reached 700 MB/s on ENWIK8 data, that is 1.5-2 GB/s effective speed, taking into account that preceding RLE stage shaves off 60-70% of BWT output.
- CPU MTF algorithm by Eugene Shelwien, 150-200 MB/s raw speed, i.e. 500 MB/s effective speed per core.
- New rolling-hash based LZP preprocessing algorithm, up to 500 MB/s per core.
- Almost complete, LZP+BWT/ST+RLE+MTF stack (only entropy coding isn't yet implemented), allowing to measure speed/ratio of various stage combinations.
Radix-sort benchmark: measures speed of the CUB radix sort with various parameters.
All GPU speeds are measured on GF560Ti overclocked to 900 MHz. All CPU speeds are measured on the Haswell i7-4770.