-
Notifications
You must be signed in to change notification settings - Fork 26
/
changelog
88 lines (65 loc) · 2.51 KB
/
changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
June, 2021
feature spline3d kernel for seismic RTM data
May, 2021
feature alternative compression path: RLE and sparsity-gather
April, 2021
optimize Lorenzo construction and reconstruction; scale to A100
optmize fast mode, using all `float32`
March, 2021
optmize coarsen thread-block use
February, 2021
feature partial-sum based Lorenzo decompression
optmize Huffman encoding output inverse-proportional to compression ratio
January, 2021
document update benchmark
December, 2020
feature add `nvcomp`
November, 2020
October, 2020
September, 2020
feature decouple zip and unzip
deploy `sm_70`, `sm_75` (and `sm_80`)
feature use cuSPARSE `prune2csr` and `csr2dense` to handle outlier
bugfix raise error when Huffman code is longer than 32 bits
bugfix histograming error
deploy revive pSZ
feature integrate parallel build Huffman codebook
document update help doc
document update published paper, "cuSZ"
document update acknowledgement
August, 2020
deploy `sm_75` for Turing.
July, 2020
document add a new NSF grant
June, 2020
bugfix compile with CUDA 9 + gcc 7.3
May, 2020
feature add `--skip huffman` and `--verify huffman` options
feature add binning as preprocessing
feature use `cuSparse` to transform `outlier` to dense format
feature add `argparse` to check and parse argument inputs
refactor add CUDA wrappers (e.g., `mem::create_CUDA_space`)
April, 2020
feature add concise and detailed help doc
deploy `sm_61` (e.g., P1000) and `sm_70` (e.g., V100) binary
feature add dry-run mode
refactor merge cuSZ and Huffman codec in driver program
optimize tune `blockDim` for 1D Lonrenzo, 2.7 GBps -> 16.8 GBps
deploy p2013histogram supersedes naive 2007 algorithm by default
feature add communication of equivalence calculation
feature use cooperative groups (CUDA 9 required) for canonical Huffman codebook
optimize faster initializing shared memory for PdQ, from 150 GBps to 200 GBps
feature add Huffman inflating/decoding
refactor merge 1,2,3-D cuSZ
feature set 32- and 64-bit as internal Huffman codeword representation
feature now use arbitrary multiple-of-8-bit for quantization code
feature switch to canonical Huffman code for decoding
March, 2020
optimize tuning thread number for Huffman deflating and inflating
feature change freely to 32bit intermediate Huffman code representation
demo add EXAFEL demo
feature switch to faster histogramming
February, 2020
feature SDRB suite metadata in `SDRB.hh`
feature visualize histogram (`pSZ`)
milestone `PdQ` for compression, Huffman encoding and deflating