Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Dictionary based compression? #239

Open
JesseRMeyer opened this issue Oct 8, 2022 · 1 comment
Open

Optional Dictionary based compression? #239

JesseRMeyer opened this issue Oct 8, 2022 · 1 comment

Comments

@JesseRMeyer
Copy link

Hi, thanks for the great library.

My use case involves compressing many small streams that share similar data distribution characteristics. LZ4/ZSTD offer prepending to the de/compressor state some prebaked 'dictionary' of common matches that can radically improve ratio and timings. A simple and powerful tool.

Is optional dictionary support compatible with the goals of this project? If so, is it a planned feature?

Best,
Jesse

@sisong
Copy link

sisong commented Jan 4, 2024

I added stream & multi-thread support for libdeflate, code at stream_mt , more ref #335
this work added new API libdeflate_deflate_compress_block() & libdeflate_deflate_decompress_block() can by used for this request.

  1. first, you need to create a 32k text dictionary with other tools.
  2. concatenate the short data you want to compress behind this dictionary data buffer each time, and then call the compress function like this:
compresed_code_nbytes=libdeflate_deflate_compress_block(c,in_dict_and_short,dict_nbytes,short_nbytes,
                                                        1,out_code,out_code_nbytes_avail,NULL);
  1. when decompressing, you must use the same dictionary and place it in the uncompressed data buffer, and after calling the decompress function, your uncompressed short data will be placed behind the dictionary data; call the decompress function like this:
err_ret=libdeflate_deflate_decompress_block(d,in_code,code_nbytes,out_dict_and_short,dict_nbytes,
                                            out_short_nbytes,NULL,out_code_nbytes_avail,
                                            LIBDEFLATE_STOP_BY_FINAL_BLOCK,NULL);

if used zlib, you can used inflateSetDictionary() + inflate() do the same thing, this compressed code is stay compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants