Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD compresion with dictionary causes odd errors #208

Open
alekepd opened this issue Jun 14, 2024 · 5 comments
Open

ZSTD compresion with dictionary causes odd errors #208

alekepd opened this issue Jun 14, 2024 · 5 comments

Comments

@alekepd
Copy link

alekepd commented Jun 14, 2024

When activating the "use_dict" flag in an SChunk instance, storing data leads to errors.

The following code does not execute on my system:

import blosc2
import numpy as np

CHUNKSIZE = int(2**12)
NCHUNKS = 5

coptions = blosc2.cparams_dflts.copy()
coptions["codec"] = blosc2.Codec.ZSTD # this is already the default
coptions["use_dict"] = 1


_rng = np.random.default_rng()


def _make_data() -> bytes:
    return _rng.random(CHUNKSIZE // 4, dtype=np.float32).tobytes()


data = [_make_data() for x in range(NCHUNKS)]

storage = blosc2.SChunk(
    chunksize=CHUNKSIZE, cparams=coptions, dparams=blosc2.dparams_dflts
)

for x in data:
    storage.append_data(x)

for index, x in enumerate(data):
    assert storage.decompress_chunk(index) == x

Instead, it leads to the following RuntimeError:

Traceback (most recent call last):
  File "/home/user/minimal_bug.py", line 26, in <module>
    storage.append_data(x)
  File "/home/user/env/lib/python3.9/site-packages/blosc2/schunk.py", line 298, in append_data
    return super(SChunk, self).append_data(data)
  File "blosc2_ext.pyx", line 1105, in blosc2.blosc2_ext.SChunk.append_data
RuntimeError: Could not append the buffer

If the above code is run with coptions["use_dict"] = 0, it executes successfully.

Do specific flags need to be set for shared dictionary compression to be successful, or does the sizing of stored data have different requirements?

python-blosc2 version: blosc2==2.3.2
python version: 3.9.18
platform: arch linux, conda based python install

@alekepd
Copy link
Author

alekepd commented Jun 15, 2024

This behavior persists with python 3.10 and python-blosc2 2.6.1. the corresponding line in the trace is 1110 in blosc2_ext.pyx.

@alekepd
Copy link
Author

alekepd commented Jun 17, 2024

I do not see any case or test in this repository where this option is activated. Is it meant to be functional in the current release?

@FrancescAlted
Copy link
Member

FrancescAlted commented Jun 17, 2024

We did not make any effort on making this functional. But a PR is always welcome.

@alekepd
Copy link
Author

alekepd commented Jun 17, 2024

Understood. I will look at what would be required for a PR. Has the shared dict functionality been tested in c-blosc2?

@FrancescAlted
Copy link
Member

Yes, I think so: https://github.com/Blosc/c-blosc2/blob/main/tests/test_dict_schunk.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants