Using memoryviews for decompression #96

jakirkham · 2020-05-01T17:52:14Z

Currently decompression requires bytes objects here and here. This means if users have a mmap or other Python object that otherwise acts like an array, they must coerce it to a bytes object, which requires a copy. To avoid this it would be better if these functions took a memoryview (in particular uint8_t[::1]). This would still support a bytes object when passed and would still behaving like an array in the code. Most importantly it would allow users to pass these other array-like objects without having to copy to a bytes object first.

cc @halehawk @rabernat

The text was updated successfully, but these errors were encountered:

rabernat · 2020-05-01T17:54:59Z

Just wanted to leave my 👍 -- this would be a simple change with important performance benefits.

jakirkham · 2020-05-01T17:59:37Z

Also apologies if this is already known to the authors here, but this doc provides some background on using memoryviews in Cython. As Ryan notes this is a simple change (guessing ~4 lines after peaking at the code).

lindstro · 2020-05-01T18:50:24Z

Thanks for the suggestion. We will look into it and see what can be done.

halehawk · 2020-05-01T19:02:34Z

I know your concern now. But zfp compressed integer, float and double data to bytes, in what kind of circumstances users will convert the bytes object to mmap first? @jakirkham @rabernat <https://github.com/rabernat>

…

On Fri, May 1, 2020 at 11:52 AM jakirkham ***@***.***> wrote: Currently decompression requires bytes objects here <https://github.com/LLNL/zfp/blob/697dd5d96a7fef6d04e4b0fa23f109127e7c587c/python/zfpy.pyx#L332> and here <https://github.com/LLNL/zfp/blob/697dd5d96a7fef6d04e4b0fa23f109127e7c587c/python/zfpy.pyx#L248>. This means if users have a mmap or other Python object that otherwise acts like an array, they must coerce it to a bytes object, which requires a copy. To avoid this it would be better if these functions took a memoryview (in particular uint8_t[::1]). This would still support a bytes object when passed and would still behaving like an array in the code. Most importantly it would allow users to pass these other array-like objects without having to copy to a bytes object first. cc @halehawk <https://github.com/halehawk> @rabernat <https://github.com/rabernat> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#96>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPEFA3FLQ2DVKVJV5OJVDRPMD53ANCNFSM4MXIDZ3A> .

jakirkham · 2020-05-01T19:18:23Z

To be clear we are discussing decompression, so its not a bytes object being converted to an array-like. It's an array-like being converted to bytes that's the issue. Second mmap is merely one array-like. There are others.

It depends on what Store users select in Zarr. If they use LMDB, this would come up for example. Potentially other backends use this. Additionally there has been interest in using memory mapping with directory storage.

Also it depends on what a user's compression/filter pipeline looks like for their data. If there are other steps that come before zfp that produce something else like a NumPy array, this may come up.

The main point here is that users turn to compression (at least in the Zarr case) because memory usage is a concern. So avoiding copies when they are not needed is important to keep memory usage to a minimum.

halehawk · 2020-09-17T16:32:31Z

@lindstro Do you have any update on this issue? "bytes" is not good as a type for passing a buffer, since "In the case that the API only deals with byte strings, i.e. binary data or encoded text, it is best not to type the input argument as something like bytes, because that would restrict the allowed input to exactly that type and exclude both subtypes and other kinds of byte containers, e.g. bytearray objects or memory views." So if we want to assign a numpy array to decompress API, it always has to convert to bytes object first which is not performance efficient with large buffer.
Could you please replace "bytes" at both decompress API with "uint8_t[::1]"?
I am trying to experiment that in my fork repo, but I don't know how to test with the modification. Could you please point out?
@jakirkham @rabernat

jakirkham · 2020-09-17T16:59:01Z

My guess is @lindstro, et al. would accept a PR @halehawk (if you want to give it a try 😉). My guess is this is a 3 line change.

In case it helps, uint8_t is defined here. So would require a cimport from libc.stdint. Though one could also just use unsigned char if that's easier.

jakirkham · 2020-09-17T17:01:38Z

Would add it looks like this script contains their CI build process. Maybe that provides a good starting place for building things locally?

lindstro · 2020-09-17T17:05:01Z

@halehawk Would be great if you could experiment with this and submit a PR. I assume your question about testing is directed at the numcodecs folks. If not, the zfpy tests are in zfp/tests/python on the develop branch.

halehawk · 2020-09-17T17:25:32Z

My guess is @lindstro, et al. would accept a PR @halehawk (if you want to give it a try 😉). My guess is this is a 3 line change.

In case it helps, uint8_t is defined here. So would require a cimport from libc.stdint. Though one could also just use unsigned char if that's easier.

@jakirkham Can I use "char *"? Can the return value by ensure_ndarray be passing as a char* to _decompress? Or do I need to get the pointer of the return value?

jakirkham · 2020-09-17T18:06:39Z

Would use uint8_t[::1]. That will still accept bytes objects, but will also accepts NumPy arrays that are 1-D uint8 contiguous arrays.

Edit: It's also possible to cast raw pointers into uint8_t[::1]. This section of the Cython docs may help.

halehawk · 2020-09-17T22:03:49Z

@jakirkham @lindstro I am done with the modification, and created a pull request.

…

On Thu, Sep 17, 2020 at 12:06 PM jakirkham ***@***.***> wrote: Would use uint8_t[::1]. That will still accept bytes objects, but will also accepts NumPy arrays that are 1-D uint8 contiguous arrays. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPEFBE6PJFGWYOJWHPLITSGJF37ANCNFSM4MXIDZ3A> .

jakirkham · 2020-09-17T22:19:24Z

Guessing that's PR ( #106 )? Thanks @halehawk! 😄 Made a couple minor suggestions.

halehawk · 2020-09-17T22:21:33Z

I saw test_numpy failed on some builds, how can I see the error logs @lindstro

…

On Thu, Sep 17, 2020 at 4:03 PM Haiying Xu ***@***.***> wrote: @jakirkham @lindstro I am done with the modification, and created a pull request. On Thu, Sep 17, 2020 at 12:06 PM jakirkham ***@***.***> wrote: > Would use uint8_t[::1]. That will still accept bytes objects, but will > also accepts NumPy arrays that are 1-D uint8 contiguous arrays. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#96 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACAPEFBE6PJFGWYOJWHPLITSGJF37ANCNFSM4MXIDZ3A> > . >

lindstro · 2020-09-18T03:42:31Z

Run ctest -V to see the full output. If the tests pass on your machine but not on Travis, then we might need to run an interactive Travis session to figure out which tests fail.

halehawk · 2020-09-18T14:47:40Z

It passed all tests on my machine. But it failed all your Travis xenial system, so I’d like an error log of it.

…

Sent from my iPhone

On Sep 17, 2020, at 9:42 PM, Peter Lindstrom ***@***.***> wrote: Run ctest -V to see the full output. If the tests pass on your machine but not on Travis, then we might need to run an interactive Travis session to figure out which tests fail. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lindstro · 2020-09-18T15:11:48Z

I've got the error log on CDash:

test_advanced_decompression_checksum (__main__.TestNumpy) ... ERROR
test_advanced_decompression_nonsquare (__main__.TestNumpy) ... ERROR
test_different_dimensions (__main__.TestNumpy) ... ERROR
test_different_dtypes (__main__.TestNumpy) ... ERROR
test_utils (__main__.TestNumpy) ... ERROR

======================================================================
ERROR: test_advanced_decompression_checksum (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 76, in test_advanced_decompression_checksum
    **compression_kwargs
  File "zfpy.pyx", line 249, in zfpy._decompress (/home/travis/build/LLNL/zfp/build/python/zfpy.c:252)
  File "stringsource", line 616, in View.MemoryView.memoryview_cwrapper (/home/travis/build/LLNL/zfp/build/python/zfpy.c:616)
  File "stringsource", line 323, in View.MemoryView.memoryview.__cinit__ (/home/travis/build/LLNL/zfp/build/python/zfpy.c:323)
BufferError: Object is not writable.

======================================================================
ERROR: test_advanced_decompression_nonsquare (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 104, in test_advanced_decompression_nonsquare
    out= decompressed_array,
  File "zfpy.pyx", line 249, in zfpy._decompress (/home/travis/build/LLNL/zfp/build/python/zfpy.c:252)
  File "stringsource", line 616, in View.MemoryView.memoryview_cwrapper (/home/travis/build/LLNL/zfp/build/python/zfpy.c:616)
  File "stringsource", line 323, in View.MemoryView.memoryview.__cinit__ (/home/travis/build/LLNL/zfp/build/python/zfpy.c:323)
BufferError: Object is not writable.

======================================================================
ERROR: test_different_dimensions (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 24, in test_different_dimensions
    self.lossless_round_trip(c_array)
  File "test_numpy.py", line 17, in lossless_round_trip
    decompressed_array = zfpy.decompress_numpy(compressed_array)
  File "zfpy.pyx", line 333, in zfpy.decompress_numpy (/home/travis/build/LLNL/zfp/build/python/zfpy.c:332)
  File "stringsource", line 616, in View.MemoryView.memoryview_cwrapper (/home/travis/build/LLNL/zfp/build/python/zfpy.c:616)
  File "stringsource", line 323, in View.MemoryView.memoryview.__cinit__ (/home/travis/build/LLNL/zfp/build/python/zfpy.c:323)
BufferError: Object is not writable.

======================================================================
ERROR: test_different_dtypes (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 38, in test_different_dtypes
    self.lossless_round_trip(array)
  File "test_numpy.py", line 17, in lossless_round_trip
    decompressed_array = zfpy.decompress_numpy(compressed_array)
  File "zfpy.pyx", line 333, in zfpy.decompress_numpy (/home/travis/build/LLNL/zfp/build/python/zfpy.c:332)
  File "stringsource", line 616, in View.MemoryView.memoryview_cwrapper (/home/travis/build/LLNL/zfp/build/python/zfpy.c:616)
  File "stringsource", line 323, in View.MemoryView.memoryview.__cinit__ (/home/travis/build/LLNL/zfp/build/python/zfpy.c:323)
BufferError: Object is not writable.

======================================================================
ERROR: test_utils (__main__.TestNumpy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_numpy.py", line 186, in test_utils
    compressed_array,
  File "zfpy.pyx", line 333, in zfpy.decompress_numpy (/home/travis/build/LLNL/zfp/build/python/zfpy.c:332)
  File "stringsource", line 616, in View.MemoryView.memoryview_cwrapper (/home/travis/build/LLNL/zfp/build/python/zfpy.c:616)
  File "stringsource", line 323, in View.MemoryView.memoryview.__cinit__ (/home/travis/build/LLNL/zfp/build/python/zfpy.c:323)
BufferError: Object is not writable.

----------------------------------------------------------------------
Ran 5 tests in 17.626s

FAILED (errors=5)

jakirkham · 2020-09-18T15:24:44Z

Right, hence my question here ( #106 (comment) ).

halehawk · 2020-09-18T15:53:21Z

I got the same errors from my machine tests before I added const to uint8_t[::1]. But it looks like Xenial needs something different. It looks like your codes tried to write to a const buffer during decompress process. Should it only read the decompressed stream?

…

On Fri, Sep 18, 2020 at 9:25 AM jakirkham ***@***.***> wrote: Right, hence my question here ( #106 (comment) <#106 (comment)> ). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPEFAZLGUKU3MC2PV6PKLSGN3UZANCNFSM4MXIDZ3A> .

halehawk · 2020-09-18T16:18:00Z

Could be this reason "Cython always passes the PyBUF_WRITABLE flag to PyObject_GetBuffer(), even when it doesn't need write access. This causes read-only buffer objects to raise an exception."

…

On Fri, Sep 18, 2020 at 9:52 AM Haiying Xu ***@***.***> wrote: I got the same errors from my machine tests before I added const to uint8_t[::1]. But it looks like Xenial needs something different. It looks like your codes tried to write to a const buffer during decompress process. Should it only read the decompressed stream? On Fri, Sep 18, 2020 at 9:25 AM jakirkham ***@***.***> wrote: > Right, hence my question here ( #106 (comment) > <#106 (comment)> ). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#96 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACAPEFAZLGUKU3MC2PV6PKLSGN3UZANCNFSM4MXIDZ3A> > . >

halehawk · 2020-09-18T16:49:41Z

On Fri, Sep 18, 2020 at 10:17 AM Haiying Xu ***@***.***> wrote: Could be this reason "Cython always passes the PyBUF_WRITABLE flag to PyObject_GetBuffer(), even when it doesn't need write access. This causes read-only buffer objects to raise an exception." This is solved in cython 0.28, I am using cython 0.29.21. Can we know

which cython version is using on Xenial?

…

On Fri, Sep 18, 2020 at 9:52 AM Haiying Xu ***@***.***> wrote: > I got the same errors from my machine tests before I added const to > uint8_t[::1]. But it looks like Xenial needs something different. It looks > like your codes tried to write to a const buffer during decompress process. > Should it only read the decompressed stream? > > On Fri, Sep 18, 2020 at 9:25 AM jakirkham ***@***.***> > wrote: > >> Right, hence my question here ( #106 (comment) >> <#106 (comment)> ). >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> <#96 (comment)>, or >> unsubscribe >> <https://github.com/notifications/unsubscribe-auth/ACAPEFAZLGUKU3MC2PV6PKLSGN3UZANCNFSM4MXIDZ3A> >> . >> >

lindstro · 2020-09-18T17:05:32Z

I can't tell from the logs. You could add a line to travis.sh to find out.

halehawk · 2020-09-18T18:21:45Z

Let me try Buffer object that is used in numcodecs.lzma. But in that case, I have to add more codes in zfpy.pyx.

…

On Fri, Sep 18, 2020 at 11:05 AM Peter Lindstrom ***@***.***> wrote: I can't tell from the logs. You could add a line to travis.sh to find out. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPEFBFRU43OZX24YZAON3SGOHOVANCNFSM4MXIDZ3A> .

jakirkham · 2020-09-18T19:36:39Z

Would take a look at using fused types to allow dispatching between const and non-const variants. Here's an example in Cython's tests.

jakirkham · 2020-10-01T18:26:06Z

Closing now that PR ( #106 ) is in.

jakirkham mentioned this issue May 1, 2020

numcodecs.zfpy is ready zarr-developers/numcodecs#229

Merged

7 tasks

lindstro added the enhancement label May 1, 2020

lindstro added the help wanted label Sep 17, 2020

jakirkham closed this as completed Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using memoryviews for decompression #96

Using memoryviews for decompression #96

jakirkham commented May 1, 2020

rabernat commented May 1, 2020

jakirkham commented May 1, 2020

lindstro commented May 1, 2020

halehawk commented May 1, 2020 via email

jakirkham commented May 1, 2020 •

edited

Loading

halehawk commented Sep 17, 2020

jakirkham commented Sep 17, 2020

jakirkham commented Sep 17, 2020

lindstro commented Sep 17, 2020

halehawk commented Sep 17, 2020 •

edited

Loading

jakirkham commented Sep 17, 2020 •

edited

Loading

halehawk commented Sep 17, 2020 via email

jakirkham commented Sep 17, 2020

halehawk commented Sep 17, 2020 via email

lindstro commented Sep 18, 2020

halehawk commented Sep 18, 2020 via email

lindstro commented Sep 18, 2020

jakirkham commented Sep 18, 2020

halehawk commented Sep 18, 2020 via email

halehawk commented Sep 18, 2020 via email

halehawk commented Sep 18, 2020 via email

lindstro commented Sep 18, 2020

halehawk commented Sep 18, 2020 via email

jakirkham commented Sep 18, 2020

jakirkham commented Oct 1, 2020

Using memoryviews for decompression #96

Using memoryviews for decompression #96

Comments

jakirkham commented May 1, 2020

rabernat commented May 1, 2020

jakirkham commented May 1, 2020

lindstro commented May 1, 2020

halehawk commented May 1, 2020 via email

jakirkham commented May 1, 2020 • edited Loading

halehawk commented Sep 17, 2020

jakirkham commented Sep 17, 2020

jakirkham commented Sep 17, 2020

lindstro commented Sep 17, 2020

halehawk commented Sep 17, 2020 • edited Loading

jakirkham commented Sep 17, 2020 • edited Loading

halehawk commented Sep 17, 2020 via email

jakirkham commented Sep 17, 2020

halehawk commented Sep 17, 2020 via email

lindstro commented Sep 18, 2020

halehawk commented Sep 18, 2020 via email

lindstro commented Sep 18, 2020

jakirkham commented Sep 18, 2020

halehawk commented Sep 18, 2020 via email

halehawk commented Sep 18, 2020 via email

halehawk commented Sep 18, 2020 via email

lindstro commented Sep 18, 2020

halehawk commented Sep 18, 2020 via email

jakirkham commented Sep 18, 2020

jakirkham commented Oct 1, 2020

jakirkham commented May 1, 2020 •

edited

Loading

halehawk commented Sep 17, 2020 •

edited

Loading

jakirkham commented Sep 17, 2020 •

edited

Loading