-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using memoryviews for decompression #96
Comments
Just wanted to leave my 👍 -- this would be a simple change with important performance benefits. |
Also apologies if this is already known to the authors here, but this doc provides some background on using memoryviews in Cython. As Ryan notes this is a simple change (guessing ~4 lines after peaking at the code). |
Thanks for the suggestion. We will look into it and see what can be done. |
I know your concern now. But zfp compressed integer, float and double data
to bytes, in what kind of circumstances users will convert the bytes object
to mmap first? @jakirkham @rabernat <https://github.com/rabernat>
…On Fri, May 1, 2020 at 11:52 AM jakirkham ***@***.***> wrote:
Currently decompression requires bytes objects here
<https://github.com/LLNL/zfp/blob/697dd5d96a7fef6d04e4b0fa23f109127e7c587c/python/zfpy.pyx#L332>
and here
<https://github.com/LLNL/zfp/blob/697dd5d96a7fef6d04e4b0fa23f109127e7c587c/python/zfpy.pyx#L248>.
This means if users have a mmap or other Python object that otherwise
acts like an array, they must coerce it to a bytes object, which requires
a copy. To avoid this it would be better if these functions took a
memoryview (in particular uint8_t[::1]). This would still support a bytes
object when passed and would still behaving like an array in the code. Most
importantly it would allow users to pass these other array-like objects
without having to copy to a bytes object first.
cc @halehawk <https://github.com/halehawk> @rabernat
<https://github.com/rabernat>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAPEFA3FLQ2DVKVJV5OJVDRPMD53ANCNFSM4MXIDZ3A>
.
|
To be clear we are discussing decompression, so its not a It depends on what Also it depends on what a user's compression/filter pipeline looks like for their data. If there are other steps that come before zfp that produce something else like a NumPy array, this may come up. The main point here is that users turn to compression (at least in the Zarr case) because memory usage is a concern. So avoiding copies when they are not needed is important to keep memory usage to a minimum. |
@lindstro Do you have any update on this issue? "bytes" is not good as a type for passing a buffer, since "In the case that the API only deals with byte strings, i.e. binary data or encoded text, it is best not to type the input argument as something like bytes, because that would restrict the allowed input to exactly that type and exclude both subtypes and other kinds of byte containers, e.g. bytearray objects or memory views." So if we want to assign a numpy array to decompress API, it always has to convert to bytes object first which is not performance efficient with large buffer. |
My guess is @lindstro, et al. would accept a PR @halehawk (if you want to give it a try 😉). My guess is this is a 3 line change. In case it helps, |
Would add it looks like this script contains their CI build process. Maybe that provides a good starting place for building things locally? |
@halehawk Would be great if you could experiment with this and submit a PR. I assume your question about testing is directed at the numcodecs folks. If not, the zfpy tests are in |
@jakirkham Can I use "char *"? Can the return value by ensure_ndarray be passing as a char* to _decompress? Or do I need to get the pointer of the return value? |
Would use Edit: It's also possible to cast raw pointers into |
@jakirkham @lindstro I am done with the modification, and created a pull
request.
…On Thu, Sep 17, 2020 at 12:06 PM jakirkham ***@***.***> wrote:
Would use uint8_t[::1]. That will still accept bytes objects, but will
also accepts NumPy arrays that are 1-D uint8 contiguous arrays.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAPEFBE6PJFGWYOJWHPLITSGJF37ANCNFSM4MXIDZ3A>
.
|
I saw test_numpy failed on some builds, how can I see the error
logs @lindstro
…On Thu, Sep 17, 2020 at 4:03 PM Haiying Xu ***@***.***> wrote:
@jakirkham @lindstro I am done with the modification, and created a pull
request.
On Thu, Sep 17, 2020 at 12:06 PM jakirkham ***@***.***>
wrote:
> Would use uint8_t[::1]. That will still accept bytes objects, but will
> also accepts NumPy arrays that are 1-D uint8 contiguous arrays.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#96 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACAPEFBE6PJFGWYOJWHPLITSGJF37ANCNFSM4MXIDZ3A>
> .
>
|
Run |
It passed all tests on my machine. But it failed all your Travis xenial system, so I’d like an error log of it.
…Sent from my iPhone
On Sep 17, 2020, at 9:42 PM, Peter Lindstrom ***@***.***> wrote:
Run ctest -V to see the full output. If the tests pass on your machine but not on Travis, then we might need to run an interactive Travis session to figure out which tests fail.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I've got the error log on CDash:
|
Right, hence my question here ( #106 (comment) ). |
I got the same errors from my machine tests before I added const to
uint8_t[::1]. But it looks like Xenial needs something different. It looks
like your codes tried to write to a const buffer during decompress process.
Should it only read the decompressed stream?
…On Fri, Sep 18, 2020 at 9:25 AM jakirkham ***@***.***> wrote:
Right, hence my question here ( #106 (comment)
<#106 (comment)> ).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAPEFAZLGUKU3MC2PV6PKLSGN3UZANCNFSM4MXIDZ3A>
.
|
Could be this reason "Cython always passes the PyBUF_WRITABLE flag to
PyObject_GetBuffer(), even when it doesn't need write access. This causes
read-only buffer objects to raise an exception."
…On Fri, Sep 18, 2020 at 9:52 AM Haiying Xu ***@***.***> wrote:
I got the same errors from my machine tests before I added const to
uint8_t[::1]. But it looks like Xenial needs something different. It looks
like your codes tried to write to a const buffer during decompress process.
Should it only read the decompressed stream?
On Fri, Sep 18, 2020 at 9:25 AM jakirkham ***@***.***>
wrote:
> Right, hence my question here ( #106 (comment)
> <#106 (comment)> ).
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#96 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACAPEFAZLGUKU3MC2PV6PKLSGN3UZANCNFSM4MXIDZ3A>
> .
>
|
On Fri, Sep 18, 2020 at 10:17 AM Haiying Xu ***@***.***> wrote:
Could be this reason "Cython always passes the PyBUF_WRITABLE flag to
PyObject_GetBuffer(), even when it doesn't need write access. This causes
read-only buffer objects to raise an exception."
This is solved in cython 0.28, I am using cython 0.29.21. Can we know
which cython version is using on Xenial?
… On Fri, Sep 18, 2020 at 9:52 AM Haiying Xu ***@***.***> wrote:
> I got the same errors from my machine tests before I added const to
> uint8_t[::1]. But it looks like Xenial needs something different. It looks
> like your codes tried to write to a const buffer during decompress process.
> Should it only read the decompressed stream?
>
> On Fri, Sep 18, 2020 at 9:25 AM jakirkham ***@***.***>
> wrote:
>
>> Right, hence my question here ( #106 (comment)
>> <#106 (comment)> ).
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#96 (comment)>, or
>> unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/ACAPEFAZLGUKU3MC2PV6PKLSGN3UZANCNFSM4MXIDZ3A>
>> .
>>
>
|
I can't tell from the logs. You could add a line to travis.sh to find out. |
Let me try Buffer object that is used in numcodecs.lzma. But in that case,
I have to add more codes in zfpy.pyx.
…On Fri, Sep 18, 2020 at 11:05 AM Peter Lindstrom ***@***.***> wrote:
I can't tell from the logs. You could add a line to travis.sh to find out.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAPEFBFRU43OZX24YZAON3SGOHOVANCNFSM4MXIDZ3A>
.
|
Would take a look at using |
Closing now that PR ( #106 ) is in. |
Currently decompression requires
bytes
objects here and here. This means if users have ammap
or other Python object that otherwise acts like an array, they must coerce it to abytes
object, which requires a copy. To avoid this it would be better if these functions took amemoryview
(in particularuint8_t[::1]
). This would still support abytes
object when passed and would still behaving like an array in the code. Most importantly it would allow users to pass these other array-like objects without having to copy to abytes
object first.cc @halehawk @rabernat
The text was updated successfully, but these errors were encountered: