-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Z5 performance #118
Comments
I think @aschampion designed the benchmarks for rust-n5 to match some benchmarks being done on the java reference implementation of N5, |
No worries, this is a perfectly legit question to ask here.
In general, the big advantage of z5 (or to be more precise of n5 / zarr) compared to hdf5 is that it supports parallel write operations. It's important to note that this only works if chunks are not accessed concurrently (in the case of z5, this will lead to undefined behaviour). @clbarnes and me actually started to implement file locking for chunks quite a while back, but this has gone stale, mainly because there are several issues with file locking in general. If you want to know more about this, have a look at #65, #66 and #63. In terms of single threaded performance z5 and hdf5 are roughly equal (I compared in python to h5py performance at some point.)
That would be awesome!
Besides the java / n5 benchmarks that @clbarnes mentioned, here are the benchmarks I used while I developed the library, also comparing to hdf5: Since then, I started to set up an asv repository:
I will give you some feedback on this in the related issue #68. |
Thanks for your explanation @constantinpape @clbarnes! I am reading your comments and searching around these days.
It's great to see that z5 supports parallel write, it might be a good fit for HPX which can launch many threads and processes. It's also sad to see that hdf5 currently does not support parallel write, and I just noticed that they are proposing parallel design here https://portal.hdfgroup.org/display/HDF5/Introduction+to+Parallel+HDF5. This also means that probably I could not get any performance comparison between z5 and hdf5 in the context of parallel computing in the near future. Also, I am not sure about how to solve the file locking issue at this point. I will need to look into it in details later.
Thanks for providing such benchmarks! I am trying to see if I can measure some performance on the C++/C side as my projects are mainly on these two languages. |
My first answer was not quite precise with regard to parallel writing in HDF5. There is support for parallel writing, but it is not a feature supported by default. As far as I am aware of it there are two options to do this in HDF5, both with some downsides:
|
I see. Thanks for sharing and answering. @constantinpape |
Sorry for bringing up another performance issue again. Could you please take a look at the issue I asked here: xtensor-stack/xtensor#1695? Let me know if you have any suggestion. Thanks. |
Thanks for bringing this up. I just came back from vacations and I had a quick look and I think I have some ideas. I will try to have a closer look and write something up tomorrow. |
First, let me provide some context on the performance issue you brought up: This header contains functions to read / write a region of interest (ROI) from / into a dataset into / from a xtensor multiarray. I will focus on reading here, writing is mostly analogous. Reading the ROI works as follows:
For the first case (complete overlap) I have noticed that the naive way of copying via xtensor functionality const auto bufView = xt::adapt(buffer, chunksShape);
view = bufView; was a major bottleneck, so I implemented a function to copy from buffer to view myself. I see two options to deal with this:
Option 1 should be straight-forward: I think the functions would only need to be changed a bit, Option 2 would be more interesting though: I only tried the naive approach with xtensor, i.e. I did not specify the layout types for the views into the array and buffer. If we were to get rid of the custom copy functions completely, this needs to be benchmarked carefully, because I don't want to regress compared to the current performance. |
Thanks for the updates, @halehawk and I will look into this soon. |
Just FYI, my presentation of the summer intern project is online now: https://www2.cisl.ucar.edu/siparcs-2019-wei where I reported how to integrate Z5 into an earth model and performance comparison between Z5, netCDF4, and pnetCDF. I will keep you posted if we have any future publication. |
Thanks for sharing this and great work!
Looking forward to it! |
We use zlib for compression. The compression rate between z5 and netCDF is similar. No, I haven't tried to do no-compression setting between z5 and PnetCDF (maybe we will try it later). |
I tried non-compression setting on z5 once, and didn't get different
results on the timing. Maybe the output size is fairly small (float number
3*192*288 on each processor).
…On Wed, Aug 21, 2019 at 6:33 AM wei ***@***.***> wrote:
I have one question: Which compression library did you use in z5 for the
performance analysis (slide 10/11) and did you compare the compression
ratios between z5 and netCDF?
Also, did you compare the performance of z5 and PnetCDF when you don't use
compression in z5 (compressor=raw)?
We use zlib for compression. The compression rate between z5 and netCDF is
similar. No, I haven't tried to do no-compression setting between z5 and
PnetCDF (maybe we will try it later).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#118?email_source=notifications&email_token=ACAPEFFNDKFCK25DAZU7ZW3QFUYYJA5CNFSM4H3UFFH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZP7KQ#issuecomment-523435946>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAPEFFQQURCSV6PDZFKBVDQFUYYJANCNFSM4H3UFFHQ>
.
|
Ok, thanks for the follow up
That's indeed fairly small. |
We used zlib, level=1 compression. But the compressed size is larger than
that of netcdf4 using the same compression. We used the same chunk size on
both. I have not figured it out the difference yet.
…On Wed, Aug 21, 2019 at 11:21 AM Constantin Pape ***@***.***> wrote:
Ok, thanks for the follow up
Maybe the output size is fairly small (float number 3*192*288 on each
processor).
That's indeed fairly small.
From my experience raw compression can bring quite a speed up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#118?email_source=notifications&email_token=ACAPEFAR4TNYKSL2OY7OJRDQFV2TPA5CNFSM4H3UFFH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD42OYPI#issuecomment-523562045>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAPEFEVKZ6V53DJ2ERPM3LQFV2TPANCNFSM4H3UFFHQ>
.
|
Hi,
I am not sure if this is a correct question to ask. How do you see the performance of parallel I/O for Z5 in the context of distributed computing comparing with other I/O libraries (hdf5)? This is just a general question. Because I am thinking to integrate Z5 into HPX https://github.com/STEllAR-GROUP/hpx, a C++ Standard Library for Concurrency and Parallelism, in the near future. I would like to see if there is any performance benchmark that I can refer to and any performance comparison that I can make. This could be my graduate study project.
Over this summer, I am working on the C API interface for Z5 (https://github.com/kmpaul/cz5test). The idea of this project is to test Z5 performance with another parallel I/O library. This project is still in progress. I would love to share the results with you once they are in place.
Any suggestion will be helpful. Thanks.
The text was updated successfully, but these errors were encountered: