Replies: 3 comments 7 replies
-
I wouldn't say categorically that you would expect Zarr performance to be better than HDF5 / NetCDF4 in all scenarios. As with all technology, there are tradeoffs with each. In particular, the details of your filesystem, chunking, and access patterns will affect the performance of each format in different ways. This paper gives a great overview. |
Beta Was this translation helpful? Give feedback.
-
this will write the array sequentially, i.e. with no parallelism. Zarr (the format) is designed so that chunks can be written in parallel, even though Zarr (the python library) doesn't exploit this at all. I recommend writing the chunks in parallel (e.g., with dask) to get a better performance indication. |
Beta Was this translation helpful? Give feedback.
-
I agree that Zarr has on average similar performance to HDF5 and NetCDF4 . But in my test it takes 45 seconds to write 200 frames of 600*500 uint8 while HDF5 takes 0.6 seconds. When I replace the sequential writing code with that of dask, I get the same results Convert the list of images into a Dask array with the right chunking
|
Beta Was this translation helpful? Give feedback.
-
I have made a test to compare read/write performance between Zarr, NetCDF and HDF5. The results show that zarr performs very poorly against HDF5 and NetCDF. From what I've seen, Zarr is known for very good performance and I'm wondering what part of the test code is slowing me down. Here is my code :
Function to generate random image data
Define the range of sample sizes from 5 to 200
Use a logarithmic scale to generate the sample sizes
Plotting the results
#Read
Beta Was this translation helpful? Give feedback.
All reactions