-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate issues associated with .zarr format of new Parcels releases #384
Comments
The recipe at the moment puts the output in scratch:
and Parcels docs recommend Zarr (https://docs.oceanparcels.org/en/latest/examples/tutorial_output.html#Reading-the-output-file), so maybe we can just add a note about this (i.e. why we are using scratch, warning about lots of files). |
I think what Hannah was getting at is we need to provide an example of how to postprocess the zarr files to reduce the file numbers before they are transferred to gdata. We recently had an example of a relatively small particle tracking project (only thousands of particles) that resulted in >4 milllion files stored on gdata. In that case, each particle had a separate file for each of lon, lat, depth, time etc at EVERY time/position! I’m not sure if that’s the default for Parcels now, because our notebook example uses an old parcels version that saved in netcdf. So as Hannah says above, first step is to check what the new default does in terms of number of files and what options there are for reducing file numbers if necessary if the default is bad. |
Oh I see! That's definitely problematic.
Our example is up to date - We moved to Parcels 3 at the end of last year, when 'conda-analysis' moved over to Parcels 3, so the example is using zarr already. |
Okay, well in that case maybe we just need to:
|
@anton-seaice is there anything else you think would be worth updating too? |
Sounds good - its also possible that changing the
argument in the instance of ParticleFile (https://docs.oceanparcels.org/en/latest/reference/particlefile.html#module-parcels.particlefile) would reduce the number of files produced, but would need some experimentation.
This article makes a good point - storing tracks (i.e. lines) is more efficient in a vector format (i.e. GeoJSON, KML etc) than storing in a raster format (i.e. Netcdf). I don't know how much we want to mess with that, but whatever we do, compressing the output will most likely save a lot of space. |
Aloha, better understanding how we can / could use I need to tackle it here: Thomas-Moore-Creative/Climatology-generator-demo#12 and intend to employ Zarr ZipStore. There are apparently some limitations and important details but I can't speak to them fully until I try it myself. A few further throw away comments for consideration:
|
@hrsdawson do you know if progress on this was made at the Hackathon? Doesn't seem like it would take too long to add this warning and link, then we can close this issue?
|
Newer versions of Parcels output trajectory data in .zarr format, rather than .netcdf. In some (all?) cases, this may lead to the creation of many, many files clogging NCI projects on Gadi.
To do:
The text was updated successfully, but these errors were encountered: