-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to allow appending cubes to existing netcdf file #565
Comments
Thanks @rsignell-usgs for posting this issue! Yes, I completely agree ... being able to append to a NetCDF file as you process a streamed input would be very useful. It's the natural extension to the current NetCDF saving capability. Let's see if we can get this addressed for you ... would @rhattersley or @esc24 care to comment please? |
I'll second that - it's a very interesting problem. 😀 @esc24 and I have had a quick chat about it and a couple of options came to mind. The first, most well-defined, and simplest option is to provide a way to append a Cube to an existing netCDF file. This would check the metadata of the Cube against the metadata in the file and extend an existing variable where appropriate. For example (apologies for the boolean argument in this mock-up 😉):
The second option (which is just an exploration at this stage, and not an alternative to the first) is to create a single, all-encompassing empty result Cube where the data is defined by a function instead of a numpy array, just as it is for deferred loading.
|
There may be situations where one would want to append a cube to a file but not do an implicit merge even if it were possible. For example: cubes = [day_one, day_two, day_three]
for cube in cubes:
iris.save(cube, 'myfile.nc', append=True, merge=False) should result in three data variables so that when loaded you get back what you saved (three cubes) (I'm not proposing another boolean keyword, but it should illustrate what I mean). cubes = [temperature, pressure, humidity]
for cube in cubes:
iris.save(cube, 'myfile.nc', append=True) is equivalent to |
Is this #565 still open ? cubes = [temperature, pressure, humidity] Still I couldn't append cubes (list of different variables) into nc file. Thanks, |
Yes, but I'm not aware of anyone working on a fix. Sorry! |
I've been meaning for ages to respond to this. I spent quite a while a few months back trying to implement an append, mostly focussed on @rhattersley case #1, It's a natural usecase, e.g. to open a file and add "today's data" to it, especially as the necessary operation exists in netcdf (without requiring you to read+re-write all the existing, even by streaming it). I spent quite a while climbing the mountain of "decoupling the saving code from actual file operations", so it instead produces an abstract representation of the required (new) data, However, when I then sought to use this to define a file-append function, I hit big problems in unambiguously relating the "new data" to the existing content in a file, and especially in guaranteeing that the proposed append operation will be correct and safe Sample of WIP here : pp-mo#52 A key problem is that Iris identifies data by it's CF identity, but we need to work with the actual variable + dimension names in the file, most likely generated by a previous Iris save, but not necessarily with the same results (notably, var- and dim-names and attributes) -- at least if you allow it to potentially add new variables + dims to an existing file. So, I concluded that, if we really want this, it would far better be based on a lower-level append operation acting on general netcdf files, with no CF concepts involved.
But equally, I have come to really doubt the wisdom of the whole idea ... |
I'm going to finally close this, as I don't think we are going to do it. |
Currently writing Iris data to netcdf creates a new file. In the case where we are using Iris to process a large amount of data (in my case a 30 year global hindcast), we need to be able process timestep-by-timestep and append the processed result at each timestep to the existing netcdf file, as we don't have enough computer memory to hold all the timesteps at once. An example of what we would like to do is here, using NetCDF4 to append, but we would like to be able to accomplish this with Iris.
http://nbviewer.ipython.org/5777643
The relevant code snippet is:
The text was updated successfully, but these errors were encountered: