-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get GLM output into netCDF DSG format #31
Comments
Issue submitted here asking if it would be within scope of |
Starting to think about this again. Variables:
Coordinates:
Dims: |
Okay I regrouped w/ @jread-usgs on this, as it is again a priority. Here are some notes: Re: how to group the output into netCDFs
Re: resolution of data storage
Re: inclusion of ice flags alongside temperature predictions
Plan for developmentWorking locally w/ subset of 1-5 lakes...
Then if all is working, test w/ all MN predictions |
Ok - @lindsayplatt , @jread-usgs. I wanted to provide an update here of my progress before I'm out for two weeks. This has been a back-burner item for some months now, but I did make some significant progress when other sprint tasks were completed or blocked. For both the GCM and NLDAS predictions, I've completed steps 1 - 4, 6, and 7 (see previous comment). I skipped step 5 b/c it was immediately apparent that the file size was going to be too large without some reduction in the resolution of predictions at depth. All of my code is in this branch on my fork. Here's a summary: NetCDF dimensionsThe code generates 3D netCDF files. The ice flags are stored as a 2D Reduction of resolution of predictions at depth.The GLM output predictions are at 0.5m intervals. If we store all predictions at all depths, the netCDF depth dimension becomes very long, and we store many many NA values for shallow lakes. Currently I am reducing the resolution of predictions at depth prior to packaging the predictions in the netCDF. For example, for the NLDAS netCDF, the depths are defined (in a somewhat hacky way for now) here, based on Andy's depths. The predictions are then subset to those depths here. Testing netCDF build on TallgrassI went so far as to test the generation of the GCM netCDFs and NLDAS netCDF on Tallgrass with a subset of 1000 sites e.g., for NLDAS. Uncompressed, the NLDAS netCDF (with ice flag and temperature predictions for 1000 sites, at a restricted set of depths) is 3.8gb. The GCM netCDFs are each 5.4gb. Testing netCDF compression on TallgrassJordan noted that the Testing extracting the predictions from the netCDF fileI did modify Dave's read_timeseries_dsg() code so that I could extract results from the 3D netCDF files. That code is in a script here - detached from the pipeline for now. The code runs for the netCDF files I generated locally w/ a small # of sites, but I just tested it for the NLDAS netCDF I generated for 1000 sites on Tallgrass and the When I'm back I'd be happy to test building and compression a full NLDAS netCDF with predictions for all of the sites (at restricted depths) |
Great summary for capturing the current state of this work. Looking forward to chatting when you get back 🌴 |
Quick update - Anthony was interested in this netCDF code briefly a couple of months ago, and in re-running my test scripts locally to refresh my memory it turned out that that |
I am testing the scaling of this code and approach beyond 1000 lakes on Tallgrass:
The job failed after 12.6 hrs with these messages BUT when I try to see the ones that failed, I get nothing
Check the ones that errored:
|
Ok the NLDAS netcdf for 5k lakes built in 1.5 hours and is 17.4 gb uncompressed. After compression it is 1.7gb. |
Ok the NLDAS netCDF for 10k lakes built in 5.3 hours and is 37.8 gb uncompressed. After compression it is 3.4gb. |
That's a lot of hours but 3.4 gb is great! Will likely need to talk with Andy about splitting his 63k up, though. |
Currently the GLM output is being stored in feather files, with one feather file per lake-gcm combo (6 files per lake). For sharing on sciencebase, we are currently (per #20) zipping these feather files together by tile number (4 zip files in total).
Per Jordan comments,
we'd like to move to storing the output in netCDF DSG format. As GLM generates temperature profiles, this would mean adding another dimension for depth. That is not currently supported by the
write_timeseries_dsg()
function ofncdfgeom
, but I will submit an issue there to see if it would be within scope of that function to add that functionality.The text was updated successfully, but these errors were encountered: