-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pipeline changes to run annual metrics for new GCM GLM outputs #59
Add pipeline changes to run annual metrics for new GCM GLM outputs #59
Conversation
…g this to freeze the version used for annual metrics)
… recipe - include options that allow some sites to not have all the model options
Small changes needed when actually trying to run the pipeline in Feb 2022 for GCMs
…lake-temperature-out into pipeline_new_projected_GLM
.Renviron
Outdated
@@ -1 +1,2 @@ | |||
R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library" | |||
#R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just remove the commented line since this is versioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yep!
full.names = I(TRUE)) | ||
depends: | ||
- caldera_access_date | ||
1_fetch/out/pb0_temp_projections.yml: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to suggest you combine this target with the list.files
target, but now I see you are using two base functions and this keeps you from writing a custom function. But something to consider in the future if you run into this since there isn't a clear benefit of having the two be separate (list files is the vector of file names, sc_indicate
makes it a hash table).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. In targets
I could do both at the same time without a custom function 😄 Custom function might be helpful here since I can name it in a way that describes what's happening.
@@ -14,6 +14,10 @@ extract_morphometry <- function(config_fn) { | |||
return(morphometry_out) | |||
} | |||
|
|||
nml_to_morphometry <- function(in_ind) { | |||
purrr::map(readRDS(as_data_file(in_ind)), `[`, c("latitude", "longitude", "H", "A")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a check here to verify your functions that are related to morphometry, H/A still work the same way now that we've adjusted those all to have the elevations of the lakes taken into account (instead of all being 320). I'm 99% certain you are fine in all of these cases, but asking here in case you think it may be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this point. I have always just used H/A and then converted later to hypso using max H. I don't think that change impacts my uses here.
command: do_annual_metrics_multi_lake( | ||
final_target = target_name, | ||
site_file_yml = "1_fetch/out/pb0_temp_projections.yml", | ||
ice_file_yml = I(NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume this is because the ice data is now included in the temperature feathers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
@@ -1,6 +1,8 @@ | |||
|
|||
calculate_annual_metrics_per_lake <- function(out_ind, site_id, site_file, ice_file, temp_ranges_file, morphometry_ind, verbose = FALSE) { | |||
|
|||
if(!file.exists(site_file)) stop("File does not exist. If running summaries for GCM output, try changing `caldera_access_date` and build again.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
rds_file <- as_data_file(ind) | ||
file_pattern <- "(?<modelid>.*)_(?<sitenum>nhdhr_.*)_annual_thermal_metrics.rds" # based on target_name of `calc_annual_metrics` step | ||
readRDS(rds_file) %>% | ||
mutate(model_id = str_match(basename(rds_file), file_pattern)[2]) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary for a change, but wondering if tidry::extract()
is more appropriate in these situations compared to mutate(.. = str_match)
.
But I'm not sure what basename(rds_file)
looks like so maybe I am on the wrong track with this comment. Again, not a necessary change since I think these two options would do the same thing, just perhaps cleaner for extract
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always forget about extract()
! I am only wanting the second group, not a column per group, so I think I will leave in this case.
@@ -0,0 +1,30 @@ | |||
# Visualize output | |||
|
|||
plot_annual_metric_summaries <- function(target_name, in_file, target_dir, model_id_colname) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipping over reviewing this plotting function
This closes #57 |
Had to make a number of modifications to how the
do_annual_thermal_metrics()
task plan was generated so that we could have more than one per lake in order to capture all 6 of the GCMs. Main reason to keep one task per lake with multiple targets for each GCM rather than one task per lake per GCM was so that we only needed to load the lake's hypso file once and then use for everything in the same task. I was able to successfully run this to generate the first round of MN GCM annual thermal metrics. Take a look at3_summarize_ACCESS.CNRM.GFDL.IPSL.MIROC5.MRI_metric_tasks.yml
on Caldera to see the result of the task plan.The code currently shows the state that was used to do the first cut of this in Feb 2022 (except for the change in c32ed42 to avoid copying ALL of the GLM output feathers). In a near-future PR, I will add more sophisticated use of Tallgrass to take advantage of containerization (something I did not do this time around which might explain the slowness - I just asked for an allocation and kicked off
scmake()
). Eventually, will need to adjust to accept NetCDF inputs (see DOI-USGS/lake-temperature-process-models#31).