Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pipeline changes to run annual metrics for new GCM GLM outputs #59

Merged
merged 32 commits into from
May 11, 2022

Conversation

lindsayplatt
Copy link
Contributor

@lindsayplatt lindsayplatt commented May 11, 2022

Had to make a number of modifications to how the do_annual_thermal_metrics() task plan was generated so that we could have more than one per lake in order to capture all 6 of the GCMs. Main reason to keep one task per lake with multiple targets for each GCM rather than one task per lake per GCM was so that we only needed to load the lake's hypso file once and then use for everything in the same task. I was able to successfully run this to generate the first round of MN GCM annual thermal metrics. Take a look at 3_summarize_ACCESS.CNRM.GFDL.IPSL.MIROC5.MRI_metric_tasks.yml on Caldera to see the result of the task plan.

The code currently shows the state that was used to do the first cut of this in Feb 2022 (except for the change in c32ed42 to avoid copying ALL of the GLM output feathers). In a near-future PR, I will add more sophisticated use of Tallgrass to take advantage of containerization (something I did not do this time around which might explain the slowness - I just asked for an allocation and kicked off scmake()). Eventually, will need to adjust to accept NetCDF inputs (see DOI-USGS/lake-temperature-process-models#31).

lindsayplatt and others added 29 commits February 16, 2022 16:42
…g this to freeze the version used for annual metrics)
… recipe - include options that allow some sites to not have all the model options
Small changes needed when actually trying to run the pipeline in Feb 2022 for GCMs
…lake-temperature-out into pipeline_new_projected_GLM
@lindsayplatt lindsayplatt requested a review from jordansread May 11, 2022 18:38
.Renviron Outdated
@@ -1 +1,2 @@
R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library"
#R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just remove the commented line since this is versioned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yep!

full.names = I(TRUE))
depends:
- caldera_access_date
1_fetch/out/pb0_temp_projections.yml:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to suggest you combine this target with the list.files target, but now I see you are using two base functions and this keeps you from writing a custom function. But something to consider in the future if you run into this since there isn't a clear benefit of having the two be separate (list files is the vector of file names, sc_indicate makes it a hash table).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. In targets I could do both at the same time without a custom function 😄 Custom function might be helpful here since I can name it in a way that describes what's happening.

@@ -14,6 +14,10 @@ extract_morphometry <- function(config_fn) {
return(morphometry_out)
}

nml_to_morphometry <- function(in_ind) {
purrr::map(readRDS(as_data_file(in_ind)), `[`, c("latitude", "longitude", "H", "A"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a check here to verify your functions that are related to morphometry, H/A still work the same way now that we've adjusted those all to have the elevations of the lakes taken into account (instead of all being 320). I'm 99% certain you are fine in all of these cases, but asking here in case you think it may be an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this point. I have always just used H/A and then converted later to hypso using max H. I don't think that change impacts my uses here.

command: do_annual_metrics_multi_lake(
final_target = target_name,
site_file_yml = "1_fetch/out/pb0_temp_projections.yml",
ice_file_yml = I(NULL),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume this is because the ice data is now included in the temperature feathers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

@@ -1,6 +1,8 @@

calculate_annual_metrics_per_lake <- function(out_ind, site_id, site_file, ice_file, temp_ranges_file, morphometry_ind, verbose = FALSE) {

if(!file.exists(site_file)) stop("File does not exist. If running summaries for GCM output, try changing `caldera_access_date` and build again.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

rds_file <- as_data_file(ind)
file_pattern <- "(?<modelid>.*)_(?<sitenum>nhdhr_.*)_annual_thermal_metrics.rds" # based on target_name of `calc_annual_metrics` step
readRDS(rds_file) %>%
mutate(model_id = str_match(basename(rds_file), file_pattern)[2]) %>%

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for a change, but wondering if tidry::extract() is more appropriate in these situations compared to mutate(.. = str_match).

But I'm not sure what basename(rds_file) looks like so maybe I am on the wrong track with this comment. Again, not a necessary change since I think these two options would do the same thing, just perhaps cleaner for extract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget about extract()! I am only wanting the second group, not a column per group, so I think I will leave in this case.

@@ -0,0 +1,30 @@
# Visualize output

plot_annual_metric_summaries <- function(target_name, in_file, target_dir, model_id_colname) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipping over reviewing this plotting function

@lindsayplatt
Copy link
Contributor Author

This closes #57

@lindsayplatt lindsayplatt merged commit 2b4520b into DOI-USGS:main May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants