Add pipeline changes to run annual metrics for new GCM GLM outputs #59

lindsayplatt · 2022-05-11T18:15:13Z

Had to make a number of modifications to how the do_annual_thermal_metrics() task plan was generated so that we could have more than one per lake in order to capture all 6 of the GCMs. Main reason to keep one task per lake with multiple targets for each GCM rather than one task per lake per GCM was so that we only needed to load the lake's hypso file once and then use for everything in the same task. I was able to successfully run this to generate the first round of MN GCM annual thermal metrics. Take a look at 3_summarize_ACCESS.CNRM.GFDL.IPSL.MIROC5.MRI_metric_tasks.yml on Caldera to see the result of the task plan.

The code currently shows the state that was used to do the first cut of this in Feb 2022 (except for the change in c32ed42 to avoid copying ALL of the GLM output feathers). In a near-future PR, I will add more sophisticated use of Tallgrass to take advantage of containerization (something I did not do this time around which might explain the slowness - I just asked for an allocation and kicked off scmake()). Eventually, will need to adjust to accept NetCDF inputs (see DOI-USGS/lake-temperature-process-models#31).

…g this to freeze the version used for annual metrics)

… recipe - include options that allow some sites to not have all the model options

…to have a record

Small changes needed when actually trying to run the pipeline in Feb 2022 for GCMs

…lake-temperature-out into pipeline_new_projected_GLM

…e-intensive

jordansread · 2022-05-11T18:41:19Z

.Renviron

@@ -1 +1,2 @@
-R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library"
+#R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library"


Can you just remove the commented line since this is versioned?

jordansread · 2022-05-11T19:02:30Z

1_fetch.yml

+      full.names = I(TRUE))
+    depends: 
+      - caldera_access_date
+  1_fetch/out/pb0_temp_projections.yml:


Was going to suggest you combine this target with the list.files target, but now I see you are using two base functions and this keeps you from writing a custom function. But something to consider in the future if you run into this since there isn't a clear benefit of having the two be separate (list files is the vector of file names, sc_indicate makes it a hash table).

Ah, yes. In targets I could do both at the same time without a custom function 😄 Custom function might be helpful here since I can name it in a way that describes what's happening.

jordansread · 2022-05-11T19:04:46Z

2_process/src/process_helpers.R

@@ -14,6 +14,10 @@ extract_morphometry <- function(config_fn) {
  return(morphometry_out)
 }

+nml_to_morphometry <- function(in_ind) {
+  purrr::map(readRDS(as_data_file(in_ind)), `[`, c("latitude", "longitude", "H", "A"))


just a check here to verify your functions that are related to morphometry, H/A still work the same way now that we've adjusted those all to have the elevations of the lakes taken into account (instead of all being 320). I'm 99% certain you are fine in all of these cases, but asking here in case you think it may be an issue.

Thanks for raising this point. I have always just used H/A and then converted later to hypso using max H. I don't think that change impacts my uses here.

jordansread · 2022-05-11T19:07:10Z

3_summarize.yml

+    command: do_annual_metrics_multi_lake(
+      final_target = target_name,
+      site_file_yml = "1_fetch/out/pb0_temp_projections.yml",
+      ice_file_yml = I(NULL),


Assume this is because the ice data is now included in the temperature feathers.

jordansread · 2022-05-11T19:07:40Z

3_summarize/src/annual_thermal_metrics.R

@@ -1,6 +1,8 @@

 calculate_annual_metrics_per_lake <- function(out_ind, site_id, site_file, ice_file, temp_ranges_file, morphometry_ind, verbose = FALSE) {

+  if(!file.exists(site_file)) stop("File does not exist. If running summaries for GCM output, try changing `caldera_access_date` and build again.")


jordansread · 2022-05-11T19:12:16Z

3_summarize/src/do_annual_thermal_metric_tasks.R

+    rds_file <- as_data_file(ind)
+    file_pattern <- "(?<modelid>.*)_(?<sitenum>nhdhr_.*)_annual_thermal_metrics.rds" # based on target_name of `calc_annual_metrics` step
+    readRDS(rds_file) %>% 
+      mutate(model_id = str_match(basename(rds_file), file_pattern)[2]) %>% 


Not necessary for a change, but wondering if tidry::extract() is more appropriate in these situations compared to mutate(.. = str_match).

But I'm not sure what basename(rds_file) looks like so maybe I am on the wrong track with this comment. Again, not a necessary change since I think these two options would do the same thing, just perhaps cleaner for extract.

I always forget about extract()! I am only wanting the second group, not a column per group, so I think I will leave in this case.

jordansread · 2022-05-11T19:12:41Z

3_summarize/src/plot_annual_metric_summaries.R

@@ -0,0 +1,30 @@
+# Visualize output
+
+plot_annual_metric_summaries <- function(target_name, in_file, target_dir, model_id_colname) {


skipping over reviewing this plotting function

lindsayplatt · 2022-05-11T19:44:15Z

This closes #57

lindsayplatt and others added 29 commits February 16, 2022 16:42

For now, copy files from one caldera location to another to use (doin…

5090c69

…g this to freeze the version used for annual metrics)

add in a dummy date for controlling the copying behavior

a81f5fe

need copy fxn

7934ec2

read NML file and extract morphometry

c7bb31d

add methods for getting annual metrics for all GCMs to the task table…

cb32987

… recipe - include options that allow some sites to not have all the model options

add optional summary plots for GCM annual metrics

557010a

need data.table for our new, faster annual metrics method!

c5ac5c9

update .Renviron for tallgrass/denali

3af7d77

up the number of default cores

786850c

fix dummy date specification

98d0dbb

add new targets to 1_fetch depends

a6b96c3

remove note bc you can't comment next to lines in a command

b27fe85

update directory for copying files to one that exists already

7b77a72

add pattern to ensure we are only copying the GLM outputs

25c4320

temporary change to the directory (update after next re-build of GLM)

4a2201b

committing code changes from workaround when files were broken, just …

c91acf1

…to have a record

revert temporary workaround

060bf31

include data.table + don't retry + need unique tasks

a1ecc50

updated regex for matching data file name in plotting fxn

e76cf6c

delete test code

50bcfdb

tiny space/line deletions

aa06e95

be explicit about which plot to save

d44a025

include more description

cffceda

Merge pull request #9 from lindsayplatt/gcm_build_02_2022

c5085bf

Small changes needed when actually trying to run the pipeline in Feb 2022 for GCMs

Merge branch 'pipeline_new_projected_GLM' of github.com:lindsayplatt/…

5c81a0e

…lake-temperature-out into pipeline_new_projected_GLM

not going to download from SB

ab7b790

update folder in which to find GLM output files

09f8baa

switch from copying ALL of the GLM output feathers since it is so tim…

c32ed42

…e-intensive

add some caveats about this approach

e768cca

lindsayplatt requested a review from jordansread May 11, 2022 18:38

add helpful stop message in case files don't exist

74810d2

jordansread approved these changes May 11, 2022

View reviewed changes

lindsayplatt added 2 commits May 11, 2022 14:39

create custom function to list + indicate files from a directory

847ba30

don't need the old R_LIBS_USER path

feb4169

lindsayplatt merged commit 2b4520b into DOI-USGS:main May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline changes to run annual metrics for new GCM GLM outputs #59

Add pipeline changes to run annual metrics for new GCM GLM outputs #59

lindsayplatt commented May 11, 2022 •

edited

Loading

jordansread May 11, 2022

lindsayplatt May 11, 2022

jordansread May 11, 2022

lindsayplatt May 11, 2022

jordansread May 11, 2022

lindsayplatt May 11, 2022

jordansread May 11, 2022

lindsayplatt May 11, 2022

jordansread May 11, 2022

jordansread May 11, 2022

lindsayplatt May 11, 2022

jordansread May 11, 2022

lindsayplatt commented May 11, 2022

		@@ -1 +1,2 @@
		R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library"
		#R_LIBS_USER="/cxfs/projects/usgs/water/iidd/data-sci/lake-temp/lake-temperature-out/Rlib_3_6":"/opt/ohpc/pub/usgs/libs/gnu8/R/3.6.3/lib64/R/library"

		@@ -1,6 +1,8 @@

		calculate_annual_metrics_per_lake <- function(out_ind, site_id, site_file, ice_file, temp_ranges_file, morphometry_ind, verbose = FALSE) {

		if(!file.exists(site_file)) stop("File does not exist. If running summaries for GCM output, try changing `caldera_access_date` and build again.")

		@@ -0,0 +1,30 @@
		# Visualize output

		plot_annual_metric_summaries <- function(target_name, in_file, target_dir, model_id_colname) {

Add pipeline changes to run annual metrics for new GCM GLM outputs #59

Add pipeline changes to run annual metrics for new GCM GLM outputs #59

Conversation

lindsayplatt commented May 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindsayplatt commented May 11, 2022

lindsayplatt commented May 11, 2022 •

edited

Loading