Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Failures on the Weekly test run #625

Closed
forsyth2 opened this issue Sep 30, 2024 · 9 comments · Fixed by #628
Closed

[Bug]: Failures on the Weekly test run #625

forsyth2 opened this issue Sep 30, 2024 · 9 comments · Fixed by #628
Labels
semver: bug Bug fix (will increment patch version)

Comments

@forsyth2
Copy link
Collaborator

What happened?

I originally ran the 3 weekly tests (bundles, comprehensive_v2, comprehensive_v3) after merging #604/#617 on 7/31. The intent was to re-run these tests every single week, but I thought it was reasonable to only run them on weeks where pull requests (with code changes rather than say doc changes) were merged into zppy. After all, if no changes had been merged, what would be the point of running the extremely lengthy tests?

For #598, I was testing the latest sets (tc_analysis, enso_diags, streamflow [but apparently missing qbo]) added to the E3SM Diags CDAT migration (https://github.com/E3SM-Project/e3sm_diags/commits/cdat-migration-fy24), using min_case_e3sm_diags_cdat_migrated_sets. However, these three sets didn't show up on the viewer.

Upon testing on main, using weekly_comprehensive_v3, I found that tc_analysis still wasn't plotting (though enso_diags and streamflow were). I then ran all the weekly tests yielding the following results:

weekly_comprehensive_v3

Sets rendered

sub-task sets included in viewer sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave "lat_lon","enso_diags","diurnal_cycle","streamflow" "tc_analysis","tropical_subseasonal"
e3sm_diags > atm_monthly_180x360_aave_mvm "lat_lon", N/A
e3sm_diags > lnd_monthly_mvm_lnd "lat_lon_land" N/A

Why are tc_analysis and tropical_subseasonal missing in the rendering for model-vs-obs?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-20240907/v3.LR.historical_0051/post/scripts/
$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o596145  

gives:

IndexError: list index out of range

and

$ grep -in tropical_subseasonal e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o596145 

gives:

OSError: no files to open

Neither of these error messages are particularly enlightening.

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3
$ ls *_diff* | wc -l
136

136 differences from the expected. Appears to be 7 MERRA2-related E3SM Diags diffs, 1 global-time-series diff, and 128 MPAS-Analysis diffs.

weekly_comprehensive_v2

Sets rendered

sub-task sets included in viewer sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave "lat_lon","diurnal_cycle","streamflow","tc_analysis" "enso_diags"
e3sm_diags > atm_monthly_180x360_aave_mvm "lat_lon", N/A
e3sm_diags > lnd_monthly_mvm_lnd "lat_lon_land" N/A

Why is enso_diags missing in the rendering for model-vs-obs?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main-20240907/v2.LR.historical_0201/post/scripts/
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o596170

That does give:

IndexError: index 0 is out of bounds for axis 0 with size 0
RuntimeError: Requested years are outside of available sst obs records.

But didn't these years work before?? Why would they not now?

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2
$ ls *_diff* | wc -l
12

12 differences from the expected, all of which are MERRA2-related E3SM Diags diffs

weekly_bundles

Sets rendered

sub-task sets included in viewer sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave "polar","enso_diags","diurnal_cycle", N/A
e3sm_diags > atm_monthly_180x360_aave_mvm "polar","enso_diags","streamflow", "tc_analysis"

Why is tc_analysis missing in the rendering for mvm?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main-20240907/v3.LR.historical_0051/post/scripts
$ grep -in tc_analysis bundle3.o597964 

Not particularly enlightening error messages:

RuntimeError: Neither does AODMOM nor the variables in [('AODMOM',)] exist in the file
IndexError: list index out of range

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles
$ ls *_diff* | wc -l
1

1 difference from the expected: 1 global-time-series diff

Possible reasons for image check failures

As far as I can recall, the expected results were generated after the 7/31 merging of #604/#617. There have been no code changes merged to zppy main since then. The only thing that could be different is the E3SM Diags environment used, but even that wouldn't account for the MPAS-Analysis diffs or global-time-series diffs. And even then, I was using conda activate e3sm_diags_20240731, so the diags environment should have been identical to when the expected results were generated.

Lesson learned: always run these tests on a weekly basis (even if no code changes have been merged!), to catch environmental changes e.g., build versions, new e3sm_diags/other package changes.

What machine were you running on?

Chrysalis

Environment

zppy main as of 9/24.

What command did you run?

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 1
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 2 (for second part)

Copy your cfg file

N/A

What jobs are failing?

N/A

What stack trace are you encountering?

N/A
@forsyth2 forsyth2 added the semver: bug Bug fix (will increment patch version) label Sep 30, 2024
@chengzhuzhang
Copy link
Collaborator

@forsyth2 Could you provide some clarification on the weekly test? When and how was the baseline results created? Since when the testing results start to deviate? The logs from each tests will be helpful to trouble shooting.

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Oct 1, 2024

By looking at the diffs, for global time-series plots: expected vs actual. I think they are plotting different datasets...

For the run scripts, could you provide absolute path of following?

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 1
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 2 (for second part)

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 1, 2024

@chengzhuzhang Thanks, my responses below

Question 1

When and how was the baseline results created?

When?

Running ls -l inside /lcrc/group/e3sm/public_html/zppy_test_resources, I see that the expected results were updated on 7/31, which is when #604 and #617 were merged. #604's header also notes that expected results were updated. The one issue I could see arising here is if the expected results were generated between #604 and #617. I feel like I would have ran the weekly test after merging #617, but that's the only possible inconsistency I can spot so far.

Screenshot 2024-10-01 at 2 24 10 PM

Looking at https://github.com/E3SM-Project/zppy/pull/617/files#diff-744a349991da0e2973e1caae3a0033f562ee29b8cb5accea97fd89f9772b13df (screenshot above), the only diff is that Diags are using the latest Diags code rather than latest Unified's Diags.

Assuming the expected results don't reflect the changes in #617, that could explain the diffs in Diags... but not in MPAS-Analysis or Global Time Series.

How?

Running ./tests/integration/generated/update_weekly_expected_files_chrysalis.sh, as specified on https://github.com/E3SM-Project/zppy/blob/main/tests/integration/generated/directions_chrysalis.md, under "Commands to run to replace outdated expected files"

Question 2

Since when the testing results start to deviate

There have only been 2 commits since #604:

As mentioned in the header for this PR, I was operating under the rule of "run weekly tests once per week -- if anything merged into zppy that week" Due to the limited number of commits above, I haven't been running the weekly test often, but rather the smaller demo cfg files (that I've been calling min_case cfgs) while working on zppy code changes.

Question 3

The logs from each tests will be helpful to trouble shooting.

(Latest tests are from 9/27. It appears I mislabeled the directory as 0907 rather than 0927)

Main output for the latest tests:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-20240907/v3.LR.historical_0051/post/scripts
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main-20240907/v2.LR.historical_0201/post/scripts
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main-20240907/v3.LR.historical_0051/post/scripts

Web output for the latest tests:

/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051

(Recall /lcrc/group/e3sm/public_html/ maps to https://web.lcrc.anl.gov/public/e3sm/)

As for output from my earlier tests of #604 (possibly with/without #617):

Main output I no longer seem to have:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/

The web output seems to remain however:

/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-the-new-tests-v17/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-the-new-tests-v17/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-the-new-tests-v17

Let's look at
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-the-new-tests-v17/v3.LR.historical_0051/, specifically the equivalent to the plots you linked in the following comment.

These match: Actual, 9/27 result

These match: Expected, 7/31 result

Question 4

I think they are plotting different datasets

Different data sets as in different variables, different years, different simulations? All of those appear to be consistent between the images (though certainly the plots are different).

Question 5

could you provide absolute path

I was using the the cfg files that can be generated in the zppy repo itself. I've copied replicas to permanent paths below:

/home/ac.forsyth2/zppy_configs/issue_625/test_weekly_bundles_chrysalis.cfg 
/home/ac.forsyth2/zppy_configs/issue_625/test_weekly_comprehensive_v2_chrysalis.cfg
/home/ac.forsyth2/zppy_configs/issue_625/test_weekly_comprehensive_v3_chrysalis.cfg

For reference, files above were generated by:

$ cd /home/ac.forsyth2/ez/zppy
$ git checkout test_main_20240927

# Modify the UNIQUE_ID in tests/integration/utils.py
# We'll set `UNIQUE_ID = "test-main-20240907"` to get the same output/www directories as above.

$ python tests/integration/utils.py

# The changes above are the only changes from `main`

$ cd /home/ac.forsyth2/zppy_configs/issue_625
$ cp ../../ez/zppy/tests/integration/generated/test_weekly_bundles_chrysalis.cfg test_weekly_bundles_chrysalis.cfg
$ cp ../../ez/zppy/tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg test_weekly_comprehensive_v2_chrysalis.cfg
$ cp ../../ez/zppy/tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg test_weekly_comprehensive_v3_chrysalis.cfg

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Oct 2, 2024

I'm visually comparing v17 and 907(latest), for global time series plots. Most if not all panels are different, including original and land variable sets (I think image checker only caught one png diff). By naked eyes, I think first 5 years (1998-1989) in the time series matches between two tests, but the second half (1990-1994) deviates.

@chengzhuzhang
Copy link
Collaborator

@forsyth2 it is hard to analyze without logs from v17 run (that created expected results). Is it possible to re-run the v17 case?

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 2, 2024

Thanks @chengzhuzhang, I have some avenues to explore now. (See specific action items for myself at end of comment)


When I was trying to debug what was going on initially, I ran comprehensive-v3 (not comprehensive-v2 or bundles) on the code from a couple different points in time. I couldn't figure out why the tests weren't passing (after all, the expected results were theoretically based off an identical run). So, from there, I went the route of running all the weekly tests on the latest main to check for issues (leading to the creation of this issue).

Here though, I've returned to those preliminary runs to dive deeper:

(scroll to right to see rest of table)

Branch name test-pre-617 test-post-617
Last commits Test pre-617, Weekly tests (#604) Post-617 changes, Use latest diags in test cfgs (#617), Weekly tests (#604)
UNIQUE_ID = in utils.py "test-main-pre-617" "test-post-617"
Web results https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-pre-617/ https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-post-617/
www = in cfgs /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-pre-617 /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-post-617
output = in cfgs /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-pre-617/v3.LR.historical_0051 /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-post-617/v3.LR.historical_0051
environment_commands in [default] ("" = use latest Unified) "" ""
environment_commands in [e3sm_diags] N/A "source /home/ac.forsyth2/miniconda3/etc/profile.d/conda.sh; conda activate e3sm_diags_20240731"
grep -v "OK" *status in {output}/posts/scripts No errors No errors
sets = for [e3sm_diags] > [[ atm_monthly_180x360_aave ]] "lat_lon","enso_diags","diurnal_cycle","streamflow","tc_analysis","tropical_subseasonal", "lat_lon","enso_diags","diurnal_cycle","streamflow","tc_analysis","tropical_subseasonal",
Diags sets that show up in web results "lat_lon","enso_diags","diurnal_cycle","streamflow" "lat_lon","enso_diags","diurnal_cycle","streamflow"
Missing sets in the viewer "tc_analysis","tropical_subseasonal", "tc_analysis","tropical_subseasonal",
Results python -u -m unittest tests/integration/test_weekly.py, specifically test_comprehensive_v3_images failed: mismatched_images contains 518 items failed: mismatched_images contains 530 items
Image check failures can be seen at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-pre-617/v3.LR.historical_0051/image_check_failures_comprehensive_v3/ https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-post-617/v3.LR.historical_0051/image_check_failures_comprehensive_v3/
Image check failures in these tasks mpas_analysis, global_time_series e3sm_diags (MERRA2), mpas_analysis, global_time_series

Conclusions:

  • tc_analysis, tropical_subseasonal never worked. I just didn't catch that those plots weren't being generated in the viewer (and hence generating no plots that would go on to be the expected results).
  • Diag expected results were based on main between 604 and 617 -- that is, they're based on using E3SM Unified 1.10, not the latest diags code. (A Diags code change between the last Unified and 7/31 must be responsible for those MERRA2 plot differences).
  • The mpas_analysis and global_time_series tasks never matched expected results. The only explanation I can think for this is that I was generating expected results and either A) forgot to run the expected results updater script or B) there is a bug in that script such that the expected results didn't actually get updated.

Action items for me:

  • Debug why certain E3SM Diags sets won't plot (tc_analysis, tropical_subseasonal in v3; enso_diags in v2, tc_analysis in bundles). Is there data missing?
  • Determine if a commit in https://github.com/E3SM-Project/e3sm_diags/commits/main between the last Unified release and 7/31 would have caused the MERRA2 plots to differ.
  • Determine if the expected results updater script is working properly.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 3, 2024

Determine if a commit in https://github.com/E3SM-Project/e3sm_diags/commits/main between the last Unified release and 7/31 would have caused the MERRA2 plots to differ.

Looks like https://github.com/E3SM-Project/e3sm_diags/pull/830/files#diff-772d3a4a1276047ece4b1df6b0e3d5253bbcc3e88410bb8626740bd248a82251 would explain that diff, so I think this is fine. ✅

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 4, 2024

Analysis

(Skip to bottom for conclusions)

I ran the full suite of weekly tests (bundles, comprehensive-v2, comprehensive-v3) on 2 points in the code history:

  1. zppy -- main as of 07/31, after Use latest diags in test cfgs #617 merged. e3sm_diags -- main as of 07/31.
  2. zppy -- main as of 10/03 (only one more commit: Add PR template #618). e3sm_diags -- main as of 10/03

Before each run, I copied the expected results to /lcrc/group/e3sm/public_html/zppy_test_resources_previous/:

  • Before Run 1: expected_results_until_20240731
  • Before Run 2: expected_results_until_20241003

Current expected results are in /lcrc/group/e3sm/public_html/zppy_test_resources/

Summary table (scroll right):

Post-617 v3-only run 9/27 run 10/3 Run 1 10/3 Run 2
Testing against which expected results? original (now in expected_results_until_20240731) original original 10/3 Run 1's results (now in expected_results_until_20241003)
branch test-post-617 test_main_20240927 test-post-617-diag0731 test-main20241003-diags1003
utils.py UNIQUE_ID test-post-617 test-main-20240907 test-post-617-diag0731 test-main20241003-diags1003
utils.py chrysalis diags_env e3sm_diags_20240731 e3sm_diags_20240731 (no change) diags0731-tested1003 (theoretically identical) e3sm_diags_1003
*Requested Diags sets missing on v3 model-vs-obs "tc_analysis","tropical_subseasonal" "tc_analysis","tropical_subseasonal" "tc_analysis","tropical_subseasonal" "tc_analysis","tropical_subseasonal"
*Requested Diags sets missing on v2 model-vs-obs N/A "enso_diags" "enso_diags" "enso_diags"
*Requested Diags sets missing on bundles model-vs-model N/A "tc_analysis" "tc_analysis" "tc_analysis"
**Number of mismatched images according to the test, v3 530 [didn't note] 528 2
**Number of _diff images in the image_check_failures web directory, v3 141 (e3sm_diags (MERRA2), mpas_analysis, global_time_series) 136 (e3sm_diags (MERRA2), mpas_analysis, global_time_series) 139 (e3sm_diags (MERRA2), mpas_analysis, global_time_series) 2 (e3sm_diags (MERRA2))
**Number of mismatched images according to the test, v2 N/A [didn't note] 11 1
**Number of _diff images in the image_check_failures web directory, v2 N/A 12 (e3sm_diags (MERRA2)) 11 (e3sm_diags (MERRA2)) 1 (e3sm_diags (MERRA2))
**Number of mismatched images according to the test, bundles N/A [didn't note] 3 0
**Number of _diff images in the image_check_failures web directory, bundles N/A 1 (global time series) 1 (global time series) 0

Notes on earlier runs:

Web paths:

https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/<UNIQUE_ID>/v3.LR.historical_0051/
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/<UNIQUE_ID>/v2.LR.historical_0201/
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/<UNIQUE_ID>/v3.LR.historical_0051/

www paths:

/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/<UNIQUE_ID>/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/<UNIQUE_ID>/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/<UNIQUE_ID>/

output paths:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/<UNIQUE_ID>/v3.LR.historical_0051
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/<UNIQUE_ID>/v2.LR.historical_0201/
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/<UNIQUE_ID>/v3.LR.historical_0051/

Remaining action items for myself

  • *Get the missing sets to show up in the viewers.
  • **Address Clarify simple_image_name #358. (I think this wasn't priortized because the tests due in fact check all images. The name collison only reduces the number of image diffs shown in the web directory).

Remaining mysteries

  • The updater script definitely works (after all, tests pass after running the updater script), so I really have no clue as to why the expected results didn't match up after merging Use latest diags in test cfgs #617, or at least Testing update #604.
  • Why do the first 3 run columns differ in the number of image failures? In theory, these ran on identical code/environment (potentially with or without Add PR template #618 included, but that didn't change functional code anyway).
  • Why did updating diags from 7/31 code to 10/3 code change 3 MERRA2 images? Nothing in the Diags commit history seems like it would affect that.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 8, 2024

*Get the missing sets to show up in the viewers.

v3 -- TC Analysis

debugging details
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o600264 
2091:2024-10-03 16:15:16,821 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:164) >> 
2093:2024-10-03 16:15:16,821 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:165) >> ============================================
2094:2024-10-03 16:15:16,838 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:226) >> Number of storms: 0
2095:2024-10-03 16:15:16,838 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:227) >> Max length of storms: 0
2096:2024-10-03 16:15:16,839 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
2100:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
2024-10-03 16:15:16,839 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
    test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 172, in generate_tc_metrics_from_te_stitch_file
    te_stitch_vars = _get_vars_from_te_stitch(lines, max_len, num_storms)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 249, in _get_vars_from_te_stitch
    year_start = int(lines[0].split("\t")[2])
IndexError: list index out of range
    test_te_file = os.path.join(
        test_data_path,
        "cyclones_stitch_{}_{}_{}.dat".format(test_name, test_start_yr, test_end_yr),
    )
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988
$ ls
aew_hist_v3.LR.historical_0051_1987_1988.nc          connect_CSne30_v2.dat                             cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat  cyclones_hist_v3.LR.historical_0051_1987_1988.nc  outCSne30.g
$ du cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
0	cyclones_stitch_v3.LR.historical_0051_1987_1988.dat

$ du -sh *
528K	aew_hist_v3.LR.historical_0051_1987_1988.nc
0	aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat
1.9M	connect_CSne30_v2.dat
272K	cyclones_hist_v3.LR.historical_0051_1987_1988.nc
0	cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
1.5M	outCSne30.g

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts
$ grep -in error tc_analysis_1987-1988.o600263 
# No errors show up

Let's compare with v2, which seemed to work.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main20241003-diags1003/v2.LR.historical_0201/post/atm/tc-analysis_1852_1853
$ du -sh *
528K	aew_hist_v2.LR.historical_0201_1852_1853.nc
0	aew_stitch_5e-6_v2.LR.historical_0201_1852_1853.dat
1.9M	connect_CSne30_v2.dat
272K	cyclones_hist_v2.LR.historical_0201_1852_1853.nc
32K	cyclones_stitch_v2.LR.historical_0201_1852_1853.dat
1.5M	outCSne30.g

cyclones_stitch_v2.LR.historical_0201_1852_1853.dat is actually non-empty there. Why?

v3 -- Tropical Subseasonal

debugging details
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep -in tropical_subseasonal e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o600264
2079:2024-10-03 16:15:16,703 [INFO]: tropical_subseasonal_driver.py(calculate_spectrum:143) >> No files to open for U850 within 1987 and 1988 from /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr.
2080:2024-10-03 16:15:16,703 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
2084:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
2086:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
2107:2024-10-03 16:15:16,840 [INFO]: tropical_subseasonal_driver.py(calculate_spectrum:143) >> No files to open for PRECT within 1987 and 1988 from /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr.
2108:2024-10-03 16:15:16,840 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
2112:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
2114:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
2119:2024-10-03 16:15:16,840 [INFO]: tropical_subseasonal_driver.py(calculate_spectrum:143) >> No files to open for FLUT within 1987 and 1988 from /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr.
2120:2024-10-03 16:15:16,841 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
2124:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
2126:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
2024-10-03 16:15:16,703 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
    test, test_start, test_end = calculate_spectrum(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
    var = xr.open_mfdataset(glob.glob(f"{path}/{variable}_*.nc")).sel(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/xarray/backends/api.py", line 1102, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open

2024-10-03 16:15:16,840 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
    test, test_start, test_end = calculate_spectrum(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
    var = xr.open_mfdataset(glob.glob(f"{path}/{variable}_*.nc")).sel(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/xarray/backends/api.py", line 1102, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open

2024-10-03 16:15:16,841 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
    test, test_start, test_end = calculate_spectrum(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
    var = xr.open_mfdataset(glob.glob(f"{path}/{variable}_*.nc")).sel(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/xarray/backends/api.py", line 1102, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open
        test, test_start, test_end = calculate_spectrum(
            parameter.test_data_path,
            variable,
            parameter.test_start_yr,
            parameter.test_end_yr,
        )
$ cd /home/ac.forsyth2/ez/zppy

$ git grep -n test_data_path
docs/source/dev_guide/new_diags_set.rst:179:        dc_param.test_data_path = 'climo_{{ climo_diurnal_subsection }}'
docs/source/dev_guide/new_diags_set.rst:200:        streamflow_param.test_data_path = 'rof_links'
zppy/templates/e3sm_diags.bash:291:param.test_data_path = '${climo_dir_primary}'
zppy/templates/e3sm_diags.bash:308:   param.test_data_path, param.reference_data_path = param.reference_data_path, param.test_data_path
zppy/templates/e3sm_diags.bash:330:land_param.test_data_path = '${climo_dir_primary_land}'
zppy/templates/e3sm_diags.bash:341:   land_param.test_data_path, param.reference_data_path = param.reference_data_path, param.test_data_path
zppy/templates/e3sm_diags.bash:350:enso_param.test_data_path = test_ts
zppy/templates/e3sm_diags.bash:368:   enso_param.test_data_path, enso_param.reference_data_path = enso_param.reference_data_path, enso_param.test_data_path
zppy/templates/e3sm_diags.bash:377:trop_param.test_data_path = '${ts_daily_dir}'
zppy/templates/e3sm_diags.bash:396:   trop_param.test_data_path, trop_param.reference_data_path = trop_param.reference_data_path, trop_param.test_data_path
zppy/templates/e3sm_diags.bash:406:qbo_param.test_data_path = test_ts
zppy/templates/e3sm_diags.bash:426:   qbo_param.test_data_path, qbo_param.reference_data_path = qbo_param.reference_data_path, qbo_param.test_data_path
zppy/templates/e3sm_diags.bash:436:ts_param.test_data_path = test_ts
zppy/templates/e3sm_diags.bash:450:   ts_param.test_data_path, ts_param.reference_data_path = ts_param.reference_data_path, ts_param.test_data_path
zppy/templates/e3sm_diags.bash:459:dc_param.test_data_path = '${climo_diurnal_dir_primary}'
zppy/templates/e3sm_diags.bash:473:   dc_param.test_data_path, dc_param.reference_data_path = dc_param.reference_data_path, dc_param.test_data_path
zppy/templates/e3sm_diags.bash:483:streamflow_param.test_data_path = '${ts_rof_dir_primary}'
zppy/templates/e3sm_diags.bash:502:   streamflow_param.test_data_path, streamflow_param.reference_data_path = streamflow_param.reference_data_path, streamflow_param.test_data_path
zppy/templates/e3sm_diags.bash:511:tc_param.test_data_path = "{{ output }}/post/atm/tc-analysis_${Y1}_${Y2}"
zppy/templates/e3sm_diags.bash:531:   tc_param.test_data_path, tc_param.reference_data_path = tc_param.reference_data_path, tc_param.test_data_path

$ git grep -n "trop_param.test_data_path"
zppy/templates/e3sm_diags.bash:377:trop_param.test_data_path = '${ts_daily_dir}'
zppy/templates/e3sm_diags.bash:396:   trop_param.test_data_path, trop_param.reference_data_path = trop_param.reference_data_path, trop_param.test_data_path

$ git grep -n ts_daily_dir zppy/templates/e3sm_diags.bash
zppy/templates/e3sm_diags.bash:210:ts_daily_dir={{ output }}/post/atm/{{ grid }}/ts/daily/{{ '%dyr' % (ts_num_years) }}
zppy/templates/e3sm_diags.bash:212:ts_daily_dir_ref={{ reference_data_path_ts_daily }}/{{ ts_num_years_ref }}yr
zppy/templates/e3sm_diags.bash:377:trop_param.test_data_path = '${ts_daily_dir}'
zppy/templates/e3sm_diags.bash:387:trop_param.reference_data_path = '${ts_daily_dir_ref}'
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep "'output'" e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'output': '/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051',

$ grep grid e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'grid': '180x360_aave',

$ grep ts_num_years e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'ts_num_years': 2,
  'ts_num_years_ref': 5,
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr
-bash: cd: /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr: No such file or directory

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/
monthly

There is only a monthly subdirectory. That is because the comprehensive v3 test does not have a daily ts subtask.

But this leads to another question: why did e3sm_diags even run then? Shouldn't it have been blocked by a missing dependency?

In e3sm_diags.py:

                    if "tropical_subseasonal" in c["sets"]:
                        add_dependencies(
                            dependencies,
                            scriptDir,
                            "ts",
                            ts_daily_sub,
                            start_yr,
                            end_yr,
                            c["ts_num_years"],
                        )

and

                    if (
                        "ts_daily_subsection" in c.keys()
                        and c["ts_daily_subsection"] != ""
                    ):
                        ts_daily_sub = c["ts_daily_subsection"]
                    else:
                        ts_daily_sub = c["sub"]
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts
$ grep ts_daily_subsection e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'ts_daily_subsection': '',

That implies ts_daily_sub = ""

In zppy/utils.py:

def add_dependencies(
    dependencies: List[str],
    scriptDir: str,
    prefix: str,
    sub: str,
    start_yr: int,
    end_yr: int,
    num_years: int,
):
    y1: int = start_yr
    y2: int = start_yr + num_years - 1
    while y2 <= end_yr:
        dependencies.append(
            os.path.join(
                scriptDir,
                "%s_%s_%04d-%04d-%04d.status" % (prefix, sub, y1, y2, num_years),
            )
        )
        y1 += num_years
        y2 += num_years

That implies our dependency would be {scriptDir}/ts__1987-1988-0002.status

But:

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts
ts_atm_monthly_180x360_aave_1985-1986-0002.status  ts_land_monthly_1985-1986-0002.status     ts_rof_monthly_1985-1986-0002.status
ts_atm_monthly_180x360_aave_1987-1988-0002.status  ts_land_monthly_1987-1988-0002.status     ts_rof_monthly_1987-1988-0002.status
ts_atm_monthly_glb_1985-1989-0005.status           ts_lnd_monthly_glb_1985-1989-0005.status
ts_atm_monthly_glb_1990-1994-0005.status           ts_lnd_monthly_glb_1990-1994-0005.status
$ grep sub e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'climo_diurnal_subsection': 'atm_monthly_diurnal_8xdaily_180x360_aave',
  'climo_subsection': 'atm_monthly_180x360_aave',
  'input_subdir': 'archive/atm/hist',
  'output_format_subplot': [],
            'tropical_subseasonal'],
  'sub': 'atm_monthly_180x360_aave',
  'subsection': 'atm_monthly_180x360_aave',
  'ts_daily_subsection': '',
  'ts_subsection': '',

So, it's using 'sub': 'atm_monthly_180x360_aave',. Why?

                    if (
                        "ts_daily_subsection" in c.keys()
                        and c["ts_daily_subsection"] != ""
                    ):
                        ts_daily_sub = c["ts_daily_subsection"]
                    else:
                        ts_daily_sub = c["sub"]

We don't have ts_daily_subsection defined in the keys....
It's defined as the empty string in default.ini, but it doesn't actually appear in tests/integration/template_weekly_comprehensive_v3.cfg.

v2 -- ENSO Diags

debugging details
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main20241003-diags1003/v2.LR.historical_0201/post/scripts/
$ grep -n "ERROR.*enso_diags" e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o600289 
1914:2024-10-03 16:14:52,457 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1935:2024-10-03 16:14:52,558 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1958:2024-10-03 16:14:52,631 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1978:2024-10-03 16:14:52,702 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1998:2024-10-03 16:14:52,773 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2018:2024-10-03 16:14:52,844 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2066:2024-10-03 16:14:53,279 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2273:2024-10-03 16:15:09,322 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2293:2024-10-03 16:15:09,394 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2345:2024-10-03 16:15:13,185 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2365:2024-10-03 16:15:13,258 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2385:2024-10-03 16:15:13,331 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2024-10-03 16:14:52,457 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 50, in calculate_nino_index
    start_ind = numpy.where(sst_years == start)[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 447, in run_diag
    return run_diag_map(parameter)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 215, in run_diag_map
    ref_nino_index = calculate_nino_index(nino_region_str, parameter, ref=True)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 54, in calculate_nino_index
    raise RuntimeError(msg)
RuntimeError: Requested years are outside of available sst obs records.
$ grep -n "ERROR.*enso_diags" e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o600289 | wc -l
12
$ grep -n "RuntimeError: Requested years are outside of available sst obs records." e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o600289 | wc -l
12

All of the errors are about sst obs.

$ grep obs e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.settings 
  'dc_obs_climo': '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology',
  'obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'prefix': 'e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853',
  'reference_data_path': '/lcrc/group/e3sm/diagnostics/observations/Atm/climatology/',
  'run_type': 'model_vs_obs',
  'streamflow_obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'tag': 'model_vs_obs',
  'tc_obs': '/lcrc/group/e3sm/diagnostics/observations/Atm/tc-analysis/',

$ ls /lcrc/group/e3sm/diagnostics/observations/Atm/time-series/
ceres_ebaf_surface_v2.8  ceres_ebaf_toa_v2.8  COREv2_Flux  ERA-Interim  GPCP_v2.2  HadISST      ISLSCPII_GRDC    NOAA-20C        OMI-MLS
ceres_ebaf_surface_v4.0  ceres_ebaf_toa_v4.0  ERA5         GPCP_1DD     GPCP_v2.3  HadISST2     MERRA2           NOAA-OLR_Daily
ceres_ebaf_surface_v4.1  ceres_ebaf_toa_v4.1  ERA5_Daily   GPCP_OAFLux  GSIM       IMERG_Daily  MERRA2_Aerosols  OAFlux

$ grep -n enso_param e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.bash 
238:enso_param = EnsoDiagsParameter()
239:enso_param.test_data_path = test_ts
240:enso_param.test_name = short_name
241:enso_param.test_start_yr = start_yr
242:enso_param.test_end_yr = end_yr
245:enso_param.reference_data_path = '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/'
246:enso_param.ref_start_yr = ref_start_yr
247:enso_param.ref_end_yr = ref_start_yr + 10
249:params.append(enso_param)

$ grep -n ref_start_yr e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.bash 
209:ref_start_yr = 1850
246:enso_param.ref_start_yr = ref_start_yr
247:enso_param.ref_end_yr = ref_start_yr + 10
269:streamflow_param.ref_start_yr = "1986" # Streamflow gauge station data range from year 1986 to 1995
283:tc_param.ref_start_yr = "1979"

So, the reference years look to be 1850-1860, which there are apparently not sst obs for?

Let's compare to v3

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep obs e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'dc_obs_climo': '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology',
  'obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'prefix': 'e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988',
  'reference_data_path': '/lcrc/group/e3sm/diagnostics/observations/Atm/climatology/',
  'run_type': 'model_vs_obs',
  'streamflow_obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'tag': 'model_vs_obs',
  'tc_obs': '/lcrc/group/e3sm/diagnostics/observations/Atm/tc-analysis/',

So, obs is using the same directory as the v2 test: /lcrc/group/e3sm/diagnostics/observations/Atm/time-series/'

$ grep -n enso_param e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.bash 
239:enso_param = EnsoDiagsParameter()
240:enso_param.test_data_path = test_ts
241:enso_param.test_name = short_name
242:enso_param.test_start_yr = start_yr
243:enso_param.test_end_yr = end_yr
246:enso_param.reference_data_path = '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/'
247:enso_param.ref_start_yr = ref_start_yr
248:enso_param.ref_end_yr = ref_start_yr + 10
250:params.append(enso_param)

$ grep -n ref_start_yr e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.bash 
210:ref_start_yr = 1985
247:enso_param.ref_start_yr = ref_start_yr
248:enso_param.ref_end_yr = ref_start_yr + 10
259:trop_param.ref_start_yr = 2001
282:streamflow_param.ref_start_yr = "1986" # Streamflow gauge station data range from year 1986 to 1995
296:tc_param.ref_start_yr = "1979"

But here we use the reference years 1985-1995.

So, it appears we need to change the ref start year to 1985.

Indeed,

$ cd /lcrc/group/e3sm/diagnostics/observations/Atm/time-series
$ ls HadISST2
sic_187001_201912.nc  sst_187001_201912.nc  ts_187001_201912.nc

seems to suggest data is only available after 1870.

bundles -- TC Analysis

debugging details
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep -in tc_analysis bundle3.o600308 
45:=== tc_analysis_1985-1986.bash ===
27725:=== tc_analysis_1987-1988.bash ===
56071:2024-10-03 17:22:56,772 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:164) >> 
56073:2024-10-03 17:22:56,772 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:165) >> ============================================
56074:2024-10-03 17:22:56,774 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:226) >> Number of storms: 0
56075:2024-10-03 17:22:56,774 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:227) >> Max length of storms: 0
56076:2024-10-03 17:22:56,774 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
56080:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
56082:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 172, in generate_tc_metrics_from_te_stitch_file
56084:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 249, in _get_vars_from_te_stitch
2024-10-03 17:22:56,774 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
    test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 172, in generate_tc_metrics_from_te_stitch_file
    te_stitch_vars = _get_vars_from_te_stitch(lines, max_len, num_storms)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 249, in _get_vars_from_te_stitch
    year_start = int(lines[0].split("\t")[2])
IndexError: list index out of range
    test_te_file = os.path.join(
        test_data_path,
        "cyclones_stitch_{}_{}_{}.dat".format(test_name, test_start_yr, test_end_yr),
    )
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988
$ ls
aew_hist_v3.LR.historical_0051_1987_1988.nc          connect_CSne30_v2.dat                             cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat  cyclones_hist_v3.LR.historical_0051_1987_1988.nc  outCSne30.g

$ du -sh *
528K	aew_hist_v3.LR.historical_0051_1987_1988.nc
0	aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat
1.9M	connect_CSne30_v2.dat
272K	cyclones_hist_v3.LR.historical_0051_1987_1988.nc
0	cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
1.5M	outCSne30.g

We're in the same situation as in v3's tc_analysis problem....

Remaining action items

  • Change logic of e3sm_diags.py to always require ts_daily_subsection to be set for tropical_subseasonal.
                    if (
                        "ts_daily_subsection" in c.keys()
                        and c["ts_daily_subsection"] != ""
                    ):
                        ts_daily_sub = c["ts_daily_subsection"]
                    else:
                        ts_daily_sub = c["sub"]
  • Add daily ts subtask to the v3 test cfg, for tropical_subseasonal to work.
  • Change reference start year for v2 test cfg to be within observation range.
  • Remove duplicate apperance of the tc_analysis set in tests/integration/template_weekly_comprehensive_v2.cfg.

Remaining mysteries

  • What's wrong with tc_analysis on v3 data/years?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: bug Bug fix (will increment patch version)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants