[Bug]: Failures on the Weekly test run #625

forsyth2 · 2024-09-30T18:56:09Z

What happened?

I originally ran the 3 weekly tests (bundles, comprehensive_v2, comprehensive_v3) after merging #604/#617 on 7/31. The intent was to re-run these tests every single week, but I thought it was reasonable to only run them on weeks where pull requests (with code changes rather than say doc changes) were merged into zppy. After all, if no changes had been merged, what would be the point of running the extremely lengthy tests?

For #598, I was testing the latest sets (tc_analysis, enso_diags, streamflow [but apparently missing qbo]) added to the E3SM Diags CDAT migration (https://github.com/E3SM-Project/e3sm_diags/commits/cdat-migration-fy24), using min_case_e3sm_diags_cdat_migrated_sets. However, these three sets didn't show up on the viewer.

Upon testing on main, using weekly_comprehensive_v3, I found that tc_analysis still wasn't plotting (though enso_diags and streamflow were). I then ran all the weekly tests yielding the following results:

weekly_comprehensive_v3

Sets rendered

sub-task	sets included in viewer	sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave	"lat_lon","enso_diags","diurnal_cycle","streamflow"	"tc_analysis","tropical_subseasonal"
e3sm_diags > atm_monthly_180x360_aave_mvm	"lat_lon",	N/A
e3sm_diags > lnd_monthly_mvm_lnd	"lat_lon_land"	N/A

Why are tc_analysis and tropical_subseasonal missing in the rendering for model-vs-obs?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-20240907/v3.LR.historical_0051/post/scripts/
$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o596145

gives:

IndexError: list index out of range

and

$ grep -in tropical_subseasonal e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o596145

gives:

OSError: no files to open

Neither of these error messages are particularly enlightening.

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3
$ ls *_diff* | wc -l
136

136 differences from the expected. Appears to be 7 MERRA2-related E3SM Diags diffs, 1 global-time-series diff, and 128 MPAS-Analysis diffs.

weekly_comprehensive_v2

Sets rendered

sub-task	sets included in viewer	sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave	"lat_lon","diurnal_cycle","streamflow","tc_analysis"	"enso_diags"
e3sm_diags > atm_monthly_180x360_aave_mvm	"lat_lon",	N/A
e3sm_diags > lnd_monthly_mvm_lnd	"lat_lon_land"	N/A

Why is enso_diags missing in the rendering for model-vs-obs?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main-20240907/v2.LR.historical_0201/post/scripts/
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o596170

That does give:

IndexError: index 0 is out of bounds for axis 0 with size 0
RuntimeError: Requested years are outside of available sst obs records.

But didn't these years work before?? Why would they not now?

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2
$ ls *_diff* | wc -l
12

12 differences from the expected, all of which are MERRA2-related E3SM Diags diffs

weekly_bundles

Sets rendered

sub-task	sets included in viewer	sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave	"polar","enso_diags","diurnal_cycle",	N/A
e3sm_diags > atm_monthly_180x360_aave_mvm	"polar","enso_diags","streamflow",	"tc_analysis"

Why is tc_analysis missing in the rendering for mvm?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main-20240907/v3.LR.historical_0051/post/scripts
$ grep -in tc_analysis bundle3.o597964

Not particularly enlightening error messages:

RuntimeError: Neither does AODMOM nor the variables in [('AODMOM',)] exist in the file
IndexError: list index out of range

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles
$ ls *_diff* | wc -l
1

1 difference from the expected: 1 global-time-series diff

Possible reasons for image check failures

As far as I can recall, the expected results were generated after the 7/31 merging of #604/#617. There have been no code changes merged to zppy main since then. The only thing that could be different is the E3SM Diags environment used, but even that wouldn't account for the MPAS-Analysis diffs or global-time-series diffs. And even then, I was using conda activate e3sm_diags_20240731, so the diags environment should have been identical to when the expected results were generated.

Lesson learned: always run these tests on a weekly basis (even if no code changes have been merged!), to catch environmental changes e.g., build versions, new e3sm_diags/other package changes.

What machine were you running on?

Chrysalis

Environment

zppy main as of 9/24.

What command did you run?

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 1
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 2 (for second part)

Copy your cfg file

N/A

What jobs are failing?

N/A

What stack trace are you encountering?

N/A

The text was updated successfully, but these errors were encountered:

chengzhuzhang · 2024-10-01T19:39:02Z

@forsyth2 Could you provide some clarification on the weekly test? When and how was the baseline results created? Since when the testing results start to deviate? The logs from each tests will be helpful to trouble shooting.

chengzhuzhang · 2024-10-01T19:52:37Z

By looking at the diffs, for global time-series plots: expected vs actual. I think they are plotting different datasets...

For the run scripts, could you provide absolute path of following?

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 1
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 2 (for second part)

forsyth2 · 2024-10-01T22:11:36Z

@chengzhuzhang Thanks, my responses below

Question 1

When and how was the baseline results created?

When?

Running ls -l inside /lcrc/group/e3sm/public_html/zppy_test_resources, I see that the expected results were updated on 7/31, which is when #604 and #617 were merged. #604's header also notes that expected results were updated. The one issue I could see arising here is if the expected results were generated between #604 and #617. I feel like I would have ran the weekly test after merging #617, but that's the only possible inconsistency I can spot so far.

Looking at https://github.com/E3SM-Project/zppy/pull/617/files#diff-744a349991da0e2973e1caae3a0033f562ee29b8cb5accea97fd89f9772b13df (screenshot above), the only diff is that Diags are using the latest Diags code rather than latest Unified's Diags.

Assuming the expected results don't reflect the changes in #617, that could explain the diffs in Diags... but not in MPAS-Analysis or Global Time Series.

How?

Running ./tests/integration/generated/update_weekly_expected_files_chrysalis.sh, as specified on https://github.com/E3SM-Project/zppy/blob/main/tests/integration/generated/directions_chrysalis.md, under "Commands to run to replace outdated expected files"

Question 2

Since when the testing results start to deviate

There have only been 2 commits since #604:

Use latest diags in test cfgs #617, as mentioned above
Add PR template #618, which did not make any functional code or even test changes.

As mentioned in the header for this PR, I was operating under the rule of "run weekly tests once per week -- if anything merged into zppy that week" Due to the limited number of commits above, I haven't been running the weekly test often, but rather the smaller demo cfg files (that I've been calling min_case cfgs) while working on zppy code changes.

Question 3

The logs from each tests will be helpful to trouble shooting.

(Latest tests are from 9/27. It appears I mislabeled the directory as 0907 rather than 0927)

Main output for the latest tests:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-20240907/v3.LR.historical_0051/post/scripts
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main-20240907/v2.LR.historical_0201/post/scripts
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main-20240907/v3.LR.historical_0051/post/scripts

Web output for the latest tests:

/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051

(Recall /lcrc/group/e3sm/public_html/ maps to https://web.lcrc.anl.gov/public/e3sm/)

As for output from my earlier tests of #604 (possibly with/without #617):

Main output I no longer seem to have:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/

The web output seems to remain however:

/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-the-new-tests-v17/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-the-new-tests-v17/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-the-new-tests-v17

Let's look at
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-the-new-tests-v17/v3.LR.historical_0051/, specifically the equivalent to the plots you linked in the following comment.

These match: Actual, 9/27 result

These match: Expected, 7/31 result

Question 4

I think they are plotting different datasets

Different data sets as in different variables, different years, different simulations? All of those appear to be consistent between the images (though certainly the plots are different).

Question 5

could you provide absolute path

I was using the the cfg files that can be generated in the zppy repo itself. I've copied replicas to permanent paths below:

/home/ac.forsyth2/zppy_configs/issue_625/test_weekly_bundles_chrysalis.cfg 
/home/ac.forsyth2/zppy_configs/issue_625/test_weekly_comprehensive_v2_chrysalis.cfg
/home/ac.forsyth2/zppy_configs/issue_625/test_weekly_comprehensive_v3_chrysalis.cfg

For reference, files above were generated by:

$ cd /home/ac.forsyth2/ez/zppy
$ git checkout test_main_20240927

# Modify the UNIQUE_ID in tests/integration/utils.py
# We'll set `UNIQUE_ID = "test-main-20240907"` to get the same output/www directories as above.

$ python tests/integration/utils.py

# The changes above are the only changes from `main`

$ cd /home/ac.forsyth2/zppy_configs/issue_625
$ cp ../../ez/zppy/tests/integration/generated/test_weekly_bundles_chrysalis.cfg test_weekly_bundles_chrysalis.cfg
$ cp ../../ez/zppy/tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg test_weekly_comprehensive_v2_chrysalis.cfg
$ cp ../../ez/zppy/tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg test_weekly_comprehensive_v3_chrysalis.cfg

chengzhuzhang · 2024-10-02T20:10:25Z

I'm visually comparing v17 and 907(latest), for global time series plots. Most if not all panels are different, including original and land variable sets (I think image checker only caught one png diff). By naked eyes, I think first 5 years (1998-1989) in the time series matches between two tests, but the second half (1990-1994) deviates.

chengzhuzhang · 2024-10-02T22:14:39Z

@forsyth2 it is hard to analyze without logs from v17 run (that created expected results). Is it possible to re-run the v17 case?

forsyth2 · 2024-10-02T23:47:32Z

Thanks @chengzhuzhang, I have some avenues to explore now. (See specific action items for myself at end of comment)

When I was trying to debug what was going on initially, I ran comprehensive-v3 (not comprehensive-v2 or bundles) on the code from a couple different points in time. I couldn't figure out why the tests weren't passing (after all, the expected results were theoretically based off an identical run). So, from there, I went the route of running all the weekly tests on the latest main to check for issues (leading to the creation of this issue).

Here though, I've returned to those preliminary runs to dive deeper:

(scroll to right to see rest of table)

Branch name	test-pre-617	test-post-617
Last commits	Test pre-617, Weekly tests (#604)	Post-617 changes, Use latest diags in test cfgs (#617), Weekly tests (#604)
`UNIQUE_ID =` in utils.py	"test-main-pre-617"	"test-post-617"
Web results	https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-pre-617/	https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-post-617/
`www =` in cfgs	/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-pre-617	/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-post-617
`output =` in cfgs	/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-pre-617/v3.LR.historical_0051	/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-post-617/v3.LR.historical_0051
`environment_commands` in `[default]` ("" = use latest Unified)	""	""
`environment_commands` in `[e3sm_diags]`	N/A	"source /home/ac.forsyth2/miniconda3/etc/profile.d/conda.sh; conda activate e3sm_diags_20240731"
`grep -v "OK" *status` in `{output}/posts/scripts`	No errors	No errors
`sets =` for `[e3sm_diags] > [[ atm_monthly_180x360_aave ]]`	"lat_lon","enso_diags","diurnal_cycle","streamflow","tc_analysis","tropical_subseasonal",	"lat_lon","enso_diags","diurnal_cycle","streamflow","tc_analysis","tropical_subseasonal",
Diags sets that show up in web results	"lat_lon","enso_diags","diurnal_cycle","streamflow"	"lat_lon","enso_diags","diurnal_cycle","streamflow"
Missing sets in the viewer	"tc_analysis","tropical_subseasonal",	"tc_analysis","tropical_subseasonal",
Results `python -u -m unittest tests/integration/test_weekly.py`, specifically `test_comprehensive_v3_images`	failed: `mismatched_images` contains 518 items	failed: `mismatched_images` contains 530 items
Image check failures can be seen at	https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-pre-617/v3.LR.historical_0051/image_check_failures_comprehensive_v3/	https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-post-617/v3.LR.historical_0051/image_check_failures_comprehensive_v3/
Image check failures in these tasks	mpas_analysis, global_time_series	e3sm_diags (MERRA2), mpas_analysis, global_time_series

Conclusions:

tc_analysis, tropical_subseasonal never worked. I just didn't catch that those plots weren't being generated in the viewer (and hence generating no plots that would go on to be the expected results).
Diag expected results were based on main between 604 and 617 -- that is, they're based on using E3SM Unified 1.10, not the latest diags code. (A Diags code change between the last Unified and 7/31 must be responsible for those MERRA2 plot differences).
The mpas_analysis and global_time_series tasks never matched expected results. The only explanation I can think for this is that I was generating expected results and either A) forgot to run the expected results updater script or B) there is a bug in that script such that the expected results didn't actually get updated.

Action items for me:

Debug why certain E3SM Diags sets won't plot (tc_analysis, tropical_subseasonal in v3; enso_diags in v2, tc_analysis in bundles). Is there data missing?
Determine if a commit in https://github.com/E3SM-Project/e3sm_diags/commits/main between the last Unified release and 7/31 would have caused the MERRA2 plots to differ.
Determine if the expected results updater script is working properly.

forsyth2 · 2024-10-03T16:49:26Z

Determine if a commit in https://github.com/E3SM-Project/e3sm_diags/commits/main between the last Unified release and 7/31 would have caused the MERRA2 plots to differ.

Looks like https://github.com/E3SM-Project/e3sm_diags/pull/830/files#diff-772d3a4a1276047ece4b1df6b0e3d5253bbcc3e88410bb8626740bd248a82251 would explain that diff, so I think this is fine. ✅

forsyth2 · 2024-10-04T01:33:46Z

Analysis

(Skip to bottom for conclusions)

I ran the full suite of weekly tests (bundles, comprehensive-v2, comprehensive-v3) on 2 points in the code history:

zppy -- main as of 07/31, after Use latest diags in test cfgs #617 merged. e3sm_diags -- main as of 07/31.
zppy -- main as of 10/03 (only one more commit: Add PR template #618). e3sm_diags -- main as of 10/03

Before each run, I copied the expected results to /lcrc/group/e3sm/public_html/zppy_test_resources_previous/:

Before Run 1: expected_results_until_20240731
Before Run 2: expected_results_until_20241003

Current expected results are in /lcrc/group/e3sm/public_html/zppy_test_resources/

Summary table (scroll right):

	Post-617 v3-only run	9/27 run	10/3 Run 1	10/3 Run 2
Testing against which expected results?	original (now in `expected_results_until_20240731`)	original	original	10/3 Run 1's results (now in `expected_results_until_20241003`)
branch	test-post-617	test_main_20240927	test-post-617-diag0731	test-main20241003-diags1003
`utils.py` `UNIQUE_ID`	test-post-617	test-main-20240907	test-post-617-diag0731	test-main20241003-diags1003
`utils.py` chrysalis `diags_env`	e3sm_diags_20240731	e3sm_diags_20240731 (no change)	diags0731-tested1003 (theoretically identical)	e3sm_diags_1003
*Requested Diags sets missing on v3 model-vs-obs	"tc_analysis","tropical_subseasonal"	"tc_analysis","tropical_subseasonal"	"tc_analysis","tropical_subseasonal"	"tc_analysis","tropical_subseasonal"
*Requested Diags sets missing on v2 model-vs-obs	N/A	"enso_diags"	"enso_diags"	"enso_diags"
*Requested Diags sets missing on bundles model-vs-model	N/A	"tc_analysis"	"tc_analysis"	"tc_analysis"
**Number of mismatched images according to the test, v3	530	[didn't note]	528	2
**Number of `_diff` images in the `image_check_failures` web directory, v3	141 (e3sm_diags (MERRA2), mpas_analysis, global_time_series)	136 (e3sm_diags (MERRA2), mpas_analysis, global_time_series)	139 (e3sm_diags (MERRA2), mpas_analysis, global_time_series)	2 (e3sm_diags (MERRA2))
**Number of mismatched images according to the test, v2	N/A	[didn't note]	11	1
**Number of `_diff` images in the `image_check_failures` web directory, v2	N/A	12 (e3sm_diags (MERRA2))	11 (e3sm_diags (MERRA2))	1 (e3sm_diags (MERRA2))
**Number of mismatched images according to the test, bundles	N/A	[didn't note]	3	0
**Number of `_diff` images in the `image_check_failures` web directory, bundles	N/A	1 (global time series)	1 (global time series)	0

Notes on earlier runs:

Post-617 v3-only run is summarized earlier at [Bug]: Failures on the Weekly test run #625 (comment).
9/27 run is summarized earlier at [Bug]: Failures on the Weekly test run #625 (comment).

Web paths:

https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/<UNIQUE_ID>/v3.LR.historical_0051/
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/<UNIQUE_ID>/v2.LR.historical_0201/
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/<UNIQUE_ID>/v3.LR.historical_0051/

www paths:

/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/<UNIQUE_ID>/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/<UNIQUE_ID>/
/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/<UNIQUE_ID>/

output paths:

/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/<UNIQUE_ID>/v3.LR.historical_0051
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/<UNIQUE_ID>/v2.LR.historical_0201/
/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/<UNIQUE_ID>/v3.LR.historical_0051/

Remaining action items for myself

*Get the missing sets to show up in the viewers.
**Address Clarify simple_image_name #358. (I think this wasn't priortized because the tests due in fact check all images. The name collison only reduces the number of image diffs shown in the web directory).

Remaining mysteries

The updater script definitely works (after all, tests pass after running the updater script), so I really have no clue as to why the expected results didn't match up after merging Use latest diags in test cfgs #617, or at least Testing update #604.
Why do the first 3 run columns differ in the number of image failures? In theory, these ran on identical code/environment (potentially with or without Add PR template #618 included, but that didn't change functional code anyway).
Why did updating diags from 7/31 code to 10/3 code change 3 MERRA2 images? Nothing in the Diags commit history seems like it would affect that.

forsyth2 · 2024-10-08T23:45:19Z

*Get the missing sets to show up in the viewers.

v3 -- TC Analysis

debugging details

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o600264 
2091:2024-10-03 16:15:16,821 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:164) >> 
2093:2024-10-03 16:15:16,821 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:165) >> ============================================
2094:2024-10-03 16:15:16,838 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:226) >> Number of storms: 0
2095:2024-10-03 16:15:16,838 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:227) >> Max length of storms: 0
2096:2024-10-03 16:15:16,839 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
2100:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag

2024-10-03 16:15:16,839 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
    test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 172, in generate_tc_metrics_from_te_stitch_file
    te_stitch_vars = _get_vars_from_te_stitch(lines, max_len, num_storms)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 249, in _get_vars_from_te_stitch
    year_start = int(lines[0].split("\t")[2])
IndexError: list index out of range

    test_te_file = os.path.join(
        test_data_path,
        "cyclones_stitch_{}_{}_{}.dat".format(test_name, test_start_yr, test_end_yr),
    )

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988
$ ls
aew_hist_v3.LR.historical_0051_1987_1988.nc          connect_CSne30_v2.dat                             cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat  cyclones_hist_v3.LR.historical_0051_1987_1988.nc  outCSne30.g
$ du cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
0	cyclones_stitch_v3.LR.historical_0051_1987_1988.dat

$ du -sh *
528K	aew_hist_v3.LR.historical_0051_1987_1988.nc
0	aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat
1.9M	connect_CSne30_v2.dat
272K	cyclones_hist_v3.LR.historical_0051_1987_1988.nc
0	cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
1.5M	outCSne30.g

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts
$ grep -in error tc_analysis_1987-1988.o600263 
# No errors show up

Let's compare with v2, which seemed to work.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main20241003-diags1003/v2.LR.historical_0201/post/atm/tc-analysis_1852_1853
$ du -sh *
528K	aew_hist_v2.LR.historical_0201_1852_1853.nc
0	aew_stitch_5e-6_v2.LR.historical_0201_1852_1853.dat
1.9M	connect_CSne30_v2.dat
272K	cyclones_hist_v2.LR.historical_0201_1852_1853.nc
32K	cyclones_stitch_v2.LR.historical_0201_1852_1853.dat
1.5M	outCSne30.g

cyclones_stitch_v2.LR.historical_0201_1852_1853.dat is actually non-empty there. Why?

v3 -- Tropical Subseasonal

debugging details

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep -in tropical_subseasonal e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o600264
2079:2024-10-03 16:15:16,703 [INFO]: tropical_subseasonal_driver.py(calculate_spectrum:143) >> No files to open for U850 within 1987 and 1988 from /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr.
2080:2024-10-03 16:15:16,703 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
2084:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
2086:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
2107:2024-10-03 16:15:16,840 [INFO]: tropical_subseasonal_driver.py(calculate_spectrum:143) >> No files to open for PRECT within 1987 and 1988 from /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr.
2108:2024-10-03 16:15:16,840 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
2112:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
2114:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
2119:2024-10-03 16:15:16,840 [INFO]: tropical_subseasonal_driver.py(calculate_spectrum:143) >> No files to open for FLUT within 1987 and 1988 from /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr.
2120:2024-10-03 16:15:16,841 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
2124:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
2126:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum

2024-10-03 16:15:16,703 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
    test, test_start, test_end = calculate_spectrum(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
    var = xr.open_mfdataset(glob.glob(f"{path}/{variable}_*.nc")).sel(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/xarray/backends/api.py", line 1102, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open

2024-10-03 16:15:16,840 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
    test, test_start, test_end = calculate_spectrum(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
    var = xr.open_mfdataset(glob.glob(f"{path}/{variable}_*.nc")).sel(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/xarray/backends/api.py", line 1102, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open

2024-10-03 16:15:16,841 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tropical_subseasonal_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 47, in run_diag
    test, test_start, test_end = calculate_spectrum(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tropical_subseasonal_driver.py", line 137, in calculate_spectrum
    var = xr.open_mfdataset(glob.glob(f"{path}/{variable}_*.nc")).sel(
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/xarray/backends/api.py", line 1102, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open

        test, test_start, test_end = calculate_spectrum(
            parameter.test_data_path,
            variable,
            parameter.test_start_yr,
            parameter.test_end_yr,
        )

$ cd /home/ac.forsyth2/ez/zppy

$ git grep -n test_data_path
docs/source/dev_guide/new_diags_set.rst:179:        dc_param.test_data_path = 'climo_{{ climo_diurnal_subsection }}'
docs/source/dev_guide/new_diags_set.rst:200:        streamflow_param.test_data_path = 'rof_links'
zppy/templates/e3sm_diags.bash:291:param.test_data_path = '${climo_dir_primary}'
zppy/templates/e3sm_diags.bash:308:   param.test_data_path, param.reference_data_path = param.reference_data_path, param.test_data_path
zppy/templates/e3sm_diags.bash:330:land_param.test_data_path = '${climo_dir_primary_land}'
zppy/templates/e3sm_diags.bash:341:   land_param.test_data_path, param.reference_data_path = param.reference_data_path, param.test_data_path
zppy/templates/e3sm_diags.bash:350:enso_param.test_data_path = test_ts
zppy/templates/e3sm_diags.bash:368:   enso_param.test_data_path, enso_param.reference_data_path = enso_param.reference_data_path, enso_param.test_data_path
zppy/templates/e3sm_diags.bash:377:trop_param.test_data_path = '${ts_daily_dir}'
zppy/templates/e3sm_diags.bash:396:   trop_param.test_data_path, trop_param.reference_data_path = trop_param.reference_data_path, trop_param.test_data_path
zppy/templates/e3sm_diags.bash:406:qbo_param.test_data_path = test_ts
zppy/templates/e3sm_diags.bash:426:   qbo_param.test_data_path, qbo_param.reference_data_path = qbo_param.reference_data_path, qbo_param.test_data_path
zppy/templates/e3sm_diags.bash:436:ts_param.test_data_path = test_ts
zppy/templates/e3sm_diags.bash:450:   ts_param.test_data_path, ts_param.reference_data_path = ts_param.reference_data_path, ts_param.test_data_path
zppy/templates/e3sm_diags.bash:459:dc_param.test_data_path = '${climo_diurnal_dir_primary}'
zppy/templates/e3sm_diags.bash:473:   dc_param.test_data_path, dc_param.reference_data_path = dc_param.reference_data_path, dc_param.test_data_path
zppy/templates/e3sm_diags.bash:483:streamflow_param.test_data_path = '${ts_rof_dir_primary}'
zppy/templates/e3sm_diags.bash:502:   streamflow_param.test_data_path, streamflow_param.reference_data_path = streamflow_param.reference_data_path, streamflow_param.test_data_path
zppy/templates/e3sm_diags.bash:511:tc_param.test_data_path = "{{ output }}/post/atm/tc-analysis_${Y1}_${Y2}"
zppy/templates/e3sm_diags.bash:531:   tc_param.test_data_path, tc_param.reference_data_path = tc_param.reference_data_path, tc_param.test_data_path

$ git grep -n "trop_param.test_data_path"
zppy/templates/e3sm_diags.bash:377:trop_param.test_data_path = '${ts_daily_dir}'
zppy/templates/e3sm_diags.bash:396:   trop_param.test_data_path, trop_param.reference_data_path = trop_param.reference_data_path, trop_param.test_data_path

$ git grep -n ts_daily_dir zppy/templates/e3sm_diags.bash
zppy/templates/e3sm_diags.bash:210:ts_daily_dir={{ output }}/post/atm/{{ grid }}/ts/daily/{{ '%dyr' % (ts_num_years) }}
zppy/templates/e3sm_diags.bash:212:ts_daily_dir_ref={{ reference_data_path_ts_daily }}/{{ ts_num_years_ref }}yr
zppy/templates/e3sm_diags.bash:377:trop_param.test_data_path = '${ts_daily_dir}'
zppy/templates/e3sm_diags.bash:387:trop_param.reference_data_path = '${ts_daily_dir_ref}'

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep "'output'" e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'output': '/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051',

$ grep grid e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'grid': '180x360_aave',

$ grep ts_num_years e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'ts_num_years': 2,
  'ts_num_years_ref': 5,

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr
-bash: cd: /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/daily/2yr: No such file or directory

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/180x360_aave/ts/
monthly

There is only a monthly subdirectory. That is because the comprehensive v3 test does not have a daily ts subtask.

But this leads to another question: why did e3sm_diags even run then? Shouldn't it have been blocked by a missing dependency?

In e3sm_diags.py:

                    if "tropical_subseasonal" in c["sets"]:
                        add_dependencies(
                            dependencies,
                            scriptDir,
                            "ts",
                            ts_daily_sub,
                            start_yr,
                            end_yr,
                            c["ts_num_years"],
                        )

and

                    if (
                        "ts_daily_subsection" in c.keys()
                        and c["ts_daily_subsection"] != ""
                    ):
                        ts_daily_sub = c["ts_daily_subsection"]
                    else:
                        ts_daily_sub = c["sub"]

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts
$ grep ts_daily_subsection e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'ts_daily_subsection': '',

That implies ts_daily_sub = ""

In zppy/utils.py:

def add_dependencies(
    dependencies: List[str],
    scriptDir: str,
    prefix: str,
    sub: str,
    start_yr: int,
    end_yr: int,
    num_years: int,
):
    y1: int = start_yr
    y2: int = start_yr + num_years - 1
    while y2 <= end_yr:
        dependencies.append(
            os.path.join(
                scriptDir,
                "%s_%s_%04d-%04d-%04d.status" % (prefix, sub, y1, y2, num_years),
            )
        )
        y1 += num_years
        y2 += num_years

That implies our dependency would be {scriptDir}/ts__1987-1988-0002.status

But:

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts
ts_atm_monthly_180x360_aave_1985-1986-0002.status  ts_land_monthly_1985-1986-0002.status     ts_rof_monthly_1985-1986-0002.status
ts_atm_monthly_180x360_aave_1987-1988-0002.status  ts_land_monthly_1987-1988-0002.status     ts_rof_monthly_1987-1988-0002.status
ts_atm_monthly_glb_1985-1989-0005.status           ts_lnd_monthly_glb_1985-1989-0005.status
ts_atm_monthly_glb_1990-1994-0005.status           ts_lnd_monthly_glb_1990-1994-0005.status

$ grep sub e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'climo_diurnal_subsection': 'atm_monthly_diurnal_8xdaily_180x360_aave',
  'climo_subsection': 'atm_monthly_180x360_aave',
  'input_subdir': 'archive/atm/hist',
  'output_format_subplot': [],
            'tropical_subseasonal'],
  'sub': 'atm_monthly_180x360_aave',
  'subsection': 'atm_monthly_180x360_aave',
  'ts_daily_subsection': '',
  'ts_subsection': '',

So, it's using 'sub': 'atm_monthly_180x360_aave',. Why?

                    if (
                        "ts_daily_subsection" in c.keys()
                        and c["ts_daily_subsection"] != ""
                    ):
                        ts_daily_sub = c["ts_daily_subsection"]
                    else:
                        ts_daily_sub = c["sub"]

We don't have ts_daily_subsection defined in the keys....
It's defined as the empty string in default.ini, but it doesn't actually appear in tests/integration/template_weekly_comprehensive_v3.cfg.

v2 -- ENSO Diags

debugging details

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main20241003-diags1003/v2.LR.historical_0201/post/scripts/
$ grep -n "ERROR.*enso_diags" e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o600289 
1914:2024-10-03 16:14:52,457 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1935:2024-10-03 16:14:52,558 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1958:2024-10-03 16:14:52,631 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1978:2024-10-03 16:14:52,702 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
1998:2024-10-03 16:14:52,773 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2018:2024-10-03 16:14:52,844 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2066:2024-10-03 16:14:53,279 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2273:2024-10-03 16:15:09,322 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2293:2024-10-03 16:15:09,394 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2345:2024-10-03 16:15:13,185 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2365:2024-10-03 16:15:13,258 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
2385:2024-10-03 16:15:13,331 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver

2024-10-03 16:14:52,457 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.enso_diags_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 50, in calculate_nino_index
    start_ind = numpy.where(sst_years == start)[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 447, in run_diag
    return run_diag_map(parameter)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 215, in run_diag_map
    ref_nino_index = calculate_nino_index(nino_region_str, parameter, ref=True)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/enso_diags_driver.py", line 54, in calculate_nino_index
    raise RuntimeError(msg)
RuntimeError: Requested years are outside of available sst obs records.

$ grep -n "ERROR.*enso_diags" e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o600289 | wc -l
12
$ grep -n "RuntimeError: Requested years are outside of available sst obs records." e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o600289 | wc -l
12

All of the errors are about sst obs.

$ grep obs e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.settings 
  'dc_obs_climo': '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology',
  'obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'prefix': 'e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853',
  'reference_data_path': '/lcrc/group/e3sm/diagnostics/observations/Atm/climatology/',
  'run_type': 'model_vs_obs',
  'streamflow_obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'tag': 'model_vs_obs',
  'tc_obs': '/lcrc/group/e3sm/diagnostics/observations/Atm/tc-analysis/',

$ ls /lcrc/group/e3sm/diagnostics/observations/Atm/time-series/
ceres_ebaf_surface_v2.8  ceres_ebaf_toa_v2.8  COREv2_Flux  ERA-Interim  GPCP_v2.2  HadISST      ISLSCPII_GRDC    NOAA-20C        OMI-MLS
ceres_ebaf_surface_v4.0  ceres_ebaf_toa_v4.0  ERA5         GPCP_1DD     GPCP_v2.3  HadISST2     MERRA2           NOAA-OLR_Daily
ceres_ebaf_surface_v4.1  ceres_ebaf_toa_v4.1  ERA5_Daily   GPCP_OAFLux  GSIM       IMERG_Daily  MERRA2_Aerosols  OAFlux

$ grep -n enso_param e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.bash 
238:enso_param = EnsoDiagsParameter()
239:enso_param.test_data_path = test_ts
240:enso_param.test_name = short_name
241:enso_param.test_start_yr = start_yr
242:enso_param.test_end_yr = end_yr
245:enso_param.reference_data_path = '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/'
246:enso_param.ref_start_yr = ref_start_yr
247:enso_param.ref_end_yr = ref_start_yr + 10
249:params.append(enso_param)

$ grep -n ref_start_yr e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.bash 
209:ref_start_yr = 1850
246:enso_param.ref_start_yr = ref_start_yr
247:enso_param.ref_end_yr = ref_start_yr + 10
269:streamflow_param.ref_start_yr = "1986" # Streamflow gauge station data range from year 1986 to 1995
283:tc_param.ref_start_yr = "1979"

So, the reference years look to be 1850-1860, which there are apparently not sst obs for?

Let's compare to v3

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep obs e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.settings 
  'dc_obs_climo': '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology',
  'obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'prefix': 'e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988',
  'reference_data_path': '/lcrc/group/e3sm/diagnostics/observations/Atm/climatology/',
  'run_type': 'model_vs_obs',
  'streamflow_obs_ts': '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/',
  'tag': 'model_vs_obs',
  'tc_obs': '/lcrc/group/e3sm/diagnostics/observations/Atm/tc-analysis/',

So, obs is using the same directory as the v2 test: /lcrc/group/e3sm/diagnostics/observations/Atm/time-series/'

$ grep -n enso_param e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.bash 
239:enso_param = EnsoDiagsParameter()
240:enso_param.test_data_path = test_ts
241:enso_param.test_name = short_name
242:enso_param.test_start_yr = start_yr
243:enso_param.test_end_yr = end_yr
246:enso_param.reference_data_path = '/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/'
247:enso_param.ref_start_yr = ref_start_yr
248:enso_param.ref_end_yr = ref_start_yr + 10
250:params.append(enso_param)

$ grep -n ref_start_yr e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.bash 
210:ref_start_yr = 1985
247:enso_param.ref_start_yr = ref_start_yr
248:enso_param.ref_end_yr = ref_start_yr + 10
259:trop_param.ref_start_yr = 2001
282:streamflow_param.ref_start_yr = "1986" # Streamflow gauge station data range from year 1986 to 1995
296:tc_param.ref_start_yr = "1979"

But here we use the reference years 1985-1995.

So, it appears we need to change the ref start year to 1985.

Indeed,

$ cd /lcrc/group/e3sm/diagnostics/observations/Atm/time-series
$ ls HadISST2
sic_187001_201912.nc  sst_187001_201912.nc  ts_187001_201912.nc

seems to suggest data is only available after 1870.

bundles -- TC Analysis

debugging details

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main20241003-diags1003/v3.LR.historical_0051/post/scripts

$ grep -in tc_analysis bundle3.o600308 
45:=== tc_analysis_1985-1986.bash ===
27725:=== tc_analysis_1987-1988.bash ===
56071:2024-10-03 17:22:56,772 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:164) >> 
56073:2024-10-03 17:22:56,772 [INFO]: tc_analysis_driver.py(generate_tc_metrics_from_te_stitch_file:165) >> ============================================
56074:2024-10-03 17:22:56,774 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:226) >> Number of storms: 0
56075:2024-10-03 17:22:56,774 [INFO]: tc_analysis_driver.py(_calc_num_storms_and_max_len:227) >> Max length of storms: 0
56076:2024-10-03 17:22:56,774 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
56080:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
56082:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 172, in generate_tc_metrics_from_te_stitch_file
56084:  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 249, in _get_vars_from_te_stitch

2024-10-03 17:22:56,774 [ERROR]: core_parameter.py(_run_diag:269) >> Error in e3sm_diags.driver.tc_analysis_driver
Traceback (most recent call last):
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/parameter/core_parameter.py", line 266, in _run_diag
    single_result = module.run_diag(self)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 91, in run_diag
    test_data["metrics"] = generate_tc_metrics_from_te_stitch_file(test_te_file)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 172, in generate_tc_metrics_from_te_stitch_file
    te_stitch_vars = _get_vars_from_te_stitch(lines, max_len, num_storms)
  File "/home/ac.forsyth2/miniconda3/envs/e3sm_diags_1003/lib/python3.10/site-packages/e3sm_diags/driver/tc_analysis_driver.py", line 249, in _get_vars_from_te_stitch
    year_start = int(lines[0].split("\t")[2])
IndexError: list index out of range

    test_te_file = os.path.join(
        test_data_path,
        "cyclones_stitch_{}_{}_{}.dat".format(test_name, test_start_yr, test_end_yr),
    )

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main20241003-diags1003/v3.LR.historical_0051/post/atm/tc-analysis_1987_1988
$ ls
aew_hist_v3.LR.historical_0051_1987_1988.nc          connect_CSne30_v2.dat                             cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat  cyclones_hist_v3.LR.historical_0051_1987_1988.nc  outCSne30.g

$ du -sh *
528K	aew_hist_v3.LR.historical_0051_1987_1988.nc
0	aew_stitch_5e-6_v3.LR.historical_0051_1987_1988.dat
1.9M	connect_CSne30_v2.dat
272K	cyclones_hist_v3.LR.historical_0051_1987_1988.nc
0	cyclones_stitch_v3.LR.historical_0051_1987_1988.dat
1.5M	outCSne30.g

We're in the same situation as in v3's tc_analysis problem....

Remaining action items

Change logic of e3sm_diags.py to always require ts_daily_subsection to be set for tropical_subseasonal.

                    if (
                        "ts_daily_subsection" in c.keys()
                        and c["ts_daily_subsection"] != ""
                    ):
                        ts_daily_sub = c["ts_daily_subsection"]
                    else:
                        ts_daily_sub = c["sub"]

Add daily ts subtask to the v3 test cfg, for tropical_subseasonal to work.
Change reference start year for v2 test cfg to be within observation range.
Remove duplicate apperance of the tc_analysis set in tests/integration/template_weekly_comprehensive_v2.cfg.

Remaining mysteries

What's wrong with tc_analysis on v3 data/years?

forsyth2 added the semver: bug Bug fix (will increment patch version) label Sep 30, 2024

forsyth2 mentioned this issue Sep 30, 2024

Remove cdscan for e3sm_diags #598

Draft

forsyth2 mentioned this issue Oct 2, 2024

Update pre-commit dependencies #627

Merged

7 tasks

forsyth2 mentioned this issue Oct 9, 2024

Improve input validation and testing #628

Merged

15 tasks

forsyth2 closed this as completed in #628 Oct 12, 2024

forsyth2 mentioned this issue Oct 12, 2024

Update supported python versions #620

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Failures on the Weekly test run #625

[Bug]: Failures on the Weekly test run #625

forsyth2 commented Sep 30, 2024

chengzhuzhang commented Oct 1, 2024

chengzhuzhang commented Oct 1, 2024 •

edited

Loading

forsyth2 commented Oct 1, 2024

chengzhuzhang commented Oct 2, 2024 •

edited

Loading

chengzhuzhang commented Oct 2, 2024

forsyth2 commented Oct 2, 2024 •

edited

Loading

forsyth2 commented Oct 3, 2024

forsyth2 commented Oct 4, 2024

forsyth2 commented Oct 8, 2024

[Bug]: Failures on the Weekly test run #625

[Bug]: Failures on the Weekly test run #625

Comments

forsyth2 commented Sep 30, 2024

What happened?

weekly_comprehensive_v3

Sets rendered

Image check failures

weekly_comprehensive_v2

Sets rendered

Image check failures

weekly_bundles

Sets rendered

Image check failures

Possible reasons for image check failures

What machine were you running on?

Environment

What command did you run?

Copy your cfg file

What jobs are failing?

What stack trace are you encountering?

chengzhuzhang commented Oct 1, 2024

chengzhuzhang commented Oct 1, 2024 • edited Loading

forsyth2 commented Oct 1, 2024

Question 1

Question 2

Question 3

Question 4

Question 5

chengzhuzhang commented Oct 2, 2024 • edited Loading

chengzhuzhang commented Oct 2, 2024

forsyth2 commented Oct 2, 2024 • edited Loading

forsyth2 commented Oct 3, 2024

forsyth2 commented Oct 4, 2024

Analysis

Remaining action items for myself

Remaining mysteries

forsyth2 commented Oct 8, 2024

v3 -- TC Analysis

v3 -- Tropical Subseasonal

v2 -- ENSO Diags

bundles -- TC Analysis

Remaining action items

Remaining mysteries

chengzhuzhang commented Oct 1, 2024 •

edited

Loading

chengzhuzhang commented Oct 2, 2024 •

edited

Loading

forsyth2 commented Oct 2, 2024 •

edited

Loading