-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Failures on the Weekly test run #625
Comments
@forsyth2 Could you provide some clarification on the weekly test? When and how was the baseline results created? Since when the testing results start to deviate? The logs from each tests will be helpful to trouble shooting. |
By looking at the diffs, for global time-series plots: expected vs actual. I think they are plotting different datasets... For the run scripts, could you provide absolute path of following?
|
@chengzhuzhang Thanks, my responses below Question 1
When? Running Looking at https://github.com/E3SM-Project/zppy/pull/617/files#diff-744a349991da0e2973e1caae3a0033f562ee29b8cb5accea97fd89f9772b13df (screenshot above), the only diff is that Diags are using the latest Diags code rather than latest Unified's Diags. Assuming the expected results don't reflect the changes in #617, that could explain the diffs in Diags... but not in MPAS-Analysis or Global Time Series. How? Running Question 2
There have only been 2 commits since #604:
As mentioned in the header for this PR, I was operating under the rule of "run weekly tests once per week -- if anything merged into zppy that week" Due to the limited number of commits above, I haven't been running the weekly test often, but rather the smaller demo Question 3
(Latest tests are from 9/27. It appears I mislabeled the directory as 0907 rather than 0927) Main output for the latest tests:
Web output for the latest tests:
(Recall As for output from my earlier tests of #604 (possibly with/without #617): Main output I no longer seem to have:
The web output seems to remain however:
Let's look at These match: Actual, 9/27 result These match: Expected, 7/31 result Question 4
Different data sets as in different variables, different years, different simulations? All of those appear to be consistent between the images (though certainly the plots are different). Question 5
I was using the the
For reference, files above were generated by:
|
I'm visually comparing v17 and 907(latest), for global time series plots. Most if not all panels are different, including original and land variable sets (I think image checker only caught one png diff). By naked eyes, I think first 5 years (1998-1989) in the time series matches between two tests, but the second half (1990-1994) deviates. |
@forsyth2 it is hard to analyze without logs from v17 run (that created expected results). Is it possible to re-run the v17 case? |
Thanks @chengzhuzhang, I have some avenues to explore now. (See specific action items for myself at end of comment) When I was trying to debug what was going on initially, I ran comprehensive-v3 (not comprehensive-v2 or bundles) on the code from a couple different points in time. I couldn't figure out why the tests weren't passing (after all, the expected results were theoretically based off an identical run). So, from there, I went the route of running all the weekly tests on the latest Here though, I've returned to those preliminary runs to dive deeper: (scroll to right to see rest of table)
Conclusions:
Action items for me:
|
Looks like https://github.com/E3SM-Project/e3sm_diags/pull/830/files#diff-772d3a4a1276047ece4b1df6b0e3d5253bbcc3e88410bb8626740bd248a82251 would explain that diff, so I think this is fine. ✅ |
Analysis(Skip to bottom for conclusions) I ran the full suite of weekly tests (bundles, comprehensive-v2, comprehensive-v3) on 2 points in the code history:
Before each run, I copied the expected results to
Current expected results are in Summary table (scroll right):
Notes on earlier runs:
Web paths:
Remaining action items for myself
Remaining mysteries
|
v3 -- TC Analysisdebugging details
Let's compare with v2, which seemed to work.
v3 -- Tropical Subseasonaldebugging details
There is only a But this leads to another question: why did In
and
That implies In
That implies our dependency would be But:
So, it's using
We don't have v2 -- ENSO Diagsdebugging details
All of the errors are about sst obs.
So, the reference years look to be 1850-1860, which there are apparently not sst obs for? Let's compare to v3
So, obs is using the same directory as the v2 test:
But here we use the reference years 1985-1995. So, it appears we need to change the ref start year to 1985. Indeed,
seems to suggest data is only available after 1870. bundles -- TC Analysisdebugging details
We're in the same situation as in v3's tc_analysis problem.... Remaining action items
Remaining mysteries
|
What happened?
I originally ran the 3 weekly tests (bundles, comprehensive_v2, comprehensive_v3) after merging #604/#617 on 7/31. The intent was to re-run these tests every single week, but I thought it was reasonable to only run them on weeks where pull requests (with code changes rather than say doc changes) were merged into zppy. After all, if no changes had been merged, what would be the point of running the extremely lengthy tests?
For #598, I was testing the latest sets (
tc_analysis
,enso_diags
,streamflow
[but apparently missingqbo
]) added to the E3SM Diags CDAT migration (https://github.com/E3SM-Project/e3sm_diags/commits/cdat-migration-fy24), usingmin_case_e3sm_diags_cdat_migrated_sets
. However, these three sets didn't show up on the viewer.Upon testing on
main
, usingweekly_comprehensive_v3
, I found thattc_analysis
still wasn't plotting (thoughenso_diags
andstreamflow
were). I then ran all the weekly tests yielding the following results:weekly_comprehensive_v3
Sets rendered
Why are
tc_analysis
andtropical_subseasonal
missing in the rendering for model-vs-obs?gives:
and
gives:
Neither of these error messages are particularly enlightening.
Image check failures
Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3/
136 differences from the expected. Appears to be 7 MERRA2-related E3SM Diags diffs, 1 global-time-series diff, and 128 MPAS-Analysis diffs.
weekly_comprehensive_v2
Sets rendered
Why is
enso_diags
missing in the rendering for model-vs-obs?That does give:
But didn't these years work before?? Why would they not now?
Image check failures
Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2/
12 differences from the expected, all of which are MERRA2-related E3SM Diags diffs
weekly_bundles
Sets rendered
Why is
tc_analysis
missing in the rendering for mvm?Not particularly enlightening error messages:
Image check failures
Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles/
1 difference from the expected: 1 global-time-series diff
Possible reasons for image check failures
As far as I can recall, the expected results were generated after the 7/31 merging of #604/#617. There have been no code changes merged to
zppy
main
since then. The only thing that could be different is the E3SM Diags environment used, but even that wouldn't account for the MPAS-Analysis diffs or global-time-series diffs. And even then, I was usingconda activate e3sm_diags_20240731
, so the diags environment should have been identical to when the expected results were generated.Lesson learned: always run these tests on a weekly basis (even if no code changes have been merged!), to catch environmental changes e.g., build versions, new e3sm_diags/other package changes.
What machine were you running on?
Chrysalis
Environment
zppy
main
as of 9/24.What command did you run?
Copy your cfg file
What jobs are failing?
What stack trace are you encountering?
The text was updated successfully, but these errors were encountered: