Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After PR #81, Cases on GPUs using MPAS-A and non-simple physics fail at run-time #82

Open
gdicker1 opened this issue Nov 12, 2024 · 0 comments
Assignees
Labels
EW specific This has to do with EarthWorks only - files, goals, code that probably won't be wanted upstream external Has to do with externals invalid This doesn't seem right OpenACC Involves OpenACC porting

Comments

@gdicker1
Copy link
Contributor

While changes in #81 re-enable MPAS-A OpenACC, further work is needed to get correct answers for cases with non-simple physics. #81 allows cases like FKESSLER to get reasonable results but cases like F2000dev or CHAOS2000dev fail as answers diverge from the accepted results and eventually cause runs to fail due to "NaN detected in the 'w' field". Other cases may still finish successfully, but the results will have diverged.

CPU results are unaffected.

Example steps to re-create this problem

  1. Clone EarthWorks, using a version equivalent to tag ewm-2.3.010 or later
  2. Create a case that uses GPUs and some non-simple CAM physics (e.g. F2000dev which uses cam7 physics)
    • Using GPUs in EarthWorks and CESM is under active development, please ask if you are unsure how to request GPUs for a case.
  3. Run ./case.setup, ./case.build, and ./case.submit
  4. The simulation will run for some time. The run will eventually fail either due to exceeding allotted walltime or due to a "NaN detected in the 'w' field"

Links or other context

  • See this comment in #81 for some comparison of CHAOS2000dev runs on GPUs. Results become significantly different as soon as the 2nd "Dynamics timestep" output.
  • See this other comment in #81 for some GPU QPC6 comparisons. The results for QPC6 stays closer to the baseline, but still diverge by the end of simulation. 5 day QPC6 runs finish normally - i.e. without crashing or fatal errors.
@gdicker1 gdicker1 added invalid This doesn't seem right external Has to do with externals EW specific This has to do with EarthWorks only - files, goals, code that probably won't be wanted upstream OpenACC Involves OpenACC porting labels Nov 12, 2024
@gdicker1 gdicker1 self-assigned this Nov 12, 2024
@gdicker1 gdicker1 changed the title After PR #81, Cases on GPUs using MPAS-A and non-simple physics fail After PR #81, Cases on GPUs using MPAS-A and non-simple physics fail at run-time Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EW specific This has to do with EarthWorks only - files, goals, code that probably won't be wanted upstream external Has to do with externals invalid This doesn't seem right OpenACC Involves OpenACC porting
Projects
None yet
Development

No branches or pull requests

1 participant