Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cam6_4_043: Make RRTMGP default radiation in CAM7 #1178

Merged
merged 12 commits into from
Oct 25, 2024

Conversation

brian-eaton
Copy link
Collaborator

@brian-eaton brian-eaton commented Oct 22, 2024

@brian-eaton brian-eaton added the answer changing answer changing tag label Oct 22, 2024
@brian-eaton brian-eaton requested a review from peverwhee October 22, 2024 16:46
@brian-eaton brian-eaton self-assigned this Oct 22, 2024
@cacraigucar
Copy link
Collaborator

@brian-eaton I thought I remembered the 13 month test also being added to help trap a "around the year boundary" error that we had and didn't have a test for at the time. I could be mistaken though

@@ -14,7 +14,7 @@

<!-- Low top upper boundary conditions -->
<ubc_specifier>'Q:H2O->UBC_FILE'</ubc_specifier>
<ubc_file_path>atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.185001-201412_c230509cdf5.nc</ubc_file_path>
<ubc_file_path>atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.1849-2014_c240604.nc</ubc_file_path>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be added to the svn inputdata repo. Please confirm that it has the mandatory metadata as described at: https://www2.cesm.ucar.edu/working_groups/Atmosphere/amwg_datasets.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated ubc_file_path file has the required metadata and has been added to the svn inputdata repo.

@brian-eaton
Copy link
Collaborator Author

brian-eaton commented Oct 23, 2024

@brian-eaton I thought I remembered the 13 month test also being added to help trap a "around the year boundary" error that we had and didn't have a test for at the time. I could be mistaken though

This is something we routinely checked in the old cam tests (pre-cime) by starting the run an hour before midnight on Dec 31. Several of our current tests do have this feature. I have created a set of test mods to do this with the FLTHIST test that I'm adding in this PR.

Update Unfortunately adding a start date just before midnight of Dec 31 ran into a run failure: CICE clock not in sync with ESMF model clock. I will come back to this issue in a future PR.

Copy link
Collaborator

@peverwhee peverwhee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small request - thanks @brian-eaton !

bld/namelist_files/namelist_defaults_cam.xml Show resolved Hide resolved
@@ -3,6 +3,6 @@
./xmlchange ROOTPE='0'
./xmlchange ROF_NCPL=`./xmlquery --value ATM_NCPL`
./xmlchange GLC_NCPL=`./xmlquery --value ATM_NCPL`
./xmlchange CAM_CONFIG_OPTS=' -microphys mg3' --append
./xmlchange CAM_CONFIG_OPTS=' -microphys mg3 -rad rrtmg' --append
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-eaton Is there a reason not to use rrtmgp here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nvhpc test that uses these mods fails when radiation is set to rrtmgp, or rrtmgp_gpu. I tried both. The error looks like this:

deg0062.hsn.de.hpc.ucar.edu 110: Failing in Thread:1
deg0062.hsn.de.hpc.ucar.edu 110: Accelerator Fatal Error: call to cuStreamSynchronize returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
deg0062.hsn.de.hpc.ucar.edu 110:  File: /glade/derecho/scratch/eaton/test-src/cam6_4_041_cam7/src/physics/rrtmgp/ext/rte-kernels/accel/mo_rte_solver_kernels.F90
deg0062.hsn.de.hpc.ucar.edu 110:  Function: sw_solver_2stream:573
deg0062.hsn.de.hpc.ucar.edu 110:  Line: 623

Since the purpose of this PR is just to change the default to rrtmgp for the LT and MT configurations I didn't take the time to chase down the problem with this test, but instead left it using rrtmg as it was already doing. I can update the test in a future PR, or in this one if you know what the problem is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Brian for your update. I see where the problem is. This folder is used by the GPU regression test and the EarthWorks team has identified that there is a problem to run CAM with the default RRTMGP kernels on the GPU. In order to run it correctly, we either need to update the CAM->RRTMGP interface (like this EarthWorksOrg#25) or update the RRTMGP kernel (like this EarthWorksOrg/rte-rrtmgp@ac0f76e). Otherwise, the rrtmgp_gpu option is not expected to work properly here. I am surprised that the rrtmgp option is not working either as it should only turn on the RRTMGP CPU code. But the GPU tests may turn on some ACC directives accidentally. Do you have an error message for the rrtmgp option? The error message posted here seems coming from the rrtmgp_gpu option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jian. I got a similar error message with the rrtmgp option (which in this PR is the default radiation for cam7). The test that's failing, and a sample of the error output:

ERS_Ln9_G4-a100-openacc.ne30pg3_ne30pg3_mg17.F2000dev.derecho_nvhpc.cam-outfrq9s_mg3_default

deg0011.hsn.de.hpc.ucar.edu 61: Failing in Thread:1
deg0011.hsn.de.hpc.ucar.edu 61: Accelerator Fatal Error: call to cuStreamSynchronize returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
deg0011.hsn.de.hpc.ucar.edu 61:  File: /glade/derecho/scratch/eaton/test-src/cam6_4_041_cam7/src/physics/rrtmgp/ext/rte-frontend/mo_rte_lw.F90
deg0011.hsn.de.hpc.ucar.edu 61:  Function: rte_lw:70
deg0011.hsn.de.hpc.ucar.edu 61:  Line: 312

I'm running regression tests now for this PR and plan to leave this test as it has been using rrtmg. It would be good if you could open an issue if you want to get this test running with the rrtmgp_gpu option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Brian. The file you pointed to also has ACC directives and it means even using the rrtmgp option, some code will still be enabled on the GPU by the GPU regression test. I guess it won't work properly if only part of the RRTMGP GPU code is activated. I agree that we can just use rrtmp option here and address this problem in a separate issue. I just want to understand what is going on here. Thanks for your help and clarification.

@peverwhee peverwhee changed the title Make RRTMGP default radiation in CAM7 cam6_4_043: Make RRTMGP default radiation in CAM7 Oct 24, 2024
@brian-eaton brian-eaton merged commit db75458 into ESCOMP:cam_development Oct 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answer changing answer changing tag
Projects
Status: Tag
Development

Successfully merging this pull request may close these issues.

4 participants