-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cam6_4_043: Make RRTMGP default radiation in CAM7 #1178
Conversation
@brian-eaton I thought I remembered the 13 month test also being added to help trap a "around the year boundary" error that we had and didn't have a test for at the time. I could be mistaken though |
@@ -14,7 +14,7 @@ | |||
|
|||
<!-- Low top upper boundary conditions --> | |||
<ubc_specifier>'Q:H2O->UBC_FILE'</ubc_specifier> | |||
<ubc_file_path>atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.185001-201412_c230509cdf5.nc</ubc_file_path> | |||
<ubc_file_path>atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.1849-2014_c240604.nc</ubc_file_path> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs to be added to the svn inputdata repo. Please confirm that it has the mandatory metadata as described at: https://www2.cesm.ucar.edu/working_groups/Atmosphere/amwg_datasets.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated ubc_file_path
file has the required metadata and has been added to the svn inputdata repo.
This is something we routinely checked in the old cam tests (pre-cime) by starting the run an hour before midnight on Dec 31. Several of our current tests do have this feature. I have created a set of test mods to do this with the FLTHIST test that I'm adding in this PR. Update Unfortunately adding a start date just before midnight of Dec 31 ran into a run failure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small request - thanks @brian-eaton !
@@ -3,6 +3,6 @@ | |||
./xmlchange ROOTPE='0' | |||
./xmlchange ROF_NCPL=`./xmlquery --value ATM_NCPL` | |||
./xmlchange GLC_NCPL=`./xmlquery --value ATM_NCPL` | |||
./xmlchange CAM_CONFIG_OPTS=' -microphys mg3' --append | |||
./xmlchange CAM_CONFIG_OPTS=' -microphys mg3 -rad rrtmg' --append |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-eaton Is there a reason not to use rrtmgp
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nvhpc test that uses these mods fails when radiation is set to rrtmgp, or rrtmgp_gpu. I tried both. The error looks like this:
deg0062.hsn.de.hpc.ucar.edu 110: Failing in Thread:1
deg0062.hsn.de.hpc.ucar.edu 110: Accelerator Fatal Error: call to cuStreamSynchronize returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
deg0062.hsn.de.hpc.ucar.edu 110: File: /glade/derecho/scratch/eaton/test-src/cam6_4_041_cam7/src/physics/rrtmgp/ext/rte-kernels/accel/mo_rte_solver_kernels.F90
deg0062.hsn.de.hpc.ucar.edu 110: Function: sw_solver_2stream:573
deg0062.hsn.de.hpc.ucar.edu 110: Line: 623
Since the purpose of this PR is just to change the default to rrtmgp for the LT and MT configurations I didn't take the time to chase down the problem with this test, but instead left it using rrtmg as it was already doing. I can update the test in a future PR, or in this one if you know what the problem is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Brian for your update. I see where the problem is. This folder is used by the GPU regression test and the EarthWorks team has identified that there is a problem to run CAM with the default RRTMGP kernels on the GPU. In order to run it correctly, we either need to update the CAM->RRTMGP interface (like this EarthWorksOrg#25) or update the RRTMGP kernel (like this EarthWorksOrg/rte-rrtmgp@ac0f76e). Otherwise, the rrtmgp_gpu
option is not expected to work properly here. I am surprised that the rrtmgp
option is not working either as it should only turn on the RRTMGP CPU code. But the GPU tests may turn on some ACC directives accidentally. Do you have an error message for the rrtmgp
option? The error message posted here seems coming from the rrtmgp_gpu
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jian. I got a similar error message with the rrtmgp
option (which in this PR is the default radiation for cam7
). The test that's failing, and a sample of the error output:
ERS_Ln9_G4-a100-openacc.ne30pg3_ne30pg3_mg17.F2000dev.derecho_nvhpc.cam-outfrq9s_mg3_default
deg0011.hsn.de.hpc.ucar.edu 61: Failing in Thread:1
deg0011.hsn.de.hpc.ucar.edu 61: Accelerator Fatal Error: call to cuStreamSynchronize returned error 700 (CUDA_ERROR_ILLEGAL_ADDRESS): Illegal address during kernel execution
deg0011.hsn.de.hpc.ucar.edu 61: File: /glade/derecho/scratch/eaton/test-src/cam6_4_041_cam7/src/physics/rrtmgp/ext/rte-frontend/mo_rte_lw.F90
deg0011.hsn.de.hpc.ucar.edu 61: Function: rte_lw:70
deg0011.hsn.de.hpc.ucar.edu 61: Line: 312
I'm running regression tests now for this PR and plan to leave this test as it has been using rrtmg
. It would be good if you could open an issue if you want to get this test running with the rrtmgp_gpu
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Brian. The file you pointed to also has ACC directives and it means even using the rrtmgp
option, some code will still be enabled on the GPU by the GPU regression test. I guess it won't work properly if only part of the RRTMGP GPU code is activated. I agree that we can just use rrtmp
option here and address this problem in a separate issue. I just want to understand what is going on here. Thanks for your help and clarification.
Resolve turning RRTMG-P on by default in CAM7 + some namelist defaults #1143
Remove some tests that added rrtmgp to the cam7 configuration. Not needed since rrtmgp is now the default in cam7.
Remove test of old cam7 development configuration (32 levels) which is no longer needed.
Remove 13 month F2000climo test. This was originally created to make sure we didn't make changes that hurt the performance of our production configuration for CMIP6 simulations. This is no longer needed.
Resolve Create at least one CAM7 regression test on izumi #1154