Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in FMS when UFS with LM4 is built with Debug flag #1288

Open
JustinPerket opened this issue Sep 5, 2024 · 3 comments
Open

Crash in FMS when UFS with LM4 is built with Debug flag #1288

JustinPerket opened this issue Sep 5, 2024 · 3 comments
Assignees
Labels
bug Something is not working

Comments

@JustinPerket
Copy link

Describe the bug

There's an upcoming PR to introduce GFDL Land Model in UFS (ufs-community/ufs-weather-model#2146) . It is unable to run with -DDEBUG=ON flag, when using the FMS module provided by spack-stack. There is no issue when using an un-optimized compile of FMS 2023.04 with debug flags. However, this is not available to UFS in the modules.

To Reproduce

# recreate failed test
git clone -b feature/LM4  --recursive [email protected]:JustinPerket/ufs-weather-model.git ufs-LM4
cd ufs-LM4/tests
# change regression test to debug 
# currently is : COMPILE | datm_cdeps_lm4 | intel | -DAPP=LND-LM4 | + hera orion gaea | fv3 |
# want: COMPILE | datm_cdeps_lm4 | intel | -DAPP=LND-LM4 -DDEBUG=ON | + hera orion gaea | fv3 |
sed -i 's|-DAPP=LND-LM4|-DAPP=LND-LM4 -DDEBUG=ON|g' lm4_tests.conf
# run LM4 regression tests, resulting in crash
./rt.sh -k -l lm4_tests.conf

The resulting crash occurs in this where statement within FMS monin_obukhov interface:
https://github.com/NOAA-GFDL/FMS/blob/7f585284/monin_obukhov/include/monin_obukhov_inter.inc#L227

Expected behavior

I'll mostly quote @J-Lentz explanation from email:

because the [release build FMS module] is being used, the calculations inside the where clause in monin_obukhov_solve_zeta are speculatively executed without regard for which indices satisfy the masking condition, and in particular, calculations are performed for indices where division by zero occurs. As long as floating point exceptions are disabled, this is benign because the resulting NaN or infinity values are discarded due to the masking condition. But the FMS code inherits the floating point environment of the main program [UFS], and in particular, if [UFS] is built with the -fpe0 flag, then division by zero in the FMS code will trigger a fatal exception, regardless of whether FMS itself was built with -fpe0.

To avoid this issue, if UFS is built with CMake flag -DDEBUG=ON, it then would require use of a debug build of FMS to be available from the spack-stack environment. It would be great to see this for the newer FMS version for spack-stack 1.6.0 on Hera and Gaea (#1215).

System:
Tested to occur on Hera, Gaea

Additional context

I tested using my own debug build of FMS , matching the spack-stack lua file options:
-DGFS_PHYS=ON -DOPENMP=ON -DENABLE_QUAD_PRECISION=ON -DWITH_YAML=OFF -DCONSTANTS=GFS -D32BIT=ON -D64BIT=ON -DFPIC=ON -DUSE_DEPRECATED_IO=ON

but then added the debug flags -g -O0 -check -check noarg_temp_created -check nopointer -warn -warn noerrors -fpe0 -ftrapuv
Then I unloaded the FMS module, set FMS_ROOT to this build, and then the debug UFS-LM4 regressions test ran without issue.

Note that because of NOAA-GFDL/FMS#1532 , the behavior of CMAKE_Fortran_FLAGS_DEBUG changes to be more standard, starting with FMS 2024.02. Then the FMS CMake build options I used are simply:

CMAKE_FLAGS_FROM_SPACK_LUA="-DGFS_PHYS=ON -DOPENMP=ON -DENABLE_QUAD_PRECISION=ON -DWITH_YAML=OFF -DCONSTANTS=GFS -D32BIT=ON -D64BIT=ON -DFPIC=ON -DUSE_DEPRECATED_IO=ON"
CMAKE_FLAGS="-DCMAKE_BUILD_TYPE=Debug $CMAKE_FLAGS_FROM_SPACK_LUA"
@JustinPerket JustinPerket added the bug Something is not working label Sep 5, 2024
@climbfuji
Copy link
Collaborator

I don't think we want a blanket debug fms for all applications in the unified environment. That can have real consequences on runtime. I am thinking that the correct course of action is to work with the FMS developers to address this problem by coding it differently and/or providing the correct flags and directives (for FMS and/or the UFS) to prevent this from happening when FMS is compiled in release mode. In the meantime, we can absolutely provide a debug FMS version in addition to the default release fms version on dedicated systems in spack-stack 1.8.0 (which will have [email protected]).

@JustinPerket
Copy link
Author

JustinPerket commented Sep 5, 2024

Thanks Dom. Unless something else starts using FMS's Monin Obukhov interface, this seems like the issue is limited to the GFDL LM4. Perhaps in my upcoming PR, I could tweak UFS's CMakeLists.txt to use debug-built FMS libraries only if UFS is also debug, and there's a LM4 app?

@climbfuji
Copy link
Collaborator

Thanks Dom. Unless something else starts using FMS's Monin Obukhov interface, this seems like the issue is limited to the GFDL LM4. Perhaps in my upcoming PR, I could tweak UFS's CMakeLists.txt to use debug-built FMS libraries only if UFS is also debug, and there's a LM4 app?

That's a good idea, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
Development

No branches or pull requests

4 participants