Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to build UFS-WM on MacOS platform with clang@15/[email protected] #2551

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

natalie-perlin
Copy link
Collaborator

@natalie-perlin natalie-perlin commented Jan 7, 2025

Description:

Updates to build UFS WM on MacOSX platforms, Ventura or Sonoma OS, [email protected], [email protected]
openmpi/5.0.3 (or 4.1.6, tested as well) is built as a part of the spack-stack-1.8.0.

Tested on three MacOS systems:
A: x86_64, Sonoma OS 14.7.2,, XCode 15.4, [email protected], [email protected]
B: M1 , Sonoma OS 14.7.2, XCode 15.4, [email protected], [email protected]
C: (NOAA AWS MacOS instance) M2, Ventura OS 13.6.9, xcode-select command-line tools only; [email protected], [email protected]

Files changed with the options added for MacOS:

  • ./modulefiles/ufs_macosx.gnu.lua (a new file, replacing the old one ufs_macosx.gnu)
  • CMakeList.txt
  • ./tests/compile.sh
  • ./tests/detect_machine.sh
  • ./tests/default_vars.sh
  • ./tests/run_compile.sh
  • ./tests/run_test.sh
  • ./tests/opnReqTest

Running of the UFS-WM was tested as a part of the UFS-SRW App, successfully ran a standard community test. A corresponding PR in the UFS-SRW repo:
ufs-community/ufs-srweather-app#1171

TESTING THE BUILD:
NB: Set the path of your local spack-stack environment location as env. variable stackpath in ./modulefiles/ufs_macosx.gnu.lua, and adjust versions of packages/modules if needed.

  1. Using CMake method, i.e.
    i) load the modulefiles:
module use $PWD/modulefiles
module load ufs_macosx.gnu

ii) set env. variable:
export CMAKE_FLAGS="-DAPP= ... -DCCPP_SUITES="
and
iii) running the ./build.sh script, i.e.: ./build.sh 2>&1 | tee log.build.ufs.001

  1. Using a compile script ./tests/compile.sh, e.g. :
cd ./tests
./compile.sh macosx "-DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v17_coupled_p8,FV3_GFS_v17_coupled_p8_ugwpv1" s2swa.gnu gnu YES NO 2>&1 | tee  log.ufs.build_s2swa.txt

Log files from MacOS systems, built using ./compile.sh script
System A:
MacA.build_log.s2swa.txt
MacA_compile_s2swa.gnu_time.log.txt
MacA.modules.fv3_s2swa.gnu.lua.txt

System C:
MacC.build_log.s2swa.txt
MacC.compile_s2swa.gnu_time.log.txt
MacC.modules.fv3_s2swa.gnu.lua.txt

Priority:

(TBD)

Git Tracking

No regular testing of UFS-WM on MacOS systems is being done

This PR addresses the issues from #2371,
and uses the solution proposed.

UFSWM Blocking Dependencies:

Uses spack-stack-1.8.0
#2453
except for using mapl-2.40.3-esmf-8.6.0 required for the current ufs-wm build

At the moment, all the modules are loaded in the ufs_macosx.gnu.lua file; ufs_common.lua not used


Changes

Library Changes/Upgrades:

Directions for building spack-stack

Detailed directions to build spack-stack-1.8.0 with the software versions used in this PR:
https://docs.google.com/document/d/1Z0L7eujZGtyeZRzcgguyZPsZpkwb2Om7UhFqQPVtxnE/edit?usp=sharing


Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • GaeaC5
    • GaeaC6
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

build.sh Outdated Show resolved Hide resolved
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Jan 7, 2025

@grantfirl new fv3 hash is NOAA-EMC/fv3atm@7d99880

no changes from develop branch for the build.sh script
@natalie-perlin
Copy link
Collaborator Author

@grantfirl new fv3 hash is NOAA-EMC/fv3atm@7d99880

Successfully tested the build of the code (S2SWA) with the hash 7d99880 on NOAA AWS MacOS instance.

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

@jkbk2004
Copy link
Collaborator

@natalie-perlin do you think you can combine ufs_macosx.gnu into ufs_macosx.gnu.lua? I mean like https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/ufs_wcoss2.intel.lua. Also, some test result or instruction will be helpful for people using mac even in sequential mode.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 13, 2025

@natalie-perlin I've been able to build on my system (m2, sonoma 14.5) with a few modifications for uwm (instead of srw). I'm having problems actually building w/ the stack though, somewhere in the module loading (?).

I've merged in your branch w/ a few changes like so:

diff --git a/modulefiles/ufs_common.lua b/modulefiles/ufs_common.lua
index 062fa384..63c5b1df 100644
--- a/modulefiles/ufs_common.lua
+++ b/modulefiles/ufs_common.lua
@@ -6,7 +6,7 @@ local ufs_modules = {
   {["jasper"]          = "2.0.32"},
   {["zlib"]            = "1.2.13"},
   {["libpng"]          = "1.6.37"},
-  {["hdf5"]            = "1.14.0"},
+  {["hdf5"]            = "1.14.3"},
   {["netcdf-c"]        = "4.9.2"},
   {["netcdf-fortran"]  = "4.6.1"},
   {["parallelio"]      = "2.5.10"},
diff --git a/modulefiles/ufs_macosx.gnu.lua b/modulefiles/ufs_macosx.gnu.lua
index 62a39b0a..b472c258 100644
--- a/modulefiles/ufs_macosx.gnu.lua
+++ b/modulefiles/ufs_macosx.gnu.lua
@@ -2,7 +2,7 @@ help([[
 loads UFS Model prerequisites for MacOS clang/gcc ("gnu")
 ]])

-prepend_path("MODULEPATH", "/Users/username/spack-stack/spack-stack-1.8.0/envs/ufs-srw-env/install/modulefiles/Core")
+prepend_path("MODULEPATH", "/Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/Core")

 stack_gnu_ver=os.getenv("stack_apple_clang_ver") or "15.0.0"
 load(pathJoin("stack-apple-clang", stack_gnu_ver))
@@ -15,9 +15,9 @@ load(pathJoin("cmake", cmake_ver))

 local ufs_modules = {
   {["jasper"]          = "2.0.32"},
-  {["zlib"]            = "1.2.13"},
+  {["zlib-ng"]         = "2.1.6"},
   {["libpng"]          = "1.6.37"},
-  {["hdf5"]            = "1.14.0"},
+  {["hdf5"]            = "1.14.3"},
   {["netcdf-c"]        = "4.9.2"},
   {["netcdf-fortran"]  = "4.6.1"},
   {["parallelio"]      = "2.6.2"},
@@ -28,7 +28,6 @@ local ufs_modules = {
   {["g2"]              = "3.5.1"},
   {["g2tmpl"]          = "1.13.0"},
   {["ip"]              = "5.0.0"},
-  {["sp"]              = "2.5.0"},
   {["w3emc"]           = "2.10.0"},
   {["gftl-shared"]     = "1.9.0"},
   {["mapl"]            = "2.40.3-esmf-8.6.0"},
@@ -56,10 +55,7 @@ setenv("CMAKE_OSX_SYSROOT","OSX_SYSROOT")

 setenv("CFLAGS"," -Wno-implicit-function-declaration ")

-if mode() == "load" then
-  LmodMsgRaw([===[
-   Please export these env. variables after the module is successfully loaded:
-       > export LDFLAGS+=" -L${libjpeg_turbo_ROOT}/lib -ljpeg -Wl,-rpath,$libjpeg_turbo_ROOT}/lib -L${jasper_ROOT}/lib -ljasper -Wl,-rpath,${jasper_ROOT}/lib -L${libpng_ROOT}/lib -lpng -Wl,-rpath,${libpng_ROOT}/lib "
-  ]===])
-end
+local ldflags=os.getenv("LDFLAGS") or ""
+       setenv("LDFLAGS", ldflags .. " -L${libjpeg_turbo_ROOT}/lib -ljpeg -Wl,-rpath,$libjpeg_turbo_ROOT}/lib -L${jasper_ROOT}/lib -ljasper -Wl,-rpath,${jasper_ROOT}/lib -L${libpng_ROOT}/lib -lpng -Wl,-rpath,${libpng_ROOT}/lib ")
+
 whatis("Description: UFS build environment")

Steps to build:

max:~/ufs_on_osx/ufs_dw$ module use modulefiles/
max:~/ufs_on_osx/ufs_dw$ module load ufs_macosx.gnu
max:~/ufs_on_osx/ufs_dw$ module list

Currently Loaded Modules:
  1) stack-apple-clang/15.0.0  12) snappy/1.1.10           23) crtm-fix/2.4.0.1_emc  34) pflogger/1.14.0
  2) pmix/5.0.1                13) zstd/1.5.2              24) git-lfs/3.4.1         35) pigz/2.8
  3) openmpi/5.0.3             14) c-blosc/1.21.5          25) crtm/2.4.0.1          36) tar/1.34
  4) stack-openmpi/5.0.3       15) netcdf-c/4.9.2          26) g2/3.5.1              37) gettext/0.22.5
  5) curl/8.6.0                16) netcdf-fortran/4.6.1    27) g2tmpl/1.13.0         38) libxcrypt/4.4.35
  6) cmake/3.27.9              17) parallel-netcdf/1.12.3  28) ip/5.0.0              39) sqlite/3.43.2
  7) libjpeg/2.1.0             18) parallelio/2.6.2        29) w3emc/2.10.0          40) python/3.11.7
  8) jasper/2.0.32             19) esmf/8.6.0              30) gftl/1.14.0           41) mapl/2.40.3-esmf-8.6.0
  9) zlib-ng/2.1.6             20) llvm-openmp/18.1.0      31) gftl-shared/1.9.0     42) scotch/7.0.4
 10) libpng/1.6.37             21) fms/2024.02             32) fargparse/1.8.0       43) nccmp/1.9.0.1
 11) hdf5/1.14.3               22) bacio/2.4.1             33) yafyaml/1.4.0         44) ufs_macosx.gnu



max:~/ufs_on_osx/ufs_dw$ cd tests
 ./compile.sh macosx  "-DAPP=DATM-WAV -DDEBUG=ON" cdeps.gnu gnu YES NO 2>&1 | tee cdeps.log

Which gives

max:~/ufs_on_osx/ufs_dw/tests$ ./compile.sh macosx  "-DAPP=DATM-WAV -DDEBUG=ON" cdeps.gnu gnu YES NO 2>&1 | tee cdeps.log
+ SECONDS=0
++ realpath ./compile.sh
+ SCRIPT_REALPATH=/Users/max/ufs_on_osx/ufs_dw/tests/compile.sh
++ dirname /Users/max/ufs_on_osx/ufs_dw/tests/compile.sh
+ MYDIR=/Users/max/ufs_on_osx/ufs_dw/tests
+ readonly MYDIR
+ readonly ARGC=6
+ ARGC=6
+ [[ 6 -lt 2 ]]
+ MACHINE_ID=macosx
+ MAKE_OPT='-DAPP=DATM-WAV -DDEBUG=ON'
+ COMPILE_ID=cdeps.gnu
+ RT_COMPILER=gnu
+ clean_before=YES
+ clean_after=NO
+ BUILD_NAME=fv3_cdeps.gnu
++ cd /Users/max/ufs_on_osx/ufs_dw/tests/..
++ pwd
+ PATHTR=/Users/max/ufs_on_osx/ufs_dw
++ pwd
+ BUILD_DIR=/Users/max/ufs_on_osx/ufs_dw/tests/build_fv3_cdeps.gnu
+ [[ macosx == derecho ]]
+ BUILD_JOBS=8
+ set +x
./compile.sh: line 60: /Users/max/ufs_on_osx/ufs_dw/modulefiles/ufs_macosx.gnu: No such file or directory

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin I'm not sure about the thumbs up? Is that for the build vs the actual compile using the build (which is failing)...

@natalie-perlin
Copy link
Collaborator Author

@natalie-perlin I've been able to build on my system (m2, sonoma 14.5) with a few modifications for uwm (instead of srw). I'm having problems actually building w/ the stack though, somewhere in the module loading (?).

./compile.sh: line 60: /Users/max/ufs_on_osx/ufs_dw/modulefiles/ufs_macosx.gnu: No such file or directory

@DeniseWorthen - thank you for testing!..
It looks like the modulefile is expected to be a bash file, not a *lua modulefile. The ./tests/compile.sh does not attempt to the modulefile but to source it instead: has the following:

   macosx|linux)
    source "${PATHTR}/modulefiles/ufs_${MACHINE_ID}.${RT_COMPILER}"

Keeping the ufs_macosx.gnu as a bash file and not converting it to *.lua modulefile (as Jong @jkbk2004 suggested, I think) also solves the issue with environmental variables being available in bash modulefile, but not in *.lua file:
#2551 (comment)
#2551 (comment)

@natalie-perlin
Copy link
Collaborator Author

@barlage - Are you using the older UFS model code, or the code from this PR?..
The runtime error that you are seeing used to be due to the ./ufs-weather-model/CMakeLists.txt using Fortran as a linker language; it needs to be set to CXX for MacOS to link it properly, see: #2371 (comment)

So yes, the previous discussion resolved that issue, but the current PR is the one addresses and implements these changes , in partucular, to the ./ufs-weather-model/CMakeLists.txt
This PR addresses this issue!

@barlage
Copy link
Collaborator

barlage commented Jan 14, 2025

@natalie-perlin I also added to the end of the previous comment, i.e., it looks like the CXX linker is being used.

Here's my set-up:

[~/src/models/macos_test/natalie_test]$ gitb
* dev_macosx 22d5dbc5 [origin/dev_macosx] Merge branch 'develop' into dev_macosx
[~/src/models/macos_test/natalie_test]$ gitr
origin  https://github.com/natalie-perlin/ufs-weather-model.git (fetch)
origin  https://github.com/natalie-perlin/ufs-weather-model.git (push)

@natalie-perlin
Copy link
Collaborator Author

@barlage - It is possible that other additional libraries installed are getting in the way? In contrary to the sequence of errors in #2371 (comment) , there is no need to install llvm_openmp and to use "-DCMAKE_SHARED_LINKER_FLAGS="${llvm_openmp_ROOT}/lib/libomp.dylib" flag.

@natalie-perlin
Copy link
Collaborator Author

@barlage - is there a chance to look at the spack-stack-1.8.0 build log, and at the ufs_model build log (with the BUILD_VERBOSE=1 option), to see the paths and the libraries are being linked against?

@barlage
Copy link
Collaborator

barlage commented Jan 14, 2025

@barlage - It is possible that other additional libraries installed are getting in the way? In contrary to the sequence of errors in #2371 (comment) , there is no need to install llvm_openmp and to use "-DCMAKE_SHARED_LINKER_FLAGS="${llvm_openmp_ROOT}/lib/libomp.dylib" flag.

I did not use any of these previous CMAKE flags.

I'll rebuild ufs_weather_model and upload the build logs.

@natalie-perlin
Copy link
Collaborator Author

Adding the logs from the SRW runs that are based on ATM-only UFS-WM, one machine uses openmpi/4.1.6 , another uses openmpi/5.0.3:

x86_64, Sonoma OS, openmpi/4.1.6: SRW_log.fcst.001.txt
M2, Ventura (NOAA AWS EC2 instance), openmpi/5.0.3:SRW_log.fcst.002.txt

@barlage
Copy link
Collaborator

barlage commented Jan 14, 2025

@barlage - is there a chance to look at the spack-stack-1.8.0 build log, and at the ufs_model build log (with the BUILD_VERBOSE=1 option), to see the paths and the libraries are being linked against?

log.build.ufs.002.txt

The spack-stack build log is too big so I put it here.

@natalie-perlin
Copy link
Collaborator Author

@barlage - is there a chance to look at the spack-stack-1.8.0 build log, and at the ufs_model build log (with the BUILD_VERBOSE=1 option), to see the paths and the libraries are being linked against?

log.build.ufs.002.txt

The spack-stack build log is too big so I put it here.

Thank you!.. Let me look through the logs for any clues.

@natalie-perlin
Copy link
Collaborator Author

@barlage - Thank you for the logs!.. They look totally fine and as expected

I've found a culprit that changes the runtime outcome, it's an additional LDFLAG =" -Wl,-no_compact_unwind" added at a later stage in my testing.

The flag " -Wl,-no_compact_unwind" prevents the ld warnings during the build, but results in the runtime error. If not used for linking, there are several warnings generated, but the ufs_model executable assembled works fine.

There may still be some other compiler/linker flags, other than "-Wl, -no_compact_unwind" to prevent numerous linker warnings, but so far they are not causing any issues during the runtime.
As for the SRW, even if this flag is used, all other binaries except for the ufs_model work fine as well.

Will update my PR momentarily, after a couple of more things to try.

The bottom line, the ufs_macos.gnu.lua should have the following (note there is no ldflags_add variable used):

local libjpeg_ROOT = os.getenv("libjpeg_turbo_ROOT")
local jasper_ROOT = os.getenv("jasper_ROOT")
local libpng_ROOT = os.getenv("libpng_ROOT")
local ldflags0 = os.getenv("LDFLAGS") or ""

if jasper_ROOT and libpng_ROOT and libjpeg_ROOT then
   local ldflags1 = " -L" .. libjpeg_ROOT .. "/lib -ljpeg -Wl,-rpath," .. libjpeg_ROOT .. "/lib"
   local ldflags2 = " -L" .. jasper_ROOT .. "/lib -ljasper -Wl,-rpath," .. jasper_ROOT .. "/lib"
   local ldflags3 = " -L" .. libpng_ROOT .. "/lib -lpng -Wl,-rpath," .. libpng_ROOT .. "/lib"
   local ldflags = ldflags0 .. ldflags_add .. ldflags1 .. ldflags2 .. ldflags3
   setenv("LDFLAGS", ldflags)
end

@GeorgeVandenberghe-NOAA
Copy link
Collaborator

@natalie-perlin
Copy link
Collaborator Author

What does this flag actually do at load time?
-- George W Vandenberghe Lynker Technologies at * NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 @.** 301-683-3769(work) 3017751547(cell)

The flag "-Wl,-no_compact_unwind " is used during the linking time, or rather it is need not to be used for this case. The info found on error warnings generated at a linking stage when this flag is not used are below. All the issues are related to additional CXX libraries required by the ESMF. Note that all ESMF unit tests all pass successfully as they use mpicxx linker.


The warning: ld: warning: could not create compact unwind
occurs during the linking phase on macOS when the linker (ld) cannot generate compact unwind(*) information for some parts of the code.

(*) Compact unwind information is a data structure used by macOS for efficient stack unwinding during:

  • Exception handling: For languages like C++ or Objective-C that use exceptions.
  • Debugging and profiling: Tools like lldb and Instruments rely on accurate unwind information.
    This compact representation is a space-saving alternative to full unwind tables, allowing the system to perform faster and more efficient stack trace analysis.

Why the Warning Occurs:
The warning indicates that for certain parts of the code, the linker cannot create compact unwind information. This can happen for several reasons:

  • Inline Assembly: If the code includes inline assembly or uses constructs that do not conform to macOS’s calling conventions, the linker might struggle to generate compact unwind data.
  • Unsupported Compiler Options: Certain compiler flags or attributes (e.g., __attribute__((naked))) may prevent the generation of unwind tables.
  • Third-Party Libraries: If you're linking against libraries that lack proper unwind information, the linker may issue this warning.
  • Custom Stack Manipulation: Low-level stack manipulation (e.g., via assembly or setjmp/longjmp) can interfere with unwind generation.
  • Mismatched Architectures: Building for an architecture that doesn't fully support compact unwind (e.g., ARM64 vs. x86_64) may result in this issue.
  • Unusual Linker Behavior: Rare cases in which the linker simply cannot resolve certain code paths for compact unwind generation.

What Are the Consequences?

  • For Most Applications: This warning is usually harmless if you’re not relying on exception handling or stack unwinding in the affected code.
  • For Applications Using Exceptions or Debugging: The lack of compact unwind information could lead to:

-- Crashes or undefined behavior when exceptions are thrown.
-- Inaccurate stack traces in debugging tools.

How to Resolve the Warning
Here are the potential fixes, depending on your project:

  • Disable Compact Unwind Tables: If compact unwind information is unnecessary for your project:
    clang++ -Xlinker -no_compact_unwind -o my_program my_program.cpp
    This suppresses the generation of compact unwind tables and removes the warning.
  • Avoid Inline Assembly: Replace any inline assembly with high-level constructs (or ensure the assembly adheres to macOS ABI conventions).
  • Ensure Proper Compiler and Linker Flags: Use consistent flags like -fexceptions or -funwind-tables for parts of the code that require exception handling. Conversely, use -fno-exceptions and -fno-unwind-tables for parts that don’t.
  • Update Third-Party Libraries: Ensure that all libraries linked in your project are up-to-date and compatible with macOS’s unwind system.
  • Debug Verbosely: Use the -v flag during compilation and linking to identify the specific file or symbol causing the issue: clang++ -v -o my_program my_program.cpp
  • Check for Architecture Compatibility: Ensure that all components (source code, libraries, etc.) are built for the same target architecture.
  • Consult Apple Documentation: For low-level details on compact unwind and ABI requirements, consult Apple's macOS ABI documentation.

@barlage
Copy link
Collaborator

barlage commented Jan 16, 2025

Good news! I just ran the first ever successful UFS simulation on my Mac laptop with the most recent @natalie-perlin commit (55e52e5), the only modification being the stackpath in ufs_macosx.gnu.lua. [as natalie noted above, you will get a ton of warnings in the build]

system: M3/Sonoma 14.7.2/Xcode 15.4/CLT 15.3
spack-stack: 1.8.0, ufs-weather-model-env
model build: -DAPP=ATM -D32BIT=ON -DCCPP_SUITES=FV3_GFS_v17_p8
model sim: C24, 1 day simulation

🥳
UFSonMac

@DeniseWorthen
Copy link
Collaborator

@barlage That's great! I'm inspired...

A question--do you get innumerable pop up windows saying something about allowing incoming connections (something like that) when you run?

@barlage
Copy link
Collaborator

barlage commented Jan 16, 2025

@barlage That's great! I'm inspired...

A question--do you get innumerable pop up windows saying something about allowing incoming connections (something like that) when you run?

@DeniseWorthen I didn't get anything like that with this simulation.

@natalie-perlin
Copy link
Collaborator Author

Good news! I just ran the first ever successful UFS simulation on my Mac laptop with the most recent @natalie-perlin commit (55e52e5), the only modification being the stackpath in ufs_macosx.gnu.lua. [as natalie noted above, you will get a ton of warnings in the build]

system: M3/Sonoma 14.7.2/Xcode 15.4/CLT 15.3 spack-stack: 1.8.0, ufs-weather-model-env model build: -DAPP=ATM -D32BIT=ON -DCCPP_SUITES=FV3_GFS_v17_p8 model sim: C24, 1 day simulation

🥳

@barlage - thank you so much for confirming it worked well for you!!

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin I was able to compile on my system after your changes. I will try this weekend to actually run.

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin I'm now able to build and run the ultra-low resolution (C12-9deg) coupled configuration I've been working on. This is for my M2, 14.5 studio (12 cores). There are some issues though....

I ended up rebuilding my 1.8, mostly because I ended up w/ two cmake installations and I was not able (even w/ spack uninstall /hash) to resolve it. In my rebuild, I believe I essentially was able to do everything in the instructions, w/ a few tweaks to the site/packages to align w/ uwm (eg not building metplus)

Here are the issues I'm still seeing:

  1. Using compile.sh to build, I see several messages of this type at the start
Lmod Warning: Syntax error in file: /Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname      Module Filename
    ---------------      ---------------
    stack-openmpi/5.0.3  /Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
    ufs_macosx.gnu       /Users/max/ufs_on_osx/ufs_dw/modulefiles/ufs_macosx.gnu.lua

But the compile still succeeds so I'm not sure what exactly the issue is.

  1. Building in debug mode is very fast but building in release mode just seems to hang, after the libfv3atm.a is created. Is there a better (faster) way to compile? The only thing that really works so far is to use -O1 for Release.

  2. I haven't yet succeeding in getting the model to run for more than a few timesteps, which took ~20mins. @barlage I know you're running standalone, but how long did your 1-day sim take?

I also have available a DATM-S2SW configuration. I'll try w/ that tomorrow.

@natalie-perlin
Copy link
Collaborator Author

natalie-perlin commented Jan 19, 2025

@natalie-perlin I'm now able to build and run the ultra-low resolution (C12-9deg) coupled configuration I've been working on. This is for my M2, 14.5 studio (12 cores). There are some issues though....

I ended up rebuilding my 1.8, mostly because I ended up w/ two cmake installations and I was not able (even w/ spack uninstall /hash) to resolve it. In my rebuild, I believe I essentially was able to do everything in the instructions, w/ a few tweaks to the site/packages to align w/ uwm (eg not building metplus)

Here are the issues I'm still seeing:

  1. Using compile.sh to build, I see several messages of this type at the start
Lmod Warning: Syntax error in file: /Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname      Module Filename
    ---------------      ---------------
    stack-openmpi/5.0.3  /Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
    ufs_macosx.gnu       /Users/max/ufs_on_osx/ufs_dw/modulefiles/ufs_macosx.gnu.lua

But the compile still succeeds so I'm not sure what exactly the issue is.

  1. Building in debug mode is very fast but building in release mode just seems to hang, after the libfv3atm.a is created. Is there a better (faster) way to compile? The only thing that really works so far is to use -O1 for Release.
  2. I haven't yet succeeding in getting the model to run for more than a few timesteps, which took ~20mins. @barlage I know you're running standalone, but how long did your 1-day sim take?

I also have available a DATM-S2SW configuration. I'll try w/ that tomorrow.

Denise, thanks for testing!

To answer your questions -

  1. This is an issue with the spack-stack Lmod stack-openmpi/5.0.3.lua module that needs to be fixed. It happens if you have the module loaded already and you load it again. It does not affect the build, so it is just a nuisance at the moment.

  2. Yes, building of the S2SWA is very slow on a final stage, on some machines takes longer than on others; while other configurations go faster on any machine. As the attached log files show, attached Mac[A,C]compile_s2swa_gnu_time.log.txt, the difference could be 4 times:

MacA/x86_64, Sonoma:

Compile s2swa.gnu elapsed time 998 seconds. -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v17_coupled_p8,FV3_GFS_v17_coupled_p8_ugwpv1 -DMPI=ON -DCMAKE_BUILD_TYPE=Release

MacC/M2, Ventura:
Compile s2swa.gnu elapsed time 4156 seconds. -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v17_coupled_p8,FV3_GFS_v17_coupled_p8_ugwpv1 -DMPI=ON -DCMAKE_BUILD_TYPE=Release

As to testing the confuguration builds, I have tested the following:
ATM, ATMW, ATMAERO, CMAQ, NG-GODAS, S2SW, S2SWA

  1. Running fully coupled simulations may need much more cpu power that 8 cpus on the current systems I tested, vs. hundreds of cpus on HPCs. It is very likely that having a Mac as a Tier 2/3? system could be viewed as a tool for i) onboarding new users and learning to build, run, and test the UFS and the Apps, making sure it is accessible to do for a wider audience - on par with general Linux (Windows not yet in the picture), ii) estimating it's practical use for different purposes, e.g., for academic community users to test the system before including it in any proposals involving HPC resources, and iii) for developing purposes, testing, building, running/functionality, while traveling or not being tightly dependent on access to HPC resources... (my 2 cents)

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 19, 2025

Thanks. The new C12-9deg configuration for S2S is meant to run on only 11 tasks, even on HPCs. It's obviously not designed to test science, but thought it might be small enough for what I have.

EDIT: I'm not sure what I did differently, but I am now able to run a 1-day C12-9deg coupled configuration

     ENDING DATE-TIME    JAN 19,2025  15:31:40.528   19  SUN   2460695
     PROGRAM ufs       HAS ENDED.
* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
*****************RESOURCE STATISTICS*******************************
The total amount of wall time                        = 0.000000
The total amount of time in user mode                = 23.920812
The total amount of time in sys mode                 = 2.230397
*****************END OF RESOURCE STATISTICS*************************

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 20, 2025

@natalie-perlin I've also been able to build and run 1-day of my C12-9deg using my M4 (Sequoia 15.2, clang 16.0.0). I did set all the RELEASE flags to be -01 to get it to compile.

@barlage
Copy link
Collaborator

barlage commented Jan 21, 2025

  1. I haven't yet succeeding in getting the model to run for more than a few timesteps, which took ~20mins. @barlage I know you're running standalone, but how long did your 1-day sim take?

@DeniseWorthen it seems you've had success, but for completeness, my 1-day C24 ATM run took about 1 minute to run. I'm using the minimum (I believe) 7 processors and writing output every 20 minute time step, but only a few variables.

@DeniseWorthen
Copy link
Collaborator

@barlage Yes, I did reboot my system (for a different reason) but I'm not sure whether that was the "fix".

Also, this is these are the pop-ups I see

Screenshot 2025-01-18 at 1 57 42 PM

I believe it has something to do w/ code-signing. I seems like a "sticky" setting for the same executable name so that if I allow it the first time, I don't get further popups.

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin The only issue I saw w/ my M4 install were a lot of messages of this type:

  File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/process.py",\
 line 108, in run
    self._target(*self._args, **self._kwargs)
OSError: [Errno 9] Bad file descriptor

@natalie-perlin
Copy link
Collaborator Author

@natalie-perlin I'm now able to build and run the ultra-low resolution (C12-9deg) coupled configuration I've been working on. This is for my M2, 14.5 studio (12 cores). There are some issues though....
I ended up rebuilding my 1.8, mostly because I ended up w/ two cmake installations and I was not able (even w/ spack uninstall /hash) to resolve it. In my rebuild, I believe I essentially was able to do everything in the instructions, w/ a few tweaks to the site/packages to align w/ uwm (eg not building metplus)
Here are the issues I'm still seeing:

  1. Using compile.sh to build, I see several messages of this type at the start
Lmod Warning: Syntax error in file: /Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname      Module Filename
    ---------------      ---------------
    stack-openmpi/5.0.3  /Users/max/spack-stack-1.8.0/envs/uwm.env.mymacos/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
    ufs_macosx.gnu       /Users/max/ufs_on_osx/ufs_dw/modulefiles/ufs_macosx.gnu.lua

@DeniseWorthen - note that a PR to fix the openmpi module issue has been submitted: JCSDA/spack-stack#1465

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin Not to get ahead of ourselves, but do you have any ideas about the OSError: [Errno 9] Bad file descriptor messages I was seeing w/ 16.0.0?

@natalie-perlin
Copy link
Collaborator Author

File "/opt/homebrew/Cellar/[email protected]/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/process.py",
line 108, in run
self._target(*self._args, **self._kwargs)
OSError: [Errno 9] Bad file descriptor

At which point of preparing system dependencies for spack-stack or spack-stack installation steps ( as outlined in GoogleDoc ) does this message appear? What system is used (M4, OS ...?, clang 16.0)?

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin This was for M4 (Sequoia 15.2, clang 16.0.0), basically following the same steps as outlined in the document. The message appears 576 times in the install.log. I've uploaded the log to this location:

https://drive.google.com/file/d/1ivZjq95Wp1Qo1fJHXlVDB61Xgmt9TLfF/view?usp=drive_link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants