Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased bufr processing times on Orion following Rocky 9 upgrade #608

Open
RussTreadon-NOAA opened this issue Jul 10, 2024 · 17 comments
Open

Comments

@RussTreadon-NOAA
Copy link

GSI issue #771 and spack-stack issue #1166 document an increase in gsi.x wall time on Orion following the Rocky 9 upgrade. The addition of dclock timers to the GSI source code identified the routines which read observations from bufr files as being responsible for the increased wall clock time The total time for gsi.x to process bufr dump files on Hercules was 168.035 seconds. The Orion run used 608.087 seconds to read the same set of bufr dump files. These timings are from the global GSI ctest.

As a second test the debufr executable was used to process select gdas bufr dump files for 20240223 00Z. This is the case which global GSI ctest runs. Five bufr files were processed using time debufr -c $bufrfile. Tabulated below are real time for each $bufrfile as a function of machine

$bufrfile wcoss2 (dogwood) hera hercules orion
gdas.t00z.ahicsr.tm00.bufr_d 2m37.066s 3m13.345s 2m40.209s 7m21.408s
gdas.t00z.1bamua.tm00.bufr_d 1m21.915s 1m35.805s 1m23.073s 3m47.932s
gdas.t00z.gpsro.tm00.bufr_d 1m25.025s 1m42.836s 1m24.657s 3m43.182s
gdas.t00z.ompsn8.tm00.bufr_d 0m5.371s 0m6.574s 0m5.484s 0m14.685s
gdas.t00z.prepbufr 5m41.836s 6m50.757s 7m45.313s 22m9.255s

Dogwood and Hercules timings are comparable except for prepbufr. The Hercules debufr time for prepbufr is 2 minutes greater than Dogwood. The debufr times are a bit higher on Hera than Dogwood. The Orion debufr times are about 3x to 4x greater than other machines.

It is not clear why bufr processing takes longer on Orion. GSI ctests ran noticeably faster on Orion prior to the Rocky 9 upgrade. I do not have debufr timings from Orion prior to the Rocky 9 upgrade.

This issue is opened to document this behavior. Hopefully the cause(s) for the increased run time can be identified and a solution developed to bring Orion bufr processing times in line with other machines.

@RussTreadon-NOAA
Copy link
Author

debufr was run in a batch script on each machine. The batch script loads bufr/11.7.0. The following debufr executables were used on each machine

  • wcoss2 (dogwood): /apps/ops/prod/libs/intel/19.1.3.304/bufr/11.7.0/bin/debufr
  • hera: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/bufr-11.7.0-w62mmgj/bin/debufr
  • hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/gsi-addon-env/install/intel/2021.9.0/bufr-11.7.0-7qdgt6m/bin/debufr
  • orion: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-rocky9/install/intel/2021.9.0/bufr-11.7.0-clawunp/bin/debufr

@RussTreadon-NOAA
Copy link
Author

Repeat the debufr test on Orion using the Orion bufr/11.7.0 debufr, Hercules bufr/11.7.0 debufr, and Orion bufr/12.0.1 debufr executables. No appreciable change in Orion timings using different debufr executables.

$bufrfile orion bufr11.7.0 debufr hercules bufr/11.7.0 debufr orion bufr/12.0.1 debufr
gdas.t00z.ahicsr.tm00.bufr_d 7m16.483s 7m17.608s 7m17.237s
gdas.t00z.1bamua.tm00.bufr_d 3m48.376s 3m49.540s 3m54.525s
gdas.t00z.gpsro.tm00.bufr_d 3m46.107s 3m43.427s 3m44.194s
gdas.t00z.ompsn8.tm00.bufr_d 0m14.483s 0m14.469s 0m14.903s
gdas.t00z.prepbufr 22m10.262s 22m5.952s 22m5.952s

@RussTreadon-NOAA
Copy link
Author

All the tests above execute debufr on compute nodes via a batch script. Manually rerun script on Orion interactive nodes using Orion bufr/11.7.0 debufr. Interactive node times are comparable to compute node times.

$bufrfile interactive orion bufr/11.7.0 debufr
gdas.t00z.ahicsr.tm00.bufr_d 7m11.996s
gdas.t00z.1bamua.tm00.bufr_d 3m47.833s
gdas.t00z.gpsro.tm00.bufr_d 3m42.075s
gdas.t00z.ompsn8.tm00.bufr_d 0m14.302s
gdas.t00z.prepbufr 22m9.804s

@RussTreadon-NOAA
Copy link
Author

Compiler sensitivity

Orion Helpdesk suggested building bufr using intel-2023.2.4. Clone NCEPLIB-bufr develop at bdb497f on Orion. Follow instructions in README.

Initial build had the following modules loaded in the Orion environment

Currently Loaded Modules:
  1) contrib/0.1     3) rocoto/1.3.7    5) glx/1.4              7) zlib/1.2.13     9) qt/5.15.8
  2) noaatools/3.1   4) git-lfs/3.1.2   6) libxkbcommon/1.4.0   8) sqlite/3.39.4  10) cmake/3.26.3

Compilers /usr/bin/cc and /usr/bin/gfortran used to build package. Below are the compiler versions

orion-login-2:/work/noaa/da/rtreadon/git/bufr/develop/build$ /usr/bin/cc --version
cc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

orion-login-2:/work/noaa/da/rtreadon/git/bufr/develop/build$ /usr/bin/gfortran --version
GNU Fortran (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Bufr package built and installed without error. Use resulting debufr to process /work/noaa/rstprod/dump/gdas.20240223/00/atmos/gdas.t00z.gpsro.tm00.bufr_d with the following timing

orion-login-2:/work/noaa/da/rtreadon/git/bufr/develop/build$ time ../install/bin/debufr -c $gpsbufr

real    2m21.333s
user    2m8.440s
sys     0m12.510s

This is faster than what we previously saw using the bufr/11.7.0 debufr.

Repeat the above with intel-oneapi-compilers/2023.2.4. The modules below were loaded prior to the build

Currently Loaded Modules:
  1) contrib/0.1     3) rocoto/1.3.7    5) glx/1.4              7) zlib/1.2.13     9) qt/5.15.8     11) intel-oneapi-compilers/2023.2.4
  2) noaatools/3.1   4) git-lfs/3.1.2   6) libxkbcommon/1.4.0   8) sqlite/3.39.4  10) cmake/3.26.3

The initial cmake failed with

-- The C compiler identification is IntelLLVM 2023.2.4
-- The Fortran compiler identification is IntelLLVM 2023.2.4
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.2.4-cuxy653alrskny7hlvpp6hdx767d3xnt/compiler/2023.2.4/linux/bin/icx - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.2.4-cuxy653alrskny7hlvpp6hdx767d3xnt/compiler/2023.2.4/linux/bin/ifx - skipped
-- 2023.2.4
CMake Error at CMakeLists.txt:16 (message):
  IntelLLVM (OneAPI) version must be 2024.2 or greater

Modify top level CMakeLists.txt as follows

@@ -12,7 +12,7 @@ project(

 if(CMAKE_Fortran_COMPILER_ID STREQUAL "IntelLLVM")
   message(STATUS ${CMAKE_Fortran_COMPILER_VERSION})
-  if(CMAKE_Fortran_COMPILER_VERSION VERSION_LESS 2024.2)
+  if(CMAKE_Fortran_COMPILER_VERSION VERSION_LESS 2023.2)
     message(FATAL_ERROR "IntelLLVM (OneAPI) version must be 2024.2 or greater")
   endif()
 endif()

Successfully rerun cmake and make. Use resulting debufr to process gpsro bufr file. The runtime increased

orion-login-2:/work/noaa/da/rtreadon/git/bufr/develop_intel2023.2.4/build$ time ../install/bin/debufr -c $gpsbufr

real    4m11.930s
user    3m23.967s
sys     0m47.356s

This test indicates that debufr execution time on Orion varies with the compiler used to build the package. I'll stop here and let bufr and system experts take over.

@jbathegit
Copy link
Collaborator

Hi @jack-woollen, do you have an account on Orion, and if so would you mind taking a look at this from the NCEPLIBS-bufr perspective to see in what part(s) of the library code it may be taking the most time to run on that machine vs. the others?

@jack-woollen
Copy link
Contributor

Hi @jbathegit This is what is shown in this and related issues opened concerning the orion slowdown:

  1. for debufr orion runs slower than wcoss2, hera, or hercules, by roughly 2-3 times.
  2. for operational gsi codes orion runs slower than wcoss2, hera, or hercules, by 3-4 times.
  3. compiling bufrlib with gfortran on orion nearly doubles debufr speed.
  4. both bufrlib(7) and bufrlib(12) show similar performance on orion.
  5. the problem became apparent after an OS upgrade.

Given that, I don't think this is solvable in the bufrlib or by using different compilers, although there is clearly sensitivity to compiler performance. More like a disk transfer rate or program interrupt issue in system hardware and/or software. If so, the issue should be visible in other programs besides the gsi.

@RussTreadon-NOAA
Copy link
Author

Nice summary @jack-woollen . I agree that the root cause is not bufrlib. The slowdown in gsi.x bufr processing is a symptom of a hardware, software, environment variable, stack size, etc change during or after the Orion Rocky 9 upgrade.

@jbathegit
Copy link
Collaborator

Thanks @jack-woollen, and I agree that the library itself isn't the culprit here. My thinking had been that maybe if we could pinpoint what part(s) of the library seem to be causing a performance bottleneck on Orion, then that in turn might provide some clues to help the Orion sysadmin folks narrow down what Rocky 9 system settings may need to be tweaked. That was my main motivation for asking :-)

@jack-woollen
Copy link
Contributor

@jbathegit One question. Why does the cmake require intel version at least 2024.2? Just curious.

if(CMAKE_Fortran_COMPILER_ID STREQUAL "IntelLLVM")
message(STATUS ${CMAKE_Fortran_COMPILER_VERSION})
if(CMAKE_Fortran_COMPILER_VERSION VERSION_LESS 2024.2)
message(FATAL_ERROR "IntelLLVM (OneAPI) version must be 2024.2 or greater")
endif()
endif()

@jbathegit
Copy link
Collaborator

jbathegit commented Jul 11, 2024

As I understand it, that was the earliest version of the OneAPI compiler for which @AlexanderRichert-NOAA was able to get a successful build in the corresponding CI test in #536.

Either way, please note that this particular CMake block only applies to the newer Intel OneAPI (icx/ifx) compiler suite and not to the older classic Intel (icc/ifort) compiler suite. The latter is unchanged.

@jack-woollen
Copy link
Contributor

Jeff, it turns out that orion (and hercules) only have the intel compilers listed below available. So to compile 12.1.0 on either you have to disable the check and not run ctest. I don't think it causes the timing problem but it could cause something bad at some point. Also when building 11.7.0 with the default intel compiler on orion several of the ctests failed. I haven't tracked the causes of those yet, but something is problematic over there.

----------------------------------------------------------------- /apps/spack-managed/modulefiles/linux-rocky9-x86_64/Core ------------------------------------------------------------------
intel-oneapi-advisor/2022.3.1 intel-oneapi-compilers/2023.1.0 intel-oneapi-inspector/2022.3.1 intel-oneapi-mpi/2021.7.1
intel-oneapi-compilers/2022.0.2 intel-oneapi-compilers/2023.2.4 intel-oneapi-mkl/2022.2.1 intel-oneapi-vtune/2022.4.1
intel-oneapi-compilers/2022.2.1 (L,D) intel-oneapi-compilers/2024.1.0 intel-oneapi-mkl/2023.1.0 (D)

@jack-woollen
Copy link
Contributor

Either way, please note that this particular CMake block only applies to the newer Intel OneAPI (icx/ifx) compiler suite and not to the older classic Intel (icc/ifort) compiler suite. The latter is unchanged.

@DavidHuber-NOAA Do you know is it possible to link the older classic Intel (icc/ifort) compiler suite on orion and hercules in order to build bufrlib? And If so, how? Thanks.

@DavidHuber-NOAA
Copy link

@jack-woollen I believe this is what is done by default when the intel-oneapi-compilers module is loaded. If that is not the case, then I do not know.

@DavidHuber-NOAA
Copy link

I suspect I may have found an issue with the compilers. The Intel compiler libraries (at least for version 2023.1.0) on Hercules and Orion are bitwise identical. These compilers were compiled for Ice Lake architecture (according to the documentation in the module file), which is the architecture used in Hercules' processors. Orion has an older architecture, Skylake. Since the compiler was compiled for newer architecture, it may contain illegal and/or non-optimal instructions for Skylake processors. I will report this to the SAs as well.

@RussTreadon-NOAA
Copy link
Author

Excellent detective work @DavidHuber-NOAA!

@DavidHuber-NOAA
Copy link

@jack-woollen I see now that the intel-oneapi-compilers/2023.1.0 module loads the LLVM compilers by default. To get the classic compilers, load the module /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/modulefiles/Core/stack-intel/2021.9.0.lua.

@jack-woollen
Copy link
Contributor

@DavidHuber-NOAA Thanks! I'll give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants