-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased bufr processing times on Orion following Rocky 9 upgrade #608
Comments
|
Repeat the
|
All the tests above execute
|
Compiler sensitivity Orion Helpdesk suggested building bufr using Initial build had the following modules loaded in the Orion environment
Compilers
Bufr package built and installed without error. Use resulting
This is faster than what we previously saw using the Repeat the above with
The initial
Modify top level
Successfully rerun
This test indicates that |
Hi @jack-woollen, do you have an account on Orion, and if so would you mind taking a look at this from the NCEPLIBS-bufr perspective to see in what part(s) of the library code it may be taking the most time to run on that machine vs. the others? |
Hi @jbathegit This is what is shown in this and related issues opened concerning the orion slowdown:
Given that, I don't think this is solvable in the bufrlib or by using different compilers, although there is clearly sensitivity to compiler performance. More like a disk transfer rate or program interrupt issue in system hardware and/or software. If so, the issue should be visible in other programs besides the gsi. |
Nice summary @jack-woollen . I agree that the root cause is not bufrlib. The slowdown in |
Thanks @jack-woollen, and I agree that the library itself isn't the culprit here. My thinking had been that maybe if we could pinpoint what part(s) of the library seem to be causing a performance bottleneck on Orion, then that in turn might provide some clues to help the Orion sysadmin folks narrow down what Rocky 9 system settings may need to be tweaked. That was my main motivation for asking :-) |
@jbathegit One question. Why does the cmake require intel version at least 2024.2? Just curious. if(CMAKE_Fortran_COMPILER_ID STREQUAL "IntelLLVM") |
As I understand it, that was the earliest version of the OneAPI compiler for which @AlexanderRichert-NOAA was able to get a successful build in the corresponding CI test in #536. Either way, please note that this particular CMake block only applies to the newer Intel OneAPI (icx/ifx) compiler suite and not to the older classic Intel (icc/ifort) compiler suite. The latter is unchanged. |
Jeff, it turns out that orion (and hercules) only have the intel compilers listed below available. So to compile 12.1.0 on either you have to disable the check and not run ctest. I don't think it causes the timing problem but it could cause something bad at some point. Also when building 11.7.0 with the default intel compiler on orion several of the ctests failed. I haven't tracked the causes of those yet, but something is problematic over there. ----------------------------------------------------------------- /apps/spack-managed/modulefiles/linux-rocky9-x86_64/Core ------------------------------------------------------------------ |
@DavidHuber-NOAA Do you know is it possible to link the older classic Intel (icc/ifort) compiler suite on orion and hercules in order to build bufrlib? And If so, how? Thanks. |
@jack-woollen I believe this is what is done by default when the intel-oneapi-compilers module is loaded. If that is not the case, then I do not know. |
I suspect I may have found an issue with the compilers. The Intel compiler libraries (at least for version 2023.1.0) on Hercules and Orion are bitwise identical. These compilers were compiled for Ice Lake architecture (according to the documentation in the module file), which is the architecture used in Hercules' processors. Orion has an older architecture, Skylake. Since the compiler was compiled for newer architecture, it may contain illegal and/or non-optimal instructions for Skylake processors. I will report this to the SAs as well. |
Excellent detective work @DavidHuber-NOAA! |
@jack-woollen I see now that the intel-oneapi-compilers/2023.1.0 module loads the LLVM compilers by default. To get the classic compilers, load the module |
@DavidHuber-NOAA Thanks! I'll give it a try. |
GSI issue #771 and spack-stack issue #1166 document an increase in
gsi.x
wall time on Orion following the Rocky 9 upgrade. The addition ofdclock
timers to the GSI source code identified the routines which read observations from bufr files as being responsible for the increased wall clock time The total time forgsi.x
to process bufr dump files on Hercules was 168.035 seconds. The Orion run used 608.087 seconds to read the same set of bufr dump files. These timings are from the global GSI ctest.As a second test the
debufr
executable was used to process select gdas bufr dump files for 20240223 00Z. This is the case which global GSI ctest runs. Five bufr files were processed usingtime debufr -c $bufrfile
. Tabulated below arereal time
for each$bufrfile
as a function of machine$bufrfile
Dogwood and Hercules timings are comparable except for
prepbufr
. The Herculesdebufr
time forprepbufr
is 2 minutes greater than Dogwood. Thedebufr
times are a bit higher on Hera than Dogwood. The Oriondebufr
times are about 3x to 4x greater than other machines.It is not clear why bufr processing takes longer on Orion. GSI ctests ran noticeably faster on Orion prior to the Rocky 9 upgrade. I do not have
debufr
timings from Orion prior to the Rocky 9 upgrade.This issue is opened to document this behavior. Hopefully the cause(s) for the increased run time can be identified and a solution developed to bring Orion bufr processing times in line with other machines.
The text was updated successfully, but these errors were encountered: