Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid linking to libirc.so in spack (parallel-netcdf), turn off crypt variant for Python, and update Orion site config to fix tar issue #1435

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Dec 24, 2024

Summary

  1. Applications built with spack-stack packages esmf, parallelio, parallel-netcdf have libirc.so dynamically linked. Applications linked against libirc.so fail to start up. See Avoid linking to Intel's libirc.so library (aka bad configure script of package parallel-netcdf) #1436. The spack PR that is part of the suggested changes here fixes this by replacing libirc.so with libintlc.so in the parallel-netcdf build. See Bug fix in parallel-netcdf to avoid linking to libirc.so AND cherry-pick spack develop PR 48251 (conflict Intel Classic with [email protected]) spack#495.
  2. Turn off crypt variant for Python; this variant leads to build errors with Intel in py-cryptography unless external curl and openssl are removed, which itself is problematic.
  3. Add external wget on Orion, latest versions don't build with Intel on the machine.
  4. Also in the spack PR: add conflict of [email protected] with Intel Classic compilers. See Bug fix in parallel-netcdf to avoid linking to libirc.so AND cherry-pick spack develop PR 48251 (conflict Intel Classic with [email protected]) spack#495.

Testing

Please try to reproduce the problem reported in #1355 with the following environment (I couldn't):

module purge
module use /work/noaa/gmtb/dheinzel/spst-libirc/envs/ue-intel-2021.9.0/install/modulefiles/Core
module load stack-intel/2021.9.0
module load stack-intel-oneapi-mpi/2021.9.0
module load stack-python/3.11.7

In addition to the testing described in JCSDA/spack#495, I built the ufs-weather-model on Orion and ran one of the ATM-only regression tests. It ran to completion, but the results didn't match the baseline (this is expected, many packages are newer in spack-stack develop than they are in spack-stack-1.6.0, which the still UFS uses)

Applications affected

All

Systems affected

Orion specifically, but basically all that use Intel compilers

Dependencies

Issue(s) addressed

Resolves #1355
Resolves #1436

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@climbfuji climbfuji changed the title DRAFT Update .gitmodules and submodule pointer for spack for code review an… Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config Dec 26, 2024
@climbfuji climbfuji changed the title Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to solve tar issue Dec 26, 2024
@climbfuji climbfuji changed the title Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to solve tar issue Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to fix tar issue Dec 26, 2024
@climbfuji climbfuji force-pushed the feature/libirc_parallel_netcdf_and_scipy branch from 3365b2a to bc40b8f Compare December 26, 2024 23:09
@climbfuji climbfuji changed the title Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to fix tar issue Avoid linking to libirc.so in spack (parallel-netcdf), turn off crypt variant for Python, and update Orion site config to fix tar issue Dec 26, 2024
@climbfuji climbfuji self-assigned this Dec 26, 2024
@climbfuji climbfuji force-pushed the feature/libirc_parallel_netcdf_and_scipy branch from bc40b8f to 96de96a Compare December 27, 2024 14:42
@srherbener
Copy link
Collaborator

I'm still running into the tar issue with an Intel build:

-- [download 99% complete]
-- [download 100% complete]
-- Checking if /work2/noaa/jcsda/herbener/jedi/build/test_data/3.1.1/fix_REL-3.1.1.2 already exists...
-- Untarring the downloaded file (~2 minutes) to /work2/noaa/jcsda/herbener/jedi/build/test_data/3.1.1
tar: Relink `/apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.1.0-sb753366rvywq75zeg4ml5k5c72xgj72/compiler/2023.1.0/linux/compiler/lib/intel64_lin/libimf.so' with `/usr/lib64/libm.so.6' for IFUNC symbol `sincosf'
CMake Error at crtm/test/CMakeLists.txt:106 (message):
  Failed to untar the file.

I must have something wrong in my environment. I used this to load modules:

SPACK_STACK_INTEL_ENV=/work/noaa/gmtb/dheinzel/spst-libirc/envs/ue-intel-2021.9.0

# load modules
module purge
module use ${SPACK_STACK_INTEL_ENV}/install/modulefiles/Core
module load stack-intel/2021.9.0
module load stack-intel-oneapi-mpi/2021.9.0
module load stack-python/3.11.7

jedi-host-post-load() {
  module swap git-lfs git-lfs/3.1.2
}

# This is a fix for the issue where the spack-stack-1.8.0 udunits
# module does not get loaded propery. Without this workaround, the
# udunits module from the "spack-managed" gets loaded instead and
# ecbuild on jedi-bundle fails.
#
# Setting LMOD_TMOD_FIND_FIRST gets rid of the default marking
# of modules, and the modification of MODULEPATH makes sure
# that spack-stack-1.8.0 modules are found first before same
# named modules in other directories (ie, "spack-managed")
export LMOD_TMOD_FIND_FIRST=yes
module use $SPACK_STACK_INTEL_ENV/install/modulefiles/intel/2021.9.0

...
# Load JEDI modules
module load jedi-fv3-env
module load jedi-mpas-env
module load ewok-env
module load soca-env

# Optional host-specific post-load procedures
[ $(declare -f -F jedi-host-post-load) ] && jedi-host-post-load; unset -f jedi-host-post-load || ech
o "No post-load procedures"

@climbfuji
Copy link
Collaborator Author

Ah well, so this is another library (libm) not libirc. I wonder if we have the same problem and solution in this case (i.e. we should link to something else instead).

@climbfuji
Copy link
Collaborator Author

@srherbener Is the information from my libirc bugfix sufficient for you to look into libm and fix that?

srherbener and others added 3 commits January 14, 2025 11:43
so that both gcc and intel builds will use the external zlib package.
Added config to use the external zlib for the orion Intel build.
@eap eap requested a review from RatkoVasic-NOAA January 21, 2025 17:23
@climbfuji
Copy link
Collaborator Author

@RatkoVasic-NOAA This PR is now up to date with spack develop and includes the bugfix for building freetype with an external libz but spack-built pkgconfig.

@RatkoVasic-NOAA
Copy link
Collaborator

Great! I'll test it on Orion.

@climbfuji
Copy link
Collaborator Author

@srherbener The ectrans build failed for the oneapi-ifx workflow, even though this is oneapi@2024 (but ifx instead of ifort):

     90      CRITICAL - Fortran compiler
     91      /home/ubuntu/spack-stack/CI/actions-runner/_work/spack-stack/spack
           -stack/spack/lib/spack/env/oneapi/ifx
     92      does not recognise Fortran flag '-fast-transcendentals -fp-model p
           recise
     93      -fp-speculation=safe'
     94    Call Stack (most recent call first):
     95      /home/ubuntu/spack-stack/CI/actions-runner/_work/spack-stack/spack
           -stack/envs/ue-oneifx-2024.2.0-buildcache/install/__spack_path_place
           holder__/__spack_path_placeholder__/__spack_path_placeholder__/__s/o
           neapi/2024.2.0/ecbuild-3.7.2-o24tdwh/share/ecbuild/cmake/ecbuild_add
           _lang_flags.cmake:116 (ecbuild_critical)

I think we need to fix the ectrans conflict statement, but unfortunately that's not straightforward with the way it is currently written (can't test in a conflict statement if the fortran compiler is ifx, at least not yet). I am trying a few things ...

@srherbener
Copy link
Collaborator

@climbfuji thanks for the update and thanks for working on getting ectrans straightened out!

@climbfuji climbfuji requested a review from srherbener January 23, 2025 00:13
@climbfuji climbfuji marked this pull request as ready for review January 23, 2025 00:13
@RatkoVasic-NOAA
Copy link
Collaborator

Orion Intel installation went all OK. I'll run GNU as well.

Copy link
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Orion both GNU and Intel. Both installations went OK.

Copy link
Collaborator

@srherbener srherbener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm didn't get much more time to work on this and I'm still having some trouble concretizing, but I think that is an issue specific to my environment or something that I am doing wrong since @RatkoVasic-NOAA was able to successfully build both Intel and GNU environments. For the sake of getting this in for the spack-stack-1.9.0 I'm going to assume all is okay with the configuration (due to Ratko's success) and approve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
3 participants