Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new machine: NOAA Ursa (Hera's follow-on) #1297

Open
ulmononian opened this issue Sep 12, 2024 · 22 comments
Open

Add new machine: NOAA Ursa (Hera's follow-on) #1297

ulmononian opened this issue Sep 12, 2024 · 22 comments
Assignees

Comments

@ulmononian
Copy link
Collaborator

ulmononian commented Sep 12, 2024

Is your feature request related to a problem? Please describe.
Hera will be replaced (time unknown to me) by Ursa (dual GPU/CPU machine). I believe it will be a NOAA Tier-1 platform. It looks like it has the following compiler(s)/mpi:

  intel-oneapi-compilers: intel-oneapi-compilers/2024.2.1

  intel-oneapi-mkl: intel-oneapi-mkl/2024.2.1

  intel-oneapi-mpi: intel-oneapi-mpi/2021.13.1

Currently, we have access to an 8-node pre-TDS via Niagara. It has no network connectivity.

Describe the solution you'd like
Need to add a site config for Ursa and install spack-stack there.

@ulmononian
Copy link
Collaborator Author

ufs-wm issue for porting to ursa: ufs-community/ufs-weather-model#2435

@rickgrubin-noaa
Copy link
Collaborator

This comment is copied from the JCSDA / spack-stack slack channel.


Installing spack-stack @ release/1.8.0 (aka develop) on an air-gapped system

Two hosts -- niagara (internet access) and ursa (air-gapped) -- share a filesystem /path/to/shared so that both hosts can see the shared space.

A set of site-specific files are created for ursa in /path/to/shared/spack-stack/configs/sites/tier1/ursa

On ursa, spack can be initialized and an env can be created, but with default settings, spack cannot concretize the env because clingo cannot be built -- spack is expecting internet access to properly bootstrap itself.

As noted in the latest spack-stack docs, a source mirror is necessary so that the air-gapped system has access to said mirror for subsequent installations.

Following are steps taken to create a local source mirror -- on ursa, per the suggestion to

Create an environment as usual, activate it and run the concretization step (spack concretize), but do not start the installation yet.

With the following setup:

clone spack-stack on niagara to the shared space -- niagara and ursa can see the cloned repo.

  • initialize spack
    • . ./setup.sh
  • create an env
    • spack stack create env --compiler intel --dir /path/to/shared/envs --name unified-dev.ursa.intel --site ursa --template unified-dev
  • activate the env
    • spack env activate -p /path/to/shared/envs/unified-dev.ursa.intel
  • concretize the env
    • spack concretize 2>&1 | tee log.concretize

A first attempt to follow those steps in the latest spack-stack docs resulted in a failure to concretize

$ spack concretize 2>&1 | tee log.concretize
==> Installing gmake-4.4.1-uri5j7dm4txw4fqcdyngq66qlsnvzimp [1/21]
==> No binary for gmake-4.4.1-uri5j7dm4txw4fqcdyngq66qlsnvzimp found: installing from source
==> Error: FetchError: All fetchers failed for spack-stage-gmake-4.4.1-uri5j7dm4txw4fqcdyngq66qlsnvzimp
[...]
==> Error: cannot bootstrap the "clingo" Python module from spec "clingo-bootstrap@spack+python %gcc target=x86_64" due to the following failures:
github-actions-v0.5 raised RuntimeError: The binary index is empty
github-actions-v0.4 raised RuntimeError: The binary index is empty
spack-install raised InstallError: Terminating after first install failure: FetchError: All fetchers failed for spack-stage-gmake-4.4.1-uri5j7dm4txw4fqcdyngq66qlsnvzimp

as spack's github-actions for bootstrapping are enabled by default:

$ spack bootstrap list
Name: github-actions-v0.5 ENABLED

 Type: buildcache

 Info: 
  url: https://mirror.spack.io/bootstrap/github-actions/v0.5
  homepage: https://github.com/spack/spack-bootstrap-mirrors
  releases: https://github.com/spack/spack-bootstrap-mirrors/releases

 Description: 
  Buildcache generated from a public workflow using Github Actions.
  The sha256 checksum of binaries is checked before installation.
   

Name: github-actions-v0.4 ENABLED

 Type: buildcache

 Info: 
  url: https://mirror.spack.io/bootstrap/github-actions/v0.4
  homepage: https://github.com/spack/spack-bootstrap-mirrors
  releases: https://github.com/spack/spack-bootstrap-mirrors/releases

 Description: 
  Buildcache generated from a public workflow using Github Actions.
  The sha256 checksum of binaries is checked before installation.
   

Name: spack-install ENABLED

 Type: install

 Info: 
  url: https://mirror.spack.io

 Description: 
  Specs built from sources downloaded from the Spack public mirror.

Disabling the github-actions:

$ spack bootstrap disable github-actions-v0.5
==> "github-actions-v0.5" is now disabled and will not be used for bootstrapping
$ spack bootstrap disable github-actions-v0.4
==> "github-actions-v0.4" is now disabled and will not be used for bootstrapping

still resulted in a failure to concretize, perhaps (speculating here) because installing from the spack public mirror, i.e.

spack-install ENABLED

remains enabled.

Since the spack-stack repo is shared between niagara and ursa, the spack-install bootstrap method is left ENABLED, and a source bootstrap mirror is created from niagara:

Enable github-actions so that niagara can download bootstrap sources and create a mirror:

$ spack bootstrap enable github-actions-v0.5
$ spack bootstrap enable github-actions-v0.4

$ spack bootstrap mirror /path/to shared/bootstrap
==> Adding "clingo-bootstrap@spack+python %gcc target=x86_64" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Warning: Error while fetching [email protected]
 All fetchers failed for spack-stage-curl-8.7.1-gzgzlqxiqb23zoef5lnzkf74v5bvnz2b
==> Adding "[email protected]: %gcc target=x86_64" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Adding "[email protected]: %gcc target=x86_64" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Adding "gnuconfig" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache

To register the mirror on the platform where it's supposed to be used, move "/path/to/shared/bootstrap" to its final location and run the following command(s):

 % spack bootstrap add --trust local-sources <final-path>/metadata/sources

NB: it's not clear to me if a failure to fetch [email protected] will eventually be fatal at some point.

Note that the suggestion to move "/path/to/shared/bootstrap" to its final location isn't necessary, as it's already in a shared space for niagara and ursa.

Now on ursa:

$ spack bootstrap add --trust local-sources /path/to/shared/bootstrap/metadata/sources
==> New bootstrapping source "local-sources" added in the "user" configuration scope
==> "local-sources" is now enabled for bootstrapping

$ spack bootstrap list
Name: local-sources ENABLED

  Type: install

  Info: 
    url: ../../bootstrap_cache

  Description: 
    Mirror with software needed to bootstrap Spack
[...]

$ spack -b find
-- linux-centos7-x86_64 / [email protected] ----------------------------
[email protected] clingo-bootstrap@spack [email protected] [email protected]
==> 4 installed packages

# disable github-actions
$ spack bootstrap disable github-actions-v0.5
==> "github-actions-v0.5" is now disabled and will not be used for bootstrapping
$ spack bootstrap disable github-actions-v0.4
==> "github-actions-v0.4" is now disabled and will not be used for bootstrapping

# verify that the only bootstrapping source available is "local-sources"
$ cat ~/.spack/bootstrap.yaml 
bootstrap:
  trusted:
    github-actions-v0.5: false
    github-actions-v0.4: false
    local-sources: true
  sources:
  - name: local-sources
    metadata: /path/to/shared/bootstrap/metadata/sources

# also make sure that ~/.spack/linux/compilers.yaml DOES NOT EXIST

Next, create and activate the env, as before / noted above, and concretize:

$ spack stack create env [...]
$ spack env activate [...]
$ spack concretize 2>&1 | tee log.concretize
==> Warning: cannot detect libc from intel@=2024.2.1. The compiler will not be used during concretization.
==> Error: concretization failed for the following reasons:

  1. No valid value for variant 'mpi' of package 'esmf'
  2. Cannot set the required compiler: esmf%intel

Poking around online and looking through this spack issue and looking at output of commands requested by spack developers:

$ /usr/bin/ldd --version
ldd (GNU libc) 2.34
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

$ /lib64/ld-linux-x86-64.so.2 --version
ld.so (GNU libc) stable release version 2.34.
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

so it appears that libc is available on the host. Also per suggestions in the issue, output from

$ spack python

for c in spack.compilers.all_compilers():
  print(c)
  print(c.compiler_verbose_output)
  print()

is voluminous, and can be made available if desired.

The compilers.yaml file for ursa:

$ cat configs/sites/tier1/hostB/compilers.yaml

compilers:
- compiler:
  spec: [email protected]
  paths:
   cc: /apps/oneapi/compiler/2022.0.2/linux/bin/intel64/icx
   cxx: /apps/oneapi/compiler/2022.0.2/linux/bin/intel64/icpc
   f77: /apps/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
   fc: /apps/oneapi/compiler/2022.0.2/linux/bin/intel64/ifort
  flags: {}
  operating_system: rocky9
  modules:
  - intel-oneapi-compilers/2024.2.1
  environment:
   prepend_path:
    PATH: '/usr/bin'
    LD_LIBRARY_PATH: '/usr/lib:/usr/lib64'
    CPATH: '/usr/include/c++/11'
  extra_rpaths: []
- compiler:
  spec: [email protected]
  paths:
   cc: /usr/bin/gcc
   cxx: /usr/bin/g++
   f77: /usr/bin/gfortran
   fc: /usr/bin/gfortran
  flags: {}
  operating_system: rocky9
  modules: []
  environment: {}
  extra_rpaths: []

There is not a system-provided module for GNU compilers.

Ideas?

@climbfuji
Copy link
Collaborator

climbfuji commented Sep 16, 2024

Way too much for me to digest, but I will point out that your compiler config is wrong. Your paths (cc etc) are pointing to 2022.0.2 but you are using @2024.2.1. Then, you use icx but icpc - why?

The convention (that works) is:

  1. If using icc, icpc, ifort (or ifx, but stay away for now) then the compiler is called intel@ and the version is determined from running icc --version - it is NOT the oneapi version.
  2. If using icx, icpx, ifort (or ifx, but stay away for now) then the compiler is called oneapi@ and the version is determined from running icx --version - which usually, but maybe not always, coincides with the oneapi version.

@rickgrubin-noaa
Copy link
Collaborator

Thanks very much for pointing out the errors in compilers.yaml -- not sure how I crossed paths, as it were, but they are certainly incorrect.

And it's helpful to learn about the newer Intel compilers, first chance to use them.

@climbfuji
Copy link
Collaborator

Might be a good idea to go straight to icx/icpx/ifort on ursa!

@rickgrubin-noaa
Copy link
Collaborator

niagara (internet access) and ursa (air-gapped) -- share a filesystem /path/to/shared so that both hosts can see the shared space.

  • clone spack-stack on niagara to the shared space -- niagara and ursa can see the cloned repo.

  • ursa provides the [email protected] compiler suite, so add:

    • spack-stack/configs/sites/tier1/ursa/packages_oneapi.yaml
  • create source and binary bootstrap mirrors on niagara:

$ spack bootstrap mirror --binary-packages /path/to/shared/bootstrap
==> Adding "clingo-bootstrap@spack+python %gcc target=x86_64" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Warning: Error while fetching [email protected]
  All fetchers failed for spack-stage-curl-8.7.1-gzgzlqxiqb23zoef5lnzkf74v5bvnz2b
==> Adding "[email protected]: %gcc target=x86_64" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Adding "[email protected]: %gcc target=x86_64" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Adding "gnuconfig" and dependencies to the mirror at /path/to/shared/bootstrap/bootstrap_cache
==> Adding binary packages from "https://github.com/spack/spack-bootstrap-mirrors/releases/download/v0.4/bootstrap-buildcache.tar.gz" to the mirror at /path/to/shared/bootstrap/bootstrap_cache

To register the mirror on the platform where it's supposed to be used, move "/path/to/shared/bootstrap" to its final location and run the following command(s):

  % spack bootstrap add --trust local-sources <final-path>/metadata/sources
  % spack bootstrap add --trust local-binaries <final-path>/metadata/binaries

NB: it's not clear to me if a failure to fetch [email protected] will eventually be fatal at some point.

Note that the suggestion to move "/path/to/shared/bootstrap" to its final location isn't necessary, as it's already in a shared space for niagara and ursa.

Now on ursa:

# disable github-actions
$ spack bootstrap disable github-actions-v0.5
==> "github-actions-v0.5" is now disabled and will not be used for bootstrapping
$ spack bootstrap disable github-actions-v0.4
==> "github-actions-v0.4" is now disabled and will not be used for bootstrapping

$ spack bootstrap add --trust local-sources /path/to/shared/bootstrap/metadata/sources
==> New bootstrapping source "local-sources" added in the "user" configuration scope
==> "local-sources" is now enabled for bootstrapping

$ spack bootstrap add --trust local-binaries /path/to/shared/bootstrap/metadata/binaries
==> New bootstrapping source "local-binaries" added in the "user" configuration scope
==> "local-binaries" is now enabled for bootstrapping

$ spack -b find
-- linux-centos7-x86_64 / [email protected] ----------------------------
[email protected]  clingo-bootstrap@spack  [email protected]  [email protected]
==> 4 installed packages

$ spack bootstrap list
Name: local-binaries ENABLED

  Type: buildcache

  Info: 
    url: ../../bootstrap_cache
    homepage: https://github.com/spack/spack-bootstrap-mirrors
    releases: https://github.com/spack/spack-bootstrap-mirrors/releases
    tarball: https://github.com/spack/spack-bootstrap-mirrors/releases/download/v0.4/bootstrap-buildcache.tar.gz

  Description: 
    Buildcache copied from a public tarball available on Github.The sha256 checksum of binaries is checked before installation.

Name: local-sources ENABLED

  Type: install

  Info: 
    url: ../../bootstrap_cache

  Description: 
    Mirror with software needed to bootstrap Spack

Name: spack-install ENABLED

  Type: install

  Info: 
    url: https://mirror.spack.io

  Description: 
    Specs built from sources downloaded from the Spack public mirror.
    

Name: github-actions-v0.5 DISABLED

Name: github-actions-v0.4 DISABLED

$ cat ~/.spack/bootstrap.yaml
bootstrap:
  trusted:
    github-actions-v0.5: false
    github-actions-v0.4: false
    local-sources: true
    local-binaries: true
  sources:
  - name: local-binaries
    metadata: /collab1/data/Richard.Grubin/contrib/spack-stack-rocky9/bootstrap/metadata/binaries
  - name: local-sources
    metadata: /collab1/data/Richard.Grubin/contrib/spack-stack-rocky9/bootstrap/metadata/sources

# also make sure that ~/.spack/linux/compilers.yaml DOES NOT EXIST

Create the unified-dev env:

  • initialize spack
    • . ./setup.sh
  • create an env
    • spack stack create env --compiler oneapi --dir /path/to/shared/envs --name unified-dev.ursa.oneapi --site ursa --template unified-dev
  • activate the env
    • spack env activate -p /path/to/shared/envs/unified-dev.ursa.oneapi
  • concretize the env
    • spack concretize 2>&1 | tee log.concretize

Concretization is successful.

@rickgrubin-noaa
Copy link
Collaborator

Following on with successful concretization:

spack mirrors for air-gapped systems

Steps 1, 2 are complete.

Step 3: git-lfs and python3 (v3.12.4) are available (via modules in a shared space, irrespective of internet connectivity). Compiler modules are different, and don't seem necessary, so not loaded.

Step 4: on niagara (connected host), in the same spack-stack repo as is available on ursa (air-gapped host), the air-gapped env is created and populated as described.

Step 5: no need to rysnc (or similar) the source mirror, as it lives on a filesystem shared by both hosts

Step 6: the source mirror is added and is listed at the top of output from spack mirror list

spack install begins but fails attempting to install libtiprc; a logfile is attached. It's noteworthy that spack-stack@release/1.8.0 with appropriate site config files appears to try to do "the right thing."

It is noted that ursa is likely not fully built out at this point, and that may be at play here. A help desk request has been submitted to install some utilities that spack-stack needs.

See: https://github.com/rickgrubin-noaa/spack-stack/tree/JCSDA/1297/configs/sites/tier1/ursa for details.

spack install logfile log.install.txt

@rickgrubin-noaa
Copy link
Collaborator

Making progress thanks to RDHPCS folks installing some requested tools / software.

CRTM failed to build; it requires fix files:

[...]
  crtm:
    #require: '+fix'

and concretizes as such:

==> Concretized [email protected]%oneapi
 -   nemiyuu  [email protected]%[email protected]+fix~ipo build_system=cmake build_type=Release generator=make arch=linux-rocky9-zen3
[...]
 -   vafxwrx      ^[email protected]_emc%[email protected]+big_endian~little_endian+netcdf build_system=generic arch=linux-rocky9-zen3

and attempts to install as:

==> Installing crtm-fix-2.4.0.1_emc-vafxwrxlqqbzmcg4ngigso3hgul7xzs7 [29/343]
==> No binary for crtm-fix-2.4.0.1_emc-vafxwrxlqqbzmcg4ngigso3hgul7xzs7 found: installing from source
==> Error: FetchError: All fetchers failed for spack-stage-crtm-fix-2.4.0.1_emc-vafxwrxlqqbzmcg4ngigso3hgul7xzs7

When creating the air-gapped source mirror:

[...]
==> Adding package [email protected]_emc to mirror
==> Warning: Error while fetching [email protected]_emc
  All fetchers failed for spack-stage-crtm-fix-2.4.0.1_emc-vafxwrxlqqbzmcg4ngigso3hgul7xzs7

What must I do so with respect to access for [email protected]_emc so that I can add it to the source mirror?

@RatkoVasic-NOAA
Copy link
Collaborator

@rickgrubin-noaa I got same thing on Hera. I did also mirroring (from Hercules), and placed it here:
/contrib/spack-stack/mirror/
and changed site/mirrors.yaml file to:
(/contrib/spack-stack/spack-stack-1.8.0/envs/ue-intel-2021.5.0/site/mirrors.yaml)

mirrors:
  local-source:
    fetch:
      url: file:///contrib/spack-stack/mirror
      access_pair:
      - null
      - null
      access_token: null
      profile: null
      endpoint_url: null
    push:
      url: file:///contrib/spack-stack/mirror
      access_pair:
      - null
      - null
      access_token: null
      profile: null
      endpoint_url: null

It concretized like this:
(/contrib/spack-stack/spack-stack-1.8.0/envs/ue-intel-2021.5.0/log.concretize)

==> Concretized [email protected]%intel
 -   fvuseal  [email protected]%[email protected]+fix~ipo build_system=cmake build_type=Release generator=make arch=linux-rocky8-haswell
 -   al2juhz      ^[email protected]_emc%[email protected]+big_endian~little_endian+netcdf build_system=generic arch=linux-rocky8-haswell

and installed like this:
(/contrib/spack-stack/spack-stack-1.8.0/envs/ue-intel-2021.5.0/log.install)

==> Installing crtm-fix-2.4.0.1_emc-al2juhzkilqssvyg75jsni664uhcknls [22/291]
==> No binary for crtm-fix-2.4.0.1_emc-al2juhzkilqssvyg75jsni664uhcknls found: installing from source
==> Fetching file:///contrib/spack-stack/mirror/_source-cache/archive/6e/6e4005b780435c8e280d6bfa23808d8f12609dfd72f77717d046d4795cac0457.tgz
==> No patches needed for crtm-fix
==> crtm-fix: Executing phase: 'install'

After that crtm-fix-2.4.0.1_emc created needed fix files.

@rickgrubin-noaa
Copy link
Collaborator

rickgrubin-noaa commented Sep 20, 2024

pnetcdf fails to install; quite confused:

==> Installing parallel-netcdf-1.12.3-oirtvs5yjc232ab7qajgvo5chycpmd7e [27/343]
==> No binary for parallel-netcdf-1.12.3-oirtvs5yjc232ab7qajgvo5chycpmd7e found: installing from source
==> Fetching file:///collab1/data/Richard.Grubin/git/EPIC/spack-stack/spack/var/spack/environments/air_gapped_mirror_env/mirror/_source-cache/archive/43/439e359d09bb93d0e58a6e3f928f39c2eae965b6c97f64e67cd42220d6034f77.tar.gz
==> No patches needed for parallel-netcdf
==> parallel-netcdf: Executing phase: 'autoreconf'
==> parallel-netcdf: Executing phase: 'configure'
[...]
configure: error: Directory '/apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4/mpi/2021.13.1' specified in --with-mpi does not exist or is not a directory

and indeed that directory doesn't exist:

$ file /apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4/mpi/2021.13.1
/apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4/mpi/2021.13.1: cannot open `/apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4/mpi/2021.13.1' (No such file or directory)

but deleting the trailing .1 from that path is a directory that exists:

$ file /apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4/mpi/2021.13
/apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4/mpi/2021.13: setgid, directory

packages_oneapi.yaml contains:

packages:
  all:
    compiler:: [[email protected]]
    providers:
      mpi:: [[email protected]]
      # Remove the next three lines to switch to intel-oneapi-mkl
      #blas:: [openblas]
      #fftw-api:: [fftw]
      #lapack:: [openblas]
  mpi:
    buildable: False
  intel-oneapi-mpi:
    externals:
    - spec: [email protected]%[email protected]
      modules:
      - intel-oneapi-mpi/2021.13.1
      prefix: /apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4

which I think is correct.

$ module show intel-oneapi-mpi/2021.13.1 
-----------------------------------------------------------------------------------------------------------------------------------------------------------
   /apps/spack/modules/linux-rocky9-x86_64/Core/intel-oneapi-mpi/2021.13.1.lua:
-----------------------------------------------------------------------------------------------------------------------------------------------------------
whatis("Name : intel-oneapi-mpi")
whatis("Version : 2021.13.1")
[...]
family("mpi")
prepend_path("MODULEPATH","/apps/spack-2024-09/modules/linux-rocky9-x86_64/intel-oneapi-mpi/2021.13.1-aalgxls/Core")
setenv("LMOD_MPI_NAME","intel-oneapi-mpi")
setenv("LMOD_MPI_VERSION","2021.13.1-aalgxls")
[...]
setenv("I_MPI_ROOT","/apps/spack-2024-09/linux-rocky9-x86_64/gcc-11.4.1/intel-oneapi-mpi-2021.13.1-aalgxlsn6mj2wsdebjjv4fl4duddijzr/mpi/2021.13")

so I_MPI_ROOT is the directory "found" above, however spack is choosing to use the directory with the trailing .1,
.../mpi/2021.13.1

Am I configuring something incorrectly?

@RatkoVasic-NOAA
Copy link
Collaborator

You might try to fool him in packages_oneapi.yaml:

packages:
  all:
    compiler:: [[email protected]]
    providers:
      mpi:: [[email protected]]
      # Remove the next three lines to switch to intel-oneapi-mkl
      #blas:: [openblas]
      #fftw-api:: [fftw]
      #lapack:: [openblas]
  mpi:
    buildable: False
  intel-oneapi-mpi:
    externals:
    - spec: [email protected]%[email protected]
      modules:
      - intel-oneapi-mpi/2021.13
      prefix: /apps/spack-2024-09/linux-rocky9-x86_64/oneapi-2024.2.1/intel-oneapi-mpi-2021.13.1-ss72gbndvat3oz22sa6lhmlbjkeabrn4

@rickgrubin-noaa
Copy link
Collaborator

Update through 2024-10-04

OS

  • Rocky Linux 9.4 (Blue Onyx)

Compilers

  • [email protected] (latest version)
  • icx
  • ifort
    • this is the last supported version of ifort
    • will also attempt to build a stack with ifx
  • icpx

MPI

Currently waiting on a few RPMS (particularly related to X11) to be installed on the OS in order to satisfy packages that require x11-devel.

@RaghuReddy-NOAA
Copy link

@rickgrubin-noaa
Please NOTE: Even though the pre-TDS does not have external connectivity it is possible to do downloads by setting the following two environment variables to use the proxy:

export http_proxy=http://10.181.2.65:3128
export https_proxy=http://10.181.2.65:3128

Not all download may work but majority of them do with the above setting. Anything that uses HTTP/HTTPS should work fine. I just wanted to let you know!

We are still looking at the X11 issue above.

@rickgrubin-noaa
Copy link
Collaborator

rickgrubin-noaa commented Oct 8, 2024 via email

@ulmononian
Copy link
Collaborator Author

@rickgrubin-noaa Please NOTE: Even though the pre-TDS does not have external connectivity it is possible to do downloads by setting the following two environment variables to use the proxy:

export http_proxy=http://10.181.2.65:3128
export https_proxy=http://10.181.2.65:3128

Not all download may work but majority of them do with the above setting. Anything that uses HTTP/HTTPS should work fine. I just wanted to let you know!

We are still looking at the X11 issue above.

thanks for looking into the X11 issue, @RaghuReddy-NOAA. it sounds like it would be helpful if this could be installed at the OS level to keep things consistent with other tier-1 machines and allow us to maintain spack-stack configurations on ursa as we do with the other platforms. please let us know if you will be able to install this into the os for ursa.

@rickgrubin-noaa
Copy link
Collaborator

Currently blocked / do not have a workaround for the following:

A source mirror exists / is used, and a number of packages that do their work (primarily patch-ing) via the internet have been worked around; don't see or understand a way around this one at the moment:

configs/common/packages.yaml

[...]
py-cryptography:
  require: '+rust_bootstrap'

log.install

[...]
==> rust-bootstrap: Successfully installed rust-bootstrap-1.78.0-fcejx635hwjkgk7mbu56n6yfrjsjk74i
  Stage: 41.27s.  Install: 8.28s.  Post-install: 0.91s.  Total: 50.51s

[...]

==> Installing py-cryptography-38.0.1-fxoomsm4krjabklnvqycdlplevq7656x [284/341]
==> No binary for py-cryptography-38.0.1-fxoomsm4krjabklnvqycdlplevq7656x found: installing from source
[...]
running build_rust
 cargo rustc --lib --message-format=json-render-diagnostics --manifest-path src/rust/Cargo.toml --release -v --features pyo3/abi3-py36 pyo3/extension-module --crate-type cdylib --
   Updating crates.io index
 warning: spurious network error (3 tries remaining): [56] Failure when receiving data from the peer (CONNECT tunnel failed, response 403)
 warning: spurious network error (2 tries remaining): [56] Failure when receiving data from the peer (CONNECT tunnel failed, response 403)
 warning: spurious network error (1 tries remaining): [56] Failure when receiving data from the peer (CONNECT tunnel failed, response 403)
 error: failed to get `asn1` as a dependency of package `cryptography-rust v0.1.0 (/collab1/data/Richard.Grubin/git/EPIC/spack-stack/cache/build_stage/spack-stage-py-cryptography-38.0.1-fxoomsm4krjabklnvqycdlplevq7656x/spack-src/src/rust)`

 Caused by:
  download of config.json failed

 Caused by:
  failed to download from `https://index.crates.io/config.json`

 Caused by:
  [56] Failure when receiving data from the peer (CONNECT tunnel failed, response 403)
=============================DEBUG ASSISTANCE=============================
[...]
   Python: 3.11.7
   platform: Linux-5.14.0-427.24.1.el9_4.x86_64-x86_64-with-glibc2.34
   pip: 23.1.2
   setuptools: 63.4.3
   setuptools_rust: 1.6.0
   rustc: 1.78.0 (9b00956e5 2024-04-29)
   =============================DEBUG ASSISTANCE=============================

packages/py-cryptography/package.py

    [...]
    # To fix https://github.com/spack/spack/issues/29669
    # https://community.home-assistant.io/t/error-failed-building-wheel-for-cryptography/352020/14
    # We use CLI git instead of Cargo's internal git library
    # See reference: https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli
    depends_on("git", type="build", when="@35:")

    def setup_build_environment(self, env):
        if self.spec.satisfies("@35:"):
            env.set("CARGO_NET_GIT_FETCH_WITH_CLI", "true")

@climbfuji
Copy link
Collaborator

@AlexanderRichert-NOAA had to work around the rust install problem on acorn if I remember correctly

@rickgrubin-noaa
Copy link
Collaborator

@rickgrubin-noaa Please NOTE: Even though the pre-TDS does not have external connectivity it is possible to do downloads by setting the following two environment variables to use the proxy:

export http_proxy=http://10.181.2.65:3128
export https_proxy=http://10.181.2.65:3128

Not all download may work but majority of them do with the above setting. Anything that uses HTTP/HTTPS should work fine. I just wanted to let you know!

@RaghuReddy-NOAA could you provide an example whereby export-ing these variables allows some external connectivity?

The particular case that we are facing is that some spack-stack packages have build-time internet access requirements -- fetching something for a build, like a file directly from github, or a rust cargo file -- items that cannot necessarily be downloaded elsewhere, placed in a package's directory, and then modify package.py to look locally for the item(s).

Short of perhaps creating a container that matches ursa's OS / CPU at a minimum, and building various packages into a build cache which can then be downloaded to ursa, being air-gapped -- while understandable -- is a blocker at the moment.

@RaghuReddy-NOAA
Copy link

@rickgrubin-noaa Can you please confirm you are seeing the above errors even though you have set the following environment variables:

export http_proxy=http://10.181.2.65:3128
export https_proxy=http://10.181.2.65:3128

If you were using a cached mirror it is possible it will work up to some point and fail. Can you please confirm you had set the above, and if you had can you please point to the location of the logs (if possible)?

@rickgrubin-noaa
Copy link
Collaborator

rickgrubin-noaa commented Oct 23, 2024

@RaghuReddy-NOAA a cached mirror is relevant with respect to source -- and that's necessary for an air-gapped system. The issue in question is connectivity at build-time when a spack package attempts to retrieve something that is distinctly not part of a source mirror.

[Richard.Grubin @ ursa] $ hostname
nfe91

[Richard.Grubin @ ursa] $ grep http .bashrc
export http_proxy=http://10.181.2.65:3128
export https_proxy=http://10.181.2.65:3128

[Richard.Grubin@nfe01 spack-stack]$ env | grep http
https_proxy=http://10.181.2.65:3128
http_proxy=http://10.181.2.65:3128

install log file:
/collab1/data/Richard.Grubin/contrib/spack-stack-rocky9/envs/unified-dev.ursa.oneapi/log.install

[...]
==> Installing py-cryptography-38.0.1-fxoomsm4krjabklnvqycdlplevq7656x [247/314]
==> No binary for py-cryptography-38.0.1-fxoomsm4krjabklnvqycdlplevq7656x found: installing from source
==> Using cached archive: /collab1/data/Richard.Grubin/git/EPIC/spack-stack/cache/source_cache/_source-cache/archive/1d/1db3d807a14931fa317f96435695d9ec386be7b84b618cc61cfa5d08b0ae33d7.tar.gz
==> No patches needed for py-cryptography
==> py-cryptography: Executing phase: 'install'
[...]
  running build_rust
  cargo rustc --lib --message-format=json-render-diagnostics --manifest-path src/rust/Cargo.toml --release -v --features pyo3/extension-module pyo3/abi3-py36 --crate-type cdylib --
      Updating crates.io index
  warning: spurious network error (3 tries remaining): [7] Couldn't connect to server (Failed to connect to index.crates.io port 443 after 446 ms: Couldn't connect to server)
  warning: spurious network error (2 tries remaining): [7] Couldn't connect to server (Failed to connect to index.crates.io port 443 after 71 ms: Couldn't connect to server)
  warning: spurious network error (1 tries remaining): [7] Couldn't connect to server (Failed to connect to index.crates.io port 443 after 70 ms: Couldn't connect to server)
  error: failed to get `asn1` as a dependency of package `cryptography-rust v0.1.0 (/collab1/data/Richard.Grubin/git/EPIC/spack-stack/cache/build_stage/spack-stage-py-cryptography-38.0.1-fxoomsm4krjabklnvqycdlplevq7656x/spack-src/src/rust)`

  Caused by:
    download of config.json failed

  Caused by:
    failed to download from `https://index.crates.io/config.json`

  Caused by:
    [7] Couldn't connect to server (Failed to connect to index.crates.io port 443 after 66 ms: Couldn't connect to server)

This failure occurs because:

spack/var/spack/repos/builtin/packages/py-cryptography/package.py

[...]
    # To fix https://github.com/spack/spack/issues/29669
    # https://community.home-assistant.io/t/error-failed-building-wheel-for-cryptography/352020/14
    # We use CLI git instead of Cargo's internal git library
    # See reference: https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli
    depends_on("git", type="build", when="@35:")

    def setup_build_environment(self, env):
        if self.spec.satisfies("@35:"):
            env.set("CARGO_NET_GIT_FETCH_WITH_CLI", "true")

That is, per The Cargo Book:

CARGO_NET_GIT_FETCH_WITH_CLI — Enables the use of the git executable to fetch

and that use of the "git executable to fetch" fails because ursa is air-gapped and the http/https proxies do not seem to make a difference.

It may be that the proxies will work if I manipulate some settings in ~/.gitconfig, or perhaps by creating a ~/.cargo/config with a proper entry, or perhaps by creating a ssh agent for cargo build. That said, because this particular package requires access to an external source at build-time, it won't build without some other sort of intervention.

@rickgrubin-noaa
Copy link
Collaborator

@RaghuReddy-NOAA

Any thoughts on how to move this forward?

Manipulations of ~/.gitconfig | ~/.cargo/config | ssh agent for cargo build have not been successful -- same error message with respect to connectivity.

@RaghuReddy-NOAA
Copy link

@rickgrubin-noaa I am really sorry, I thought I had added a note this issue, looks like I forgot to hit the final "Comment" button...
In any case, at this point I think it will be best to wait until the actual Ursa-TDS (which will be called Oso) will be available shortly. Of course unlike this trial system (which doen's have external network connectivity) Oso will have the same network connectivity as Hera.

Thank you for looking into this and identifying some of the issues we are running into. I will let you know when the Oso is ready for testing, which will not be before mid Nov.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

6 participants